How to: Exclude part of a page from being indexed by SharePoint Search

Posted Wednesday, January 26, 2011 12:07 PM by CoreyRoth

This is something that I have been trying to figure out for quite some time.  I’ve seen numerous people ask in the forums and no one seemed to have a conclusive answer (at least the last time I checked).  The issue is simple.  You want to index a regular non-SharePoint web site.  Usually, it’s your company’s public-facing web site.  That site has common navigation on every page with terms such as Contact, Locations, Privacy Policy, About, etc. that you don’t want to be indexed.  If it is indexed, every time a user types in contact, they end up having every page on the site returned in the search results.  When I was at FAST University, I had a chance to ask Leonardo Souza about this issue and he told me the secret.  Let’s take a look at my example site so you can see what I mean.

Here’s the home page.  As you can see, I spent hours working on the branding for it. :)  The Contact Us and Privacy Policy links are considered the navigation and are repeated on each page.

EnterpriseSearchNoIndexHomePage

The Contact Us page looks similar with the same navigation.

EnterpriseSearchNoIndexContactPage

Lastly, the Privacy Policy page has the same navigation as well.

EnterpriseSearchNoIndexPrivacyPolicyPage

We want to exclude the contact us and privacy policy links in the navigation from our search results.  How do we do that?  It’s pretty simple actually.  Just put the content that you do not want indexed in a div tag with a class of noindex.   Let’s look at the complete HTML of the home page.

<html>

<head>

    <title>Super Neat Home Page</title>

</head>

<body>

    <div>

        Welcome to our awesome site. We are the best! <a href="test.html">Awesome Stuff</a>

        If you need to get a hold of us, click <a href="contactus.html">here</a>. Worried,

        we'll <a href="privacy.html">sell you out?</a>

    </div>

    <div class="noindex">

        <a href="contactus.html">Contact Us</a> <a href="privacy.html">Privacy Policy</a>

    </div>

</body>

</html> 

You can see that the Contact Us and Privacy Policy links are inside <div class=”noindex”>.  You might have noticed that the body of the page also has links to these two pages.  I had to include these so that those pages would get indexed.  Since the common navigation is excluded there was no way for the crawler to follow those links.  This is something you will  want toconsider when you are designing master pages because you will need to have at least one link to each page on the site somewhere.

Since I learned about this in the context of FAST Search for SharePoint, I decided to look at it first.  The first thing I will do is show you the results of the entire content source.  That way you will believe me that all of the pages are in the index. :)  I do this with the ContentSource keyword as I have mentioned in my handy keywords post.

EnterpriseSearchNoIndexFASTContentSource

The search results shows the four pages from site.  Now let’s verify that the noindex class worked.  Searching for the word contact yields a single result.

EnterpriseSearchNoIndexFASTContact

Searching for privacy policy also yields a single result.

EnterpriseSearchNoIndexFASTPrivacyPolicy

The noindex class works great with FAST Search for SharePoint.  At this point though, I wondered would this also work with Enterprise Search in SharePoint 2010?  I decided to give it a try and sure enough it works there too.

EnterpriseSearchNoIndexContact

Will this also work in SharePoint 2007?  I haven’t had time to try it yet.  If you have tried it before, please leave a comment and let us know.  Maybe you already knew about this technique, but I think there are plenty of people who don’t so I hope this post helps.  I highly recommend making use of the noindex attribute any time you want to index a non-SharePoint site, such as your public-facing company web site.  By excluding redundant sections of the page, you make your search results much more usable.

Comments

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Monday, February 7, 2011 10:12 AM by Leonardo Souza

Very nice post, Corey! It is indeed a great way to exclude sections you don't want to be crawled when you have control over the website's content.

I'm glad to have shared this trick with you and even more that you tested in SharePoint 2010 and it works there too. You are now the expert on this :)

Cheers,

Leo

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Monday, February 7, 2011 10:17 AM by CoreyRoth

@Leo thanks again for sharing the trick!

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Thursday, March 31, 2011 3:53 AM by Joakim F

Sometimes you see parts of pages being tagged with Robots, noindex, nofollow in a iFrame. Would that work in the same way or will the crawler reject the whole page then?

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Monday, April 4, 2011 1:18 PM by CoreyRoth

@Joakim That's a good question.  I would have to try it out.  My initial guess is that it wouldn't follow the contents of the IFRAME, but I could be wrong.

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Tuesday, June 28, 2011 6:49 AM by qasem

Dear gurus,

              I faced unexpected behaviour with sharepoint 2010 crawling which i have custom .net webparts with <div class="noindex"> for all my .net webparts then reset the indexing form center administration then run the full crawling but the search result come with 2 result unexpected as i explained into this image please zoom in inside the image to see my comment

www.flickr.com/.../5880124663

any help will be appreciated

Thanks

Qassem

--------------------------------------------------------------------------------

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Tuesday, June 28, 2011 9:45 PM by CoreyRoth

@Qassem Hard to say why this might occur.  You may consider opening a support ticket with Microsoft.

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Wednesday, October 19, 2011 4:11 AM by sebngu

first of all great post.

I have a situation where i want to exclude the <head> section as well. Do you know how this can be done?

I have tried and failed with <head class="noindex">

and there is no way i can put a <div> tag inside <head>.

thanks in advanced for your answer.

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Monday, November 7, 2011 8:25 PM by CoreyRoth

@sebngu I'm not sure that it's possible in this case.

# SharePoint 2010: Excluding Navigation Nodes From Search Crawl

Thursday, December 15, 2011 10:04 AM by SharePoint 2010: Excluding Navigation Nodes From Search Crawl

Pingback from  SharePoint 2010: Excluding Navigation Nodes From Search Crawl

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Wednesday, April 25, 2012 4:29 AM by Keval

Dear Corey Roth,

I was looking for this solution. Gr8 work! Salute to you!

Thanks,

Keval

# Hide navigation nodes from showing in search result &laquo; Drift Bottle

Pingback from  Hide navigation nodes from showing in search result &laquo;  Drift Bottle

# Resources: SharePoint Search Center and Searching | lionadi

Pingback from  Resources: SharePoint Search Center and Searching | lionadi

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Wednesday, March 20, 2013 5:46 AM by Mikko

The noindex class seems not to be working if you use nested tags. So you can't wrap a large section into a div class=noindex and expect it to be excluded by crawler.

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Wednesday, June 19, 2013 4:38 AM by emma w

Thanks for this.

I know little about sharepoint, so please forgive if this is a numpty question: One thing I don't get about the above example is why you don't get the home page listed under search results for 'contact' since you now have the link on there outside of the noindex tag. Is it because the link text is not 'contact us' but 'here'?

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Thursday, October 3, 2013 10:59 AM by Drew

Looks like it's been a while since you've commented on this blog post, but I'll leave my question anyway and hope for the best.

Is it possible to add this css class with javascript at the time of page load, in such a way that SharePoint will ignore the approriate divs?

We have a very involved site with lots of custom controls which have a common root div. I should be able to add the css class to the appropriate divs, but I'm concerned that it won't affect the Search index because javascript may not fire for the crawler...

I'm going to try it out in the meantime, but figured I'd at least ask to see if you or anyone else had tried this before.

# re: How to: Exclude part of a page from being indexed by SharePoint Search

Friday, October 11, 2013 10:38 AM by CoreyRoth

@Drew that I am not sure of.  It's worth a try.  Be sure and let us know your results.  Thanks!

Leave a Comment

(required)
(required)
(optional)
(required)