in

Dot Net Mafia

Group site for developer blogs dealing with (usually) .NET, SharePoint 2013, SharePoint 2010, Office 365, SharePoint Online, and other Microsoft products, as well as some discussion of general programming related concepts.

This Blog

Syndication

Archives

Corey Roth [MVP]

A SharePoint MVP bringing you the latest time saving tips for SharePoint 2013, Office 365 / SharePoint Online and Visual Studio 2013.

How to: Exclude part of a page from being indexed by SharePoint Search

This is something that I have been trying to figure out for quite some time.  I’ve seen numerous people ask in the forums and no one seemed to have a conclusive answer (at least the last time I checked).  The issue is simple.  You want to index a regular non-SharePoint web site.  Usually, it’s your company’s public-facing web site.  That site has common navigation on every page with terms such as Contact, Locations, Privacy Policy, About, etc. that you don’t want to be indexed.  If it is indexed, every time a user types in contact, they end up having every page on the site returned in the search results.  When I was at FAST University, I had a chance to ask Leonardo Souza about this issue and he told me the secret.  Let’s take a look at my example site so you can see what I mean.

Here’s the home page.  As you can see, I spent hours working on the branding for it. :)  The Contact Us and Privacy Policy links are considered the navigation and are repeated on each page.

EnterpriseSearchNoIndexHomePage

The Contact Us page looks similar with the same navigation.

EnterpriseSearchNoIndexContactPage

Lastly, the Privacy Policy page has the same navigation as well.

EnterpriseSearchNoIndexPrivacyPolicyPage

We want to exclude the contact us and privacy policy links in the navigation from our search results.  How do we do that?  It’s pretty simple actually.  Just put the content that you do not want indexed in a div tag with a class of noindex.   Let’s look at the complete HTML of the home page.

<html>

<head>

    <title>Super Neat Home Page</title>

</head>

<body>

    <div>

        Welcome to our awesome site. We are the best! <a href="test.html">Awesome Stuff</a>

        If you need to get a hold of us, click <a href="contactus.html">here</a>. Worried,

        we'll <a href="privacy.html">sell you out?</a>

    </div>

    <div class="noindex">

        <a href="contactus.html">Contact Us</a> <a href="privacy.html">Privacy Policy</a>

    </div>

</body>

</html> 

You can see that the Contact Us and Privacy Policy links are inside <div class=”noindex”>.  You might have noticed that the body of the page also has links to these two pages.  I had to include these so that those pages would get indexed.  Since the common navigation is excluded there was no way for the crawler to follow those links.  This is something you will  want toconsider when you are designing master pages because you will need to have at least one link to each page on the site somewhere.

Since I learned about this in the context of FAST Search for SharePoint, I decided to look at it first.  The first thing I will do is show you the results of the entire content source.  That way you will believe me that all of the pages are in the index. :)  I do this with the ContentSource keyword as I have mentioned in my handy keywords post.

EnterpriseSearchNoIndexFASTContentSource

The search results shows the four pages from site.  Now let’s verify that the noindex class worked.  Searching for the word contact yields a single result.

EnterpriseSearchNoIndexFASTContact

Searching for privacy policy also yields a single result.

EnterpriseSearchNoIndexFASTPrivacyPolicy

The noindex class works great with FAST Search for SharePoint.  At this point though, I wondered would this also work with Enterprise Search in SharePoint 2010?  I decided to give it a try and sure enough it works there too.

EnterpriseSearchNoIndexContact

Will this also work in SharePoint 2007?  I haven’t had time to try it yet.  If you have tried it before, please leave a comment and let us know.  Maybe you already knew about this technique, but I think there are plenty of people who don’t so I hope this post helps.  I highly recommend making use of the noindex attribute any time you want to index a non-SharePoint site, such as your public-facing company web site.  By excluding redundant sections of the page, you make your search results much more usable.

Comments

 

Leonardo Souza said:

Very nice post, Corey! It is indeed a great way to exclude sections you don't want to be crawled when you have control over the website's content.

I'm glad to have shared this trick with you and even more that you tested in SharePoint 2010 and it works there too. You are now the expert on this :)

Cheers,

Leo

February 7, 2011 10:12 AM
 

CoreyRoth said:

@Leo thanks again for sharing the trick!

February 7, 2011 10:17 AM
 

Joakim F said:

Sometimes you see parts of pages being tagged with Robots, noindex, nofollow in a iFrame. Would that work in the same way or will the crawler reject the whole page then?

March 31, 2011 3:53 AM
 

CoreyRoth said:

@Joakim That's a good question.  I would have to try it out.  My initial guess is that it wouldn't follow the contents of the IFRAME, but I could be wrong.

April 4, 2011 1:18 PM
 

qasem said:

Dear gurus,

              I faced unexpected behaviour with sharepoint 2010 crawling which i have custom .net webparts with <div class="noindex"> for all my .net webparts then reset the indexing form center administration then run the full crawling but the search result come with 2 result unexpected as i explained into this image please zoom in inside the image to see my comment

www.flickr.com/.../5880124663

any help will be appreciated

Thanks

Qassem

--------------------------------------------------------------------------------

June 28, 2011 6:49 AM
 

CoreyRoth said:

@Qassem Hard to say why this might occur.  You may consider opening a support ticket with Microsoft.

June 28, 2011 9:45 PM
 

sebngu said:

first of all great post.

I have a situation where i want to exclude the <head> section as well. Do you know how this can be done?

I have tried and failed with <head class="noindex">

and there is no way i can put a <div> tag inside <head>.

thanks in advanced for your answer.

October 19, 2011 4:11 AM
 

CoreyRoth said:

@sebngu I'm not sure that it's possible in this case.

November 7, 2011 8:25 PM
 

SharePoint 2010: Excluding Navigation Nodes From Search Crawl said:

Pingback from  SharePoint 2010: Excluding Navigation Nodes From Search Crawl

December 15, 2011 10:04 AM
 

Keval said:

Dear Corey Roth,

I was looking for this solution. Gr8 work! Salute to you!

Thanks,

Keval

April 25, 2012 4:29 AM
 

Hide navigation nodes from showing in search result « Drift Bottle said:

Pingback from  Hide navigation nodes from showing in search result &laquo;  Drift Bottle

December 9, 2012 3:15 PM
 

Resources: SharePoint Search Center and Searching | lionadi said:

Pingback from  Resources: SharePoint Search Center and Searching | lionadi

March 18, 2013 7:21 AM
 

Mikko said:

The noindex class seems not to be working if you use nested tags. So you can't wrap a large section into a div class=noindex and expect it to be excluded by crawler.

March 20, 2013 5:46 AM
 

emma w said:

Thanks for this.

I know little about sharepoint, so please forgive if this is a numpty question: One thing I don't get about the above example is why you don't get the home page listed under search results for 'contact' since you now have the link on there outside of the noindex tag. Is it because the link text is not 'contact us' but 'here'?

June 19, 2013 4:38 AM
 

Drew said:

Looks like it's been a while since you've commented on this blog post, but I'll leave my question anyway and hope for the best.

Is it possible to add this css class with javascript at the time of page load, in such a way that SharePoint will ignore the approriate divs?

We have a very involved site with lots of custom controls which have a common root div. I should be able to add the css class to the appropriate divs, but I'm concerned that it won't affect the Search index because javascript may not fire for the crawler...

I'm going to try it out in the meantime, but figured I'd at least ask to see if you or anyone else had tried this before.

October 3, 2013 10:59 AM
 

CoreyRoth said:

@Drew that I am not sure of.  It's worth a try.  Be sure and let us know your results.  Thanks!

October 11, 2013 10:38 AM

Leave a Comment

(required)  
(optional)
(required)  
Add

About CoreyRoth

Corey Roth is an independent SharePoint consultant specializing in ECM, Apps, and Search.
2012 dotnetmafia.
Powered by Community Server (Non-Commercial Edition), by Telligent Systems