We have already seen what is SiteMap in the before chapter and lets see SiteMap Index in this one.
Introduction
This article explains the XML schema for the sitemap protocol. If you have more than one sitemap then you can list them in sitemap index file to Google. You should specify all URLs using same syntax, if you specify your site location as http://www.example.com/, your URL list shouldn’t contain – http://example.com/.
Limits of XML file
A sitemap file should not contain more than 50,000 URLs and must not be larger than 10 MB when uncompressed. If sitemap is larger, then break it into several smaller sitemaps. This makes the web server not overloaded by serving large files to Google. If you want to list more than 50,000 URLs, you must create multiple sitemap files.
Sitemap index file must have:
- Must begin with opening tag <sitemapindex> and end with a closing tag </sitemapindex>.
- Include <sitemap> entry for each sitemap as parent tag.
- Include <loc> as a child entry for each <sitemap> parent tag.
Sample sitemap index
<? xml version=”1.0” encoding=”UTF-8”?>
<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-01-26</lastmod>
</sitemap>
<sitemap>
….
….
</sitemap>
</sitemapindex>
Sitemap extensions
For videos, news, images.
Informing search engine crawlers
Once the sitemap file is created, place it on your web server. You need to inform the search engines that support this protocol for its location.
You should know following steps:
- Sending an HTTP request.
- Specifying location in your site’s robots.txt file.
- Submitting to the search engine’s submission interface.
Conclusion
Hopefully, all readers have understood about sitemaps and its significance. Sitemaps protocol enables you to tell search engines what content need to be indexed. If you don’t want some pages of your site to be indexed then use robots.txt file or robots meta tag. To learn further details about robots.txt file, wait for the upcoming article.