Sitemap Updates
This SOP is a work in progress. If you notice anything that is out of date, could be more clear or could be more efficient please update the SOP or add a comment to start a conversation.
Purpose
The purpose of this document is to explain what Sitemaps are and how they impact how search engines understand websites.
What are Sitemaps?
A site map (or sitemap) is a list of pages of a website. Sitemaps are used during the planning of a Website by its designers. These are Human-visible listings, typically hierarchical, of the pages on a site. These structured listings are intended for web crawlers such as search engines.
Or another way of saying it:
A sitemap is a file that lists the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to crawl your site.
This is an XML sitemap which is probably one of the most commonly used:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.net/?id=who</loc>
<lastmod>2009-09-22</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.net/?id=what</loc>
<lastmod>2009-09-22</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
You will see that the first line is always the same. The <?xml version="1.0" encoding="UTF-8"?> explains to the reader that this is an XML file that is utf-8 encoded.
Each web page usually has a spot in the sitemap, though we don’t need to include them all. Usually, the most important pages should be included. You make an “entry” by using the <url> tag. So:
<url>
Each URL will be in one of these as you can see above.
</url>
These are the basics of the naming conventions you will encounter:
Loc: The URL
Lastmod: The last time the page was updated
Changefreq: How often the page is updated (weekly, monthly, etc.)
Priority: How important is this page compared to the rest (between 0 - 1)
Now, you can also include URLs to other sitemaps. That way, you can have one that has them all but still break down the files if need be.
Why Do We Use SiteMaps?
We have search engines such as Google that we want to crawl our site. Google will crawl every link it can find but just because a page is crawled, doesn’t mean it will be indexed. The problem is that search engines will not index (or are very unlikely to index) an entire website. Submitting a sitemap can help show Google what we would like it to crawl and index.
Again, the sitemap DOES NOT mean that all pages will be indexed. That is always up to the search engine. Also, just because you submit a sitemap, doesn’t mean that you will be crawled. Think of it as more of a request to the search engines.
Let’s put this in terms our clients can understand:
Imagine that Google is driving into your city (the client website) to map it out for other people so they can come visit. So, you give Google and the car driver (the crawler) a map of your city, which is the site map in our case.
So they drive through your city looking at every spot. They mark the places they think are important. Those spots are the ones that are indexed for people to see. So, even though you gave them a map of the entire city and Google did in fact do a drive by of all places included in your map of the city, for whatever reason, Google picked only certain parts of town to share with others to visit.
Remember that construction happens, the city can change and the map becomes outdated. The same thing happens when websites change, the sitemap becomes outdated. This is not the end of the world. Google will still drive every road (link) it can find regardless of if it is in the sitemap. As long as there is a road (link) to that spot (page).
So, in the end, the best we can do is hand over the most up-to-date map we have and hope for the best.
What are Sitemap Errors?
Things break, change, or just don’t work correctly. This is okay. Sitemap errors range from “This can be ignored” to “take care of this ASAP”.
There will occasionally be sitemaps that will throw you errors. For instance, if there is a problem with the auto generator adding invalid links to the map, it’s usually not worth the time to fix. Those pages don’t exist so there is nothing happening. The sitemap should still go through and the site should show some indexing.
The major problem is when NO INDEXING OCCURS for a while. In Google Search Console, in the Sitemaps tab under Index you’ll be able to see the status of a sitemap getting crawled and indexed. If after two weeks, there is no indexing, there are probably some issues. It can take over a week for sites to get indexed, but as soon as two days for others. So, I would also err on the side of caution and wait at least a week.
Some major errors that could occur will more than likely be bad files, which is usually from a bad generation, so a new sitemap just needs to be generated. There are some other ways around this (by using another source of a sitemap), but avoid doing that and check with a pod lead if just regenerating the sitemap doesn’t work.
Updating a Sitemap
Need-to-Know Notes
Ampersands: due to the formatting of sitemaps, all ampersands in a URL must be changed to & so it can be properly read.
For example:
https://www.rivamiami.com/--inventory?category=atv&make=yamaha
This would have to be converted to:
https://www.rivamiami.com/--inventory?category=atv&make=yamaha
Complete example to be added to the sitemap:
<url><loc>https://www.rivamiami.com/--inventory?category=atv&make=yamaha</loc><changefreq>monthly</changefreq><priority>1.00</priority></url>
To update the Sitemap on our platforms, follow the procedure below:
Helpful Tips
If you’re making a large update to your sitemap, it’s highly recommended to record the day you updated it. That way you have a reference point to see the effects of updating your sitemap.