The best ways to inform any search engines about new content appearing on your websites is with sitemaps. Sitemap is basic XML file that lists all URLs that are available to be crawled. It also contains information such as how often content of the page changes, when it was last updated and how important the page is to the context of your whole website.

There 2 ways to inform search engines how to locate your website:

User-agent: *

If you've just created website and you have yet to acquire outbound links (more on that in later blog posts) sitemaps is great way to get you in search engines as soon as possible. When I had other blog it took only week to appear in google search results.

Most respected search engines will use defined format for sitemaps. Let's go through basic syntax:

Each sitemap should start with following lines:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="">

Within urlset is where we should define our URLs within url tags. Let's investigate how I've defined url in my web page sitemap for this blog post:

  • loc tag is used to identify URL
  • lastmod tag is used to identify when was the last time page was updated. This is very important if your content is expected to change frequently.
  • changefreq is used to identify how often the content of page is expected to change. Possible values are: always, hourly, daily, weekly, monthly, yearly or never
    • always means the content is changed everytime it is accessed
    • never should be used for URLs that should be archived. This doesn't necessary mean crawlers won't ever visit URLs.
  • priority - used to define how you feel important page should be to crawler.

The only required tag is loc.

Note: Limit of urls per sitemap can't exceed 50000. If it does you need to consider spliting your sitemaps to several files.

That's pretty much the basics you might need to get you started. I won't go into further details on this topic - that's for the next blog post!