This website uses cookies

This website uses cookies to give you the best and most relevant experience. By continuing to browse this site, you are agreeing to our use of cookies. Learn More.

Generating XML Sitemaps from SQL Server

XML sitemaps, Atom and RSS are three popular xml based formats used to expose website content to interested parties such as crawlers. This article describes the use of T-SQL and xml functionality in SQL Server to generate xml sitemaps directly from a SQL Server database. The same principles can be applied to generate other xml formats such as Atom and RSS.

In the following examples, the Products and Categories tables from the Northwind database will be joined to generate fictitious product urls. The query below will be modified to generate the desired xml output.

T-Sql script
/*
script to generate a url in the form of in the form of http://yourwebsite/{categoryname}/{productname}/{productid}.
dbo.Clean is a custom scalar function that puts all the characters in lowercase, replaces space with a dash and substitutes invalid characters
*/
USE Northwind
GO
SELECT TOP 5 [ProductID]
    ,C.CategoryName
    ,[ProductName]
    ,'http://yourwebsite/' 
		+ [dbo].[Clean](C.CategoryName) + '/' 
		+ [dbo].[Clean](P.ProductName) + '/' 
		+ [dbo].[Clean](CAST(P.ProductID AS VARCHAR(10))) AS "URL"
FROM [Products] P
INNER JOIN Categories C ON P.CategoryID = C.CategoryID 
ORDER BY P.ProductName
ProductID CategoryName ProductName URL
Query results
17 Meat/Poultry Alice Mutton http://yourwebsite/meat-poultry/alice-mutton/17
3 Condiments Aniseed Syrup http://yourwebsite/condiments/aniseed-syrup/3
40 Seafood Boston Crab Meat http://yourwebsite/seafood/boston-crab-meat/40
60 Dairy Products Camembert Pierrot http://yourwebsite/dairy-products/camembert-pierrot/60
18 Seafood Carnarvon Tigers http://yourwebsite/seafood/carnarvon-tigers/18

The following script generates xml sitemaps with the optional tags (lastmod, changefreq, priority) but without validation namespaces. The FOR XML clause is used in conjunction with the PATH mode.

XML sitemap generation with TSQL
USE Northwind
GO

;WITH XMLNAMESPACES (
	DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'
)
SELECT TOP 5 
  --required
  'http://yourwebsite/' 
       + [dbo].[Clean](C.CategoryName) + '/' 
       + [dbo].[Clean](P.ProductName) + '/' 
       + [dbo].[Clean](CAST(P.ProductID AS VARCHAR(10))) AS "loc",
  
  --optional
  CONVERT(VARCHAR(10), GetDate(), 20) AS "lastmod", 

  --optional
  'daily' AS "changefreq", 

  --optional
  1 AS "priority"
FROM [Products] P
INNER JOIN Categories C ON P.CategoryID = C.CategoryID 
ORDER BY P.ProductName
FOR XML PATH('url'), ROOT('urlset'), ELEMENTS
Query results
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://yourwebsite/meat-poultry/alice-mutton/17</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/condiments/aniseed-syrup/3</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/seafood/boston-crab-meat/40</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/dairy-products/camembert-pierrot/60</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/seafood/carnarvon-tigers/18</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
</urlset>

To enable validation, 2 additional headers must be added to the sitemap urlset node. This requires a slightly different approach with some string handling to get the proper xsi prefix for schemaLocation.

Adding headers in sitemap urlset node
USE Northwind
GO

DECLARE @Xml XML
;WITH XMLNAMESPACES (
	'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' as "schemaLocation", 
	'http://www.w3.org/2001/XMLSchema-instance' as xsi,
	DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'
	)
SELECT @Xml = (
    SELECT TOP 5 
	    --required loc
	    'http://yourwebsite/' 
            + [dbo].[Clean](C.CategoryName) + '/' 
            + [dbo].[Clean](P.ProductName) + '/' 
            + [dbo].[Clean](CAST(P.ProductID AS VARCHAR(10))) AS "loc",

	    CONVERT(VARCHAR(10), GetDate(), 20) AS "lastmod", --optional
	    'daily' AS "changefreq", --optional
	    1 AS "priority" --optional
    FROM [Products] P
    INNER JOIN Categories C ON P.CategoryID = C.CategoryID 
    ORDER BY P.ProductName
    FOR XML PATH('url'), ROOT('urlset'), ELEMENTS
)

/*
 the query above gives the following urlset node
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

 xmlns:schemaLocation should be replaced with xsi:schemaLocation
 replace the proper prefix xsi instead of xmlns with string manipulation
*/
SELECT @Xml = REPLACE(CAST(@Xml AS NVARCHAR(MAX)), 'xmlns:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"', 'xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"')
SELECT @Xml
Query results with additional namespaces
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <url>
        <loc>http://yourwebsite/meat-poultry/alice-mutton/17</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/condiments/aniseed-syrup/3</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/seafood/boston-crab-meat/40</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/dairy-products/camembert-pierrot/60</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://yourwebsite/seafood/carnarvon-tigers/18</loc>
        <lastmod>2016-01-31</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
</urlset>

While the output now conforms to xml sitemap protocol, it is a good practice to limit the number of records that goes in a single file (generally less than 10000 at the time of writing). In real life scenarios where the number of records is high, this would give rise to multiple sitemap files which have to be referenced inside a sitemap index file. One way to achieve this is to add support for pagination inside the sql query - one parameter for the starting record number and another for the number of records to be returned (using OFFSET/FETCH NEXT in SQL Server 2012 and above, or a common table expression with ROW OVER in previous versions).

Limiting number of records
/*
    pagination parameters to be passed by client
    @StartingRecordNumber INT, 
    @NumberOfRecords INT
    add the line below between ORDER BY and FOR XML (SQL 2012 and above)
*/
OFFSET @StartingRecordNumber ROWS FETCH NEXT @NumberOfRecords ROWS ONLY

The client can then loop and create a sitemap file and its corresponding entry into a sitemap index file until no more records are returned.

Pseudocode for generating sitemap index file
//pseudocode
while (haveMoreRecordsInDB)
{
    data = getRecords(startingRecordNumber, numberOfRecordsToReturn);
    filename = createSiteMapFile(data);
    addFilenameToSitemapIndex(filename);
    startingRecordNumber = startingRecordNumber + numberOfRecordsToReturn;
}
saveSitemapIndexFile();