Swimburger

PowerShell Snippet: Crawling a sitemap

Niels Swimberghe

Niels Swimberghe - - PowerShell

Follow me on Twitter, buy me a coffee

PowerShell logo

Here's a PowerShell function that you can use to validate that all pages in your sitemap return a HTTP Status code 200.
You can also use it to warm up your website, or ensure your website caching is warm after a cold boot.

Function CrawlSitemap
{
    Param(
        [parameter(Mandatory=$true)]
        [string] $SiteMapUrl
    );

    $SiteMapXml = Invoke-WebRequest -Uri $SiteMapUrl -UseBasicParsing -TimeoutSec 180;
    $Urls = ([xml]$SiteMapXml).urlset.ChildNodes
    ForEach ($Url in $Urls){
        $Loc = $Url.loc;
        try{
            $result = Invoke-WebRequest -Uri $Loc -UseBasicParsing -TimeoutSec 180;
            Write-Host $result.StatusCode - $Loc;
        }catch [System.Net.WebException] {
            Write-Warning (([int]$_.Exception.Response.StatusCode).ToString() + " - " + $Loc);
        }
    }
}

You can use the script as follows:

CrawlSitemap -SiteMapUrl 'https://www.swimburger.net/sitemap.xml';

I personally use it as part of my Continuous Delivery pipeline to warm up my site and Cloudflare's cache.
Hope it's useful!

Related Posts

Related Posts