PowerShell Snippet: Crawling a sitemap
Niels Swimberghe - - PowerShell
Follow me on Twitter, buy me a coffee
Here's a PowerShell function that you can use to validate that all pages in your sitemap return a HTTP Status code 200.
You can also use it to warm up your website, or ensure your website caching is warm after a cold boot.
Function CrawlSitemap { Param( [parameter(Mandatory=$true)] [string] $SiteMapUrl ); $SiteMapXml = Invoke-WebRequest -Uri $SiteMapUrl -UseBasicParsing -TimeoutSec 180; $Urls = ([xml]$SiteMapXml).urlset.ChildNodes ForEach ($Url in $Urls){ $Loc = $Url.loc; try{ $result = Invoke-WebRequest -Uri $Loc -UseBasicParsing -TimeoutSec 180; Write-Host $result.StatusCode - $Loc; }catch [System.Net.WebException] { Write-Warning (([int]$_.Exception.Response.StatusCode).ToString() + " - " + $Loc); } } }
You can use the script as follows:
CrawlSitemap -SiteMapUrl 'https://www.swimburger.net/sitemap.xml';
I personally use it as part of my Continuous Delivery pipeline to warm up my site and Cloudflare's cache.
Hope it's useful!