Swimburger

PowerShell Script: Scan documentation for broken links

Niels Swimberghe

Niels Swimberghe - - PowerShell

Follow me on Twitter, buy me a coffee

A lot of documentation will link to other locations on the web using URL's. Unfortunately, many URL's change over time. Additionally, it's easy to make typos or fat finger resulting in incorrect URL's.
Here's a small PowerShell script you can run on your documentation repositories and will tell you which URL's are not resolving in a proper redirect or HTTP StatusCode 200:

Param(
    [Parameter(Mandatory=$true)]
    [string] $DocsRootPath
)
# use as ./CrawlDocsForBrokenLinks -DocsRootPath path/to/docs

# Url Regex specifically for Markdown keeping []() [][] into account
$UrlRegex = '((?:https?):\/\/[a-z0-9\.:].*?(?=[\s\]\[\)]))|((?:https?):\/\/[a-z0-9\.:].*?(?=[\s\]\[\)]))';
Get-ChildItem -Path $DocsRootPath -File -Recurse -Filter "*.md" `
    | Select-String -Pattern $UrlRegex -AllMatches `
    | ForEach-Object { 
    [Microsoft.PowerShell.Commands.MatchInfo]$MatchInfo = $PSItem; 
    $MatchInfo.Matches `
        | Where-Object { $_.Value.StartsWith('http://') -or $_.Value.StartsWith('https://') } `
        | ForEach-Object {
            $Value = $PSItem.Value;
            $Value = $Value.Trim('"').Trim("'");

            try {
                $Response =  Invoke-WebRequest `
                    -Uri $Value `
                    -UseBasicParsing `
                    -ErrorAction SilentlyContinue;
            }
            catch {
                $Response = $PSItem.Exception.Response;
                Write-Output "$([int]$Response.StatusCode) - $($MatchInfo.Path):$($MatchInfo.LineNumber) ($($Value))";
            }
        };
};

The code does the following:

  • Finds files recursively for the given path, filtering to only markdown files
  • Inside of the files extract URL's using a Regular Expression
  • For each URL, make an HTTP Request and if not successful, write to the console with
    • statuscode
    • path to file
    • line number where the URL was found

Save the code to a file named CrawlDocsForBrokenLinks.ps1 and then you can use it by opening a PowerShell shell and invoking it like this:

./CrawlDocsForBrokenLinks.ps1 -DocsRootPath path/to/docs
# Output looks like this
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\README.md:22 (https://marketplace.visualstudio.com/items?itemName=docsmsft)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\ThirdPartyNotices.md:3 (https://creativecommons.org/licenses/by/4)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\best-practices-availability-paired-regions.md:90 (https://github.com/uglide/azure-content/blob/master/articles/resiliency/resiliency-technical-guidance)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\cloud-services-php-create-web-role.md:166 (http://127.0.0.1:81)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\cloud-services-php-create-web-role.md:170 (http://127.0.0.1:81)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\nodejs-use-node-modules-azure-apps.md:27 (https://github.com/woloski/nodeonazure-blog/blob/master/articles/startup-task-to-run-npm-in-azure)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-java-phone-call-example.md:175 (http://localhost:8080/TwilioCloud/callform)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:134 (https://CHANGE_ME.azurewebsites.net/outbound_call)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:244 (https://www.twilio.com/blog/2013/04/introduction-to-twilio-client-with-node-js)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:246 (https://www.twilio.com/blog/2012/09/building-a-real-time-sms-voting-app-part-1-node-js-couchdb)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:247 (https://www.twilio.com/blog/2013/06/pair-programming-in-the-browser-with-twilio)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:97 (https://github.com/twilio/twilio-php/blob/master/README)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:145 (http://readthedocs.org/docs/twilio-php/en/latest/usage/rest)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:255 (https://github.com/twilio/twilio-php/blob/master/README)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-make-phone-call.md:26 (https://github.com/twilio/twilio-php/blob/master/README)

You can redirect the output to a file like this:

./CrawlDocsForBrokenLinks.ps1 -DocsRootPath path/to/docs > brokenlinks.log

When you open the log file with VSCode, you can ctrl+click on the path:linenumber combination and VSCode will open the file and put your cursor on the correct line number!

Hopefully, this makes it easier to maintain working URL's in your documentation. Good luck!

Related Posts

Related Posts