Swimburger

Auto generate Heading Anchors using HTML AgilityPack DOM Manipulation

Niels Swimberghe

Niels Swimberghe - - .NET

Follow me on Twitter, buy me a coffee

Screenshot of the HTML Agility Pack homepage

For very long documents it can be hard to share a specific segment with others. One way website commonly solve this is by providing "Heading Anchors".
I'm not sure if "Heading Anchors" is the correct term, but that's the most descriptive name I've come across. A heading anchor is when articles provide a hyperlink for each heading to provide deep links. When you browse to the link, it will scroll directly to the heading. Often heading anchors are implemented by adding the pound sign "#" as a hyperlink next to the heading. Here's a nice example from css-tricks.com:

Heading Anchors example from Css-Tricks.com

Manually adding an anchor to every heading would be a painful solution. So let's learn how we can achieve this by generating the Heading Anchors using the HTML AgilityPack .NET library.

Generate Heading Anchors using HTML AgilityPack #

HTML AgilityPack (HAP) is a .NET library for parsing, querying, and manipulating HTML. Here's some operations you can do with HAP. 

To generate the Heading Anchors we'll need to:

  1. Parse our HTML wherever the HTML is coming from (Database, CMS, etc.)
  2. Select our headings using XPath
  3. Add "#" anchors using DOM Manipulation
  4. Output the manipulated HTML

To follow along, you can use this GitHub repository containing all the sample code. There's more relevant code in the repository, but the important part is the following function:

public string AddHeadingAnchorsToHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    // select all possible headings in the document
    var headings = doc.DocumentNode.SelectNodes("//h1 | //h2 | //h3 | //h4 | //h5 | //h6");
    if (headings != null)
    {
        foreach (var heading in headings)
        {
            var headingText = heading.InnerText;
            // if heading has id, use it
            string headingId = heading.Attributes["id"]?.Value;
            if (headingId == null)
            {
                // if heading does not have an id, generate a safe id by creating a slug based on the heading text
                // slug is a URL/SEO friendly part of a URL, this is a good option for generating anchor fragments
                // Source: http://predicatet.blogspot.com/2009/04/improved-c-slug-generator-or-how-to.html
                // assumption: Prase should only contain standard a-z characters or numbers
                headingId = ToSlug(headingText);
                // for the fragment to work (jump to the relevant content), the heading id and fragment needs to match
                heading.Attributes.Append("id", headingId);
            }

            // use a non-breaking space to make sure the heading text and the #-sign don't appear on a separate line
            heading.InnerHtml += " ";
            // create the heading anchor which points to the heading
            var headingAnchor = HtmlNode.CreateNode($"<a href=\"#{headingId}\" aria-label=\"Anchor for heading: {headingText}\">#</a>");
            // append the anchor behind the heading text content
            heading.AppendChild(headingAnchor);
        }
    }

    return doc.DocumentNode.InnerHtml;
}

In summary, the above code does the following:

  1. Parse the HTML by using the HtmlDocument.LoadHtml function
  2. Select all headings by passing an XPath query to the DocumentNode.SelectNodes function
  3. Iterate over each heading and
    1. Generate an ID for each heading by slugifying the text in the heading. The ToSlug method is based on this article.
      If the heading already has an ID we can reuse it.
    2. Create an HTML anchor and set a fragment URL generated from to heading-id to the href-attribute
    3. Append the anchor to the heading so the '#'-anchor shows up next to the heading text
  4. Return the manipulated HTML

If you play around with the sample, you'll see the HTML is coming from an HTML file stored on the server and the resulting HTML is returned directly to the browser. The result looks like this:

Heading Anchors demo screenshot

IMPORTANT NOTE: Parsing, querying, and manipulating DOM is an intensive task. Keep that in mind when using HTML AgilityPack and apply caching if necessary.

Summary #

Using the HTML AgilityPack library, we parsed, queried, and manipulated HTML to generate Heading Anchors for a richer URL sharing experience.

BONUS: Using the Scroll-behavior CSS property we can enable a smooth scroll animation on supporting browsers.

Related Posts

Related Posts