Markdown: Generating Heading IDs

In late October, I submitted a pull request (#125) to blackfriday, a Go package for processing Markdown, in order to satisfy a feature request deemed useful for Hugo (GitHub-style header generation).

The change itself wasn’t very difficult—only about 25 lines of code, including code that Dmitri Shuralyov (@shurcooL) had written to create a sanitized anchor name1.

The change to enable this in Hugo is stuck because there’s no good way to prevent or handle duplicate heading IDs.

There are two problems here:

  1. Single document duplicate IDs; and
  2. Multiple document duplicate IDs.

Single Document Duplicate IDs

It is possible to generate the same heading ID more than once in a single Markdown document. This is a problem for blackfriday. Markdown like this:

# Header
# Header
# Header

produces HTML like this:

<h1 id="header">Header</h1>
<h1 id="header">Header</h1>
<h1 id="header">Header</h1>

I implemented a naïve approach in a second pull request (#126) that is the wrong way to implement this (I will be modifying it at my next opportunity). It uses an incrementing counter and suffix to prevent header collisions (the heading IDs for the above example would be header, header-1, and header-2 under this model). This doesn’t prevent a simple name collision:

# Header
# Header 1
# Header

It also does not prevent a collision like this (resulting in header, header, and header-1):

# Header
# Header {#header}
# Header

Both collisions are undesirable, but the second example (where header is provided by the explicit desire of the user) is a worse collision than the first2.

In a slightly less-naïve approach, we can detect header collisions and append a suffix (like -1) to each header that collides (resulting in header, header-1, and header-1-1), but that feels wrong, unintuitive, and unnecessarily complex:

  1. parser.headers would be changed from map[string]int to map[string]bool, and would grow larger for each header that collides, because each of the forms header, header-1, and header-1-1 would be put into parser.headers.
  2. parser.createSanitizedAnchorName would need to be modified to be something like what follows. (The code here is untested.)

func (p *parser) createSanitizedAnchorName(text string) string {
  var anchorName []rune
  for _, r := range []rune(text) {
    switch {
    case r == ' ':
      anchorName = append(anchorName, '-')
    case unicode.IsLetter(r) || unicode.IsNumber(r):
      anchorName = append(anchorName, unicode.ToLower(r))

  return ensureUniqueAnchor(string(anchorName))

func (p *parser) ensureUniqueAnchor(anchor string) string {
  for _, found := p.headers[anchor]; found; found = p.headers[anchor] {
    anchor = anchor + "-1";

  p.headers[anchor] = true

  return anchor

While I don’t like the collision-with-append approach, it will solve all but the most pathological cases where a user actively tries to sabotage header ID generation. Most of that can be solved by running ensureUniqueAnchor over any provided ID, whether it was generated by createSanitizedAnchorName or not. The header IDs may not match what a user has given, but it will at least be guaranteed to not collide within a single document3. This is the modified approach I will be submitting to blackfriday soon.

Multiple Document Duplicate IDs

This is a problem for Hugo, and blackfriday can’t solve it. It is likely that two or more rendered documents will have identical headings (consider a site based on a web API documentation; they will both have a heading ## Endpoint). If these documents are then rendered into a list page (such as the default Hugo index page), that list page will have multiple headings with identical fragment IDs.

Most Hugo themes that render page {{ .Content }} into a list page (including Hyde, the more-or-less default theme, and my own theme, Cabaret) do not include the {{ .TableOfContents }}, so this won’t usually be a problem, except for HTML validation. If possible, we don’t want to generate invalid HTML.

We’ve already fixed this once in Hugo, by adding a prefix based on the page ID4. This technique can be reused to fix cross-document IDs so that the two headings would be rendered with #endpoint-deadbeef and #endpoint-beefdead5.

A different alternative6 would be to strip all header IDs from list pages. Not impossible, but equally unpredictable (especially if there were a theme that rendered the {{ .TableOfContents }}).

To make this reliable to use, it would be necessary to introduce a new method that would work on both Node and Page objects to render a URL or URL-fragment with the appropriate page ID appended. Pending feedback on this and a couple of other link helper methods, this is how I will be proposing that this feature be fully enabled in Hugo.

  1. If you want similar functionality, you should instead use his extracted package for this through import "", rather than copying the code as I did.
  2. This type of collision cannot be fixed with the “smarter” approach described, either, because the explicit header ID (#header) would be changed to something different. Blackfriday can’t be modified to detect this and parse it differently because it parses and renders documents in a single pass.
  3. This could also be helped if blackfriday provided any logging facility whatsoever. When header ID collisions are detected, it should be logged and reported to the user at a minimum. Ideally, the document should be rejected (e.g., crash-first behaviour), but that would not match user expectations.
  4. An MD5 of the logical name for the source file.
  5. This does require an additional change to blackfriday such that the desired suffix can be passed as part of the renderer, but I do not expect much resistance to that change.
  6. This would be difficult in the current implementation of Hugo because the renderer does not know the difference between rendering in a Page context as opposed to a List or Node context.
Tags// , ,

Bastille in Toronto, October 2014

I saw Bastille last night at the Air Canada Centre, and “Bad Blood: The Last Stand” was a pretty damned good show.

Plumage: A RubyMotion CLI App

I wanted to play with terminal.sexy1 recently to see if I wanted a different terminal colour scheme—but as I use, I needed to convert the colour scheme to the iTerm 2 format for import.

  1. A Terminal Color Scheme Designer, via One Thing Well.

Ruby Net::LDAP Under New Management

TL;DR: Net::LDAP for Ruby is under new management: Michael Schaarschmidt (@schaary). Back in 2003, I registered the net-ldap project on RubyForge. I had time available, and I thought I needed LDAP for a project I was working on. As I started looking at the implementation of LDAP, I found that there were things more interesting and more pressing that I could work on. Then, as now, I wasn’t sure that I wanted to understand LDAP.

New GPG Key

mime-types for Io

Welcome back. I haven’t posted here in five and a half years and the website has been offline for at least two years, but that will be changing, starting now.

I recently finished reading Seven Languages in Seven Weeks. As an exercise, I ported mime-types for Ruby to Io on the 17th of September.

Series: Seven Languages

FOSSLC Panels and Me

I’ve been extraordinarily fortunate recently to be invited to participate in two panels presented by FOSSLC and hosted by the University of Toronto.

Mac Recipe Management Programs, Planning a Revisit

Mostly through a couple of bundles that I’ve purchased recently, I have acquired full licences to Acacia Tree Software’s SousChef and MacGourmet Deluxe (which is, remember, MacGourmet with all of the plug-ins included).

Mac Recipe Management Programs

It’s time to declutter the house. One of the things I want to get rid of are all the recipe magazines and loose recipes that I have. To do this, I need to keep the recipes that I like or want to try. I need a recipe management program. I currently use Yum 2.7.4, which is good, but not great. I decided to seriously evaluate the various recipe management programs available for the Mac.

Aw, Damn (Au Revoir, M Decoux)

Guy Decoux, an extraordinary Rubyist, died earlier this month in tragic circumstances.

Vacationing with the iPhone

This past August, my wife and I went out to Nova Scotia with my parents in their RV1. I’ll be uploading some of the better pictures I took to my Flickr stream in the near future2.

  1. We call it “The Bus”, since it’s a 40-foot Mandalay with four slides, resulting in about 400 square feet of living space when parked.
  2. …which will flood my Tumblr and FriendFeed, but such is life.

A Legend Passes

More Old Magazines

Amnesty International Condemns Canada on Death Penalty

Medical Office Magazine Collections

There’s a Ruby debugger?

Why I (now) wholeheartedly support MMP

Andrew Coyne: Why conservatives should support proportional representation

Is there anybody going to listen to my story…

On Derek Sivers return to PHP…

Ontario Votes: Voting Format Referendum

Beyond time

RubyConf 2006—Day 3 (Sunday, 22 October 2006)

RubyConf 2006—Day 2 (Matz’s keynote, Saturday, 21 October 2006

RubyConf 2006—Day 2 (Saturday, 21 October 2006)

RubyConf 2006—Day 1 (Friday evening, 20 October 2006)

RubyConf 2006—Day 1 (Friday, 20 October 2006)

RubyConf 2006—Day 0 (Thursday, 19 October 2006)

Ruby on Windows: A Note for Microsoft

D*ck T*ping and Semantics

ARIEL: A Mentor’s Mini-Review

Vacation Rubyist Meet Wrapup

Deutsches Rubyists

London Ruby Users Group

Rubyists in London, Germany, Amsterdam?

Eggplant Curry

Pumpkin Cheesecake

Complex Data Structures in PL/SQL

An Extremely Brief Introduction to PL/SQL