The Tao of Unicode Sparklines

I’ve long been enamored of the sparkline, a graphical device which its inventor Edward Tufte defines thusly:

A sparkline is a small intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic.

Nowadays you can create sparklines in many tools including Excel and Google Sheets, both of which can use the technique to pack a summary of a series of numbers into a single cell. By stacking such cells vertically you can create views that compress vast amounts of information.

In A virtuous cycle for analytics I noted that we often use Metabase to display tables and charts based on extracts from our Postgres warehouse. I really wanted to use sparklines to summarize views of activity over time, but that isn’t yet an option in Metabase.

When Metabase is connected to Postgres, though, you can write Metabase questions that can not only call built-in Postgres functions but can also call user-defined functions. Can such a function accept an array of numbers and return a sparkline for display in the Metabase table viewer? Yes, if you use Unicode characters to represent the variable-height bars of a sparkline.

There’s a page at rosettacode.org devoted to Unicode sparklines based on this sequence of eight characters:

U+2581 LOWER ONE EIGHTH BLOCK
U+2582 LOWER ONE QUARTER BLOCK
U+2583 LOWER THREE EIGHTHS BLOCK
U+2584 LOWER HALF BLOCK
U+2585 LOWER FIVE EIGHTHS BLOCK
U+2586 LOWER THREE QUARTERS BLOCK
U+2587 LOWER SEVEN EIGHTHS BLOCK
U+2588 FULL BLOCK

Notice that 2581, 2582, and 2588 are narrower than the rest. I’ll come back to that at the end.

If you combine them into a string of eight characters you get this result:

▁▂▃▄▅▆▇█

Notice that the fourth and eight characters in the sequence drop below the baseline. I’ll come back to that at the end too.

These characters can be used to define eight buckets into which numbers in a series can be quantized. Here are some examples from the rosettacode.org page:

“1 2 3 4 5 6 7 8 7 6 5 4 3 2 1” -> ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁
“1.5, 0.5 3.5, 2.5 5.5, 4.5 7.5, 6.5” -> ▂▁▄▃▆▅█▇
“0, 1, 19, 20” -> ▁▁██
“0, 999, 4000, 4999, 7000, 7999” -> ▁▁▅▅██

To write a Postgres function that would do this, I started with the Python example from rosettacode.org:

bar = '▁▂▃▄▅▆▇█'
barcount = len(bar)
 
def sparkline(numbers):
    mn, mx = min(numbers), max(numbers)
    extent = mx - mn
    sparkline = ''.join(bar[min([barcount - 1,
                                 int((n - mn) / extent * barcount)])]
                        for n in numbers)
    return mn, mx, sparkline

While testing it I happened to try an unchanging sequence, [3, 3, 3, 3], which fails with a divide-by-zero error. In order to address that, and to unpack the algorithm a bit for readability, I arrived at this Postgres function:

create function sparkline(numbers bigint[]) returns text as $$

    def bar_index(num, _min, barcount, extent):
        index = min([barcount - 1, int( (num - _min) / extent * bar_count)])
        return index

    bars = '\u2581\u2582\u2583\u2584\u2585\u2586\u2587\u2588'
    _min, _max = min(numbers), max(numbers)
    extent = _max - _min 

    if extent == 0:  # avoid divide by zero if all numbers are equal
        extent = 1

    bar_count = len(bars)
    sparkline = ''
    for num in numbers:
        index = bar_index(num, _min, bar_count, extent)
        sparkline = sparkline + bars[index]

    return sparkline

$$ language plpython3u;

Here’s a psql invocation of the function:

analytics=# select sparkline(array[1, 2, 3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 3, 2, 1]);
    sparkline
-----------------
▁▂▃▄▅▆▇█▇▆▅▄▃▂▁
(1 row)

And here’s an example based on actual data:

Each row represents a university course in which students and teachers are annotating the course readings. Each bar represents a week’s worth of activity. Their heights are not comparable from row to row; some courses do a lot of annotating and some not so much; each sparkline reports relative variation from week to week; the sum and weekly max columns report absolute numbers.

This visualization makes it easy to see that annotation was occasional in some courses and continuous in others. And when you scroll, the temporal axis comes alive; try scrolling this view to see what I mean.

We use the same mechanism at three different scales. One set of sparklines reports daily activity for students in courses; another rolls those up to weekly activity for courses at a school; still another rolls all those up to weekly activity for each school in the system.

At the level of individual courses, the per-student sparkline views can show patterns of interaction. In the left example here, vertical bands of activity reflect students annotating for particular assignments. In the right example there may be a trace of such temporal alignment but activity is less synchronized and more continuous.

When we’re working in Metabase we can use its handy mini bar charts to contextualize the row-wise sums.

The sparkline-like mini bar chart shows a row’s sum relative to the max for the column. Here we can see that a course with 3,758 notes has about 1/4 the number of notes as the most note-heavy course at the school.

Because these Unicode sparklines are just strings of text in columns of SQL or HTML tables, they can participate in sort operations. In our case we can sort on all columns including ones not shown here: instructor name, course start date, number of students. But the default is to sort by the sparkline column which, because it encodes time, orders courses by the start of annotation activity.

The visual effect is admittedly crude, but it’s a good way to show certain kinds of variation. And it’s nicely portable. A Unicode sparkline looks the same in a psql console, an HTML table, or a tweet. The function will work in any database that can run it, using Python or another of the languages demoed at rosettacode.org. For example, I revisited the Workbench workflow described in A beautiful power tool to scrape, clean, and combine data and added a tab for Lake levels.

When I did that, though, the effect was even cruder than what I’ve been seeing in my own work.

In our scenarios, with longer strings of characters, the differences average out and things align pretty well; the below-the-baseline effect has been annoying but not a deal breaker. But the width variation in this example does feel like a deal breaker.

What if we omit the problematic characters U+2581 (too narrow) and U+2584/U+2588 (below baseline and too narrow)?

There are only 5 buckets into which to quantize numbers, and their heights aren’t evenly distributed. But for the intended purpose — to show patterns of variation — I think it’s sufficient in this case. I tried swapping the 5-bucket method into the function that creates sparklines for our dashboards but I don’t think I’ll switch. The loss of vertical resolution makes our longer sparklines less useful, and the width variation is barely noticeable.

Unicode evolves, of course, so maybe there will someday be a sequence of characters that’s friendlier to sparklines. Maybe there already is? If so please let me know, I’d love to use it.


1 https://blog.jonudell.net/2021/07/21/a-virtuous-cycle-for-analytics/
2 https://blog.jonudell.net/2021/07/24/pl-pgsql-versus-pl-python-heres-why-im-using-both-to-write-postgres-functions/
3 https://blog.jonudell.net/2021/07/27/working-with-postgres-types/
4 https://blog.jonudell.net/2021/08/05/the-tao-of-unicode-sparklines/
5 https://blog.jonudell.net/2021/08/13/pl-python-metaprogramming/
6 https://blog.jonudell.net/2021/08/15/postgres-and-json-finding-document-hotspots-part-1/
7 https://blog.jonudell.net/2021/08/19/postgres-set-returning-functions-that-self-memoize-as-materialized-views/
8 https://blog.jonudell.net/2021/08/21/postgres-functional-style/
9 https://blog.jonudell.net/2021/08/26/working-in-a-hybrid-metabase-postgres-code-base/
10 https://blog.jonudell.net/2021/08/28/working-with-interdependent-postgres-functions-and-materialized-views/
11 https://blog.jonudell.net/2021/09/05/metabase-as-a-lightweight-app-server/
12 https://blog.jonudell.net/2021/09/07/the-postgres-repl/

11 thoughts on “The Tao of Unicode Sparklines

  1. Excellent work!

    That below-baseline problem seems to be highly dependent on the font being used. For example, in the main body of your post, the issue is present with whatever serif font is being used to display it in my browser. However, the problem is not present in the code snippet sections that are using a monospace font. I don’t know if you can control which font family is used in the various applications you’re using. But if you can choose a monospace font, that may solve the issue and get your two extra characters/buckets back.

    1. Good observation. In Metabase, where this approach was incubated, there’s no control over the font, and that’s likewise true in the Workbench example I gave. However there is another context where I do control the font so I’ll give that try. Thanks!

      1. Hmm. The baseline issue does appear, albeit subtly, in the monospace examples above. Also in any monospace font I try in the browser. But not when you paste the sequence into an actual console.

  2. Nice!

    One could also use the pixel-style patterns from the “Braille Patterns” Unicode block (https://unicode-explorer.com/b/2800), which would result in narrower diagrams:

    ⣠⣾⣦⡀

    Unfortunatelly, the spacing between characters is bigger than the spacing between individual Braille dots…

  3. That’s a good thought!

    I’m sure there must be some solution here. Hmm. What about using color intensity instead of block height?

    These are the emoji modifiers from https://unicode-explorer.com/b/1F300

    🏻🏼🏽🏾🏿

    Oh, that won’t work because they only modify other things. Maybe there’s a sequence of base glyphs with a useful color progression though?

  4. When possible, inline SVG is another simple way to render sparklines, displayed with HTML fragments like this:

    See https://www.w3.org/TR/SVG/paths.html#PathData for more details, but basically 0,0 is the top left corner, and the path is a series of commands:
    M x y: move to x, y
    h dx: move horizontally +/- dx
    V y: move vertically to y

    Code here: https://github.com/kubernetes/test-infra/blob/d9e4e78e5efba49af9901a80e119811a51939f44/triage/render.js#L144

    1. > Yes, absolutely. This approach trades that for simplicity and portability. The two are not mutually exclusive of course. (https://twitter.com/judell/status/1423397509712416768)

      This approach originated in the context of Metabase views which, although they are HTML, do not afford any control over the HTML It has also turned out to be quite interesting to be able to use in a console, e.g. psql.

  5. I’m pretty sure that the line height problem here is that the font being used doesn’t have all the glyphs, so the glyphs for the missing codepoints are being pulled from a backup font which does have them. This results in them having different font metrics from the rest.

Leave a Reply