I’ve long been enamored of the sparkline, a graphical device which its inventor Edward Tufte defines thusly:
A sparkline is a small intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic.
Nowadays you can create sparklines in many tools including Excel and Google Sheets, both of which can use the technique to pack a summary of a series of numbers into a single cell. By stacking such cells vertically you can create views that compress vast amounts of information.
In A virtuous cycle for analytics I noted that we often use Metabase to display tables and charts based on extracts from our Postgres warehouse. I really wanted to use sparklines to summarize views of activity over time, but that isn’t yet an option in Metabase.
When Metabase is connected to Postgres, though, you can write Metabase questions that can not only call built-in Postgres functions but can also call user-defined functions. Can such a function accept an array of numbers and return a sparkline for display in the Metabase table viewer? Yes, if you use Unicode characters to represent the variable-height bars of a sparkline.
There’s a page at rosettacode.org devoted to Unicode sparklines based on this sequence of eight characters:
|U+2581||▁||LOWER ONE EIGHTH BLOCK|
|U+2582||▂||LOWER ONE QUARTER BLOCK|
|U+2583||▃||LOWER THREE EIGHTHS BLOCK|
|U+2584||▄||LOWER HALF BLOCK|
|U+2585||▅||LOWER FIVE EIGHTHS BLOCK|
|U+2586||▆||LOWER THREE QUARTERS BLOCK|
|U+2587||▇||LOWER SEVEN EIGHTHS BLOCK|
Notice that 2581, 2582, and 2588 are narrower than the rest. I’ll come back to that at the end.
If you combine them into a string of eight characters you get this result:
Notice that the fourth and eight characters in the sequence drop below the baseline. I’ll come back to that at the end too.
These characters can be used to define eight buckets into which numbers in a series can be quantized. Here are some examples from the rosettacode.org page:
“1 2 3 4 5 6 7 8 7 6 5 4 3 2 1” -> ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁
“1.5, 0.5 3.5, 2.5 5.5, 4.5 7.5, 6.5” -> ▂▁▄▃▆▅█▇
“0, 1, 19, 20” -> ▁▁██
“0, 999, 4000, 4999, 7000, 7999” -> ▁▁▅▅██
To write a Postgres function that would do this, I started with the Python example from rosettacode.org:
bar = '▁▂▃▄▅▆▇█' barcount = len(bar) def sparkline(numbers): mn, mx = min(numbers), max(numbers) extent = mx - mn sparkline = ''.join(bar[min([barcount - 1, int((n - mn) / extent * barcount)])] for n in numbers) return mn, mx, sparkline
While testing it I happened to try an unchanging sequence, [3, 3, 3, 3], which fails with a divide-by-zero error. In order to address that, and to unpack the algorithm a bit for readability, I arrived at this Postgres function:
create function sparkline(numbers bigint) returns text as $$ def bar_index(num, _min, barcount, extent): index = min([barcount - 1, int( (num - _min) / extent * bar_count)]) return index bars = '\u2581\u2582\u2583\u2584\u2585\u2586\u2587\u2588' _min, _max = min(numbers), max(numbers) extent = _max - _min if extent == 0: # avoid divide by zero if all numbers are equal extent = 1 bar_count = len(bars) sparkline = '' for num in numbers: index = bar_index(num, _min, bar_count, extent) sparkline = sparkline + bars[index] return sparkline $$ language plpython3u;
Here’s a psql invocation of the function:
analytics=# select sparkline(array[1, 2, 3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 3, 2, 1]); sparkline ----------------- ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁ (1 row)
And here’s an example based on actual data:
Each row represents a university course in which students and teachers are annotating the course readings. Each bar represents a week’s worth of activity. Their heights are not comparable from row to row; some courses do a lot of annotating and some not so much; each sparkline reports relative variation from week to week; the sum and weekly max columns report absolute numbers.
This visualization makes it easy to see that annotation was occasional in some courses and continuous in others. And when you scroll, the temporal axis comes alive; try scrolling this view to see what I mean.
We use the same mechanism at three different scales. One set of sparklines reports daily activity for students in courses; another rolls those up to weekly activity for courses at a school; still another rolls all those up to weekly activity for each school in the system.
At the level of individual courses, the per-student sparkline views can show patterns of interaction. In the left example here, vertical bands of activity reflect students annotating for particular assignments. In the right example there may be a trace of such temporal alignment but activity is less synchronized and more continuous.
When we’re working in Metabase we can use its handy mini bar charts to contextualize the row-wise sums.
The sparkline-like mini bar chart shows a row’s sum relative to the max for the column. Here we can see that a course with 3,758 notes has about 1/4 the number of notes as the most note-heavy course at the school.
Because these Unicode sparklines are just strings of text in columns of SQL or HTML tables, they can participate in sort operations. In our case we can sort on all columns including ones not shown here: instructor name, course start date, number of students. But the default is to sort by the sparkline column which, because it encodes time, orders courses by the start of annotation activity.
The visual effect is admittedly crude, but it’s a good way to show certain kinds of variation. And it’s nicely portable. A Unicode sparkline looks the same in a psql console, an HTML table, or a tweet. The function will work in any database that can run it, using Python or another of the languages demoed at rosettacode.org. For example, I revisited the Workbench workflow described in A beautiful power tool to scrape, clean, and combine data and added a tab for Lake levels.
When I did that, though, the effect was even cruder than what I’ve been seeing in my own work.
In our scenarios, with longer strings of characters, the differences average out and things align pretty well; the below-the-baseline effect has been annoying but not a deal breaker. But the width variation in this example does feel like a deal breaker.
What if we omit the problematic characters U+2581 (too narrow) and U+2584/U+2588 (below baseline and too narrow)?
There are only 5 buckets into which to quantize numbers, and their heights aren’t evenly distributed. But for the intended purpose — to show patterns of variation — I think it’s sufficient in this case. I tried swapping the 5-bucket method into the function that creates sparklines for our dashboards but I don’t think I’ll switch. The loss of vertical resolution makes our longer sparklines less useful, and the width variation is barely noticeable.
Unicode evolves, of course, so maybe there will someday be a sequence of characters that’s friendlier to sparklines. Maybe there already is? If so please let me know, I’d love to use it.