News in the fediverse

14 Feb 202324 May 2023 ~ Jon Udell ~ 2 Comments

On a recent episode of You’re Wrong About, Sarah Marshall delivered a crisp summary of how the 24-hour news cycle came to be. This led me to realize that many among us do not remember when news was confined to time slots: the 7 o’clock news, the 11 o’clock news, the morning paper. I think it might be healthy to bring that back in some form. From the excitement I heard in their voices I gather that Sarah Marshall and her co-presenter Blair Braverman feel the same way. When I hear people complain about Mastodon’s lack of breaking news, I think: “Feature, not bug!” Maybe what the 24-hour cycle has been breaking is us.

So when press.coop launched today I was a bit conflicted. On the one hand, I would like to use Mastodon to read mainstream news headlines just as I once used RSS to do. (Full disclosure: press.coop is run as a public service by hello.coop whose founder, Dick Hardt, is a longtime friend.) On the other hand, when reading Mastodon timelines I’m enjoying NOT knowing what’s happening in the world right now.

What if you could exclude news from the home timeline, put it on a list, and go there when — and only when — in a news mindset? That’s a feature of the Fedilab client, I learned from Keith Soltys.

What would it take to implement the same idea in the Mastodon reader I’m developing? It couldn’t be just an extra WHERE condition, could it?

Well, actually, it could.

Problem solved. Now I can read news in Mastodon when and how I want, and never see it otherwise.

If you want that same control, you shouldn’t have to use a particular Android client, or Steampipe client, or any other. There’s a Cambrian explosion of such creatures right now. The more they can share important DNA, the better for all of us.

I hope that the Steampipe plugin for Mastodon, which enables the dashboards I’m building using Steampipe’s own dashboards-as-code system, can provide some useful common DNA. A rule like news only on lists, not timelines, once expressed in SQL, can be used (and thus not reinvented) by any kind of Steampipe (read: Postgres) client: psql (or any Postgres CLI), Metabase or Tableau or any Postgres-compatible BI tool, Python or JavaScript or any programming language. Steampipe is a versatile component. Its mapping from APIs to SQL can, in theory, provide the data layer for any application.

My Steampipe + Mastodon adventure notwithstanding, let’s hold on to that larger idea. The fediverse is our chance to reboot the social web and gain control of our information diets. Since our diets all differ, it ought to be trivial for anyone — in any client — to turn on a rule like news only on lists, not timelines. I’ve shown one way it can be trivial. Please show me others!

Protocols, APIs, and conventions

6 Feb 202324 May 2023 ~ Jon Udell ~ 2 Comments

The Fediverse is wired together by protocols like ActivityPub and WebFinger which, as yet, I know very little about. That’s because the Steampipe plugin, which supports the dashboards I’ve been building and describing in this series, doesn’t require me to understand or use those protocols.

It does, however, require me to understand and use the Mastodon API. Mostly I use that API by way of the Go SDK for Mastodon (thanks, mattn!), sometimes I make REST calls directly. Either way, my read-only dashboards use a fairly small subset of the Mastodon API. The full API is quite broad and deep; it enables API clients to read from and write to Mastodon servers in all sorts of ways. Here are the chapters of the Mastodon API book: apps, accounts, admin, instance, search, statuses, timelines, notifications, oembed. These chapters define what’s common to all Mastodon clients, including web apps, phone apps, native OS apps, and Steampipe dashboards.

So far I’ve ignored protocol-enabled interop in order to focus on API-enabled interop. I’m aware that the Fediverse includes much more than just Mastodon. I intend to explore BookWrym, Friendica, Funkwhale, Lemmy, takahe, PeerTube, Pixelfed, PeerTube, and others in due time. But right now the Mastodon ecosystem is plenty to try to wrap my head around.

For example, there’s a new web client for Mastodon: elk.zone. With the recent addition of support for lists, it has become my favorite way to interact in Mastodon space. So naturally I wanted to be able to click through from Steampipe dashboards to Elk, and use it as an alternative to the batteries-included Mastodon web app.

It turned out to be easy to enable that integration. Not thanks to ActivityPub, and not even thanks to the API. It works thanks to a third level of interop at play: common patterns for account URLs and toot URLs.

Here’s the account URL for Ward Cunningham who hangs his shingle at mastodon.radio: https://mastodon.radio/@k9ox. But as we saw in instance-qualified Mastodon URLs, if you visit that URL directly — and if it’s not your home server — you can’t follow Ward there, or add him to a list. You’ll need to copy that URL, paste it into your home server’s search box, run the search, and arrive at an instance-qualified URL where you can follow him or add him to a list: https://mastodon.social/@k9ox. If you’re home is fosstodon.org this would instead be https://fosstodon.org/@k9ox.

Similarly here is one of Ward’s toots at mastodon.radio: https://mastodon.radio/@k9ox/109802968820955379. If you want to reply or boost or favorite, you can’t do it there. The URL you need is again one that routes through your home server: https://mastodon.social/@k9ox/109802969999396562. Note that the ids for the same toot differ! That difference surprised me and some others, and is a topic for another episode. Here I’ll just note that these two patterns govern how we interact when crossing server boundaries in Mastodon space using the stock web client.

When I started using Elk, another layer of pattern emerged. Here are those same URLs in Elk:

https://elk.zone/mastodon.social/@k9ox

https://elk.zone/mastodon.social/@k9ox/109802969999396562

As it turns out, I just needed to make two of the Steampipe plugin’s transform functions prepend elk.zone to the instance-qualified URLs, then make such prefixing a configuration option. Now when I visit Mastodon links from dashboards, to reply or boost or follow or enlist, I land in the Elk experience that I prefer.

ActivityPub and WebFinger are formal standards. I would describe the Mastodon API as a de-facto standard. But this prefixing maneuver is just a convention. It’s not guaranteed to work with another web client, and not even guaranteed to work across all URLs presented by the stock Mastodon client. That’s OK by me. Conventions are incredibly useful. The Twitter hashtag is just a convention, after all, inspired in turn by an IRC convention.

We’re in one of those Internet moments of rapid innovation, when new conventions can unlock emergent behaviors. It hadn’t even occurred to me that the Steampipe dashboards could support Elk. A few hours after I thought they might, they did. I’ve seen this kind of thing before, perhaps most notably when the blogosophere adopted <link rel="alternate" type="application/rss+xml" href="{feedUrl}"> to enable browsers to auto-discover RSS feeds. That happened about 20 years ago, and quite suddenly when a handful of leading blogging tools adopted the convention in a matter of days. It was a lot of fun to live through that era. If you missed it, enjoy the sequel that’s unfolding now!

Mapping the wider Fediverse

2 Feb 202324 May 2023 ~ Jon Udell ~ 1 Comment

I began this journey convinced that Steampipe could help the Fediverse evolve, but not sure exactly how. My first thought was to use Steampipe’s API-wrangling superpower to study patterns of communication (and conflict) across the Fediverse. But as one of many Twitter escapees last November, I soon realized that the network I was joining reflected a culture that had been humming along nicely for six years and didn’t particularly want to be the object of sociological study.

As I argued in Autonomy, packet size, friction, fanout, and velocity, Mastodon bakes in certain kinds of friction for reasons. You’ve likely heard about a default unfriendliness to search, which is both a technical setting and a cultural choice that privileges the experience of current flow over the mining of past flow. Even more fundamentally, the ID of a toot not only differs from server to server but also obfuscates the toot’s date, another technical/cultural choice that means you can’t randomly access history by date. None of these frictions is insurmountable. They will be overcome for purposes good and bad. I hope and expect that communities will be able to choose their desired amounts and kinds of friction while still interoperating with others. But for my project it seemed that trying to survey the wider Fediverse wasn’t the right place to start.

So instead I began to explore a different way to read my home timeline. The dashboards that I’ve built and described in this series have become, for me at least, an effective way to scan recent Mastodon flow, then click into the stock client to post, reply, or boost. After overcoming a few obstacles, things are starting to feel like the Bloomberg terminal for Mastodon that I envision.

One of those obstacles was the awkward copy/paste/search of foreign toot URLs that was required in order to interact with them. That’s now overcome by Instance-qualified Mastodon URLs. Another obstacle was the difficulty of curating and effectively reading topical lists of people. The strategies described in Lists and people on Mastodon and Working with Mastodon lists have improved matters nicely. And relationship graphs turned out to be a more useful alternate view of the current flow than I had expected.

I think I’ve proven that a set of Steampipe dashboards, layered on a plugin that maps the Mastodon API to tables that the dashboards query, can improve the ability to absorb and react to Mastodon flow. An unproven corollary: Steampipe’s dashboards-as-code system is only one of potentially many clients of the Mastodon plugin. Any dashboarding system or web app or native app could tap into the same query capability to deliver still another way to experience the flow. But that’s a future episode.

Meanwhile, with a decent reading experience in place, this seemed like a good time to circle back to the question of surveying the wider Fediverse. To begin enabling that I added a couple of new tables to the plugin: mastodon_peer and mastodon_domain_block.

Here’s a query that uses mastodon_peer.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(p.peer) as peers
from
  servers s
join
  mastodon_peer p
on
  s.server = p.server
group by
  s.server,
  s.occurrences
order by
  peers desc

In Engish: gather the most recent 100 toots in my home timeline, count the occurrences of each origin server, then ask each origin server how many other servers it talks to. Unsurprisingly my home server, mastodon.social, occurs most often. And because it’s the marquee Mastodon server it has the most peers.

+----------------------------------+-------------+-------+
| server                           | occurrences | peers |
+----------------------------------+-------------+-------+
| https://mastodon.social          | 11          | 49732 |
| https://fosstodon.org            | 1           | 33973 |
| https://octodon.social           | 1           | 29983 |
| https://infosec.exchange         | 2           | 26833 |
| https://indieweb.social          | 9           | 26279 |
| https://hachyderm.io             | 3           | 19911 |
| https://social.treehouse.systems | 3           | 18110 |
| https://journa.host              | 1           | 18021 |
| https://nerdculture.de           | 9           | 17984 |
| https://werd.social              | 2           | 13792 |
| https://dan.mastohon.com         | 2           | 13351 |
| https://masto.nyc                | 1           | 10917 |
| https://mastodon.archive.org     | 1           | 9582  |
| https://social.fossdle.org       | 1           | 8343  |
| https://devdilettante.com        | 12          | 6898  |
+----------------------------------+-------------+-------+

Here’s a query that uses mastodon_domain_block.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(d.domain) as "blocked domains"
from
  servers s
join
  mastodon_domain_block d
on
  s.server = d.server
group by
  s.server,
  s.occurrences
order by
  "blocked domains" desc

This one says: Again gather the origin servers in my recent home timeline, but this time ask each one how many other servers it blocks. Here we see that octodon.social, which happened to show up in my timeline when I ran the query, blocks a lot more servers than mastodon.social does.

+--------------------------+-------------+-----------------+
| server                   | occurrences | blocked domains |
+--------------------------+-------------+-----------------+
| https://octodon.social   | 1           | 510             |
| https://mastodon.social  | 8           | 181             |
| https://hachyderm.io     | 4           | 125             |
| https://infosec.exchange | 4           | 66              |
| https://nerdculture.de   | 1           | 36              |
| https://indieweb.social  | 4           | 23              |
+--------------------------+-------------+-----------------+

One could, and maybe at some point I will, comprehensively acquire and store this data. But meanwhile, how might it enhance the experience of reading recent Mastodon flow? Here’s what I’ve got so far.

We’ve already seen the first table which answers the question: “How many servers does each of the origin servers in my timeline block?” The second table answers a different question: “Which servers are most often blocked by the origin servers in my timeline?”

The blocking server list column shows an interesting mix of consensus and variation, and I think that will be something to explore comprehensively across the Fediverse. But for now I like how this view contextualizes what’s in my home timeline. Like the relationship graphs, it’s highly dynamic because my immediate network neighborhood is changing all the time. Whatever the current set of servers happens to be, though, I now have some clues about how connected each of those servers is and how aggressively each one is blocking others. This feels like a good first step toward mapping the wider Fediverse.

Verification for Mastodon

30 Jan 20235 Apr 2023 ~ Jon Udell ~ Leave a comment

This post attests to my ownership of this Mastodon account.

Images considered harmful (sometimes)

26 Jan 202324 May 2023 ~ Jon Udell ~ Leave a comment

The Mastodon dashboards I’ve been developing and describing in this series are backed by a Steampipe plugin that translates SQL queries to Mastodon API calls. Like all Steampipe plugins you can use this one to run those queries in all sorts of ways: from psql or another Postgres CLI (perhaps via cron, perhaps in a CI/CD pipeline); from Metabase or Grafana or any Postgres-compatible BI tool; from Python or JavaScript or any programming language. The Steampipe core is a versatile software component that you can plug into just about any environment.

There’s also, of course, Steampipe’s dashboards as code approach which powers the alternate Mastodon UX that I’ve been exploring in this series. You can think of this dashboarding tool as a primitive web browser with a freakish talent for querying Postgres and weaving SQL results into widgets such as infocards, input controls, charts, tables, and relationship graphs. You compose widgets using HCL (Hashicorp Configuration Language) instead of HTML, arrange them using basic layout syntax, and view them by connecting your browser to the local Steampipe dashboard server or to cloud.steampipe.io.

The alternate Mastodon UX built this way was aptly described (thanks again Greg Wilson!) as A Bloomberg terminal for Mastodon. As a reminder, the stock Mastodon web client looks like this.

And the pages served by the Mastodon dashboards look like this.

I can scan the latter view far more effectively.

Now, I’ve been trying out a bunch of alternate Mastodon clients lately. I love Ivory on my phone, and Elk in the browser, and I use them when I want an experience that feels social-media-ish. But I don’t use Ivory or Elk or Toot! or the stock Mastodon web clients (browser, phone) when I want an experience that feels RSS-reader-ish.

For some information landscapes I want to unfold a big map, spread it out, and scan for points of interest. Social media is that kind of landscape, RSS readers were the first way I surveyed it effectively, and these Mastodon dashboards are becoming my new way.

When I compare those two screenshots though, it’s not just the density of the latter that I notice, but also the absence of pictures. At first I fretted about that. Could the dashboard pages render full-strength HTML? Perhaps they will, and there are a few different ways that could happen, but meanwhile I’ve come to embrace the text-only constraint. When I’ve got my map unfolded, and I’m scanning my home timeline or my lists for things to click into, images can be a distraction. I wouldn’t have chosen to omit them, but I find that their absence enables me to focus very effectively on who is speaking, and what they are saying or boosting.

There is also, of course, the graphical view afforded by relationship graphs. These feel very map-like in the way they reveal clusters of people interacting with one another. I’m finding them more useful than anticipated.

But the text that I read in these dashboards is image-free for now. And I think it’s having a calming effect. There are plenty of times when I want images, don’t get me wrong, and that’s partly why I use a mix of conventional Mastodon clients. But there are times when I want to dial down the clutter, just like there are times when I switch my phone to monochrome. Two things can be true: The pictures we share with one another are a source of joy, and they are sensory overload. I’ll make them optional here when I can, and would like to have the option to use all social media interfaces in text mode.

Working with Mastodon lists

21 Jan 202324 May 2023 ~ Jon Udell ~ 1 Comment

Since the early days of the blogosphere I have cherished the ability to view the world through the eyes of people more qualified than me to understand and explain what happens in particular domains. Although Twitter lists were a great way to collect people who provide those perspectives, I made little use of them. Prompted by Martin Fowler’s frustration with lists I described my first steps toward reviving the technique in Lists and people on Mastodon.

First I encapsulated the relevant APIs in a pair of tables provided by the Steampipe plugin for Mastodon: mastodon_list and mastodon_list_account. Then I used those tables to enhance the Mastodon dashboard in a couple of ways. The followers and following tabs now show me who is or isn’t on a list.

And I can use the list tab to read recent posts from people on each list.

With these basic building blocks in place I want to be more intentional about curating these lists. To that end I thought I’d share the lists I’ve built so far, and invite suggestions. You can export your lists from the Mastodon web client using Preferences -> Data export -> Lists. The resulting CSV file has two columns: the name of a list and the Mastodon handle for a person.

list	person
Library	liza@post.lurk.org
Library	alexwade@fosstodon.org
Library	jdiffin@glammr.us

Here’s the export I really wanted.

list	person	server	url	note
Library	liza	post.lurk.org	https://mastodon.social/@liza	Technologist in Residence at the Harvard Library Innovation Lab. Interactive fiction, generative art, Democratic political tech. Amateur scholar of fringe utopian sci-fi. I will pet your dog.
Library	alexwade	fosstodon.org	https://mastodon.social/@alexwade	Librarian, open science advocate, geek. VP Data Products, @digitalscience (Seattle, WA) Formerly: AllenAI (AI2), CZI, Amazon, Microsoft Research, UW
Library	jdiffin	glammr.us	https://mastodon.social/@jdiffin	Head of Library Technology Services Section, NLM. Opinions are my own.

It’s easy to create that list using the Steampipe plugin for Mastodon.

steampipe query “select title as list, username, server, instance_qualified_account_url, note from mastodon_list l join mastodon_list_account a on l.id = a.list_id order by list, username” –output csv > mastodon_lists.csv

(I’m using instance-qualified URLs relative to my home server, if your home is elsewhere than mastodon.social you’ll want to adjust the links accordingly.)

I’ve uploaded the output of that query to a Google sheet. The most recent addition to my climate list is Peter Gleick who was quoted yesterday in Farhad Manhjoo’s What Will ‘Weather Whiplash’ Mean for California? I’d been looking for an expert perspective on California water policy, so I checked to see if Peter is on Mastodon, found that he is, and added him to my climate list. Here’s an example of the kind of perspective I’m looking for.

California reservoirs have captured enough water from the recent storms to supply all urban water use in the state for a year. Not counting the massive snowpack. Those claims that “all the water is being wasted flowing to the ocean” is nonsense. — Mastodon link

Of course it’s a complex issue, and I don’t think I’m wrong to be very interested in strategies for capturing more rainfall and using it to recharge aquifers. But this was a useful reality check.

Who are the experts you follow in various domains? Can I cherrypick from your lists? And can we imagine a way of remixing and sharing these lists? I’m looking at you, Lucas Gonze, with fond memories of how Webjay did that for MP3 playlists. Can we do that for Mastodon lists too?

Of course the attention economy is threatened by the Fediverse

19 Jan 202319 Jan 2023 ~ Jon Udell ~ 13 Comments

Megan McArdle says this week, in the Washington Post, that “Twitter might be replaced, but not by Mastodon or other imitators.” I’m not linking to the article, you can easily find it, but that title is all we need for my purpose here, along with this bit of context: she has 93K followers on Twitter.

Nobody wants to walk away from that kind of audience. Well, almost nobody. Sam Harris’ recent Twitter exit is a rare example of someone concluding that a large follower count is a net negative. If I were in his shoes I’m not sure I’d be able to do the same. When my own audience was at its peak — at BYTE during the dawn of the Internet, then at InfoWorld in the early years of the blogosphere — I could press the Publish button on my blog and watch in realtime as the responses rolled in on waves of dopamine. It’s addictive, there’s never enough, you’re always looking for the next hit.

When Twitter started, that momentum carried forward for a while. I never racked up a huge follower count — it maxed out just shy of 6K — but most of those people followed me early on, thanks to the the ad-supported publications that had brought me to their attention. My Twitter following reached a plateau years ago. Did I wish for 100K followers? Sure, I’d be lying to pretend otherwise. But gradually I came to see that there was a sweet spot, somewhere between (let’s say) 200 and 15,000 followers, where it was possible to enjoy the kinds of pleasant and stimulating interaction that I’d first experienced in web forums and the blogosophere.

Until it wasn’t. Like a frog in slowly boiling water, I failed to notice how the Twitter experience degraded over time. Fewer and fewer of my 6K followers corresponded regularly, and my social graph there became stagnant. For me the Mastodon reboot has been a delightful replay of the early blogosphere: new acquaintances, collegial discussion, positive energy.

If you occupy a privileged position in the attention economy, as Megan McArdle does now, and as I once did in a more limited way, then no, you won’t see Mastodon as a viable replacement for Twitter. If I were still a quasi-famous columnist I probably wouldn’t either. But I’m no longer employed in the attention economy. I just want to hang out online with people whose words and pictures and ideas intrigue and inspire and delight me, and who might feel similarly about my words and pictures and ideas. There are thousands of such people in the world, not millions. We want to congregate in different online spaces for different reasons. Now we can and I couldn’t be happier. When people say it can’t work, consider why, and who benefits from it not working.

Here’s a graph of the Fediverse as it appears from my perspective right now.

It looks and feels healthy and it’s working just great. I don’t want us to replace Twitter, or imitate it. I want The Internet Transition that I hope is underway.

Mastodon relationship graphs

16 Jan 202324 May 2023 ~ Jon Udell ~ 2 Comments

The new release of Steampipe is all about relationship graphs. Our blog post shows how these graphs provide contextual awareness for devops and security pros who can now see all the resources related to an EC2 instance, or determine at a glance whether the permissions related to an IAM role are properly scoped. As always, developers can explore and remix the code that builds these graphs, and adapt the idioms for their own purposes in any data domain.

These relationship graphs are driven by SQL queries that define nodes and edges. Such queries can use any column of any table provided by any Steampipe plugin to form nodes, and then edges between nodes. If you want to see connections among the people and objects represented by diverse APIs, you can now use SQL idioms to graph them. The only limit is your imagination.

Naturally I imagined graphing Mastodon relationships. So far I’ve built two graphs that visualize my home timeline. Here’s the first one.

Here we’re looking at the most recent 50 boosts (the Mastodon version of retweet) in my homeline. This is the query to find them.

select
  *
from
  mastodon_toot
where
  timeline = 'home'
  and reblog_server is not null
  limit  50

If we focus on Brian Marick we can see that:

Brian belongs to mastdn.social
Brian boosted a post by Tim Bray
Tim belongs to hachyderm.io

So this graph shows people on a selected server boosting people on other servers. In this case mastdn.social is the selected server, but we can refocus the graph on any other server that’s sending boosts.

The second graph zooms out to show the web of boost relationships among servers. If anyone on infosec.exchange boosts anyone on mastodon.world, there’s an edge connecting the two nodes. Although it’s not happening anywhere in this graph, the arrow can point both ways and would if anyone on mastodon.world were also boosting anyone on infosec.exchange.

Let’s build up the first graph step by step.

Step 1: Identify the selected server

Here’s the definition of the node that represents the selected server.

node {
  category = category.selected_server
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      server as id,
      server as title,
      jsonb_build_object(
        'server', server
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Per the documentation, a node’s query must at least select a column aliased as id. Here it’s the server column in a row returned by the above query. I’ve packaged that query into a SQL function, mastodon_boosts, to hide details (timeline = 'home' reblog_server is not null limit 50) and make it easier to focus on what’s special about each node. In this case the special quality is the server column that gives the node its identity matches the selected server.

If the graph block includes only this node, and mastdn.social is the selected server, here is the rendering. Not much to see here yet!

The node defines a bag of properties that can be any of the columns returned by the underlying query; these appear when you hover the node. The node also refers to a category that governs the node’s icon, color, and link. Here’s the category for the selected server.

category "selected_server" {
  color = "darkgreen"
  icon = "server"
  href  = "https://{{.properties.'server'}}"
}

Step 2: Identify boosted servers

Now we’ll add boosted servers. This node uses the same set of records: the 50 most recent boosts in my feed. Again it finds just those whose server column matches the selected server. But the id is now the reblog_server which is the target, instead of the origin, of boosts from the selected server.

node {
  category = category.boosted_server
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_server as id,
      reblog_server as title
    from
      mastodon_boosts()
    where
      server = $1
    EOQ
}

Here’s the graph with both selected_server and boosted_server nodes. We’ve used another category to differentiate the boosted nodes.

There’s only one selected server but it can send boosts to more than one boosted server. The default rendering folds them into one node but you can click to unfold and see all of them.

Step 3: Identify people who boost others

Where are the people? Let’s add them next, starting with the people who are sending boosts.

node {
  category = category.person
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      username as id,
      display_name as title,
	    jsonb_build_object(
        'instance_qualified_account_url', instance_qualified_account_url
      ) as properties
    from
      mastodon_boosts()
    where
     server = $1
    EOQ
  }

The username column gives the node its identity. Note also the property instance_qualified_account_url. That’s the synthetic column we added to the Mastodon plugin last time to ensure that links to people and toots will work properly in the Mastodon client. Because it’s included in a property here, and because category.person refers to that property, links representing people in the graph will resolve properly.

Step 4: Identify people who are boosted

This node takes its identify from the reblog_username column, and uses the synthetic column instance_qualified_reblog_url to provide the link.

node {
  category = category.boosted_person
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_username as id,
      reblog_username as title,
      jsonb_build_object(
        'instance_qualified_reblog_url', instance_qualified_reblog_url
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Step 5: Connect boosters on the selected server to that server

So far we’ve seen only nodes, whose queries minimally return the id property. An edge connects nodes by way of a query that minimally returns columns aliased to from_id and to_id.

edge {
  sql = <<EOQ
    select
      username as from_id,
      server as to_id,
      'belongs to' as title
    from
      mastodon_boosts()
  EOQ
}

You’ll also want to provide a title to label the edge. Here this edge occurs twice to represent “John Mashey belongs to mstdn.social” and “Brian Marick belongs to mstdn.social.”

Step 6: Connect people on boosted servers to their servers

This edge works the same way, but captures the relationship between boosted people and their servers.

edge {
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_username as from_id,
      reblog_server as to_id,
      'belongs to' as title
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Step 7: Connect boosters to the people they boost

Finally we add an edge to connect boosters to the people they boost.

edge {
  category = category.boost
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      username as from_id,
      reblog_username as to_id,
      'boosts' as title,
      jsonb_build_object(
        'reblog_username', reblog_username,
        'reblog_server', reblog_server,
        'content', reblog ->> 'content'
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

And now we’ve completed the first graph shown above.

Graphing GitHub relationships

You can use this grammar of nodes and edges to describe relationships in any domain. Here’s a graph that looks across all the Steampipe-related repos and shows recently-updated PRs from external contributors.

And here’s one that uses any Steampipe plugin to show recently-updated pull requests for a selected repo.

These two views share a common SQL query and serve complementary purposes. The table is handy for sorting by date or author, the graph highlights one-to-many relationships.

Lifting the burden of context assembly

In What TimeDance got right I mourned the passing of a tool for scheduling meetings that had excelled at bringing together the messages and documents related to a meeting. I called this “context assembly” — a term I’d picked up from Jack Ozzie, cofounder of Groove, another collaboration tool whose passing I mourn. Context assembly is hard work. Too often the burden falls on people who only need to use that context and would rather not spend time and effort creating it.

We’ve seen how SQL can unify access to APIs. Now it can also help us see relationships among the data we extract from those APIs.

Instance-qualified Mastodon URLs

31 Dec 202224 May 2023 ~ Jon Udell ~ 1 Comment

In Lists and people on Mastodon I showed how I added a list column to the following tab of the Mastodon browser I’m building. That was a step in the direction of easier and more powerful list management. It enables me to see whether the people I follow are assigned to lists, and to consider who should be on a list (or perhaps on a different list).

Today, as I began to use that new affordance in earnest, I discovered a new challenge. In order to assign someone to a list, or change a list assignment, I clicked the link in the account_url column to open that person’s profile in the Mastodon web app. That was fine for accounts on my home server, mastodon.social. An account URL like Shelley Powers’ https://mastodon.social/@burningbird brings me to Shelley’s profile on my home server where the list manager is available.

But if I’m following someone elsewhere, like Ward Cunningham at https://mastodon.radio/@k9ox, the account URL brings me to Ward’s profile on that server where the list manager isn’t available. In order to assign Ward to a list I had to capture his account URL, paste it into the search box in my home server’s web app, and then click the resulting link: https://mastodon.social/@k9ox@mastodon.radio.

That got old real fast, so I adjusted the following tab to display the latter flavor of URL which I’ll call an instance-qualified URL.

Steampipe provides a few ways to make that adjustment. As a user of the dashboard, you can use Postgres’ regular expression functions to do the transformation in the SQL query that drives the view. But you’d rather not have to. It’s much nicer if the plugin does that for you, so the SQL can just refer to a column called instance_qualified_url.

I chose the latter approach. As the author of a Steampipe plugin you want to make life as easy as possible for users of the plugin. When you’re the author of both the plugin and the dashboard, as I am in this case, you can enjoy a nice virtuous cycle. As the dashboard evolves you discover ways to improve the plugin, which leads to more use of the dashboard, which suggests further opportunities to improve the plugin. I have been greatly enjoying the coevolution of these two components!

Adding a new column to a Steampipe table

To make the change, I extended the structure that defines the columns of the tables mapped from Mastodon Account API. A Steampipe plugin defines columns using a list of structs like this.

...,
{
	Name:        "url",
	Type:        proto.ColumnType_STRING,
	Description: "URL for the account.",
},
...,

That struct says: “When the name of a top-level field in the API response is url, tell Steampipe to make a database column with that name and with the Postgres type text.”

You can also transform values in API responses to synthesize new columns that don’t appear in API responses. Here’s the struct I added for this case.

...,
{	
	Name:        "instance_qualified_account_url",
	Type:        proto.ColumnType_STRING,
	Description: "Account URL prefixed with my instance.",
	Transform:   transform.FromValue().Transform(instanceQualifiedAccountUrl),
},
...

That one says: “Send the API response to the transform function instanceQualifiedAccountUrl, and use its result as the value of the column.

Here’s the function.

func instanceQualifiedAccountUrl(ctx context.Context, input *transform.TransformData) (interface{}, error) {
	url := input.Value.(*mastodon.Status).Account.URL
	qualifiedUrl := qualifiedUrl(ctx, url)
	return qualifiedUrl, nil
}

It delegates the real work to another function.

func qualifiedUrl(ctx context.Context, url string) string {
	plugin.Logger(ctx).Debug("instanceQualifiedUrl", "server", homeServer, "url", url)
	re := regexp.MustCompile(`https://([^/]+)/@(.+)`)
	matches := re.FindStringSubmatch(url)
	if len(matches) == 0 {
		return url
	}
	person := matches[1]
	server := matches[2]
	qualifiedUrl := fmt.Sprintf("%s/@%s@%s", homeServer, server, person)
	plugin.Logger(ctx).Debug("instanceQualifiedUrl", "qualifiedUrl", qualifiedUrl)
	schemelessHomeServer := strings.ReplaceAll(homeServer, "https://", "")
	qualifiedUrl = strings.ReplaceAll(qualifiedUrl, "@"+schemelessHomeServer, "")
	plugin.Logger(ctx).Debug("qualifiedUrl", "qualifiedUrl", qualifiedUrl)
	return qualifiedUrl
}

Why? Two different sets of column definitions need the same transformation. instanceQualifiedAccountUrl works with responses from the Account API. But account URLs also appear in the Status API that drives timeline views. Those use a different transform function, instanceQualifiedStatusUrl, to do the same transformation for a different API response.

From account URLs to status URLs

The instanceQualifiedAccountUrl column solved the original problem. I was able to remove my plugin-author hat, put on my dashboard-author hat, and refer to account urls as instance-qualified URLs in all the tabs that display them. Any such link now leads to a profile that I view through the lens of mastodon.social and that enables me to use the web app’s list manager directly, without the cumbersome copy/paste/search procedure.

My happy dance didn’t last long, though. Newly sensitized to that copy/paste/search friction, I realized it was still happening when I try to reply to items that appear in a timeline view. Here is a recent example: https://techpolicy.social/@mnot/109610641523489182.

That’s the URL displayed in the dashboard. When I click it I land on Mark’s server and can view the item, but if I try to reply I’m confronted with the dreaded copy/paste/search operation.

No problem! I’ll use a similar transform! Not so fast. I can form an URL like https://mastodon.social/@mnot@techpolicy.social/109610641523489182 but it doesn’t go anywhere.

If I do the copy/paste/search operation, I land on a similar-but-different URL: https://mastodon.social/@mnot@techpolicy.social/109610641692667630. It has the same structure but a different toot ID. This URL is also the one that appears in the web app’s home timeline, which is why I can reply directly from that view.

I’m out of my depth here so I’ll just end with an appeal for help. It makes sense that a home server will assign its own id to an item fetched from a foreign server, and that the web app will use that id. But I’m not seeing a way to aquire that id directly from the API. I suspect it’s possible to acquire it by way of search, but doing that for every item in a timeline will quickly exhaust the tight budget for API requests (just 300 every 5 minutes).

So, Lazy Mastodon, am I just stuck here or is there a way to transform foreign status URLs into instance-relative status URLs?

Update: Solved!

After chatting with Jari Pennanen I took another look and realized the needed ID was available in the API response after all, I just wasn’t using it (facepalm). And in fact there are two flavors of the ID — one for original toots, another for boosts. Columns for both cases are added here and the tweak to make the dashboard use them here.

Here is the result.

Thanks for being my rubber duck, Jari! The instance-qualified toot and reblog URLs make this dashboard massively more useful.

How many people in my Mastodon feed also tweeted today?

29 Dec 202224 May 2023 ~ Jon Udell ~ 1 Comment

I lasted tweeted on Dec 22. (It was, unsurprisingly, a link to a blog post about Mastodon.) Today I wondered what percentage of the people who appear in my Mastodon timeline today also appeared on Twitter today.

To start, I wrote this query which tries to match Twitter and Mastodon usernames. When it finds a match, it reports the day on which that person last tweeted.

with mastodon as (
  select
    substring(username from 1 for 15) as username, -- twitter names are max 15 chars
    'from:' || substring(username from 1 for 15) as query -- we will query twitter using, e.g., 'from:judell'
  from
    mastodon_toot
  where
    timeline = 'home'
  limit
    500
)
select
  m.username as mastodon_person,
  t.author->>'username' as twitter_person,
  max(to_char(t.created_at, 'YYYY-MM-DD')) as last_tweet_day
from 
  mastodon m
left join
  twitter_search_recent t -- see https://hub.steampipe.io/plugins/turbot/twitter/tables/twitter_search_recent
on
  t.query = m.query
group by
  mastodon_person,
  twitter_person
order by
  last_tweet_day desc

This is my favorite kind of Steampipe query: two different APIs, each represented as a Postgres table, combined with a SQL JOIN.

The result looks like this, with nulls for failed matches.

+-----------------+-----------------+----------------+
| mastodon_person | twitter_person  | last_tweet_day |
+-----------------+-----------------+----------------+
| AlanSill        | null            | null           |
| Colarusso       | null            | null           |
| ...                                                |
| williamgunn     | null            | null           |
| xian            | null            | null           |
| ...                                                |
| futurebird      | futurebird      | 2022-12-29     |
| glynmoody       | glynmoody       | 2022-12-29     |
| ...                                                |
| khinsen         | khinsen         | 2022-12-23     |
| blaine          | blaine          | 2022-12-23     |
+-----------------+-----------------+----------------+

Next I created a table from the above query.

create table public.mastodon_twitter as 
  -- sql as above

And then ran this query.

select
  last_tweet_day,
  count(*)
from
  mastodon_twitter
where
  last_tweet_day is not null
group by
  last_tweet_day
order by
  last_tweet_day desc

Here’s the result.

+----------------+-------+
| last_tweet_day | count |
+----------------+-------+
| 2022-12-29     | 36    |
| 2022-12-28     | 6     |
| 2022-12-27     | 1     |
| 2022-12-26     | 1     |
| 2022-12-25     | 2     |
| 2022-12-23     | 2     |
+----------------+-------+

The 500 toots represented here were created by 93 people who tooted today.

select count(*) from mastodon_twitter

+-------+
| count |
+-------+
| 93    |
+-------+

Of those 93 people, 48 have matching usernames.

select count(*) from mastodon_twitter where last_tweet_day is not null

+-------+
| count |
+-------+
| 48    |
+-------+

Of the 48 with matching usernames, 36 also tweeted today.

So there’s my answer: 75% of the people who appeared in my Mastodon home timeline (when I sampled it just now) also appeared on Twitter today.

My belated introduction to Super Mario Brothers

26 Dec 202226 Dec 2022 ~ Jon Udell ~ 6 Comments

On a hike today my friend noticed a clump of Amanita Muscaria and took away this glorious specimen. As we headed down the trail he said: “I can never see one of these without hearing the music from Super Mario Brothers — beep, boop.” He went on to explain that the spotted red mushroom was a coveted power-up that made Mario grow large.

I knew nothing about this. I have never played Super Mario Brothers, or any other iconic game, or really any computer-based game at all, except briefly in the early 1990s when I played Rogue during an office craze that lasted a few weeks, and around the same time played Putt-Putt with our young kids.

This seems unusual for someone like me. I have spent my adult life deeply engrossed in computer-based activity, and am often described to others, by friends and family, as a “computer guy.” That makes me a geek and/or nerd by definition. But I’ve never been comfortable with either of those terms.

Over the years I’ve known a great many others in the geek/nerd tribe who self-identify as gamers. It always puzzles me. I’m perfectly happy to spend long hours in front of a screen reading or writing or coding. But games? I simply lack that gene. It baffles me how other members of the tribe — especially adults who like me already put in long hours in front of screens — could then pile up more screen time playing games, when they could be outside hiking, or cycling, or gardening, or doing really anything else at all.

Perhaps this explains my reluctance to self-identify as a member of the geek/nerd tribe. If I really belonged I’d be gamer. Since I’m not I must be an impostor. I wonder if there’s a word — in some other language than English probably — for when you share certain core attributes with a tribe but not others?

Lists and people on Mastodon

22 Dec 202224 May 2023 ~ Jon Udell ~ 1 Comment

I hadn’t thought to use Mastodon lists until I read the Frustration with lists chapter of Martin Fowler’s Exploring Mastodon, in which he writes:

I like lists because they allow me to divide up my timeline to topics I want to read about at different times. They are frustrating because the tools to manage them in Twitter are very limited, so it’s more hassle to set up the kind of environment I’d like. Mastodon also has lists, sadly its current management tools are equally bad.

This seemed like a good challenge for Steampipe. To tackle it, I first needed to add some new tables to the plugin to encapsulate the list APIs: mastodon_list and mastodon_list_account. I’ll save that story for another time. Here I’ll just show that together they enable queries like this.

select
  l.title as list,
  array_agg(a.username order by a.username) as people
from
  mastodon_list l
join
  mastodon_list_account a
on
  l.id = a.list_id
group by
  l.title

+--------------+--------------------------------------+
| list         | people                               |               
+--------------+--------------------------------------+
| Academic     | ____, ______, ____, ___              |
| Education    | ___, ______  ___, ______             |
| Energy       | ___, ______, ____ __                 |
| Fediverse    | ____ __,                             |
| Humor        | ____, ____ __, ____ __               |
| Journalism   | ___ __, ___ ____,  ___, ______       |
| Library      | __                                   |
| Net          | ___ __, _____, ___ __, __ __, ____   |
| Science      | __, ____ __, ______                  |
| Software     | ____ __, ______, ____ __             |
+--------------+--------------------------------------+

That’s a useful view, and I’ve now included it, but it didn’t address Martin’s specific need.

To manage these lists, I really need a display that shows every account that I follow in a table with its lists. That way I can easily see which list each account is on, and spot any accounts that aren’t on a list.

For that I needed to add a list column to the Following tab.

This was the original query.

select
  url,
  case when display_name = '' then username else display_name end as person,
  to_char(created_at, 'YYYY-MM-DD') as since,
  followers_count as followers,
  following_count as following,
  statuses_count as toots,
  note
from
  mastodon_following
order by
  person

The new version captures the above join of mastodon_list and mastodon_list_account, and joins that to the mastodon_following (people I follow) table. It’s a left join which means I’ll always get all the people I follow. If you’re not on a list, your list column will be null.

with data as (
  select
    l.title as list,
    a.*
  from
    mastodon_list l
  join
    mastodon_list_account a
  on
    l.id = a.list_id
),
combined as (
  select
    d.list,
    f.url,
    case when f.display_name = '' then f.username else f.display_name end as person,
    to_char(f.created_at, 'YYYY-MM-DD') as since,
    f.followers_count as followers,
    f.following_count as following,
    f.statuses_count as toots,
    f.note
  from
    mastodon_following f
  left join
    data d
  on
    f.id = d.id
)
select
  *
from
  combined
order by
  person

That query drives the new version of the Following tab.

It’s pretty sparse, I’ve only just begun adding people to lists. And honestly I’m not sure I’ll want to keep doing this curation, it’s the kind of thing that can become a burden, I need to play around some more before I commit. Meanwhile, the default sort puts unlisted people first so they’re easy to find.

To provide a better way to find people who are on lists, I expanded the List tab in a couple of ways. It had included a dropdown of lists by which to filter the home timeline. Now that dropdown has counts of people on each list.

input "list" {
  type = "select"
  width = 2
  sql = <<EOQ
    with list_account as (
      select
        l.title
      from
        mastodon_list l
      join
          mastodon_list_account a
      on
        l.id = a.list_id
    ),
    counted as (
      select
        title,
        count(*)
      from
        list_account
      group by
        title
      order by
        title
    )
    select
      title || ' (' || count || ')' as label,
      title as value
    from
      counted
    order by
      title
  EOQ
}

I also used this query to expand the List tab.

select
  l.title as list,
  array_to_string( array_agg( lower(a.username) order by lower(a.username)), ', ') as people
from
  mastodon_list l
join
  mastodon_list_account a
on
  l.id = a.list_id
group by
  l.title

The result is the list / people table on the right.

I know that some won’t cotton to this SQL-forward programming model. But for others who will, I wanted to show a few detailed examples to give you a sense of what’s possible at the intersection of Mastodon and Steampipe.

If you’re not tuned into SQL (like I wasn’t for a very long time), here’s your takeaway: as SQL goes, this stuff is not too scary. Yes there are joins, yes there’s an array_agg that transposes a column into a list. It’s not beginner SQL. But lots of people know how to use join and array_agg in these ways, lots more could easily learn how, and with SQL ascendant nowadays these are skills worth having.

Create your own Mastodon UX

19 Dec 202224 May 2023 ~ Jon Udell ~ 3 Comments

I’ve been discussing Mastodon UX wishlists with some new acquaintances there. This excerpt from A Bloomberg terminal for Mastodon concludes with part of my own wishlist.

In a Mastodon timeline, a chatty person can dominate what you see at a glance. When we participate in social media we are always making bids for one another’s attention. As publishers of feeds it’s wise to consider how a flurry of items can overwhelm a reader’s experience. But it’s also useful to consider ways that feed readers can filter a chatty source. Steampipe’s SQL foundation affords an easy and natural way to do that. Here’s part of the query that drives the list view.
select distinct on (list, person, hour) -- only one per list/user/hour
  person,
  url,
  hour,
  toot
from
  data
order by
  hour desc, list, person
It was easy to implement a rule that limits each person to at most one toot per hour. Next steps here will be to apply this rule to other views, show the number of collapsed toots, and enable such rules on a per-person basis.

As a warmup exercise, I decided to first add a simple control for boosts that enables me to see my home timeline with or without boosts. To give technically-inclined readers a sense of what’s involved in doing this kind of thing with Steampipe, I’ll describe the changes here. I’m obviously biased but I find this programming environment to be accessible and productive. If it seems that way to you as well, you might want to try out some of the items on your own UX wishlist. And if you do, let me know how it goes!

Here are the original versions of the two files that I changed to add the new feature. First there’s home.sp which defines the dashboard for the home timeline.

dashboard "Home" {
  
  tags = {
    service = "Mastodon"
  }

  container {
    // a text widget with the HTML links that define the menu of dashboards
  }

  container {
    text {
    // a block that displays the HTML links that form a menu of dashboards
    }

    card {
    // a block that reports the name of my server
    }

    input "limit" {
      width = 2
      title = "limit"
      sql = <<EOQ
        with limits(label) as (
          values 
            ( '50' ),
            ( '100' ),
            ( '200' ),
            ( '500' )
        )
        select
          label,
          label::int as value
        from 
          limits
      EOQ
    }    
  }


  container { 

    table {
      title = "home: recent toots"
      query = query.timeline
      args = [ "home", self.input.limit ]
      column "person" {
        wrap = "all"
      }
      column "toot" {
        wrap = "all"
      }
      column "url" {
        wrap = "all"
      }
    }
  }

}

And here’s the new version. It adds an input block called boosts, and passes its value to the referenced query.

dashboard "Home" {
  
  tags = {
    service = "Mastodon"
  }

  container {
    // a text widget with the HTML links that define the menu of dashboards
  }

  container {
    text {
    // a block that displays the HTML links that form a menu of dashboards
    }

    card {
    // a block that reports the name of my server
    }

    input "limit" {
    // as above
    }

    input "boosts" {
      width = 2
      title = "boosts"
      sql = <<EOQ
        with boosts(label, value) as (
          values
            ( 'include', 'include' ),
            ( 'hide', ' ' ),
            ( 'only', ' 🢁 ' )
        )
        select
          label,
          value
        from
          boosts
      EOQ
    }

  }

  container { 

    table {
      // as above
      args = [ "home", self.input.limit, self.input.boosts ]
    }
  }

}

Steampipe dashboards are built with two languages. HCL (Hashicorp configuration language) defines the UX widgets, and SQL fills them with data. In this case we’re selecting static values for the boosts input. But any Steampipe query can run there! For example, here is the input block I use on the dashboard that filters the timeline by the list to which I’ve assigned people.

input "list" {
  type = "select"
  width = 2
  title = "search home timeline"
  sql = <<EOQ
    select
      title as label,
      title as value
    from
      mastodon_list
    order by
      title
  EOQ
}

Now here is the referenced query, query.timeline, from the file query.sp which contains queries used by all the dashboards.

query "timeline" {
  sql = <<EOQ
    with toots as (
      select
        account_url as account,
        case 
          when display_name = '' then user_name 
          else display_name
        end as person,
        case
          when reblog -> 'url' is null then
            content
          else
            reblog_content
        end as toot,
        to_char(created_at, 'MM-DD HH24:MI') as created_at,
        case
          when reblog -> 'url' is not null then '🢁'
          else ''
        end as boosted,
        case
          when in_reply_to_account_id is not null then ' 🢂 ' || ( select acct from mastodon_account where id = in_reply_to_account_id )
          else ''
        end as in_reply_to,
        case
          when reblog -> 'url' is not null then reblog ->> 'url'
          else url
        end as url
      from
        mastodon_toot
      where
        timeline = $1
      limit $2
    )
    select
      account,
      person || 
        case 
          when in_reply_to is null then ''
          else in_reply_to
        end as person,
      boosted || ' ' || toot as toot,
      url
    from
      toots
    order by
      created_at desc
  EOQ
  param "timeline" {}
  param "limit" {}
}

And here is the new version of that query.

query "timeline" {
  sql = <<EOQ
    with toots as (
    // as above      
    ),
    boosted as (
      select
        $3 as boost,
        boosted,
        account,
        in_reply_to,
        person,
        toot,
        url
      from
        toots
      order by
        created_at desc
    )
    select
      account,
      person ||
        case
          when in_reply_to is null then ''
          else in_reply_to
        end as person,
      boosted || ' ' || toot as toot,
      url
    from
      boosted
    where
      boost = boosted
      or boost = 'include'
      or boost = 'n/a'
  EOQ
  param "timeline" {}
  param "limit" {}
  param "boost" {}
}

The original version uses a single CTE (aka common table expression aka WITH clause), toots, to marshall data for the concluding SELECT. The new version inserts another CTE, boosts, into the pipeline. It uses $3 to reference param "boost" {} which maps to the self.input.boosts passed from home.sp

The SQL code is all standard. Postgres is the engine inside Steampipe, and I sometimes use Postgres-specific idioms, but I don’t think any of those are happening here.

The HCL code may be unfamiliar. Steampipe uses HCL because its core audience are DevSecOps pros who are familiar with Terraform, which is HCL-based. But its a pretty simple language that can be used to describe all kinds of resources. Here the resources are widgets that appear on dashboards.

The other thing to know, if you want to roll up your sleeves and try building your own dashboards, is that the developer experience is — again in my biased opinion! — pretty great because if you’re using an autosaving editor you’ll see your changes (to both HCL and SQL code) reflected in realtime.

To illustrate that, here’s the screencast we included in our blog post introducing the dashboard system.

Not shown there, because we wanted to focus on the happy path, is realtime feedback when your SQL queries provoke Postgres errors. The experience feels very much like the one Bret Victor champions in Inventing on Principle. The core principle: “Creators need an immediate connection to what they’re creating.”

Here’s the wrong way that too often constrains us.

If there’s anything wrong with the scene, or if I go and make changes, or if I have further ideas, I have to go back to the code, and I edit the code, compile and run, see what it looks like. Anything wrong, I go back to the code. Most of my time is spent working in the code, working in a text editor blindly, without an immediate connection to this thing, which is what I’m actually trying to make.

And here is the right way.

I’ve got this picture on the side, and the code on the side, and this part draws the sky and this draws the mountains and this draws the tree, and when I make any change to the code, the picture changes immediately. So the code and the picture are always in sync; there is no compile and run. I just change things in the code and I see things change in the picture.

We want to work the right way wherever we can. The experience isn’t available everywhere, yet, but it is available in Steampipe where it powerfully enables the experimentation and prototyping that many of us are inspired to do as we delve into Mastodon.

If you want to try this for yourself, please check out the setup instructions for the plugin that maps Mastodon APIs to Postgres tables, and the dashboards that use those tables, and ping me (on Mastodon if you like!) with any questions you may have.

A Bloomberg terminal for Mastodon

17 Dec 202224 May 2023 ~ Jon Udell ~ 5 Comments

As I mentioned last time, the Steampipe dashboard for Mastodon has evolved in unexpected ways. I imagined that the components — a plugin that maps Mastodon APIs to Postgres foreign tables, and a suite of views that query the APIs — would combine to enable a broad overview of activity in the Fediverse. That didn’t pan out for two reasons.

First, I learned that the Mastodon community didn’t appreciate the kind of surveillance required for such analysis. That was the original community, I should stress, and things have changed dramatically, but I want to respect the original ethos. Plenty of people will, nevertheless, crawl and index the Fediverse, but I don’t need to put my shoulder to that wheel. And if I did I’d be pushing Steampipe out of its sweet spot: realtime acquisition, querying, and visualization of API-sourced data.

Second, Mastodon’s API allows 300 requests every 5 minutes. You can use Steampipe in batch mode to defeat that limit, and you can store data permanently in its Postgres database, but that cuts across the grain with respect to both Steampipe and Mastodon. All Mastodon clients are subject to the same API rate limit. If you use the web app, or one of the phone apps, you will likely never have seen a message announcing that you’ve hit the limit and need to wait a few minutes. I never saw that message until I started querying the API with Steampipe while also using the web app.

So if Mastodon culture and tech resist deep data mining, and the system is optimized for clients that live within an API budget of 300 requests every 5 minutes, what kind of Mastodon client could Steampipe enable? It wouldn’t be a conventional client because Steampipe is a read-only system. The path forward would be some kind of reader, or browser, that augments the interactive apps.

The outcome, so far, is a suite of dashboards that display tabular views (along with some charts) of the home, local, and federated timelines, of my toot history and my favorites, of my follows and followers, of my notifications, of searches for terms, people, and hashtags, and of the timelines formed by the lists to which I’ve assigned people I follow. These are all HTML tables rendered by Steampipe’s dashboard server. The columns are all sortable, and the cells of the tables can contain only links or plain text.

Given that the toot content returned from the Mastodon API is HTML, the plain-text-only constraint felt, initially, like a blocker. No images? No links in toot content? What good is that?

Some constraints are worth embracing, though, and that may prove true here. The views created this way put a lot of information onto the screen. Here’s my default view in the stock client.

At a glance I can see three items on the home timeline, and if I want to scroll through 100 items I can only do so awkwardly in small gulps.

Here’s my home timeline in the Steampipe dashboard. I can see a dozen items at a glance, and can easily scan 100 items in gulps of that size.

When I described this effect to Greg Wilson he gave me the title for this post: “That sounds like the Bloomberg terminal for Mastodon.” I’ve never used one, and I’m aware that its design is often derided as a UX disaster, but as I understand it the product is built to enable traders to scan fast-moving data feeds from many different sources. In that sense I do think it’s an interesting and useful comparison.

The underlying principle is one I’ve learned from Edward Tufte: present information at maximum density. Our brains are built to take in a lot of information at a glance, and if it’s organized well we can do that very effectively. It feels like that’s happening for me when I scan these densely-packed views of Mastodon activity.

To enhance the effect, I’ve begun to apply filters. In a Mastodon timeline, for example, a chatty person can dominate what you see at a glance. When we participate in social media we are always making bids for one another’s attention. As publishers of feeds it’s wise to consider how a flurry of items can overwhelm a reader’s experience. But it’s also useful to consider ways that feed readers can filter a chatty source. Steampipe’s SQL foundation affords an easy and natural way to do that. Here’s part of the query that drives the list view.

select distinct on (list, user_name, person, hour) -- only one per list/user/hour
  person,
  url,
  hour,
  toot
from
  data
order by
  hour desc, list, person

It was easy to implement a rule that limits each person to at most one toot per hour. Next steps here will be to apply this rule to other views, show the number of collapsed toots, and enable such rules on a per-person basis.

There are always links into the Mastodon web app, and I follow them when I want to view images, boost someone, or reply to someone. The dashboards help me scan a lot of Mastodon activity quickly, and decide which items I want to interact with. Your 500-character toot is all you’ve got to grab my attention, and I’ll only see it as an unformatted chunk of plain text. That’s a pretty severe constraint, and not everyone will want to embrace it, but it’s working pretty well for me so far.

I expect that our dashboard system will support formatted text and images in cells of HTML tables. When it does I’d like to make it an option you can turn on or off in Mastodon dashboards. What should the default be? I suspect I’ll want plain text and no images, especially if image captions can appear along with the text of toots. Some of the original Mastodon cultural norms aren’t surviving the onslaught of new people, but writing descriptions of images is one that’s held up so far, and it’s a wonderful thing. So write a short thoughtful post, write a caption for your image if you include one, and if you capture my attention I’ll click through to view and interact.

Browsing the Fediverse

10 Dec 202224 May 2023 ~ Jon Udell ~ 1 Comment

A month ago, when the Great Discontinuity happened, I started working on a Steampipe plugin to enable SQL queries against the Mastodon API, along with a companion Steampipe “mod” (suite of dashboards) to display and chart the results of those queries.

I expect these dashboards will soon be available in Steampipe Cloud, where it will take just a few seconds to pop in your Mastodon access token (from, e.g., https://mastodon.social/settings/applications/new) and begin using the dashboards.

Meanwhile, if you’re so inclined, you can find the plugin here and the dashboards here. If you’re reasonably technical you can pretty quickly and easily install Steampipe, clone these repos, build the plugin, and start using the dashboards.

Why would you want to? My own motivation, originally, was to do Mastodon analytics. I thought Steampipe’s SQLification of the API would be a handy way to discern and monitor activity trends during a period of extraordinary flux. And that’s proven to be true, to a limited extent. Here’s a snapshot of the dashboard that uses the instance activity API.

I’m watching this chart with great interest. Where does it go from here? I’m not going to hazard a guess. Everything’s up in the air right now, and anything could happen.

But as I added tables to the plugin to encapsulate more of the Mastodon API, and added dashboards to visualize those tables, my focus shifted. I began to see the suite of dashboards as a Mastodon reader/browser that complements the web and phone clients, and that’s how I mainly use them now.

I think the key benefit is one of Edward Tufte’s core principles: information density. Each of these dashboards shows more activity than you can see at a glance in the web or phone interfaces. I find this very helpful for searching and browsing. When I see items of interest that I want to interact with, I click through to the web app in order to boost, reply, or favorite.

Will this way of browsing Mastodon appeal to you? To get a feel for what it’s like, here are snapshots of some of the dashboards I’ve built so far.

dashboard.Notification

dashboard.PeopleSearch

dashboard.StatusSearch

dashboard.TagSearch

For me, at least, this approach has become an effective way to browse the fediverse, find interesting people, read what they boost, and keep track of my own activity.

If you are dev-minded, by the way, please note that these dashboards are just one way to skin the results of queries against the plugin. Any SQL client can connect to Steampipe’s Postgres endpoint. You could use dashboards like Metabase or Grafana, or you could embed Steampipe as a component in an app.

When your database is an HTTP client

9 Dec 2022 ~ Jon Udell ~ 3 Comments

Here are three things I once hoped — but no longer expect — to outlive:

1. PDF files

2. passwords

3. The occasional need to scrape data from web pages

PDF files and passwords are topics for another day, but web scraping is timely. Today I was asked to corral data from the Steampipe blog, and the path of least resistance was (of course!) to extract it from the site.

I was all warmed up for the exercise because we’d just published a post dramatically entitled “Why build an HTTP client into a database? So you can ingest web data directly!” In that post I show three solutions enabled by the Net plugin’s net_http_request table.

Since the dawn of the web, scraping has worked this way: Use a script to fetch the data, then save it for retrieval and analysis. You might use the script language to query the data, or the query language of a database.

A couple of years ago I found a way to unify those ingredients: Run the script inside the database. You can do a lot with Postgres’ built-in procedural language, and even more if you activate Python inside Postgres. I went deeply into both and explained why in an earlier episode.

PL/Python was great for advanced uses, and I used it for a bit of web scraping too. It all worked fine, and I never thought to ask this question: “What if the database is the HTTP client, and SQL the common way to reason over data coming through that pipe?”

The examples in the post show what that’s like. In its simplest form you write a query like this.

select
    response_status_code,
    jsonb_pretty(response_headers) as headers
  from
    net_http_request
  where
    url = 'https://steampipe.io'

The result is like this.

+----------------------+-------------------------------------------------------+
| response_status_code | headers                                               |
+----------------------+-------------------------------------------------------+
| 200                  | {                                                     |
|                      |     "Age": [                                          |
|                      |         "45557"                                       |
|                      |     ],                                                |
|                      |     "Date": [                                         |
|                      |         "Fri, 09 Dec 2022 06:46:40 GMT"               |
|                      |     ],                                                |
|                      |     "Etag": [                                         |
|                      |         "W/\"614a142998557b388e053bfa4408cf70\""      |

The response_status_code is a regular Postgres column, the headers column is a JSONB column that you can index into (e.g. headers ->> 'etag'). If you also select the response_body column of the net_http_request table you’ll get another regular Postgres column containing the text of the web page. If it’s HTML text, you can use regular expressions to match patterns in it. If it’s JSON text you can use Postgres’ JSON functions to query and transform it.

You can join fields nested in the JSON with other Postgres columns. And those other columns can belong to tables populated in any of the ways Steampipe plugins populate tables: from JSON or GraphQL API endpoints, from CSV or Terraform or Yaml files, from anywhere really. As a developer writing Steampipe queries (and flowing results into dashboards) you see all of these sources as tables, you query them individually in all the ways Postgres can, and you join across diverse sources in a common way.

Of course web pages are structured in ways that regular expressions can’t easily grok. It’s easy to match links, but parsing HTML tag names and attributes is a job for a real parser. I’d made a start on an HTML plugin for Steampipe. There were already two tables: one to extract links from a web page, one to transform HTML tables to CSV format. So today, when tasked with tabulating blog metadata, I added a third table to enable these queries.

-- find the title

select
  page,
  tag_name,
  tag_content
from
  html_tag
where
  page = 'https://steampipe.io/blog/selective-select'
  and tag_name = 'title'

-- list the meta tags

select
  page,
  tag_name,
  tag_attrs
from
  html_tag
where
  page = 'https://steampipe.io/blog/selective-select'
  and tag_name = 'meta'

That’s a powerful way to reason over HTML data! It was easy for me to extend the HTML plugin in this way, and I assure you that I’m no 10x programmer. The Steampipe plugin SDK and the wonderful goquery package are doing all the heavy lifting. I just had to stitch the components together, and if you’re any kind of programmer, with or without Go experience, you could pretty readily do the same.

Mastodon, Steampipe, and RSS

6 Dec 202224 May 2023 ~ Jon Udell ~ 2 Comments

I was determined to write my Mastodon #introduction today. To get started I used the tag search in the dashboard I’m building.

The idea was to look at a bunch of other #introduction posts to get a feel for how mine should go. When you search specifically for hashtags, the Mastodon search API returns this information.

"hashtags": [
    {
      "name": "introduction",
      "url": "https://mastodon.social/tags/introduction",
      "history": [
        {
          "day": "1574553600",
          "uses": "10",
          "accounts": "9"
        },
        // ...
      ]
    },

A first version of the dashboard, having only this data to work with, just listed the names of tags matching the search term along with corresponding URLs. Here was the initial query.

select 
  name,
  url
from 
  mastodon_search_hashtag 
where 
  query = 'introduction'

That produced a list of links, like https://mastodon.social/tags/introduction, to home pages for variants of the tag. These are useful links! Each goes to a page where you can see who is posting to the tag.

To make this view slightly more useful, I tapped the third element of the API response, history, in a revised query.

with data as (
  select 
    name,
    url,
    ( jsonb_array_elements(history) ->> 'uses' )::int as uses
  from 
    mastodon_search_hashtag 
  where 
    query = 'introduction'
)
select
  name,
  url,
  sum(uses)
from
  data
group by
  name, url
order by
  sum desc

These results help me decide which variant to use.

+-------------------+---------------------------------------------------+------+
| name              | url                                               | sum  |
+-------------------+---------------------------------------------------+------+
| introduction      | https://mastodon.social/tags/introduction         | 1816 |
| introductions     | https://mastodon.social/tags/introductions        | 218  |
| introductionpost  | https://mastodon.social/tags/introductionpost     | 19   |
| introductionfr    | https://mastodon.social/tags/introductionfr       | 6    |

But I still need to visit each link’s page to explore how it’s being used. It would be nice to surface more context in the dashboard, and I found a nifty way to do it, but first let’s dwell on the revised query for a minute. Postgres’ JSON features are powerful and it’s often a challenge (at least for me) to visualize how they work.

The Postgres jsonb_array_elements() function is what’s called a set-returning function. Here it unpacks Postgres’ JSON representation of the list of history structures returned from the Mastodon API. In its simplest form, the function call jsonb_array_elements(history) produces a temporary table with per-tag, per-day data.

select
  name,
  jsonb_array_elements(history) as history
from
  mastodon_search_hashtag 
where 
  query = 'introduction'

+--------------------------------+----------------------------------------------------+
| name                           | history                                            |
+--------------------------------+----------------------------------------------------+
| introduction                   | {"accounts":"16","day":"1670371200","uses":"19"}   |
| introduction                   | {"accounts":"250","day":"1670284800","uses":"269"} |
| introduction                   | {"accounts":"259","day":"1670198400","uses":"274"} |
| introduction                   | {"accounts":"253","day":"1670112000","uses":"270"} |
| introduction                   | {"accounts":"245","day":"1670025600","uses":"269"} |
| introduction                   | {"accounts":"345","day":"1669939200","uses":"383"} |
| introduction                   | {"accounts":"307","day":"1669852800","uses":"339"} |
| introductionsfr                | {"accounts":"0","day":"1670371200","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670284800","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670198400","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670112000","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670025600","uses":"0"}     |

history is a JSONB column that holds an object with three fields. The revised query uses Postgres’ JSON indexing operator ->> to reach into that object and hoist the number of daily uses into its own column, so it can be the target of a SQL SUM function.

OK, ready for the nifty solution? Recall that https://mastodon.social/tags/introduction is the home page for that variant of the tag. There you can see introduction posts from people using the tag. Those posts typically include other tags. In the dashboard shown above you can see that Kathy Nickels is using these: #Music #Art #Equestrian #Nature #Animals. The tags appear in her introduction post.

I didn’t immediately see how to capture them for use in the dashboard. Then I remembered that certain classes of Mastodon page have corresponding RSS feeds, and wondered if the tag pages are members of one such class. Sure enough they are, and https://mastodon.social/tags/introduction.rss is a thing. That link, formed by tacking .rss onto the base URL, provides the extra context I was looking for. Here’s the final version of the query.

with data as (
  select 
    name,
    url,
    ( jsonb_array_elements(history) ->> 'uses' )::int as uses
  from 
    mastodon_search_hashtag 
  where 
    query = 'introduction'
  ),
  uses as (
    select 
      name,
      url || '.rss' as feed_link,
      sum(uses) as recent_uses
    from 
      data 
    group 
      by connection, name, url
  )
  select
    u.name,
    r.guid as link,
    to_char(r.published, 'YYYY-MM-DD') as published,
    r.categories
  from
    uses u
  join
    rss_item r
  on 
    r.feed_link = u.feed_link
  where
    recent_uses > 1
  order by 
    recent_uses desc, published desc
)

The new ingredients, courtesy of the RSS feed, are: guid which links to an individual introduction like Kathy’s, published which is the day the introduction appeared, and categories which has the tags used in the introduction post. Sweet! Now I can scan the dashboard to get a sense of which introductions I want to check out.

The first three queries use the Steampipe plugin for Mastodon, and in particular its mastodon_search_hashtag table which encapsulates the Mastodon API for searching tags. The final version joins that table with the rss_item table provided by the RSS plugin, using the common base URL as the basis of the join.

This delights me in so many ways. When the blogosphere first emerged in the early 2000s, some of us discovered that the RSS protocol was capable of far more than just delivering feeds to RSS readers. The other new hot protocol in that era was XML web services. As an InfoWorld analyst I was supposed to be cheering the latter as an enterprise-grade technology, but I couldn’t help noticing that RSS kept turning out to be a great way to move data between cooperating systems. That’s always been true, and I love how this example reminds us that it’s still true.

I’m equally delighted to show how Steampipe enables this modern exercise in RSS-powered integration. Steampipe was, originally, an engine for mapping results from JSON API endpoints to SQL tables. Over time, though, it has broadened its view of what constitutes an API. You can use Steampipe to query CSV files, or Terraform files, or — as we see here — RSS feeds. Data comes in all kinds of flavors. Steampipe abstracts those differences and brings all the flavors into a common space where you can reason over them using SQL.

And finally, it’s just wonderful to be at the intersection of Mastodon, Steampipe, and RSS in this remarkable moment. I’ll readily admit that nostalgia is a factor. But RSS did bust things wide open twenty years ago, Mastodon’s doing that now, and I love that RSS can help it happen again.

Now I need to write that #introduction!

Autonomy, packet size, friction, fanout, and velocity

28 Nov 202224 May 2023 ~ Jon Udell ~ 4 Comments

Nostalgia is a dangerous drug and it’s always risky to wallow in it. So those of us who fondly remember the early blogosphere, and now want to draw parallels to the fediverse, should do so carefully. But we do want to learn from history.

Here’s one way to compare five generations of social software along the five dimensions named in the title of this post.

              Autonomy    Packet Size   Friction    Fanout    Velocity
 
Usenet        medium      high          medium      medium    low

Blogosphere   high        high          high        low       low

Facebook      low         high          low         medium    high

Twitter       low         low           low         high      high

Fediverse     high        medium        high        medium    medium

These are squishy categories, but I think they surface key distinctions. Many of us who were active in the blogosphere of the early 2000s enjoyed a high level of autonomy. Our RSS readers were our Internet dashboards. We loaded them with a curated mix of official and individual voices. There were no limits on the size of packets exchanged in this network. You could write one short paragraph or a 10,000-word essay. Networking wasn’t frictionless because blog posts did mostly feel like essays, and because comments didn’t yet exist. To comment on my blog post you’d write your own blog post linking to it.

That friction limited the degree to which a post would fan out through the network, and the velocity of its propagation. The architecture of high friction, low fanout, and low velocity was a winning combination for a while. In that environment I felt connected but not over-connected, informed but not overloaded.

Twitter flipped things around completely. It wasn’t just the loss of autonomy as ads and algos took over. With packets capped at 120 characters, and tweets potentially seen immediately by everyone, friction went nearly to zero. The architecture of low friction created an addictive experience and enabled powerful effects. But it wasn’t conducive to healthy discourse.

The fediverse can, perhaps, strike a balance. Humans didn’t evolve to thrive in frictionless social networks with high fanout and velocity, and arguably we shouldn’t. We did evolve in networks governed by Dunbar’s number, and our online networks should respect that limit. We need less friction within communities of knowledge and practice, more friction between them. We want messages to fan out pervasively and rapidly within communities, but less so between them.

We’re at an extraordinary inflection point right now. Will the fediverse enable us to strike the right balance? I think it has the right architectural ingredients to land where I’ve (speculatively) placed it in that table. High autonomy. As little friction as necessary, but not too little. As much fanout and velocity as necessary, but not too much. Nobody knows how things will turn out, predictions are futile, behavior is emergent, but I am on the edge of my seat watching this all unfold.

Debuggable explanations

17 Nov 202217 Nov 2022 ~ Jon Udell ~ 8 Comments

I’ve been reviewing Greg Wilson’s current book project, Software Design in Python. Like the earlier JavaScript-based Software Design by Example it’s a guided tour of tools, techniques, and components common to many software systems: testing frameworks, parsers, virtual machines, debuggers. Each chapter of each of these books shows how to build the simplest possible working version of one of these things.

Though I’ve used this stuff for most of my life, I’ve never studied it formally. How does an interpreter work? The chapter on interpreters explains the basic mechanism using a mixture of prose and code. When I read the chapter I can sort of understand what’s happening, but I’m not great at mental simulation of running code. I need to run the code in a debugger, set breakpoints, step through execution, and watch variables change. Then it sinks in.

The GitHub repo for the book includes all the text and all the code. I’d like to put them side-by-side, so that as I read the narrative I can run and debug the code that’s being described. Here’s how I’m doing that in VSCode.

This is pretty good! But it wasn’t dead simple to get there. In a clone of the repo, the steps included:

Find the HTML file for chapter 3.
Install a VSCode extension to preview HTML.
Find the code for chapter 3.
Adjust the code to not require command-line arguments.
Arrange the text and code in side-by-side panes.

Though it’s all doable, the activation threshold is high enough to thwart my best intention of repeating the steps for every chapter.

Whether in VSCode or another IDE or some other kind of app, what would be the best way to lower that activation threshold?

Curating the Studs Terkel archive

25 Sep 20226 Oct 2022 ~ Jon Udell ~ 2 Comments

I can read much faster than I can listen, so I rarely listen to podcasts when I’m at home with screens on which to read. Instead I listen on long hikes when I want to shift gears, slow down, and absorb spoken words. Invariably some of those words strike me as particularly interesting, and I want to capture them. Yesterday, what stuck was these words from a 1975 Studs Terkel interview with Muhammad Ali:

Everybody’s watching me. I was rich. The world saw me, I had lawyers to fight it, I was getting credit for being a strong man. So that didn’t really mean nothing. What about, I admire the man that had to go to jail named Joe Brown or Sam Jones, who don’t nobody know who’s in the cell, you understand? Doing his time, he got no lawyers’ fees to pay. And when he get out, he won’t be praised for taking a stand. So he’s really stronger than me. I had the world watching me. I ain’t so great. I didn’t do nothing that’s so great. What about the little man don’t nobody know? He’s really the one.

I heard these words on an episode of Radio OpenSource about the Studs Terkel Radio Archive, an extraordinary compilation of (so far) about a third of his 5600 hours of interviews with nearly every notable person during the latter half of the twentieth century.

If you weren’t aware of him, the Radio OpenSource episode, entitled Studs Terkel’s Feeling Tone, is the perfect introduction. And it’s delightful to hear one great interviewer, Chris Lydon, discuss with his panel of experts the interviewing style of perhaps the greatest interviewer ever.

Because I’d heard Muhammad Ali’s words on Radio OpenSource, I could have captured them in the usual way. I always remember where I was when I heard a segment of interest. If that was 2/3 of the way through my hike, I’ll find the audio at the 2/3 mark on the timeline. I made a tool to help me capture and share a link to that segment, but it’s a clumsy solution.

What you’d really rather do is search for the words in a transcript, select surrounding context, use that selection to define an audio segment, and share a link to both text and audio. That’s exactly what I did to produce this powerful link, courtesy of WFMT’s brilliant remixer, which captures both the written words I quoted above and the spoken words synced to them.

That’s a dream come true. Thank you! It’s so wonderful that I hesitate to ask, but … WFMT, can you please make the archive downloadable? I would listen to a lot of those interviews if I could put them in my pocket and take them with me on hikes. Then I could use your remixer to help curate the archive.

GitHub for English teachers

27 Aug 202227 Aug 2022 ~ Jon Udell ~ 14 Comments

I’ve long imagined a tool that would enable a teacher to help students learn how to write and edit. In Thoughts in motion I explored what might be possible in Federated Wiki, a writing tool that keeps version history for each paragraph. I thought it could be extended to enable the kind of didactic editing I have in mind, but never found a way forward.

In How to write a press release I tried bending Google Docs to this purpose. To narrate the process of editing a press release, I dropped a sample release into a GDoc and captured a series of edits as named versions. Then I captured the versions as screenshots and combined them with narration, so the reader of the blog post can see each edit as a color-coded diff with an explanation.

The key enabler is GDoc’s File -> Version history -> Name current version, along with File -> See version history‘s click-driven navigation of the set of diffs. It’s easy to capture a sequence of editing steps that way.

But it’s much harder to present those steps as I do in the post. That required me to make, name, and organize a set of images, then link them to chunks of narration. It’s tedious work. And if you want to build something like this for students, that’s work you shouldn’t be doing. You just want to do the edits, narrate them, and share the result.

This week I tried a different approach when editing a document written by a colleague. Again the goal was not only to produce an edited version, but also to narrate the edits in a didactic way. In this case I tried bending GitHub to my purpose. I put the original doc in a repository, made step-by-step edits in a branch, and created a pull request. We were then able to review the pull request, step through the changes, and review each as a color-coded diff with an explanation. No screenshots had to be made, named, organized, or linked to the narration. I could focus all my attention on doing and narrating the edits. Perfect!

Well, perfect for someone like me who uses GitHub every day. If that’s not you, could this technique possibly work?

In GitHub for the rest of us I argued that GitHub’s superpowers could serve everyone, not just programmers. In retrospect I felt that I’d overstated the case. GitHub was, and remains, a tool that’s deeply optimized for programmers who create and review versioned source code. Other uses are possible, but awkward.

As an experiment, though, let’s explore how awkward it would be to recreate my Google Docs example in GitHub. I will assume that you aren’t a programmer, have never used GitHub, and don’t know (or want to know) anything about branches or commits or pull requests. But you would like to be able to create a presentation that walks a learner though a sequence of edits, with step-by-step narration and color-coded diffs. At the end of this tutorial you’ll know how to do that. The method isn’t as straightforward as I wish it were. But I’ll describe it carefully, so you can try it for yourself and decide whether it’s practical.

Here’s the final result of the technique I’ll describe.

If you want to replicate that, and don’t already have a GitHub account, create one now and log in.

Ready to go? OK, let’s get started.

Step 1: Create a repository

Click the + button in the top right corner, then click New repository.

Here’s the next screen. All you must do here is name the repository, e.g. editing-step-by-step, then click Create repository. I’ve ticked the Add a README file box, and chosen the Apache 2.0 license, but you could leave the defaults — box unchecked, license None — as neither matters for our purpose here.

Step 2: Create a new file

On your GitHub home page, click the Repositories tab. Your new repo shows up first. Click its link to open it, then click the Add file dropdown and choose Create new file. Here’s where you land.

Step 3: Add the original text, create a new branch, commit the change, and create a pull request

What happens on the next screen is bewildering, but I will spare you the details because I’m assuming you don’t want to know about branches or commits or pull requests, you just want to build the kind of presentation I’ve promised you can. So, just follow this recipe.

Name the file (e.g. sample-press-release.txt
Copy/paste the text of the document into the edit box
Select Create a new branch for this commit and start a pull request
Name the branch (e.g. edits)
Click Propose new file

On the next screen, title the pull request (e.g. edit the press release) and click Create pull request.

Step 4: Visit the new branch and begin editing

On the home page of your repo, use the main dropdown to open the list of branches. There are now two: main and edits. Select edits

Here’s the next screen.

Click the name of the document you created (e.g. sample-press-release.txt to open it.

Click the pencil icon’s dropdown, and select Edit this file.

Make and preview your first edit. Here, that’s my initial rewrite of the headline. I’ve written a title for the commit (Step 1: revise headline), and I’ve added a detailed explanation in the box below the title. You can see the color-coded diff above, and the rationale for the change below.

Click Commit changes, and you’re back in the editor ready to make the next change.

Step 5: Visit the pull request to review the change

On your repo’s home page (e.g. https://github.com/judell/editing-step-by-step), click the Pull requests button. You’ll land here.

Click the name of the pull request (e.g. edit the press release) to open it. In the rightmost column you’ll see links with alphanumeric labels.

Click the first one of those to land here.

This is the first commit, the one that added the original text. Now click Next to review the first change.

This, finally, is the effect we want to create: a granular edit, with an explanation and a color-coded diff, encapsulated in a link that you can give to a learner who can then click Next to step through a series of narrated edits.

Lather, rinse, repeat

To continue building the presentation, repeat Step 4 (above) once per edit. I’m doing that now.

… time passes …

OK, done. Here’s the final edited copy. To step through the edits, start here and use the Next button to advance step-by-step.

If this were a software project you’d merge the edits branch into the main branch and close the pull request. But you don’t need to worry about any of that. The edits branch, with its open pull request, is the final product, and the link to the first commit in the pull request is how you make it available to a learner who wants to review the presentation.

GitHub enables what I’ve shown here by wrapping the byzantine complexity of the underlying tool, Git, in a much friendlier interface. But what’s friendly to a programmer is still pretty overwhelming for an English teacher. I still envision another layer of packaging that would make this technique simpler for teachers and learners focused on the craft of writing and editing. Meanwhile, though, it’s possible to use GitHub to achieve a compelling result. Is it practical? That’s not for me to say, I’m way past being able to see this stuff through the eyes of a beginner. But if that’s you, and you’re motivated to give this a try, I would love to know whether you’re able to follow this recipe, and if so whether you think it could help you to help learners become better writers and editors.

How to rewrite a press release: a step-by-step guide

13 Aug 202213 Aug 2022 ~ Jon Udell ~ 1 Comment

As a teaching fellow in grad school I helped undergrads improve their expository writing. Some were engineers, and I invited them to think about writing and editing prose in the same ways they thought about writing and editing code. Similar rules apply, with different names. Strunk and White say “omit needless words”; coders say “DRY” (don’t repeat yourself.) Writers edit; coders refactor. I encouraged students to think about writing and editing prose not as a creative act (though it is one, as is coding) but rather as a method governed by rules that are straightforward to learn and mechanical to apply.

This week I applied those rules to an internal document that announces new software features. It’s been a long time since I’ve explained the method, and thanks to a prompt from Greg Wilson I’ll give it a try using another tech announcement I picked at random. Here is the original version.

I captured the transformations in a series of steps, and named each step in the version history of a Google Doc.

Step 1

The rewritten headline applies the following rules.

Lead with key benefits. The release features two: support for diplex-matched antennas and faster workflow. The original headline mentions only the first, I added the second.

Clarify modifiers. A phrase like “diplex matched antennas” is ambiguous. Does “matched” modify “diplex” or “antennas”? The domain is unfamiliar to me, but I suspected it should be “diplex-matched” and a web search confirmed that hunch.

Omit needless words. The idea of faster workflow appears in the original first paragraph as “new efficiencies aimed at streamlining antenna design workflows and shortening design cycles.” That’s a long, complicated, yet vague way of saying “enables designers to work faster.”

Step 2

The original lead paragraph was now just a verbose recap of the headline. So poof, gone.

Step 3

The original second paragraph, now the lead, needed a bit of tightening. Rules in play here:

Strengthen verbs. “NOUN is a NOUN that VERBs” weakens the verb. “NOUN, a NOUN, VERBs” makes it stronger.

Clarify modifiers. “matching network analysis” -> “matching-network analysis”. (As I look at it again now, I’d revise to “analysis of matching networks.”)

Break up long, weakly-linked sentences. The original was really two sentences linked weakly by “making it,” so I split them.

Omit needless words. A word that adds nothing, like “applications” here, weakens a sentence.

Strengthen parallelism. If you say “It’s ideal for X and Y” there’s no problem. But when X becomes “complex antenna designs that involve multi-state and multi-port aperture or impedance tuners,” and Y becomes “corporate feed networks with digital phase shifters,” then it helps to make the parallelism explicit: “It’s ideal for X and for Y.”

Step 4

Omit needless words. “builds on the previous framework with additional” -> “adds”.

Simplify. “capability to connect” -> “ability to connect”.

Show, don’t tell. A phrase like “time-saving options in the schematic editor’s interface” tells us that designers save time but doesn’t show us how. That comes next: “the capability to connect two voltage sources to a single antenna improves workflow efficiency.” The revision cites that as a shortcut.

Activate the sentence. “System and radiation efficiencies … can be effortlessly computed from a single schematic” makes efficiencies the subject and buries the agent (the designer) who computes them. The revision activates that passive construction. Similar rules govern the rewrite of the next paragraph.

Step 5

When I reread the original fourth paragraph I realized that the release wasn’t only touting faster workflow, but also better collaboration. So I adjusted the headline accordingly.

Step 6

Show, don’t tell. The original version tells, the new one shows.

Simplify. “streamline user input” -> “saves keystrokes” (which I might further revise to “clicks and keystrokes”).

Final result

Here’s the result of these changes.

I haven’t fully explained each step, and because the domain is unfamiliar I’ve likely missed some nuance. But I’m certain that the final version is clearer and more effective. I hope this step-by-step narration helps you see how and why the method works.

The Velvet Bandit’s COVID series

7 Aug 2022 ~ Jon Udell ~ 3 Comments

The Velvet Bandit is a local street artist whose work I’ve admired for several years. Her 15 minutes of fame happened last year when, as reported by the Press Democrat, Alexandria Ocasio-Cortez wore a gown with a “Tax the Rich” message that closely resembled a VB design.

I have particularly enjoyed a collection that I think of as the Velvet Bandit’s COVID series, which appeared on the boarded-up windows of the former Economy Inn here in Santa Rosa. The building is now under active renovation and the installation won’t last much longer, so I photographed it today and made a slideshow.

I like this image especially, though I have no idea what it means.

If you would like to buy some of her work, it’s available here. I gather sales have been brisk since l’affaire AOC!

Subtracting devices

1 Aug 20221 Aug 2022 ~ Jon Udell ~ 8 Comments

People who don’t listen to podcasts often ask people who do: “When do you find time to listen?” For me it’s always on long walks or hikes. (I do a lot of cycling too, and have thought about listening then, but wind makes that impractical and cars make it dangerous.) For many years my trusty podcast player was one or another version of the Creative Labs MuVo which, as the ad says, is “ideal for dynamic environments.”

At some point I opted for the convenience of just using my phone. Why carry an extra, single-purpose device when the multi-purpose phone can do everything? That was OK until my Quixotic attachment to Windows Phone became untenable. Not crazy about either of the alternatives, I flipped a coin and wound up with an iPhone. Which, of course, lacks a 3.5mm audio jack. So I got an adapter, but now the setup was hardly “ideal for dynamic environments.” My headset’s connection to the phone was unreliable, and I’d often have to stop walking, reseat it, and restart the podcast.

If you are gadget-minded you are now thinking: “Wireless earbuds!” But no thanks. The last thing I need in my life is more devices to keep track of, charge, and sync with other devices.

I was about to order a new MuVo, and I might still; it’s one of my favorite gadgets ever. But on a recent hike, in a remote area with nobody else around, I suddenly realized I didn’t need the headset at all. I yanked it out, stuck the phone in my pocket, and could hear perfectly well. Bonus: Nothing jammed into my ears.

It’s a bit weird when I do encounter other hikers. Should I pause the audio or not when we cross paths? So far I mostly do, but I don’t think it’s a big deal one way or another.

Adding more devices to solve a device problem amounts to doing the same thing and expecting a different result. I want to remain alert to the possibility that subtracting devices may be the right answer.

There’s a humorous coda to this story. It wasn’t just the headset that was failing to seat securely in the Lightning port. Charging cables were also becoming problematic. A friend suggested a low-tech solution: use a toothpick to pull lint out of the socket. It worked! I suppose I could now go back to using my wired headset on hikes. But I don’t think I will.

What happened to simple, basic web hosting?

5 Jun 20225 Jun 2022 ~ Jon Udell ~ 34 Comments

For a friend’s memorial I signed up to make a batch of images into a slideshow. All I wanted was the Simplest Possible Thing: a web page that would cycle through a batch of images. It’s been a while since I did something like this, so I looked around and didn’t find anything that seemed simple enough. The recipes I found felt like overkill. Here’s all I wanted to do:

Put the images we’re gathering into a local folder
Run one command to build slideshow.html
Push the images plus slideshow.html to a web folder

Step 1 turned out to be harder than expected because a bunch of the images I got are in Apple’s HEIC format, so I had to find a tool that would convert those to JPG. Sigh.

For step 2 I wrote the script below. A lot of similar recipes you’ll find for this kind of thing will create a trio of HTML, CSS, and JavaScript files. That feels to me like overkill for something as simple as this, I want as few moving parts as possible, so the Python script bundles everything into slideshow.html which is the only thing that needs to be uploaded (along with the images).

Step 3 was simple: I uploaded the JPGs and slideshow.html to a web folder.

Except, whoa, not so fast there, old-timer! True, it’s easy for me, I’ve maintained a personal web server for decades and I don’t think twice about pushing files to it. Once upon a time, when you signed up with an ISP, that was a standard part of the deal: you’d get web hosting, and would use an FTP client — or some kind of ISP-provided web app — to move files to your server.

As I realized a few years ago, that’s now a rare experience. It seems that for most people, it’s far from obvious how to push a chunk of basic web stuff to a basic web server. People know how to upload stuff to Google Drive, or WordPress, but those are not vanilla web hosting environments.

It’s a weird situation. The basic web platform has never been more approachable. Browsers have converged nicely on the core standards. Lots of people could write a simple app like this one. Many more could at least /use/ it. But I suspect it will be easier for many nowadays to install Python and run this script than to push its output to a web server.

I hate to sound like a Grumpy Old Fart. Nobody likes that guy. I don’t want to be that guy. So I’ll just ask: What am I missing here? Are there reasons why it’s no longer important or useful for most people to be able to use the most basic kind of web hosting?

import os

l = [i for i in os.listdir() if i.endswith('.jpg')]

divs = ''

for i in l:
  divs += f"""
<div class="slide">
  <img src="{i}">
</div>
"""


# Note: In a Python f-string, CSS/JS squiggies ({}) need to be doubled

html = f"""
<html>

<head>
  <title>My Title</title>
  <style>
    body {{ background-color: black }} 
    .slide {{ text-align: center; display: none; }}
    img {{ height: 100% }}
  </style>
</head>

<body>

<div id="slideshow">
<div role="list">
{divs}
</div>
</div>

<script>
const slides = document.querySelectorAll('.slide')

const time = 5000

slides[0].style.display = 'block';

let i = 0

setInterval( () => {{
  i++
  if (i === slides.length) {{
    i = 0
  }}
  for (let j = 0; j <= i; j++ ) {{
    if ( j === i ) {{
      slides[j].style.display = 'block'
    }} else {{
      slides[j].style.display = 'none'
    }}
    
  }}
}}, time)
</script>

</body>
</html>
"""

with open('slideshow.html', 'w') as f:
  f.write(html)

Appreciating “Just Have a Think”

12 May 202212 May 2022 ~ Jon Udell ~ Leave a comment

Just Have a Think, a YouTube channel created by Dave Borlace, is one of my best sources for news about, and analysis of, the world energy transition. Here are some hopeful developments I’ve enjoyed learning about.

Solar Wind and Wave. Can this ocean hybrid platform nail all three?

New energy storage tech breathing life and jobs back into disused coal power plants

Agrivoltaics. An economic lifeline for American farmers?

Solar PV film roll. Revolutionary new production technology

All of Dave’s presentations are carefully researched and presented. A detail that has long fascinated me: how the show displays source material. Dave often cites IPCC reports and other sources that are, in raw form, PDF files. He spices up these citations with some impressive animated renderings. Here’s one from the most recent episode.

The progressive rendering of the chart in this example is an even fancier effect than I’ve seen before, and it prompted me to track down the original source. In that clip Dave cites IRENA, the International Renewable Energy Agency, so I visited their site, looked for the cited report, and found it on page 8 of World Energy Transitions Outlook 2022. That link might or might not take you there directly, if not you can scroll to page 8 where you’ll find the chart that’s been animated in the video.

The graphical finesse of Just Have a Think is only icing on the cake. The show reports a constant stream of innovations that collectively give me hope we might accomplish the transition and avoid worst-case scenarios. But still, I wonder. That’s just a pie chart in a PDF file. How did it become the progressive rendering that appears in the video?

In any case, and much more importantly: Dave, thanks for the great work you’re doing!

Capture the rain

16 Apr 202217 Apr 2022 ~ Jon Udell ~ 3 Comments

It’s raining again today, and we’re grateful. This will help put a damper on what was shaping up to be a terrifying early start of fire season. But the tiny amounts won’t make a dent in the drought. The recent showers bring us to 24 inches of rain for the season, about 2/3 of normal. But 10 of those 24 inches came in one big burst on Oct 24.

Here are a bunch of those raindrops sailing down the Santa Rosa creek to the mouth of the Russian River at Jenner.

With Sam Learner’s amazing River Runner we can follow a drop that fell in the Mayacamas range as it makes its way to the ocean.

Until 2014 I’d only ever lived east of the Mississipi River, in Pennsylvania, Michigan, Maryland, Massachusetts, and New Hampshire. During those decades there may never have been a month with zero precipitation.

I still haven’t adjusted to a region where it can be dry for many months. In 2017, the year of the devastating Tubbs Fire, there was no rain from April through October.

California relies heavily on the dwindling Sierra snowpack for storage and timed release of water. Clearly we need a complementary method of storage and release, and this passage in Kim Stanley Robinson’s Ministry for the Future imagines it beautifully.

Typically the Sierra snowpack held about fifteen million acre-feet of water every spring, releasing it to reservoirs in a slow melt through the long dry summers. The dammed reservoirs in the foothills could hold about forty million acre-feet when full. Then the groundwater basin underneath the central valley could hold around a thousand million acre-feet; and that immense capacity might prove their salvation. In droughts they could pump up groundwater and put it to use; then during flood years they needed to replenish that underground reservoir, by capturing water on the land and not allow it all to spew out the Golden Gate.

…

Now the necessity to replumb the great valley for recharge had forced them to return a hefty percentage of the land to the kind of place it had been before Europeans arrived. The industrial agriculture of yesteryear had turned the valley into a giant factory floor, bereft of anything but products grown for sale; unsustainable ugly, devastated, inhuman, and this in a place that had been called the “Serengeti of North America,” alive with millions of animals, including megafauna like tule elk and grizzly bear and mountain lion and wolves. All those animals had been exterminated along with their habitat, in the first settlers’ frenzied quest to use the valley purely for food production, a kind of secondary gold rush. Now the necessity of dealing with droughts and floods meant that big areas of the valley were restored, and the animals brought back, in a system of wilderness parks or habitat corridors, all running up into the foothills that ringed the central valley on all sides.

The book, which Wikipedia charmingly classifies as cli-fi, grabbed me from page one and never let go. It’s an extraordinary blend of terror and hope. But this passage affected me in the most powerful way. As Marc Reisner’s Cadillac Desert explains, and as I’ve seen for myself, we’ve already engineered the hell out of California’s water systems, with less than stellar results.

Can we redo it and get it right this time? I don’t doubt our technical and industrial capacity. Let’s hope it doesn’t take an event like the one the book opens with — a heat wave in India that kills 20 million people in a week — to summon the will.

Life in the neighborhood

30 Jan 202230 Jan 2022 ~ Jon Udell ~ 2 Comments

I’ve worked from home since 1998. All along I’ve hoped many more people would enjoy the privilege and share in the benefits. Now that it’s finally happening, and seems likely to continue in some form, let’s take a moment to reflect on an underappreciated benefit: neighborhood revitalization.

I was a child of the 1960s, and spent my grade school years in a newly-built suburb of Philadelphia. Commuter culture was well established by then, so the dads in the neighborhood were gone during the day. So were some of the moms, mine included, but many were at home and were able to keep an eye on us kids as we played in back yards after school. And our yards were special. A group of parents had decided not to fence them, thus creating what was effectively a private park. The games we played varied from season to season but always involved a group of kids roaming along that grassy stretch. Nobody was watching us most of the time. Since the kitchens all looked out on the back yards, though, there was benign surveillance. Somebody’s mom might be looking out at any given moment, and if things got out of hand, somebody’s mom would hear that.

For most kids, a generation later, that freedom was gone. Not for ours, though! They were in grade school when BYTE Magazine ended and I began my remote career. Our house became an after-school gathering place for our kids and their friends. With me in my front office, and Luann in her studio in the back, those kids enjoyed a rare combination of freedom and safety. We were mostly working, but at any given moment we could engage with them in ways that most parents never could.

I realized that commuter culture had, for several generations, sucked the daytime life out of neighborhoods. What we initially called telecommuting wasn’t just a way to save time, reduce stress, and burn less fossil fuel. It held the promise of restoring that daytime life.

All this came back to me powerfully at the height of the pandemic lockdown. Walking around the neighborhood on a weekday afternoon I’d see families hanging out, kids playing, parents working on landscaping projects and tinkering in garages, neighbors talking to one another. This was even better than my experience in the 2000s because more people shared it.

Let’s hold that thought. Even if many return to offices on some days of the week, I believe and hope that we’ve normalized working from home on other days. By inhabiting our neighborhoods more fully on weekdays, we can perhaps begin to repair a social fabric frayed by generations of commuter culture.

Meanwhile here is a question to ponder. Why do we say that we are working from and not working at home?

Remembering Diana

23 Jan 2022 ~ Jon Udell ~ 3 Comments

The other day Luann and I were thinking of a long-ago friend and realized we’d forgotten the name of that friend’s daughter. Decades ago she was a spunky blonde blue-eyed little girl; we could still see her in our minds’ eyes, but her name was gone.

“Don’t worry,” I said confidently, “it’ll come back to one us.”

Sure enough, a few days later, on a bike ride, the name popped into my head. I’m sure you’ve had the same experience. This time around it prompted me to think about how that happens.

To me it feels like starting up a background search process that runs for however long it takes, then notifies me when the answer is ready. I know the brain isn’t a computer, and I know this kind of model is suspect, so I wonder what’s really going on.

– Why was I was so sure the name would surface?

– Does a retrieval effort kick off neurochemical change that elaborates over time?

– Before computers, what model did people use to explain this phenomenon?

So far I’ve only got one answer. That spunky little girl was Diana.

The (appropriately) quantified self

2 Jan 2022 ~ Jon Udell ~ 10 Comments

A year after we moved to northern California I acquired a pair of shiny new titanium hip joints. There would be no more running for me. But I’m a lucky guy who gets to to bike and hike more than ever amidst spectacular scenery that no-one could fully explore in a lifetime.

Although the osteoarthritis was more advanced on the right side, we opted for bilateral replacement because the left side wasn’t far behind. Things hadn’t felt symmetrical in the years leading up to the surgery, and that didn’t change. There’s always a sense that something’s different about the right side.

We’re pretty sure it’s not the hardware. X-rays show that the implants remain firmly seated, and there’s no measurable asymmetry. Something about the software has changed, but there’s been no way to pin down what’s different about the muscles, tendons, and ligaments on that side, whether there’s a correction to be made, and if so, how.

Last month, poking around on my iPhone, I noticed that I’d never opened the Health app. That’s beause I’ve always been ambivalent about the quantified self movement. In college, when I left competive gymnastics and took up running, I avoided tracking time and distance. Even then, before the advent of fancy tech, I knew I was capable of obsessive data-gathering and analysis, and didn’t want to go there. It was enough to just run, enjoy the scenery, and feel the afterglow.

When I launched the Health app, I was surprised to see that it had been counting my steps since I became an iPhone user 18 months ago. Really? I don’t recall opting into that feature.

Still, it was (of course!) fascinating to see the data and trends. And one metric in particular grabbed my attention: Walking Asymmetry.

Walking asymmetry is the percent of time that your steps with one foot are faster or slower than the other foot.
…
An even or symmetrical walk is often an important physical therapy goal when recovering from injury.

Here’s my chart for the past year.

I first saw this in mid-December when the trend was at its peak. What caused it? Well, it’s been rainy here (thankfully!), so I’ve been riding less, maybe that was a factor?

Since then I haven’t biked more, though, and I’ve walked the usual mile or two most days, with longer hikes on weekends. Yet the data suggest that I’ve reversed the trend.

What’s going on here?

Maybe this form of biofeedback worked. Once aware of the asymmetry I subconsciously corrected it. But that doesn’t explain the November/December trend.

Maybe the metric is bogus. A phone in your pocket doesn’t seem like a great way to measure walking asymmetry. I’ve also noticed that my step count and distances vary, on days when I’m riding, in ways that are hard to explain.

I’d like to try some real gait analysis using wearable tech. I suspect that data recorded from a couple of bike rides, mountain hikes, and neighborhood walks could help me understand the forces at play, and that realtime feedback could help me balance those forces.

I wouldn’t want to wear it all the time, though. It’d be a diagnostic and therapeutic tool, not a lifestyle.