The future of SQL: conversational hands-on problem solving

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/the-future-of-sql-conversational-hands-on-problem-solving/

I keep returning to the theme of choral explanations (#4 on my list of best practices), and it’s especially relevant in the SQL domain where there are just so many ways to write a query.

Exploring the range of possibilities used to be arduous, time-consuming and hard to justify. Now it’s becoming hard to justify not doing that; optimizations (sometimes major ones) can and do emerge.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Using AI to Improve Bad Business Writing

19 Code in Context: How AI Can Help Improve Our Documentation

Code in Context: How AI Can Help Improve Our Documentation

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/code-in-context-how-ai-can-help-improve-our-documentation/.

Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide a powerful assist, and Unblocked did.

I don’t know how to measure the boost it gave me. But I do know that I’ll never again want to undertake this kind of project without a tool that can help me assemble the necessary context.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Using AI to Improve Bad Business Writing

Creating a GPT Assistant That Writes Pipeline Tests

Here’s the latest installment in the series on working with LLMS: Creating a GPT Assistant That Writes Pipeline Tests.

Once you get the hang of writing these tests, it’s mostly boilerplate, so I figured my team of assistants could help. I recruited Cody, GitHub Copilot, and Unblocked — with varying degrees of success. Then I realized I hadn’t yet tried creating a GPT. As OpenAI describes them, “GPTs are custom versions of ChatGPT that users can tailor for specific tasks or topics by combining instructions, knowledge, and capabilities.”


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

How to Learn Unfamiliar Software Tools with ChatGPT

Here’s the latest installment in the series on working with LLMS: How to Learn Unfamiliar Software Tools with ChatGPT.

Ideally, tools like GeoGebra and Metabase provide interfaces so intuitive that you rarely need to read the docs, and you can learn the software just by poking around in it. In reality, of course, we need those docs — and they still need to be excellent. But now, we’ll extract a new benefit from them. When we can partner with machines that have read the docs, and can look over our shoulders as we try to do the things described in the docs, we’ll turbocharge our ability to dive into unfamiliar software tools and quickly learn how to use them.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

You say feature, I say bug: the enshittification of Microsoft Paint

I’ve happily used MS Paint as my basic bitmap editor since Windows 3, almost 25 years ago. Mostly I’ve used it to create images from screenshots, but that has suddenly become way harder. Formerly, when I’d cut a region, the now-empty region would display using the default white background. Now it displays a checkered background like so.

Here is the procedure to refill the white background:

  1. Switch the foreground color to white
  2. Use the Fill tool to fill the checkered region
  3. Then switch the foreground back to black.

ARE YOU KIDDING ME?

Nope. It’s evidently an unintended consequence of a pair of new feature: layers and transparency.

To get started, click on the new Layers button in the toolbar, which will open a panel on the side of the canvas.”

Microsoft also revealed today that an upcoming Paint feature is support for image transparency, which will add the ability to open and save transparent PNG files.

During editing, users will notice a prominent checkerboard pattern displayed on the canvas, serving as a visual indicator and highlighting the transparent regions within the image.

This ensures that when content is erased from the canvas, it is completely removed, eliminating the need to cover unwanted regions of an image with white fill.

bleepingcomputer.com

I never asked for these “long-awaited” new features, Paint is (or was) useful to me precisely because it only does the kind of basic bitmap editing I need when compositing screenshots. But I can opt out, right?

Nope.

ARE YOU KIDDING ME?

Nope.

This feature (layers and image transparency) seems to be introduced in September 2023 and doesn’t actually allow to be turned off.

Doing what vengy proposes for each and every image being edited is a natural madness and will drive even the most sane person crazy.

What worked for me was to uninstall Paint and replace it with a classic version:

  1. Uninstalling can be done by simply right-clicking Paint icon in Start Menu and selecting Uninstall from context menu.
  2. Classic Paint can be get from here or here.

  3. Download and install it.

  4. Go to Settings → Apps → Apps & Features → More settings → App execution aliases.

  5. Toggle the switch to Off for mspaint.exe and pbrush.exe items.

superuser.com

Evidently people are willing to hack their systems in order to revert to a now-unsupported version that they prefer. As insane as it would be, I’m considering whether to become one of those people. Sigh. I guess 25 years was a pretty good run.

Learn by Doing: How LLMs Should Reshape Education

Here’s the latest installment in the series on working with LLMS: Learn by Doing: How LLMs Should Reshape Education.

If you’re teaching SQL, this article points to a pedagogical challenge/opportunity: How would I create a lesson that guides a student to an understanding of CROSS JOIN without ever mentioning or explicitly teaching anything about it?

If you’re teaching anything else, the same question could (I’ll argue should) apply. How to scaffold learning by doing?


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

7 Guiding Principles for Working with LLMs

Here’s the latest installment in the series on working with LLMS: 7 Guiding Principles for Working with LLMs.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

Watch your hands

I’m lucky to have two hands, let’s be clear, and the minor struggles I’ve had with them over the years don’t qualify as any kind of real hardship. Yet there have been a lot of small injuries — an ongoing annoyance made worse by being mostly my fault.

Consider the most recent incident. To relieve strain on my left hand, I switched a few months ago from a steel-string to a nylon-string guitar. I knew I wanted the lesser force required to press nylon strings, and the vacation from steel strings has really helped. The wider spacing between the strings is also better for my hands, I realized as I settled in. I’d started on a classical guitar, then hadn’t owned one in decades, it feels good to have one again and I’ve been playing it a lot.

Being the guy who wrote a blog post about an early warning system for RSI not even six months ago, I see the absurdity of my situation. Back in July I was rehabbing an overextended little finger. Now I’m rehabbing a thumb and related muscles insulted by my failure to properly adapt to the new instrument.

You can wrap your thumb around the narrower neck of a steel-string guitar in order to grab the lowest string. You can do that with the wider neck of classical guitar too. But as I probably learned the first time and then forgot, you really shouldn’t. A D major chord with F# in the bass is too much of a stretch for the thumb, at least for my thumb, on a classical guitar. You won’t see classical guitarists do that. Instead they’ll make a tripod with the index, middle, and ring fingers.

So once again I get to rewire my hand posture. Which, again, is a minor hardship, not remotely comparable to the guy I mentioned last time who had to switch sides and learn to fret with his right hand. As I also mentioned there, he found an upside. Now he’s a switch-hitter who can use both halves of his brain directly. In my case, I’m trying to embrace the rewiring as a way to break a habit and form new neural pathways. It’d be nice, though, if that weren’t always a response to self-inflicted injury!

But like I said, it’s a minor hardship. My hands could have been mangled in my dad’s car engine that one time, or in the anchor chain of Ben Smith’s boat that other time: two escapes from disaster that still provoke the occasional nightmare. I’m lucky to have these two hands, and again vow to take better care of them.

Why is it so hard (for me, at least) to detect and avoid injurious hand postures? I guess it’s because whatever you’re projecting — when you write sentences or lines of code, or play notes and chords — has to flow through your hands with minimal conscious attention to your hands. Note to self: pay more attention.

Critical mass in the Goldilocks zone

The use of the phrase “critical mass” in this NYTimes story about the enshittification of Goodreads stopped me in my tracks.

Give all of Goodreads’s issues, it might seem easy enough to encourage writers and readers simply to flock to another forum. Sites like The Storygraph and Italic Type have sprung up as promising alternatives, but they’re still far from reaching a critical mass of users.

Nuclear physicists know they are dancing with the devil when they bring fissile material to criticality. They also know that the reaction can be controlled, that it must be, and that the means of control obey well-understood principles.

Social sites typically push toward supercriticality with no such understanding. If Goodreads enshittifies at 125 million users, why would another service expect a different outcome at similar scale?

We can learn from a natural experiment. Not mentioned in the story is a long-running service, LibraryThing, that’s been going strong since 2005. I interviewed its founder and lead developer, Tim Spalding, back in 2008. Listening to that interview again today reminded me that everything I loved about LibraryThing remains important and matters even more now.

LibraryThing was, and remains, a place where you make and share lists of books in order to connect with other people and with books — not primarily via friend relationships but rather book relationships. It’s a small business that’s kept Tim and his crew happily engaged in serving a few million bibliophiles, some of whom pay a membership fee to be super-cataloguers.

I’m not in LibraryThing’s core demographic. Books aren’t as central to my life as they are to members of the service who carefully curate their own lists, tag books and authors, contribute to a common knowledge wiki, and write reviews. But I appreciate their work when I visit the site.

Today I added Ed Yong’s remarkable An Immense World to my list. Among the book’s dozens of reviews on the site, I found a 2000-word essay that usefully highlights many of the strange (to humans) powers of animal perception that Yong describes.

I guess LibraryThing isn’t on the Times’ radar because it hasn’t reached a critical mass of … what, exactly? Tens of millions of people? Hundreds of millions? I’m glad it hasn’t! That’s a recipe for meltdown. LibraryThing has been going strong, for almost two decades, in the Goldilocks zone: neither too little activity nor too much, just the right amount for meaningful experiences at human scale.

I feel the same way about Mastodon. Conventional wisdom says it’s dead in the water: “nobody” goes there, no Mastodon apps rank highly in the app stores. But if critical mass means operating at the scale of Twitter or Facebook, then who wants that? Who benefits from the inevitable enshittification? Not me, and probably not you.

LibraryThing shows that a controlled reaction, at smaller scale, is sustainable over time. Mastodon so far has been successful in the same way, and I see no reason why that can’t continue. Although Mastodon is young, my connections there date back to social networking’s roots in the early blogosphere. It feels like the right amount of critical mass. For me, a billion people on Mastodon is an anti-goal. I would much rather see hundreds or maybe thousands of healthy communities emerge, each operating in its own Goldilocks zone. Many small and safe critical masses, instead of a few huge and dangerous ones, powering small businesses whose proprietors are — like Tim Spalding and his crew — able to maintain real relationships with their customers.

That global conversation we thought we were having on Twitter? We don’t know how to control that reaction and I’m not sure it makes sense to try.

Don’t look ahead. Look sideways as you climb the hill.

I do a lot more cycling in Sonoma County, California than was possible in Cheshire County, New Hampshire. The Mediterranean climate here, which enables me to ride year-round, is a blessing for my mental and physical well-being. And because the topography is even more rugged, I’m doing more climbing that ever.

Yesterday, Luann dropped me at the Coleman Overlook on the coast and I cycled home from there. I’m not a fast climber, I can’t keep up with younger friends when we cycle together, and my rides aren’t extreme by local standards. But my cumulative elevation gain over the course of a year far exceeds what it ever was back east, and I’ve had plenty of time to reflect on climbing strategy.

A better gear ratio would help, but my older road bike won’t accommodate that. So on the steepest pitches I switch to a weaving ascent that eases the grade, which I’ve decided is OK. For a while I resisted shoes with cleats that lock into pedals, but now I use them to gain extra leverage which really helps.

It’s never mainly about the equipment, though, the real challenge is always mental. How do you think about reaching the top of the big hill you’re climbing? One piece of conventional wisdom: don’t look ahead. If you look down at the road you aren’t forced to think about the grade, or your slow progress up it. Instead you see pavement moving beneath you, and feel steady progress.

Of course that defeats the purpose of cycling through spectacular Sonoma County landscapes. Recently a friend suggested a different strategy: look to the side. Of course! There’s little or no traffic on many of these routes, so it’s safe to do that. And the effect is mesmerizing.

I’ve described it like this:

Everything looks different from everywhere. You’re always seeing multiple overlapping planes receding into the distance, like dioramas. And they change dramatically as you move around even slightly. Even just ten paces in any direction, or a slight change in elevation, can alter the sightlines completely and reveal or hide a distant landmark.

So, don’t look ahead to the top of the hill, and don’t look down at the road. Look left and right to see sliding panels of majestic scenery. It really helps!

Puzzling over the Postgres query planner with LLMs

Here’s the latest installment in the series on LLM-assisted coding over at The New Stack: Puzzling over the Postgres Query Planner with LLMs.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

Debugging SQL with LLMS

Here’s the latest installment in the series on LLM-assisted coding over at The New Stack: Techniques for Using LLMs to Improve SQL Queries.

The join was failing because the two network_interfaces columns contained JSONB objects with differing shapes; Postgres’ JSONB containment operator, @>, couldn’t match them. Since the JSONB objects are arrays, and since the desired match was a key/value pair common to both arrays, it made sense to explode the array and iterate through its elements looking to match that key/value pair.

Initial solutions from ChatGPT, Copilot Chat, and newcomer Unblocked implemented that strategy using various flavors of cross joins involving Postgres’ jsonb_array_elements function.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

Let’s Talk: Conversational Software Development

Here’s number 12 in the series on LLM-assisted coding over at The New Stack: Let’s Talk: Conversational Software Development

I keep coming back to the theme of the first article in this series: When the rubber duck talks back. Thinking out loud always helps. Ideally, you get to do that with a human partner. A rubber duck, though a poor substitute, is far better than nothing.

Conversing with LLMs isn’t like either of these options, it’s something else entirely; and we’re all in the midst of figuring out how it can work. Asking an LLM to write code, and having it magically appear? That’s an obvious life-changer. Talking with an LLM about the code you’re partnering with it to write? I think that’s a less obvious but equally profound life-changer.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

How to Use LLMs for Dynamic Documentation

Here’s #11 in the new series on LLM-assisted coding over at The New Stack:
How to Use LLMs for Dynamic Documentation

My hunch is that we’re about to see a fascinating new twist on the old idea of literate programming. Some explanations can, will, and should be written by code authors alone, or by those authors in partnership with LLMs. Others can, will, and should be conjured dynamically by code readers who ask LLMs for explanations on the fly.


The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

How LLMs teach you things you didn’t know you didn’t know

Here’s #9 in the new series on LLM-assisted coding over at The New Stack:
Learning While Coding: How LLMs Teach You Implicitly

LLMs can deliver just-in-time knowledge tailored to real programming tasks; it’s a great way to learn about coding idioms and libraries.

As I mentioned on Mastodon, I know we are in a hype cycle, and I’m trying to report these findings in a quiet and matter-of-fact way. But when Greg Lloyd played this quote back to me, I got excited all over again.

This is the kind of tacit knowledge transfer that can happen when you work with another person, you don’t explicitly ask a question, and your partner doesn’t explicitly answer it. The knowledge just surfaces organically, and transfers by osmosis.

I’m certain this augmented way of learning will carry forward in some form, and improve the learning experience in other domains too.


1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

Is there an early warning system for RSI?

My RSI (repetitive strain injury) struggles began 25 years ago, after an intensive push to build and ship the BYTE magazine website. The tendons in both wrists became badly inflamed, and I spent a couple of months in rehab: anti-inflammatory meds, alternating heat and ice, stretching. It mostly resolved, but my left thumb joint remained painful. One day, as I wondered what could possibly be the cause, I looked down at my hands as I was typing. I use the world’s best (and rarest) split keyboard so there’s a left spacebar and a right spacebar. Suddenly it was obvious. I was always hitting the left spacebar with my left thumb, an unconscious habit that was repeated hundreds or thousands of times a day. And I was really slamming it too, another habit that I still have to monitor. Thus began one of a series of interventions to remap the relationship between fingers and keys. In that case, I assigned my right thumb to the spacebar, intending to switch to an alternating pattern once things settled down. All these years later, I look down at my hands now and see that never happened. I’m still using the right spacebar exclusively. I should probably try to change that, but there’s a more pressing problem.

Over the years, I got bitten a few more times by the effects of asymmetry. When my left hand became painful again, I realized I was mousing exclusively with that hand, and switched the mouse to my right side. I think I’ve gone back and forth a few times, currently it’s again on the right.

Another time, when I was doing a lot of lap swimming, my right shoulder became painful. Once again, it was something that should have been obvious, but wasn’t to the medical people I consulted, and wasn’t to me either until, once again, I recognized another asymmetry: I was always breathing on the left side, which put extra strain on the right shoulder. When I switched to an alternating pattern, breathing every third stroke instead of every second, that problem resolved.

You’d think I’d have learned my lesson by now. But when the knuckle of my left pinky suddenly became inflamed two months ago, it was the same story all over again. I attributed it to playing more guitar than usual, but when laying off didn’t help I thought I might have triggered arthritis. An x-ray was negative, though. What could it possibly be?

Then last night, while typing, I felt a twinge and looked down at my hands. It was blindingly obvious: I use emacs key mappings; the CTRL key is part of many of the basic chords; my poor left pinky is overloaded. Amping up the guitar playing probably pushed me over the edge, but it seems clear that is the root cause.

So I’m typing this post in order to adjust to yet another remapping. On this keyboard, there’s also a right CTRL key but it’s not convenient to my right pinky, so I’ve recruited the right SHIFT key (which I apparently never use) to be my new CTRL key. It’s not going to be an easy adjustment, but I won’t complain, I’m just glad to know I’ve found the way forward.

Given these experiences, I’ve thought a lot about how we can unconsciously injure ourselves with repetitive and asymmetric patterns of movement. In my case, I don’t seem to receive any signals until damage is already done. I wish I could notice sooner, do the necessary remapping, and continue with less disruption. But it just doesn’t seem to work that way.

I really can’t complain, though. Recently I met a guy in a guitar shop whose left wrist and hand became so painful that he had to switch sides and learn to fret with his right hand. It’s hard for me to conceive of doing that kind of remapping, but you do what you have to do. And as a bonus, he’s now a switch-hitter: as he recovers some use of his left hand, he finds that he can achieve different effects as a right-handed and left-handed player.

I’m pretty sure I’ll get through this episode. The question will then become: how to avoid the next one? When we use computer keyboards or musical instruments, we need our hands to work unconsciously and automatically. It’s only when injuries happen that we have to surface what’s been unconscious, enter manual mode, and retrain until we can form new unconscious habits. I wish there were an early warning system.

A new series on LLM-assisted coding

In the 20th episode of my Mastodon series I pivoted to a new topic: LLM-assisted coding. After three posts in the new series, it got picked up by The New Stack. Here’s the full list so far, I’ll update it here as the series continues there.

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

Although I’ve spent decades working with software, and still regularly write code in one form or another, I’m only an average programmer. There’s no doubt that LLM assistants can help me be better. But how much better? In what ways exactly? That’s what I aim to find out. My day job embeds me in a fast-moving open-source project, Steampipe, to which I want to contribute as much as I can. That ambition was, until recently, tempered by my own limitations. Now, augmented by the collective knowledge embodied in LLM assistants, I’m raising my sights. Larry Wall famously said: “Easy things should be easy, and hard things should be possible.” It’s starting to look like even hard things can be easy. But the devil’s in the details, so in this series I’ll focus on concrete examples of how LLM assistants do (and don’t) help me become a better developer.

Why LLM-assisted table transformation is a big deal

Last week I had to convert a table in a Google Doc to a JSON structure that will render as an HTML page. This is the sort of mundane task that burns staggering amounts of information workers’ time, effort, and attention. Until recently there were two ways to do it. For most, it’s a mind-numbing manual exercise. For some (like me) it can sometimes be possible to write a script to do the transformation, but that’s another mind-numbing exercise, and it’s always tough to decide in advance whether the time and effort saved will be worth the trouble.

Tabular data is famously complex. Recently, I wrote a Steampipe plugin to find tables in web pages and transform them into Postgres tables. Then, naively, I tried to do the same for PDF files. It had been a while since I used any of the available libraries for parsing PDFs. Last time it hadn’t gone well, I thought maybe things had improved since, but no, it remains an unsolved problem. Which is tragic for many reasons, not least that the corpus of scientific knowledge lives in PDF files, and the tabular info they contain remains unavailable to machine processing. That’s not just my opinion. I consulted Peter Murray-Rust, who has spent decades working on this problem, and he confirmed that as yet there’s no general solution.

An eye-opening outcome

So you can imagine my surprise and delight when ChatGPT-4 accomplished the task. As usual, I first tried the Simplest Thing That Could Possibly Work. Rather than exporting the GDoc to one of the available formats, I just copied from the screen, pasted into the chat, and asked for a Markdown table. What you get, when you paste a copied GDoc table, is completely unformatted: one item per line, with no indication of what’s a table header, or a section header, or a normal row. There’s no apparent structure at all. And yet the tool was able to find the structure in that undifferentiated stream of items, and faithfully recreate the table.

I then presented an example of the JSON format, and asked for a transformation of the Markdown table to that format. It was a big table with a dozen sections, each with a handful of rows, and that didn’t go well. What I’m learning, though, is that the same strategy we use as programmers — decompose big tasks into smaller chunks — works well with LLMs. That’s why I opted for Markdown as an intermediate format, instead of asking to transform the raw data directly to the target JSON format.

The next part was especially interesting. I asked for a list of the section names, which it reported correctly. Then I asked to translate each section, by name, to the target format. That worked. It still required a dozen manual steps, because I had to append each generated JSON chunk to the final output, but that took a fraction of the time and effort that would otherwise be required.

We are on the cusp of a new era of cognitive surplus. People shouldn’t be laboriously transposing document formats, as millions (maybe billions) of information workers spend minutes (maybe hours) doing every day. These are tedious chores; it’s impossible to overstate how taxing they are, or what a relief it is to outsource them to the machine. That said…

A few caveats

Beginner’s luck?: I was unable to reproduce the behavior exactly as described here. This seems typical so far with LLMs, and I sometimes wonder if some of the best outcomes are just beginner’s luck. But actually, I think it may have more to do with interaction boundaries. In ChatGPT (and friends), New Chat (or equivalent) doesn’t seem to reset in a way that enables me to reproduce a result. Maybe there is no way?

Context window: I tried prompting with docx and html exports of the GDoc, but they were too large. My guess is that more context headroom wouldn’t necessarily have enabled me to do the whole job in one go, versus multiple interactions, because the path from bigger prompt to a desired transformation becomes less likely to be followed. Again, who knows? We’re in a realm of experimention that’s full of such unknowns.

Verification: My first rule for effective use of an LLM to perform some well-defined task has been: It’s quick and easy to verify the result. Right away, simple bash and Python scripts met that criterion. A script that combines a bunch of isomorphic CSV files into one big file, while deduplicating the headers, either works or it doesn’t. That’s easy to verify. And it’s quick, too, even if you wind up iterating a few times. The table transformation described here falls into the same category. It did require a proofread, and I made a couple of tiny tweaks, but verification was quick and easy. Had it been 10x larger, though, the burden of verification would suck some of the value out of the LLM solution.

Other LLMs: None of the others did well on this task. I expect they’ll all improve, though, and as they do, it’ll become more and more feasible to ask for multiple results in parallel, and look for consensus among them.

PDF tables in the scientific record

It’s my understanding that arxiv.org is in the CommonCrawl, so the LLMs are seeing zillions of tables in scientific papers. Tables are used there for layout, for data, or for a mixture of both purposes. Nowadays, when a conversation turns to the brokenness of scholarly publishing, and the perpetual dominance of data-unfriendly PDF formats, my eyes start to bleed. Semantic markup, smoothly implemented from inception to collaborative editing to HTML publication, was always the dream. Maybe it’ll never happen that way; instead, we’ll just use anything to write and edit, we’ll publish to PDF (but please not two-column!), and then we’ll extract document structure at the end of the toolchain. I never thought that might be possible, I’m not sure it’s a right answer, but this is a good time to be open to unconventional possibiliites. Whatever unlocks the data in the world’s corpus of scholarly PDFs is worth doing.

Radical just-in-time learning

Buoyed by the success of my LLM-assisted refactoring exercise, I set a new and (for me) more ambitious goal. While the refactoring task was a bit challenging, I didn’t have to learn new concepts and syntax. The basic patterns and Go idioms for Steampipe plugins were already familiar, I just had to find the right way to rearrange things.

Now, though, I wanted to tackle another project that I’d been blocked on: a slider widget for Steampipe dashboards. Why? Well, there are million uses for an input control that reports continuously varying values. For example, the Mastodon dashboard currently limits results by means of a picklist with hardcoded values.

input "limit" {
  title = "Max results"
  type = "select"  
  sql = <<EOQ
    with limits(label) as (
      values
        ( '20' ),
        ( '40' ),
        ( '100' ),
        ( '200' ),
        ( '500' )
    )
    select
      label,
      label::int as value
    from
      limits
  EOQ
}

It would be nice, instead, to code a widget like this…

input "limit" {
  title = "Max results"
  type = "slider"
  sql = <<EOQ
    select
      20 as min,
      500 as max,
      40 as step,
      80 as default
  EOQ
}

…that renders something like this:

Why was this an ambitious goal? In this case there were a whole bunch of new concepts and idioms to absorb. A Steampipe dashboard resource touches many different aspects of the Steampipe system. The main components are: Steampipe itself, the base program that launches both Postgres and the dashboard server; the dashboard server that forwards queries to Steampipe and receives query results; and the dashboard app that defines widgets instantiated with query results. Multiple communication channels are in play, including a websocket connection from the dashboard server to the dashboard app, and event-driven message flow within the app.

For aspiring plugin developers, there’s an SDK and documentation about common idioms and patterns. No such tooling or guidance is yet available to the widget developer. Parts of the dashboard system are written in Go code, parts in React-flavored TypeScript. I’d used TypeScript before, but had never touched React, so I not only needed to learn some key React concepts, I also had to grok how they look and feel in TypeScript.

We’ve seen plenty of examples of LLMs writing Fibonacci generators, then writing tests for them, then translating from Python to JavaScript. Far more typical, I think, are tasks that require diving into an unfamiliar code base and learning just enough to make useful progress on some tactical goal.

Can you fly that thing?

Here’s my favorite scene in the Matrix.

Neo: Can you fly that thing?

(looks toward helicopter)

Trinity: Not yet.

(picks up phone)

Tank: Operator.

Trinity: I need a pilot progam for a B-212 helicopter. Hurry.

(eyelids flutter briefly)

Trinity: Let’s go.

(they jump in the chopper and fly off the roof)

That’s what I want, and what we all need, for coding (among many other things). Could LLM assistants help me rapidly learn what I needed to know to add a basic slider to Steampipe’s gallery of widgets? That was my experiment, and here’s my report.

An overly-optimistic first try

I started by feeding examples of existing input widgets to ChatGPT-4 and Sourcegraph Cody, and asking how to modify the patterns for a new slider widet. Predictably that yielded confident but useless results. Although Sourcegraph has now indexed the Steampipe repo, and is presumably making embeddings available to Cody, it wasn’t clear to me whether, or to what extent, that made a difference. No big deal, I wasn’t really expecting this approach to work, but it never hurts to ask!

Next I tried prompting with a José Reyes blog post that contains a detailed explanation of how he made a significant improvement to the chart widget. Would that material supply enough useful context? Not for ChatGPT-4 or Cody, evidently. If there’s a way to craft a prompt that uses José’s explanation to produce a working slider, I’d love to see it.

If you haven’t seen José’s blog, by the way, I commend it to you. Over the past few months he’s shown an astonishing ability to jump into diverse code bases and do useful things with them. His Steampipe series, in six installments so far, covers a wide variety of Steampipe topics in great depth and provides a wealth of instructive examples.

It’s humbling, to be honest. I’ve worked for a while now on the periphery of the Steampipe, building plugins and dashboards, but have barely scratched the surface as a contributor to the core. Although I’m wary of applying the 10x label to a developer, José clearly deserves it. Me, not so much, 1x at best. I’m a fountain of good ideas that I often struggle to implement quickly and well, especially in code bases that span unfamiliar languages, frameworks, and components. If LLM augmentation can get me even to 1.5x or 2x, never mind 10x, that’ll be a huge win for me — and for a whole lot of average developers like me.

A more realistic next try

Programmers aim to decompose tasks into small, manageable chunks. The same discipline applies when working with LLM assistants. For starters, I lowered the goal post. I originally hoped to implement a two-handled slider, like the one I used in this AV editor. But first I needed a basic understanding of how to wire up any kind of new dashboard widget, and for that I wanted the simplest possible slider.

I asked both ChatGPT-4 and Cody, in various ways, for ideas. At one point, Cody proposed <input type="range">. Oh? No kidding? I didn’t know that was a thing, thanks Cody! A native HTML element was just the right starter element. There would still be plenty to learn about Steampipe/React/TypeScript conventions for wiring such a thing into Steampipe’s dashboard system. No need to complicate matters with a library that implements a fancier slider but brings extra conceptual baggage.

As a widget developer, you’re operating in an environment where the results of your widget’s query are magically available in a data property that you can unpack in the widget’s TypeScript code. How is that data made available to the widget? It’s complicated. I spent a good while spelunking to find out how exactly it’s done, I still can’t clearly articulate all the ingredients, or data flows that interconnect them, and I didn’t find a way for the LLMs to usefully aid the investigation. At one point I found myself in Chromium’s devtools, looking at websocket traffic and setting breakpoints to isolate event sources and handlers. I can envision LLMs enlisting debuggers, just as ChatGPT can now consult WolframAlpha via a plugin, but we’re not there yet. Manual debugging (where available!) will remain a critical skill for exploring unfamiliar code.

Tracing that data flow wasn’t strictly necessary, though. For my purposes, it was OK to just accept that the data property magically holds the results of the query. Now I needed to work out how to use that data to initialize the widget, and then wire up a handler to set a new value when the slider’s handle moves. As a React/TypeScript newbie, though, even that was nontrivial. Last time I tried I got lost in the abstractions and came up empty-handed. This time, I hoped my pair of rubber ducks could help me break through.

Exploring examples

I modeled the slider on another widget type, select, which builds a picklist and reports the selected item. It wasn’t an ideal starting point because there’s a family of select widgets, interrelated in ways that require more complex coding than the slider would. Neither of the LLMs managed to distill the essence of a slider from those examples, but in the back-and-forth dialog with them I began to put the pieces together.

It was straightford to put a slider onto the canvas. Then it took me a while to work out how to wire it into the React machinery. At one point, as multiple useEffect hooks came into play, I found myself wondering about why their dependency arrays were sometimes empty. Both LLMs gave the same answer: an empty array means the effect only runs once after the initial render.

I’m an example-driven learner, though, so I wanted to see which of the useEffect hooks in the code base work that way. ChatGPT-4 can’t see the whole repo, so it doesn’t know. Cody, in VSCode, does see the repo, and is further enlightened by the embeddings in Sourcegraph’s indexed copy of the repo. Cody found some of the examples, but not all. Then it occurred to me to try Sourcegraph’s structural search. Here’s the query that found all the examples.

context:global repo:^github\.com/turbot/steampipe$ useEffect(..., []) 

It’s interesting to see that, while Cody is joined at the hip to the Sourcegraph web app — and indeed even runs in a sidebar there, as well as in VSCode — it doesn’t yet seem to tap into structural search. Perhaps that will become the first external tool that plugs into Cody?

Controlled vs uncontrolled components

At one point, as I worked out how to persist the slider’s selected value, I encountered what I now realize is a common React error.

Warning: A component is changing a controlled input to be uncontrolled.

Me: Huh?

Cody: This warning is occurring because you are setting a value prop on the element, but also listening for onChange events. This is conflicting – either the input should be controlled (by setting the value prop) or uncontrolled (by listening for onChange events). To fix this, you have a few options:

  • Make the input controlled by setting value and handling changes with an onChange handler that calls setValue().

  • Make the input uncontrolled by removing the value prop and keeping the onChange handler.

  • Use a ref to access the DOM input and read its value when needed: const inputRef = useRef<HTMLInputElement>(null);

Eventually I worked out how to do it both ways. Then I asked ChatGPT-4 and Cody to compare them, evaluate tradeoffs, and weigh in on which approach to take. One favored uncontrolled’s simplicity, the other liked controlled’s flexiblity. Ultimately that’s for the maintainer to decide, if this turns into a pull request. For now I’m sticking with controlled, with a better sense of why than I could otherwise have easily gained.

A catalog of concepts

Beyond the controlled/uncontrolled distinction, here (courtesy of ChatGPT-4) are some of the concepts and idioms that this exercise introduced me to:

  • Functional Component: SliderInput is defined as a functional component. In TypeScript, the props that this component accepts are explicitly typed as InputProps.
  • Props and TypeScript Interfaces: In TypeScript, interfaces are used to ensure the props have the correct type. The data and name props are passed to the SliderInput component, and their types are defined by the InputProps interface.
  • Destructuring Props: The props are destructured in the function parameter, allowing the properties to be accessed directly as variables (data and name) instead of as properties on a props object.
  • useState Hook with TypeScript: The useState hook is used to add state to functional components. In TypeScript, you can provide a type argument to this hook to specify the type of the state variable. In this case, value is defined as a state variable with the type number.
  • useEffect Hook: The useEffect hook lets you perform side effects in function components. In this case, it’s used for two purposes:

    • To update the value state when the stateValue prop changes.
    • To dispatch an action to set the initial value of the slider when the defaultValue changes.
  • Custom Hooks with TypeScript: Custom hooks are a mechanism to reuse stateful logic between components. The useDashboard hook is used to access the dispatch function and selectedDashboardInputs state from a dashboard context. These returned values also have their own types defined.
  • Event Handling with TypeScript: Event handlers are used to handle user interactions with the component. In this case, the sliderChange function is triggered when the slider value changes. The type of the event object e is inferred from the context, but it could also be explicitly defined with React.ChangeEvent<HTMLInputElement> for stricter type checking.
  • JSX and TypeScript (TSX): JSX is a syntax extension for JavaScript that is used to describe what the UI should look like. When using TypeScript, this syntax is referred to as TSX. TypeScript provides additional type safety in TSX code.

The output of this exercise was a few dozen lines of code involving all that conceptual stuff. But the result of the exercise feels bigger. As a developer of Steampipe dashboards, there are a few other affordances I’d like to have. Yes, it’s open source, so I can contribute, but I’m hard-pressed to justify climbing that learning curve. The LLM assistants ease that curve; I’m a better learner with them than without them; more feels possible than before. Doug Engelbart dreamed of augmenting human intellect. I wish he’d lived to see this moment.

Ed Iacobucci on DayJet, from IT Conversations

In a 1991 BYTE review entitled Citrix’s New Multiuser OS/2 I wrote:

Citrix Systems, a team of IBM refugees led by OS/2 guru Ed Iacobucci, has extended OS/2 into the multiuser territory occupied by Unix, Xenix, QNX, Theos, and a variety of DOS-based operating systems.

That’s how I met Ed Iacobucci. He left Citrix in 2000 to found DayJet, a company that aimed to make real the vision of point-to-point air travel that James Fallows had championed in his book Free Flight.

In 2007 I interviewed Ed for IT Conversations. It was one of the most memorable of the podcasts I made for that series.

Ed died in 2013.

The other day, Alan Levine pointed me to Descript, a new tool for making podcasts with transcripts; it features a technique, first shown to me by Laurian Gridinoc, that uses timecoded text as the interface for audio editing.

I’ve been waiting for that technique to appear in a mainstream podcast production app, so when I heard Descript was doing this I jumped right into a remake of the Ed Iacobucci podcast. It was a remarkable experience in many ways.

There was no automatic transcription back then, so while I did a few experiments with Mechanical Turk, there were no transcripts of my shows. I uploaded the show’s MP3 file to Descript and it did a fantastic job of speech-to-text, faster and better than anything else I’ve tried. But that was just the start.

I’m listed as the post-production engineer on that original podcast, which meant I edited the audio for clarity: eliding phrases or sentences, removing ums and likes. Doing that in a waveform editor was laborious to say the least. With Descript I was able to do much more of that kind of editing, better and faster, because text selection as the interface to audio editing is the absolute holy grail.

But I wasn’t just editing the audio, I was also tuning the transcript at the same time: correcting the recognized text (pretty rarely!), tweaking the auto-generated speaker labels, refining the flow to strike a harmonious balance between the words you hear and the words you read.

There’s much more to say about Descript, which by the way deployed a significant upgrade in the middle of my project, but for now I’ll just say: Thank you! It’s a brilliant piece of software that enabled me to revisit one of my most treasured conversations and bring it to life in a way that people can now search for and read, as well as hear, with maximum fidelity. DayJet folded in 2008; it was a remarkable tale of innovation; here’s hoping Ed’s dream will come true.

Here’s a link to the new version of the show’s MP3 file. Here’s the interview as published by Descript. And here’s the HTML export of the transcript.


Phil Windley: Up next on Interviews with Innovators, host Jon Udell talks with Ed Iacobucci, co-founder of Dayjet. Widely known in IT circles as the co-founder of Citrix, he left in 2000 to pursue his interest in aviation. In 2000 he co-founded DayJet, a company whose mission is to deliver on-demand jet travel services.

From IT Conversations.

Jon Udell: I have an ulterior and extremely selfish motivation for having this conversation. So you’ve been to Peterborough when you came to visit us at BYTE. You had to schlep to Manchester and then drive an hour, right?

Ed Iacobucci: And getting to Manchester wasn’t even easy coming out of Florida.

Jon Udell: Now, what you probably didn’t know is that 20 miles west of Peterborough is the town I live in, which has a fabulous, and of course completely underutilized regional airport. We have in Keene an airport where we have, I think the third longest runway in the state.

It can land and does occasionally land Air Force One during political seasons. And back when you visited me, at BYTE in the, what was it like the late eighties, early nineties, we had commercial air service out of Keene that was subsidized by the essential air service program of the government. And so I was actually able to drive two minutes to my airport and I could be midtown Manhattan in an hour, quicker than anyone else could get there.

Ed Iacobucci: But I bet it was hard to get to Huntsville, Alabama, though.

You can fix one degree of freedom, but not both.

Jon Udell: At some point I picked up on this emerging air taxi movement and I read the James Fallows book, and I just reread it the other day to prepare for this conversation and I’ve been basically just desperate for this to play out.

I’m just really curious to know, first of all, how you got into this and second of all how it’s going and, and third of all, most importantly for me, cause it’s all about me, you know, when can I be using my own airport again the way I’d like to?

Ed Iacobucci: (laughs) Well, that’s the objective.

I can’t tell you that we’re gonna achieve it all overnight. It’s one of those frontiers that’s never been properly addressed. You know, I spent most of my career, as you did yours, trying to elevate the computational and communication capabilities that businesses have through systems and networks and such. But one thing that we never really have addressed, on a broad scale, is the same interconnectivity from a physical standpoint.

Jon Udell: And of course architecturally it’s the exact same style, right?

Ed Iacobucci: It’s an awful lot of similarities.

Jon Udell: Yeah. It’s the peer-to-peer air travel network.

Ed Iacobucci: That’s exactly what it is. I love talking to people that understand what I’m saying when I talk technology. It’s a different world that I’m involved in now, but basically, it is peer-to-peer versus hierarchical networks and all the same issues relating to efficiency and best routing in a dynamic environment.

That’s not the only issues, but those are some of the issues. That’s what attracted me to the project. I think if you remember when I left Citrix, one of the things that I wanted to do other than retire, which lasted about two months, that’s pretty boring actually. Retirement isn’t that’s cracked up to be. One of the things that I wanted to pursue, at least intellectually was where do we go with software as a service? Which remember we started exploring back in the late nineties. And Citrix, obviously that took on a life of its own.

And with ASPs and with a massively interconnected and computationally rich web, you know, one of the things that we’ve always done as technologists is focus on the evolution of infrastructure that people use for delivering things.

But there hasn’t been as much of a focus on vertically integrated, highly specialized systems, which, in many ways we always kind of poo-pooed. I mean, real technologists work on horizontal platforms, you know, kind of a macho attitude. In reality, that’s one step, but every pendulum swings.

And I think that the place where there’s a lot of value that can be added by a lot of smart people is, instead of building a highly horizontal platform that appeals to a lot of people, is go the other end of the spectrum and build a very highly specialized, vertical solution that solves one problem really well and leverage that platform not by making it available to a broad audience, cause almost by definition there isn’t one, but focus on the needs of a fully integrated service delivery. And that’s really what drove me to something that looks so different, like a service industry.

Jon Udell: The appeal of the air taxi idea though, is extremely broad. I mean, initially it’s not going to be something that most people, I guess, will be able to afford. But the idea can go extremely broad, right?

Ed Iacobucci: Oh yeah. If we actually dive into the market itself, I like to joke with people, saying that Citrix was a lot of fun, but man, was it hard to explain to people why you’d wanna do remote computing when they didn’t think they needed it.

This on the other hand is the exact opposite. Everybody understands exactly what it is that we’re trying to achieve: seamless point to point travel on a cost efficient basis. And they grasp the concept very quickly, but very seldom do most people think deeply enough in terms of what the complexities are that are behind it, because, as a lot of appealing concepts are, it’s easy to come up with an idea, but it’s really hard to make it real.

A lot of people have had this idea for many, many years. In fact, that’s probably one tha t’s been kicked around for ages, but it’s not until recently — and by recently, I mean, I don’t know, 15 years, 20 years — that we’ve had the kind of tools that you need to really tackle the problem. Not solve it, but just tackle it.

Jon Udell: So let’s go through what the enablers are. Now the Fallows book lays out two broad categories of technological enabler. One is a new generation of general aviation craft, which are smarter, faster, lighter, and safer than what we’ve had, kind of unchanged really, for 30 or 40 years.

So that’s the airplane part of it. And he focuses a lot on that. In fact, a lot of the book is the story of, in particular, Cirrus.

Ed Iacobucci: Sure.

Jon Udell: In parallel with that, and he talks less about it, but I’m hoping that we’ll talk a lot more about it, is this notion that you also have got now this thing called the internet. And one of the things that it can do is make demand visible and aggregate demand.

That if you could go to the internet and put a bid out that says, well, I’m here, and if there’s three or four other people within a 50 mile radius that wanna go to this other place, or various places within a 50 mile radius of some destination, then we can discover one another in a very dynamic way.

That was never conceivable until fairly recently. But I think that you’re talking about also operational routing capabilities.

Ed Iacobucci: There’s a lot of complicating factors and, from a broad brush basis, I agree totally. It’s the hardware and the communications and software, obviously, that all come together. But I mean, at it’s very root, first of all, let’s talk about just general trends. What the real enabler in all this is really what we’ve been doing for 20 years.

It’s Moore’s law, it’s cheap memory and it’s good communications. I mean, that’s really what enables manufacturers to build airplanes better. That’s what essentially the glass cockpits are. I used to joke about it when we first started working on this is that it’s, you know, it’s a thin client.

It’s three pieces of glass up in the front and what used to be individual steam gauges all became applications that run an operating system. It’s highly redundant and, you know, multiple data buses and all sorts of enhancements to guarantee a certain level of reliability but at its core it’s an application running on an operating system. So the IT gains that we’ve made, or the technology gains, I should say, in the last 20 years, come very strongly to play in terms of how these new airplanes are designed.

Now having said that, let me back off for a second and say, I have a slightly different opinion that it’s not necessarily the airplane that’s revolutionary because this is something that I’ve learned in talking to a lot of manufacturers. There’s been a lot of hoopla about, you know, these VLJs (very light jets) are really what’s gonna change the world.

In reality, it isn’t really the airplane. Other than a few things like the integration of information systems, it’s still the same basic laws of physics that come to play. It’s still a wing, it’s two turbo fan engines, it’s one or two pilots. Structurally and aerodynamically it’s not a massive change from what existed before. The cost points are better. The manufacturing advantage that a modern manufacturer can do by intelligently engaging (with the) supply chain. Precision design and manufacturing are incremental improvements. But in the final analysis, a VLJ might give you 10, 20% better capability than a pre-existing design.

The other thing the VLJ lets you do is build volume because, as you’ll soon see, all these models are based on large, dense networks.

Jon Udell: You need a big fleet of taxis for this to make sense.

Ed Iacobucci: You can’t do it with one or two, which is traditionally what people have done. And the problem traditionally is when people independently buy and then let out airplanes to the charter world is that they all wind up being personal things that are all different.

And if they’re all different, then you can’t optimize, you can’t do fleet maintenance programs, you can’t scale. So those are all barriers to scaling. So what you need is a large fleet of identical aircraft that are modern, that can be acquired in fleet quantities.

Part of what this revolution’s about, be it a VLJ or turbine or turboprop, I mean, the airplane technology isn’t as important as the fact that you can make them in large quantities.

Jon Udell: And you mean literally identical, or do you mean interoperable and interchangeable?

Ed Iacobucci: Literally identical. Well, it’s the same thing. From an operations standpoint, we abstract it a little bit and say, as long as they have the same operating characteristics, when you train on one, you can fly any of them. Whereas private aircraft on charter, every one of them is different because they have different instrument sets.

So on the aviation side, airplane design certainly benefited from technology, but it isn’t a revolution. It’s more of a rapid evolution, from a manufacturing standpoint. I think there’s a good shot that, you know, we can start seeing something that looks a little bit more like a revolution in that sense, that you can make a lot of them fast. Be that Honda, be that Eclipse, be that whoever, they’re all gonna wind up in the same place.

But the real revolution, I think, is often missed because it’s least understood. It’s in the business model. You and I have spent a lot of our lives in technology and we’ve seen the effect of creating or identifying new markets, new customers, new value propositions, and how things just spring up outta nowhere.

This is something that’s been part of our lives, our careers. But has not really played out very well , in many industries, one of which is aviation. The core business model of commercial aviation hasn’t changed at all.

They’re scheduled services. You can say they went to hubs and then Southwest did something a little bit less hubby, but it’s all about building a schedule and advertising and getting people to show up. Which is kind of like defining a big freight train that goes around on a schedule and and then you try to reach everybody possible.

And you have big wars in terms of who piles in and how many seats there are. And it’s a fixed market. I like to say, it’s the same one and a half million people that fly from Atlanta or Orlando every year. They fly on different airlines and they all fight each other over who gets the share.

But it’s not a growing market. It’s essentially the same zero sum game.

Jon Udell: And everyone’s going through the same hubs and those hubs are beyond capacity.

Ed Iacobucci: And the charter world, nothing changes. The tax subsidies change from year to year, so it’s more attractive or less attractive for people of means to buy airplanes and then give them to somebody to manage them, to help offset costs.

But it isn’t a fundamental sea change in model. So what we tried to do was kind of abstract that, and I actually had a kind of a false start after I left Citrix. I feel very fortunate that I’ve been able to be in a position where I could actually buy my own airplane which to me, it isn’t so much the luxury of having an aircraft.

It’s really the luxury of being able to schedule your time. That’s really what it’s all about. It has nothing to do with carpeting and entertainment systems or gold plated anything or, cause I’m very down to earth when it comes to that. You know, I travel in blue jeans all the time and t-shirt cause that’s the way I lived most of my life.

Jon Udell: But you go when you go when and where you want.

Ed Iacobucci: But I go when and where I want. That’s the real luxury. And that appeals to everybody. You don’t have to be wealthy to see the value. Everybody has an intrinsic value of time.

So when I first left, I engaged Nancy, my uh, I think when I first left Citrix, we were just married about then, so she was transitioning from fiance to wife, um, to manage our bigger airplanes while I tried to figure out a business plan to make it, you know, like business jets for the masses. And I’ve found out very quickly that that’s a really hard problem because the deck is stacked against you.

The aircraft are extremely expensive. You know, a Challenger 604 is about 25, 26 million new . It’s a very high capital cost. Everything is handbuilt. It’s very high level of service, very customized service. You can’t mass produce it in any reasonable way.

The expectations of the customers are high. The asset utilization’s low. And believe it or not, the competition is incredible because a lot of people have bought airplanes that, throw ’em at the charter to offset costs. So it’s just like all the elements of a crummy business all rolled up in one. And coming right outta software, to me it was a shock to see that this was really such a bad business environment. So basically I hung it up there, I said, nah, I don’t think I wanna spend a lot of time on this.

Then something happened. Two things actually, since you brought it up already. I read Fallows’ book and about the same time I went to Agenda 2001 and Vern Raburn who I knew from years past, most notably from Slate, although he was at Lotus and Symantec also, Vern Raburn was pitching his new sub- million -dollar jet for the masses.

That was his idea. I was skeptical, knowing a little bit about airplanes, but in the same panel we had Dr. Bruce Holmes who was the head of the NASA, well, one of the A’s in NASA is aeronautics, right, it’s not all just space.

He was the chief strategist for the aeronautics, the small airport systems and integration and technology to utilize underutilized assets that we have in our country.

Jon Udell: Just as an aside, for people who haven’t read the book, there’s a subplot about NASA and about Bruce Holmes and about Dan Goldin and about how these guys in a very, kind of underground way, were really seriously trying to figure out how to reinvent air travel and find a way to utilize these 5,000 regional airports that we have around the country.

Ed Iacobucci: Which is a huge capital investment that our country’s made.

Jon Udell: Exactly. Exactly.

Ed Iacobucci: And yet we’re still piling everything into 15 major airports.

Jon Udell: Exactly. I think Bruce Holmes is the guy who clocked all of his travel door to door, for a period of years.

Ed Iacobucci: He still does.

Jon Udell: And he found out that …

Ed Iacobucci: Bruce works for us now.

Jon Udell: He does? No kidding.

Ed Iacobucci: I hired him this year

Jon Udell: Wow.

Ed Iacobucci: To be our chief strategist for next generation air traffic systems.

Jon Udell: That’s awesome!

So the story is that he clocked all of his trips and he figured out that for anything under 500 miles, it was like just as fast to drive.

Ed Iacobucci: Exactly.

Which was, to me was a tremendous epiphany. Of course, we all know that.

Jon Udell: Yeah.

Ed Iacobucci: We just never really actually took the trouble to measure it all.

Jon Udell: Exactly.

So what did he say at Agenda that turned things around for you?

Ed Iacobucci: So I went to Agenda, which was as you know, it’s a technology conference, always has been, first and foremost. There’s a panel there, it was about aviation and it was a real oddball panel. They had Fallows, they had Dr. Holmes and they had Vern on the panel. I was already in the middle of trying to investigate how to do aviation. I was on the verge of giving up and I was looking for other things to do. And it dawned on me that what we needed to make point to point travel a reality was a platform. That’s what we were missing. The platforms were big, expensive handbuilt creatures that were bought by very, very wealthy people and, as I said, underutilized expensive assets.

But what we needed was something that was more akin to what the PC was before we had explosions of local area network. So it dawned on me what Vern was saying about his Eclipse –which was still a concept, this was 2001, he hadn’t really built anything yet– was an approach to what he called making the model T of business jets, and in other words, being the first mass produced precision manufactured aircraft.

And so that got me thinking, so I gave Vern and his wife a lift back on our airplane after agenda, back to Albuquerque with Nancy. And we went and took a look at his new facilities and immediately started thinking of software, network operating systems, and so forth. And that set the wheel in motion.

Jon Udell: So when you use the word platform, you are not just referring to the equivalence of the plane to the pc, but you are referring to the software systems that animate that piece hardware.

Ed Iacobucci: In the early days, I looked at the airplane as the pc. I mean, this is kind of oversimplification, but let’s say that the airplane was the mass produced pc, the engines were– originally Williams and then ultimately Pratt and Whitney was the Intel making the thing that had to be mass produced to make these things a reality.

It was incredibly important, high value component. And then I looked at ourselves as some kind of a blend of an operating system/ network operating system/ application that laid on top of all that and it delivered low cost, point to point travel.

Jon Udell: Because that’s what operating systems do is resource utilization.

Ed Iacobucci: Resource management.

Jon Udell: Okay, so we’re getting down to it. Describe what kind of network operating system you are building.

Ed Iacobucci: Okay, so here’s the thing. It didn’t take very long to look at the business models and say if you take a $25 million airplane and replace it with 25 $1 million airplanes, then it fixes a lot of problems. It gives you a lot of robustness in terms of being able to service individual demands. If you couple that also, by the way, with doing it by the seat instead of by the plane. That lets you interleave packets, or payloads. It became very clear that what we need to do is build a very large self optimizing network that would take a lot of other factors into consideration.

Like the physics of the airplane, the temperature, the loads. The beauty of aviation is that it’s like physics meets business. How much you can carry depends on what the temperature is. The altitude, the runway lengths. Safety is all expressed in terms of parameters that you can operate under, right?

But now the optimizer has to take it into account as it starts shuffling around different customers. So it’s not just a straight optimization, it’s an optimization that has to be done very quickly, real time, and it has an incredible number of constraints, many, many more than you would normally have.

So that’s why it’s so important. You have to eliminate all the optional constraints. Like the customers that want black jelly beans. Well they ain’t, they aren’t gonna be our customers. Right. You have to be willing to take any flight. With no special services. Cause special services complicate things.

Jon Udell: But that’s okay. Southwest has found that people are very happy to go along with that idea.

Ed Iacobucci: It’s the same, it’s the same notion. It simplifies, but, but when you’re talking about a real time on demand network, it becomes even more important. So I had to hire mathematicians. I actually had a group of folks that I’d had previous experience with in adaptive compression algorithms.

Really smart guys. We brought them on, gave ’em a challenge of their lifetime and actually for the rest of their lives. The problem is you never find a solution to the problem cause it’s what mathematicians called NP Hard. Which means that you can take every computer on earth that’s ever gonna be manufactured between now and the end of our lives and put it to work on the project til the end of our solar system, and you’re not gonna find the optimal solution. So you have to move from traditional hard optimization into heuristics that are aided by optimization techniques. And so then we brought in Georgia Tech, who I’m alumni of, and I’m on their engineering board, advisory board, brought in the operations research group. And these are really heavy hitter guys that have been doing optimization. Actually, some of ’em were my professors when I was at school, it was funny to see ’em again. These guys did optimization routines for large air carriers. But the difference, the essential differences is in a fixed schedule, there is a fixed schedule.

And optimizing assets around a fixed schedule is a vastly different problem than trying to determine the most optimal solution in real time for something that doesn’t have any fixed schedule and that’s morphing every request that comes in. Basically their response was, nobody’s ever done this, and these are the, you know, smart OR guys, and since then, MIT guys have said the same thing, you know, a lot of people.

It really is a problem that hasn’t been very adequately addressed. So now it looks really exciting for me cause you know, if it was just like chartering airplanes, that is not very exciting. But now we’ve got new science, new math, it’s a lot of green fields in areas that we can get collaboration of some major universities.

Top-notch people want to work and want to assign PhD students to work with us. An opportunity to set up a group of really smart people. And so we did that. And that’s how we got started. And that was the beginnings of the optimization. And the idea is that every reservation that comes in, you have to figure out how how to value these things.

So we have invented a market model where the pricing is based on how you value your time. If you value your time highly, you don’t wanna waste any time at all, you want to say, I’m gonna tell you I’m gonna be there at nine and I’ve gotta be there by 10 :30. And we know that it’s about an hour and a half flight. That’s a very expensive ticket.

But then if you say, I’m willing to leave as early as six in the morning and be there by noon for something that nominally takes an hour, then we can afford to give a very large discount for that seat.

Without knowing whether we’re gonna match anybody up or not. Statistically, we can determine pricing models. And we did that through, uh, I’ll explain in a second another area of science that we have to bring in. But, with that flexible model, that gave us the degrees of freedom that the optimizers could build some really effective schedules. And so then we used Georgia Tech team to do traditional formulations and run what we would determine in real time. They would run it over the weekend on their mesh of computers and determine if they could improve upon it with adding a lot of computational time.

So we’ve kind of determined that our algorithms and heuristics are within 5% or less of what ultimately might be optimal solutions for problems that you can test. Which are smaller problems. I mean, they can’t use the traditional techniques for a typical network because like I said, it’s far too big.

So for problems of, I don’t know, 10 airports and 20 aircraft, or 50 aircraft, they can test our optimality. It doesn’t work for 300 aircraft…

That was half of the problem. I don’t wanna bore you with a lot of details here, so

Jon Udell: It’s not boring, believe me. But feel free to move along.

Ed Iacobucci: The other side of the equation was that we also had to knock the problem of mesh and how you parallelize things cause these problems aren’t really well suited to be solved in parallel. So we had to build a strategy around that and build a mesh computing platform and all that stuff. That’s kind of like nuts and bolts, but that went on in parallel. But the big problem that we had then, which was one that had been nagging me from the beginning was, if we really succeed at building a new market, which I believe we are, is, how do I know how many people are gonna use a service that never existed before? Especially if it’s people that maybe hadn’t even flown before.

Jon Udell: Yeah, this is the the web 2.0 problem of, oops, I’m too successful.

Ed Iacobucci: Or, how do you determine what is gonna be successful or not? Cause these are really big asset investment decisions you have to make, you have to lay out airplanes, you have to lay out bases and people, and you have to go places and you have to do marketing investments and, believe me, there’s far more ways you can go bankrupt than there are that you can make money because as it turns out, jump forward a little bit, that these value networks, which is really what we’re building, is a value network between regional points.

The composition of the value network determines the load of the value network. Which is, you know, I guess intuitive to somebody that looks at networks. But from a social standpoint, it’s all the same rules apply. If I have a network of nodes A, B, C, and D and I add E, that could have an impact on all the nodes at their various times of the day.

But if I add F it could have impact some nodes differently than other nodes. So it’s an interrelated loading. A problem that’s very, very difficult to determine. So for that, I thought, you know, we’ve got all these guys that are taking basically chaos and organizing it into order so we can file flight plans and make it all look organized or actually be organized on the back end.

But what I need is another group of people that creates organized chaos or organized complexity that mimics the social behaviors of a region of travelers that can be used to test how well we can reorganize that into order.

Jon Udell: So you mean you need to actually be able to model…

Ed Iacobucci: …you need to model the markets and the demand. And that’s not a simple problem either cause it’s gonna depend on pricing and value, time/ value trade offs and density of your transportation network and what nodes you introduce and what the interactions between the nodes are, cause every city has a different effect than others.

For that, I thought, you know, I did study operations research way back when and never really used it heavily, but one of the things I always loved was modeling. And, in fact, before even did Citrix, my other alternative plan was doing something similar to what Maxis did, and that’s built a platform for large scale simulations and never did that.

They did a marvelous job, by the way. I’m thinking the kind of thing that Sim City represented in a generic sense is the kind of technology we wanna bring in. And that happens to be, the modern phraseology for that is complexity science.

When I was at school it was discrete time simulation, but then it got a biological twist to it. And so it became complexity science. At one point in time it was called chaos theory, but complexity science is more, uh, accepted terminology for it. And I had some people that I had done some work with, in fact, through one of my directors , we had both served on the board of a Bios institute in Santa Fe, which was an offshoot of the Santa Fe Institute, which was biological modeling or evolutionary modeling of large complex systems.

So we got in touch with some of those folks, and really, really smart guys that have been doing this all their lives, and we offered ’em a job. We said, Hey, come on board and we’re gonna build the most sophisticated regional traveler model that anybody’s ever built that’s gonna have a great deal of fidelity.

They were gonna use it not just for postulating the future, but actually for building a business. And integrating that as part of our long term planning systems. And so they came on and worked for about four years. And the result was this other piece of technology, which gets married to our system, which is called ABM.

We call it ABM. It’s not very imaginative. It’s agent based model.

Jon Udell: Not anti ballistic missile.

Ed Iacobucci: No, no. And it’s basically, I mean, the simplest way to describe it is, it’s Sim City on steroids. A commercial grade Sim City on steroids. Very targeted on the specific problem of regional travel.

So we’ve got like nine different types of agents or sims that operate and they’re populated according to IRS statistics and 10 square mile zones in the country. And they all have different rules on how they book trips and what flexibility they have.

And then we loaded on top of that a bunch of demographic data. Some we bought, some we got from DOT, some we got from IRS. And then we loaded on top of that all the schedules for all the airlines between all the airports in the contiguous 48. And then we actually developed some driving algorithms so we can estimate driving times. Not great circle, but actual routes and added congestion, time of day congestion through various nodes when driving.

Then we added train schedules and basically had this very sophisticated, very high fidelity model of the transportation options that you would face if you lived in one 10 square mile region of the US and needed to go to another one.

Jon Udell: That’s pretty amazing, that’d be a useful service in and of itself.

Ed Iacobucci: Well, the interesting thing is after we started developing that, that’s when we started doing a lot more work with the FAA, DOT, NASA, uh, actually not only Eclipse, but Pratt and Whitney, then GE when they came into the market, and Honda and Embraer. And so, you know, we’re a little startup company, but we’ve pretty much been in the business of trying to plot out the future, and the manufacturers started sending us design specs for power plants and hypothetical airplanes.

We actually found that we had to build some real Chinese firewalls. Cause we were getting all this confidential data from competitors to see how does this airplane design work in loading, you know? So we were able to make some really quantitative assessment.

Jon Udell: Well, this leads into another question I was getting ready to ask, which is that presumably you’re not gonna be the only such service operating on the planet. At some point there comes the need to federate these things, right?

It sounds like this other piece that you’ve built is potentially part of the answer to how you are able to make that happen.

Ed Iacobucci: Well, lemme back up one second. One thing that I need to add that’s a complexity or actually a reality check, of this whole thing, that was the biggest curve that we had to handle as we went through this. As a technologist, longtime technologist, I’d like to look at the problem, break down in pieces, and then figure out, in this case, how could we build an ASP service? Or how do we get the federation of this across a bunch of people that are buying assets?

Cause I don’t really wanna buy airplanes and run ’em. But it turns out, interestingly and surprisingly, one of the constraints that we have to live with is the regulatory environment we’re in today. Which winds up being a great, big reality test. You can’t build a business that requires this kind of an investment having in your business plan an assumption that you’re gonna change regulations to make it happen.

There’s no investor in the world that’s gonna invest in that. So one of the constraints we had on our business from the beginning was, this has to operate in the regulatory domain and environment that we have today. We better get pretty darn good at understanding how that works.

So right from the beginning, I started making trips to Washington, meeting with senior FAA folks. Tell them, man, listen, this stuff that NASA’s doing that we want to do in kind of a practical implementation of it, it makes a lot of sense, and we also recognize you can’t change all the rules for something doesn’t exist yet.

So help us understand how we follow the rules so that you can at least not feel like you have to stand in our way. So that turned out to be a long road, a very rewarding road, we built some really good relationships with the regulators. But the punchline is because there is such an interwoven relationship between operating control and safety, it’s very hard to say you own the airplane, you guys over there own the pilots, and we’ll run it.

Almost to play by the rules, you have to vertically integrate across the board. So you have to own the airplanes, hire the crews, train the crews, manage all the concurrency of crews, cause there’s limitations of how much they can fly in a day, a month, a year and all that stuff. So you have to manage, that’s all safety aspects, which can be done. It’s all fairly well understood rules, but you have to kind of integrate the whole thing.

Same with the airplane.

Jon Udell: So you do have to kind of own it for now.

Ed Iacobucci: You have to own it all, from the trip formation, the planning, and you have to do it fast. Because if you’re gonna build a system where 95% of the trips are managed by a computer, instead of vast rooms of human beings doing flight plans, which is non-scalable by the way, then you have to be able to embody all those business and regulatory processes in a system that uses the planning tools, and to develop stuff that conforms. So we wound up building a very vertically integrated company. We used our relationship with Eclipse to put in place a very long term, large scale order for airplanes.

We’re working with other manufacturers as well, but mostly Eclipse right now. This has been a long, long, long road that we’ve had to travel because not only do we have to build the core technology for routing and realtime optimization, but had to develop the market models, had to run market studies to back check the market models to make sure that we’re not inhaling our own exhaust or whatever the right metaphor is, right? We had the regulatory side that we had to develop a relationship over four years, and the culmination of all that is a system that we’re about to launch, and it’s gotten a lot of attention.

I’m trying not to hype it a lot. But it’s one of those things that everybody wants to talk about because it’s such a compelling story.

Jon Udell: We’ve been hearing about it for a long time now, it’s about to get switched on in some form that people can see and experience.

Ed Iacobucci: So what this means to you is that you’ll be able to go to that airport, in your community, and be able to say, I need to go to some other airport in my value network. I don’t think it’s the model that says you can fly anywhere you want to fly, everybody flies where they want and it’s totally open.

I don’t think there’s enough scale to make that make business sense, cause there’s economics of flying the airplanes, so you have to focus on what we call day ports, which are kinda like preferred airports where we do the per-seat services between, we’ve identified five and we’re gonna grow that dramatically as we go along based on customer input, actually the reservations that they make themselves tell us a lot about what direction they want to go in.

Jon Udell: So what are those original five non hub airports?

Ed Iacobucci: They’re all in Florida. Because we wanted to keep it all close by here so we could watch the launch carefully. It’s one in South Florida in Boca Raton, which is between Palm Beach and Fort Lauderdale, basically on the east coast. Lakeland, which is between Orlando and Tampa in the central region. Gainesville, which is where the University of Florida is and the business hub, getting up towards the panhandle, and Tallahassee, state capital, and then Pensacola way out on the panhandle. So it’s kind of like a network that initially spans the center of Florida from the east coast.

Jon Udell: And these are all characteristic of that group of 5,000 airports that Fallows talks about.

The regional, underutilized, you know, you drive up to one and there’s not a parking problem and there’s Cessnas sitting on the runway and stuff.

Ed Iacobucci: Absolutely. Two of ’em actually have commercial traffic, but it’s very limited traffic , very limited service I should say. They go mostly through Atlanta, like Tallahassee and Gainesville are the two. To get anywhere else in the network commercially, you go through Atlanta.

The value in this whole thing is saving time, right? I like to say that we’re really not an aviation company or a technology company or a marketing company. We’re in the business of time arbitrage. Some people sell us their time that they don’t need. And we resell it more expensively to other people that can use it. So we use the flexibility one customer has to give , another customer that doesn’t have flexibility a service and aggregate ’em in a way that makes the business make money.

And it comes together.

Jon Udell: When do you switch this thing on? What’s the first day?

Ed Iacobucci: We actually have started turning on some customers in a controlled fashion. Because what we want to do is guarantee service level. We discovered nobody wants to use this as a business critical service if they’re gonna get rejected frequently . They may not get accepted, you know, the capacity, then we may not be able to find a solution to their request in the prescribed time, which is 10 seconds now, by the way, from the time that you tell us what you wanna do, we’ll tell you whether we can do it or not in 10 seconds. And it’s all done virtually, right? Because there’s no real airplanes, or this is all done in advance. So we have a virtual mapping of what the capacity of the network is. And so we can construct a yea or a nay answer.

Jon Udell: So if you make a promise that you can do it, you don’t necessarily have complete knowledge at that moment about all of the components that are that are gonna satisfy that request, right?

You’re making a bet and, and it’s a good bet.

Ed Iacobucci: Well, it’s more than a bet. We know we can do it. We don’t know with what assets, we know we’ve got 10 virtual airplanes that are gonna be available that day. And we know we’ve made X number of commitments for these windows.

And we know we’ve got the virtual crews. Now the reality of it is, which one is it gonna be and what the sequencing is gonna be, and when we pick you up and drop you off, and what order the airplanes fly around in that’s changing all the time.

And what the load will be and therefore what your profit will or won’t be.

And every request that comes in disrupts the previous schedule, but it gives us an opportunity to find a better schedule. It not schedule. It’s planned basically. Schedule implies that we publish it. We don’t, that’s one of the things we had to do that made this all hard, is to operate under the on-demand rules we couldn’t publish a schedule.

Jon Udell: So that also leads to this other question, which is, I heard Esther Dyson talking about this topic recently, and she said, like you said, it’s really not about the airplanes, it is though an awful lot about the customer experience. What’s your sense of how people react to this highly dynamic environment in which I’m presuming you might get a number of updates as to, well now you’re leaving at such and such time, right?

Ed Iacobucci: Well that, you know, this is what we’ve been grappling with for the last six months.

We know how to make it work. We know how to scale it. We know how to make the business work, and we know approximately the demand that we’re gonna see because we’ve been investing a lot in developing markets and customers in advance. What we’ve been grappling with is how do we take this basically entirely fluid system…

Jon Udell: …and present it to people…

Ed Iacobucci: and present it in a way that people can reliably use it. And what that means is you just add more constraints and in other words, it’s not, you’re not gonna be flipped around all over the place. We start with, essentially, you negotiate as big a window as you can accept because the bigger the window, the cheaper the ticket, right? And it’s not a departure window. It’s a window in which we will complete the mission for you. And you basically have to block that outta your schedule. So if you have a lot of meetings during the day, you’re gonna necessarily be pushed to smaller windows, but that’s okay.

You’re more productive and then may cost a little more. But it allows you to be productive. If you don’t have any hard commitments one way or the other, you can give us a bigger travel window, and then somewhere in there, the flight will be generated.

Now, obviously you can’t say to somebody, you know, you negotiated all morning to get me from A to B, which is maybe an hour flight . People don’t feel comfortable with that, so you have to shrink that down. So then the next challenge operationally is, How do you shrink those windows down over time as you get closer to flight time?

Leaving enough space in the windows for disruption recovery, and not flooding on notifications. So there’s a fine balance and we’re learning that. And really what we’ve discovered is the night before we can crunch it down pretty much.

The big windows are useful for us in the days leading up to the travel because that’s when we’re doing most of our optimizing. When we get right before the day of travel, we can afford to crunch that down pretty far. So we crunch the window down to a much smaller size and we notify the customer the night before, that they have to be at the airport by time X, and they’ll be at their destination by time Y and leaving a little room for wiggling around for us in that. That’s our commitment and it’s a reasonable window, you know? So maybe an hour flight they may have to allocate two hours to get there. Which is far better than driving or anything else.

Jon Udell: Exactly, exactly. It’s like, what’s it compared to? So this is absolutely fascinating, Ed, this is so cool.

Ed Iacobucci: Yeah, it’s fun because it’s like, I feel like we’re kind of inventing a market, not inventing, but we’re pioneering a market and a customer set and it’s all those things that we’ve learned over the years, all packaged up in something that doesn’t look like technology, but it is all about deployment and acceptance of technology.

Jon Udell: Wow. Thank you so much for making the time to do this and congratulations on getting to this point. I know you said it was a long road, but on the other hand, I mean what you’ve done in I think not much more than five years…

Ed Iacobucci: right, it’s five years…

Jon Udell: seems like a hell of a lot.

Ed Iacobucci: Well, it does when you look back on it, but you know, we’ll see. And the true test is gonna be the customers and I think they’re gonna be happy with it. So we’ll just stay tuned.

Jon Udell: All right. Well, thanks so much, Ed.

Ed Iacobucci: Take care, Jon.

Yeah, you too.

Phil Windley: You’ve been listening to Interviews with Innovators, with host Jon Udell.

The post-production audio engineer for this program was Jon Udell. Our website editor was Niels Makel the series producer is Niels Makel. This is Phil Windley. I hope you’ll join me next time for another great edition of Interviews with Innovators on IT Conversations.

When the rubber duck talks back

I’d been needing to refactor the pagination logic in the Mastodon plugin for Steampipe. After a couple of abortive tries, I took another run at it this week with the help of the latest generation of LLM-powered coding assistants.

Here was the problem. The pre-release version of the plugin consolidated pagination for many tables in one place. That was a good thing, but the downside was that there was only one Steampipe table which represented what should have been many of them. So you could say select * from mastodon_timeline but then you had qualify with where timeline = 'home' or where timeline = 'local' and so on. For a user of the plugin this was awkward, you’d rather say select * from mastodon_timeline_home or select * from mastodon_timeline_local, and reserve the where clause for more specific purposes. The v1 plugin made separate tables, but duplicated the pagination logic on a per-table basis. It worked, and was good enough to ship the plugin in time to demo at FediForum, but it obviously needed improvement.

ChatGPT-4 and Sourcegraph Cody

Since then, Sourcegraph has released its new coding assistant, Cody, which you can run as a VSCode extension or on sourcegraph.com. This set up the possibility for an interesting comparison. ChatGPT-4 builds on OpenAI’s LLM; Sourcegraph’s Cody, on the other hand, uses Anthropic’s Claude.

Another key difference is that ChatGPT only has the context you paste into it. Cody, sitting inside VSCode, can see your repository and has all that context. And if you index your repo, which is something Sourcegraph are willing to do for beta users on request, then Cody has access to what are called embeddings that represent the structure of your code in various ways. These embeddings, according to Sourcegraph, can powerfully enhance your LLM prompts. Even without embeddings, Cody offers quite a range of assistance, from a high-level overview of what your repo does to line-level improvement. It’s all packaged, in the extension, as a set of recipes behind buttons with names like Explain selected code, Improve variable names, and Smell code. I haven’t yet used these recipes enough to form solid opinions, though. For this exercise I used Cody mostly in a ChatGPT-like conversational way. In that mode, it’s wonderful to be able to select the code you want to talk about, instead of pasting it into the chat.

In both cases, as should be no surprise, it wasn’t enough to just ask the tools to consolidate the pagination logic. They were perfectly happy to propose solutions that could never work and might not even compile. So I began with a simpler version of the problem. Mastodon uses the same pagination machinery for APIs that return arrays of different kinds of results: Statuses (toots), Accounts, and Notifications. By focusing on these separately I reduced the duplicate pagination from 13 instances to three. Then, in a separate pass, I worked out how to collapse those into a single paginate function that accepted one of three data-fetching function parameters.

I tried to pay careful attention to prompts and completions as I went along, but in the heat of the action I didn’t do a great job of that, partly because I was switching back and forth between the two tools. But I’m quite happy with the result. There was one key insight in particular which, fascinatingly, I am hard pressed to assign credit for. Was it me or one of the assistants? I think it was me, but in a way that doesn’t matter, and isn’t the point of this story.

The key insight

Here was the insight. When I was building the transitional paginateStatus function, the first attempt returned results to the calling code in each table’s List function, which was responsible for streaming the data to Steampipe. This led to a series of detours to work around the problem that the returned data could be quite large, and chew up a lot of memory. That could probably be solved with a goroutine that would stream results back to the caller, instead of returning them as a batch. I tried prodding both LLMs to come up with that kind of solution, had no luck with several tries in both cases, but then came the insight. The helper functions could stream results directly to Steampipe, and just return nil or err to the calling List function.

With that dramatic simplication I was able to complete the phase 1 refactoring, which yielded three pagination functions: paginateStatus, paginateAccount, and paginateNotification. Phase 2, which consolidated those into a single paginate function, was a bit more prosaic. I did need some help understanding how the necessary switch statements could switch on the timeline types passed into the paginate function. Both assistants had seen lots of examples of this pattern, and both helpfully augmented my imperfect knowledge of golang idioms.

Partnering with machine intelligence

I came away with a profound sense that the real value of these assistants isn’t any particular piece of code that they get “right” or “wrong” but rather the process of collaborating with them. When you’re working alone, you have an ongoing conversation with yourself, usually in your own head. The point of talking to a rubber duck is to voice that conversation so you can more effectively reason about it. Externalizing your thinking in that way is intrinsically valuable. But when the rubber duck talks back, it’s a whole new game. As Garry Kasparov famously wrote:

The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and coaching their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

I’m not worried about robot overlords. Instead, I look forward to collaborating with robot partners.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Migrating Mastodon lists

My original Mastodon account was on mastodon.social. I chose it because it’s the largest instance, it’s operated by the developers of the service, and it runs the reference implementation of the software. I wanted to experience the fediverse, initially, as most people do, then consider whether to migrate to another server — and if so, to which one.

Last week I migrated to social.coop which, like cosocial.ca, it what Darius Kazemi calls a community-owned independent social media site. I don’t think the acronym COISMS will fly, but the business model is just what I’ve been looking for. I don’t want to use “free” services that make me the product they sell. I want to pay for a service that’s funded by my dollars, not by my data. That’s why I bailed on Gmail years ago, in favor of Fastmail, and it’s why I’m happy to spend $10/month — the price of a couple of cappucinos — to support the people who keep the lights on at social.coop and maintain civility.

The instructions for migrating from one server offer two paths: Profile redirect and Profile move. I chose the former, because I assumed I would retain API access to mastodon.social post-migration, and thus have the option to use Steampipe to query both accounts. Why do that? You can’t migrate your posts from the old server to the new one, only your followers and (optionally) your follows, bookmarks, blocks, and mutes. Steampipe can connect to two different Mastodon servers at the same time, so I figured I could run queries against both.

Lessons learned

That turned out to be a wrong assumption. I can reactivate my mastodon.social account by turning off the redirection, but I’m vague on the implications of doing that. Meanwhile it’s still possible to export my posts, so I can always reacquire them that way if needed.

Here was another wrong assumption. I think that if I’d chosen to move my account, instead of redirect it, my profile would have transferred to the new server. Instead it was on me to fill out the new profile. I’d created @judell but, when I kicked off the migration, I hadn’t yet completed the profile. So when followers were notified that judell@social.coop was now following them, there was no picture or bio or verified website. That led some people to think it was a bogus account and block it. It wasn’t a catastrophe, and I’ve spoken with a few of them to resolve the matter, but it was an unforced error that would have been easy to avoid.

The view from here

Here’s a comparison of weekly activity on two servers, by way of the server dashboard.

What’s it like moving to a server with only 1% of the flow that happens on mastodon.social? Not too different! Because I’ve moved my followers and follows, and because I interact primarily with the home timeline and with lists, the experience is mostly the same. For example, here’s the graph of boost relationships among servers in my home timeline. It looks just like it did before.

It’s wonderful to have a portable social graph that you can bring along as you move around in the fediverse. The key differences are that I’m meeting new people, and spending more time on the local timeline. On a big server like mastodon.social the local timeline feels like the Twitter firehose, for those who remember when that was still a thing you could watch. I’d look at it occasionally, because it could be a source of useful serendipity, but mostly it was just too random. The local timeline on social.coop is still somewhat random, but I find it more useful — not only because it represents a far smaller population but also because the people who’ve signed up share a common interest in the coop model.

I want my lists

There was, however, one big obstacle. You can export lists from the old server and import them into the new server, but there’s no export/import for the accounts you’ve assigned to those lists. Since I rely heavily on lists to navigate Mastodon flow in a focused and intentional way, this was a problem I really needed to fix. I tried Eliot Nash’s masto-list-importer but when I tried to transfer my 460 person/list mappings I hit API rate limits. So instead I worked out a Steampipe-based solution which is a useful alternative, and also a good way to illustrate the steps involved.

Let’s assume that I want to migrate again, from social.coop to fosstodon.org. I’ve exported the names of my lists from social.coop, and imported them to fosstodon.org. Now I need to populate those lists. The SQL solution I came up with proceeds by steps, each being a common table expression (CTE) that accomplishes one part of a complex transformation, then passes results to the next step.

Step 1: Gather the accounts assigned to each social.coop list.

We start by joining two tables to enumerate accounts assigned to each list.

  with accounts_by_list as (
  select 
    a.acct as old_account,
    a.id as old_account_id,
    l.id as old_list_id,
    l.title as title
  from
    social_coop.mastodon_my_list l
  join
     social_coop.mastodon_list_account a
  on
    l.id = a.list_id
)
select * from accounts_by_list

The output, in accounts_by_list, is effectively a temporary table that you can review to make sure the results are as expected. The ability to verify the output of each phase of a CTE pipeline, before sending it to the next phase, is a key advantage of this approach. Nested subqueries are much harder to debug!

Step 2: Map the accounts on social.coop to accounts on fosstodon.org

Two transformations happen in this phase. If an account is a bare username, say personA, then it’s a social.coop account. On fosstodon.org that same account will be represented as personA@social.coop. Conversely if personB@fosstodon.org exists on social.coop, that account becomes the bare username personB there. All other accounts (e.g. personC@hachyderm.io pass through unchanged.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  select
    old_account,
    old_account_id,
    old_list_id,
    title,
    case 
      when old_account !~ '@' then old_account || '@social.coop' 
      when old_account ~ '@fosstodon.org' then replace(old_account, '@fosstodon.org', '')
      else old_account
    end as new_account
  from
     accounts_by_list
)
select * from adjusted_accounts_by_list

Step 3: Map the list titles to the new server.

The names of the lists are the same in both places, but their ids differ. For example, my Fediverse list is 1043 on social.coop and 6771. To invoke the API call that adds someone to a list, we’ll need to use the latter id.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  select
    a.*,
    l.id as new_list_id
  from
    adjusted_accounts_by_list a
  join
    fosstodon.mastodon_my_list l
  on
    a.title = l.title
)
select * from adjusted_list_ids

Step 4: Map the account ids to the new server.

Like lists, accounts on the new server also have different ids, and those are also required for the API call. We can find the new ids by joining the new_account column from step one with the table fosstodon.mastodon_my_following.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  -- as above
,
),
adjusted_account_ids as (
  select 
    a.*,
    f.id as new_account_id
  from
    adjusted_list_ids a
  join
    fosstodon.mastodon_my_following f
  on
    f.acct = a.new_account
)
select * from adjusted_account_ids

Step 5: Build the API calls

There are lots of ways to skin this cat. Because you can use Steampipe as a component, you could use any programming language with a Postgres driver to run this query, and call the Mastodon API with the appropriate list and account ids.

And then, of course, there’s always curl.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  -- as above
),
adjusted_account_ids as (
  -- as above
)
select
  'curl -X POST -H "Authorization: Bearer ***" https://fosstodon.org/api/v1/lists/' ||
  new_list_id ||
  '/accounts/?' ||
  'account_ids[]=' ||
  new_account_id ||
  '; sleep 1;' as command

To avoid throttling, I appended sleep 1 to each line. I saved the query in migrate-lists.sql, and exported the output to a file.

steampipe query --output csv migrate-lists.sql > migrate.sh

That was close, but not quite right. The output looked like this:

command
"curl -X POST -H ""Authorization: Bearer ***"" https://fosstodon.org/api/v1/lists/6771/accounts/?account_ids[]=279462; sleep 1;"
"curl -X POST -H ""Authorization: Bearer ***"" https://fosstodon.org/api/v1/lists/6771/accounts/?account_ids[]=109283421559929728; sleep 1;"

I need to remove the header line, remove the double quotes on either end of each line, and deduplicate the pairs of double quotes. For that I turned to ChatGPT (v4). It took a few tries to get it right, but soon enough it produced a working bash script that I’m very glad I did not have to write.

#!/bin/bash

# read the original script from migrate.sh and store it in an array
readarray -t original_script < migrate.sh

# remove the first line of the original script
modified_script=("${original_script[@]:1}")

# loop through the modified script array and replace double double-quotes with a single double-quote and remove double quotes at the beginning and end of each line
for (( i=0; i<${#modified_script[@]}; i++ )); do
    modified_script[$i]=$(sed 's/""/"/g' <<< "${modified_script[$i]}")
    modified_script[$i]=$(sed 's/^"//' <<< "${modified_script[$i]}")
    modified_script[$i]=$(sed 's/"$//' <<< "${modified_script[$i]}")
done

# print out the modified script with newlines
printf '%s\n' "${modified_script[@]}"

And that did the trick. With my lists now fully populated on social.coop, I'm now reading Mastodon from my new home server in the same list-oriented way as I was on the old one.

If you move servers and want to migrate the people on your lists, you should definitely try masto-list-importer first, and consider the method here as fallback if that doesn't work for you. I agree with Eliot Nash that list transfer ought to be baked into the Mastodon UI. People rightly complain that it can be hard to find the people you want to follow and topics you want to explore. You can follow hashtags but that can be too random. Assigning people to topical lists strikes a good balance. But only if those list assignments are portable!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

The Mastodon plugin is now available on the Steampipe Hub

When Twitter changed hands last November I switched to Mastodon; ever since I’ve enjoyed happier and more productive social networking. To enhance my happiness and productivity I began working on a Mastodon plugin for Steampipe. My initial goal was to study the fediverse writ large. Which people and which servers are powerful connectors? How do moderation policies work? What’s it like to join a small server versus a large one? These are important questions, and you can use the plugin to begin to answer them. But I soon realized that as a newcomer to a scene that’s been evolving for six years, and has not welcomed such analysis, I should start by looking for ways to enhance the experience of reading Mastodon. So I began building a set of dashboards that augment the stock Mastodon client or (my preference) elk.zone. And I’ve narrated that project in a series of posts.

Last week we released the plugin to the Steampipe hub. If you’ve installed Steampipe, you can now get the plugin using steampipe plugin install mastodon. The next phases of this project will explore using the plugin and dashboards in Steampipe Cloud, and speeding up the dashboards by means of persistent Postgres tables and Steampipe Cloud snapshots. Meanwhile, here’s a recap of what I’ve learned thus far.

See more with less distraction

While the dashboards use charts and relationship graphs, they are mainly tables of query results. Because Steampipe dashboards don’t (yet) render HTML, these views display plain text only — no images, no styled text. I’ve embraced this constraint, and I find it valuable in two ways. First, I’m able to scan many more posts at a glance than is possible in conventional clients, and more effectively choose which to engage with. When I described this effect to a friend he said: “It’s a Bloomberg terminal for Mastodon!” As those of us who rode the first wave of the blogosphere will recall, RSS readers were a revelation for the same reason.

Second, I find that the absence of images and styled text has a calming effect. To maintain a healthy information diet you need to choose sources wisely but, no matter where you go, sites deploy a barrage of attention-grabbing devices. I find dialing down the noise helpful, for the same reason that I often switch my phone to monochrome mode. Attention is our scarcest resource; the fewer distractions, the better.

There’s a tradeoff, of course; sometimes an image is the entire point of a post. So while I often read Mastodon using these Steampipe dashboards, I also use Elk directly. The Steampipe dashboards work alongside conventional Mastodon clients, and indeed depend on them: I click through from the dashboards to Elk in order to boost, reply, or view images. That experience is enhanced by instance-qualified URLS that translate foreign URLs to ones that work on your home server.

Manage people and lists

The ability to assign people to lists, and read in a list-oriented way, is a handy Twitter affordance that I never used much because it was easy to let the algorithms govern my information diet. Because Mastodon doesn’t work like that, lists have become the primary way I read the fediverse flow. Of the 800+ people I follow so far, I’ve assigned more than half to lists with titles like *Climate* and *Energy* and *Software*. To help me do that, several dashboards report how many of the people I follow are assigned to lists (or not).

I want as many people on lists as possible. So I periodically review the people I follow, put unassigned people on lists, and track the ratio of people who are, or aren’t, on lists. Here’s the query for that.

with list_account as (
  select
    a.id,
    l.title as list
  from
    mastodon_my_list l
    join mastodon_list_account a on l.id = a.list_id
),
list_account_follows as (
  select
    list
  from
    mastodon_my_following
    left join list_account using (id)
)
select 'follows listed' as label, count(*) from list_account_follows where list is not null
union
select 'follows unlisted' as label, count(*) from list_account_follows where list is null

When you read in a list-oriented away, as is also true when you read by following hashtags, there are always people whose chattiness becomes a distraction. To control that I’ve implemented the following rule: show at most one original toot per person per list per day. Will I miss some things this way? Sure! But if you’ve said something that resonates with other people, I’m likely to hear about it from someone else. It’s a tradeoff that’s working well for me so far.

Here’s the SQL implementation of the rule.

with data as (
  select
    list_id,
    to_char(created_at, 'YYYY-MM-DD') as day,
    case
      when display_name = '' then username
      else display_name
    end as person,
    instance_qualified_url as url,
    substring(content from 1 for 200) as toot
  from
    mastodon_toot_list
  where
    list_id = '42994'
    and reblog is null -- only original posts
    and in_reply_to_account_id is null -- only original posts
  limit
    40
)
select
  distinct on (person, day) -- only one per person per day
  day,
  person,
  toot,
  url
from
  data
order by
  day desc,
  person;

On the home timeline’s dashboard I’ve made it optional to include or hide boosts, which can be the majority of items. On the list-reading dashboard I’ve opted to always exclude them, but the SQL idiom for doing so — select distinct on (person, day) — is simple, easy to understand, and easy to change.

Visualize relationships

I’ve so far found three ways in which relationship graphs can make Mastodon more legible. First, in Mastodon relationship graphs, I showed how to use SQL-defined nodes and edges to show boost relationships among people and servers. In another article I used the same tools to map relationships among people and tags. And most recently I used them to explore server-to-server moderation.

In all three cases the format conveys information not directly available from tabular views. Clusters of interesting people pop out, as do people who share tags. And when I graphed servers that block other servers I discovered an unexpected category: some servers that block others are themselves also blocked, like infosec.exchange in this example.


blocked and blocking mastodon servers

The Steampipe combo of SQL-oriented API access and dashboards as code is a uniquely productive way to build relationship graphs that can unlock insights in any domain. As we’ve seen with Kubernetes, they can help make cloud infrastructure more legible. The Mastodon graphs suggest that the same can happen in the social networking realm.

Use RSS feeds

When you append .rss to the URL of a Mastodon account, or tag, you produce an RSS feed like https://mastodon.social/@judell.rss or https://mastodon.social/tags/steampipe.rss. These feeds provide a kind of auxiliary API that includes data not otherwise available from the primary API: related tags, which appear in the feeds as RSS category elements. Steampipe really shines here thanks to the RSS plugin which enables joins with the primary Mastodon API. This query augments items in account’s feed with tags that appear in each item.

with data as (
  select
    name,
    url || '.rss' as feed_link
  from
    mastodon_search_hashtag
  where
    query = $1,
    and name = query
  limit 
)
select
  to_char(r.published, 'YYYY-MM-DD') as published,
  d.name as tag,
  (
    select string_agg(trim(JsonString::text, '"'), ', ')
    from jsonb_array_elements(r.categories) JsonString
  ) as categories,
  r.guid as link,
  ( select content as toot from mastodon_search_toot where query = r.guid ) as content
from
  data d
join
  rss_item r
on
  r.feed_link = d.feed_link
order by
  r.published desc
limit 10

A similar query drives the graph discussed in Mapping people and tags on Mastodon.


blocked and blocking mastodon servers

In that example, surfacing the connection between a user, @themarkup, and a pair of tags, scotus and section230, was useful in two ways. First, it helped me instantly spot the item that I most wanted to read, which was buried deep in the search results. Second, it helped me discover a source that I’ll return to for guidance on similar topics. Of course I added that source to my Law list!

Own the algorithm

Everyone who comes to Mastodon appreciates not having an adversarial algorithm control what they see in their timelines. Most of us aren’t opposed to algorithmic influence per se, though, we just don’t like the adversarial nature of it. How can we build algorithms that work with us, not against us? We’ve already seen one example: the list-reading dashboard displays just one item per list per person per day. That’s a policy that I was able to define, and easily implement, with Steampipe. And in fact I adjusted it after using it for a while. The original policy was hourly, that was too chatty, so I switched to daily by making a trivial change to the SQL query.

In News in the fediverse I showed another example. The Mastodon server press.coop aggregates feeds from mainstream news sources. I was happy to have those feeds, but I didn’t want to see those news items mixed in with my home timeline. Rather, I wanted to assign them to a News list and read them only when I visit that list in a news-reading mindset. The fediverse offers an opportunity to reboot the social web and gain control of our information diets. Since our diets all differ, it ought to be possible — and even easy — for anyone to turn on a rule like *news only on lists, not timelines*. Steampipe can make it so.

Steampipe as component

When you ask people on Mastodon about these kinds of features, the response is often “Have you tried client X? It offers feature Y.” But that solution doesn’t scale. It would require massive duplication of effort for every client to implement every such policy; meanwhile, people don’t want to switch to client X just for feature Y (which might entail losing feature Z). Could policies be encapsulated and made available to any Mastodon client? It’s interesting to think about Steampipe as a component that delivers that encapsulation. A timeline built by SQL queries, and governed by SQL-defined policies, is a resource available to any app that can connect to Postgres, either locally or in Steampipe Cloud.

If you’re curious about the Steampipe + Mastodon combo, install the plugin, try out the sample queries, then clone the mod and check out the dashboards. Do they usefully augment your Mastodon reader? What would improve them? Can you use these ingredients to invent your own customized Mastodon experience? Join our Slack community and let us know how it goes!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mastodon timelines for teams

Welcome to episode 17 of the Steampipe+Mastodon series, in which we introduce a new subplot: timeline history. So far, the examples I’ve shown and discussed work with current timelines. We’ve seen SQL queries that fetch results from realtime calls to the Mastodon API, and Steampipe dashboards that display those results. But Steampipe isn’t just an API siphon, it’s also a Postgres database. As such it supports the transient tables created by Steampipe’s foreign data wrapper and plugins, but also enables you to create your own native tables as well. And you can use those native tables to accumulate data from the transient foreign tables.

Because saving and searching Mastodon data is a controversial topic in the fediverse — none of us wants to recapitulate Big Social — I’ve focused thus far on queries that explore recent Mastodon flow, of which there are plenty more to write. But nobody should mind me remembering my own home timeline, so a few weeks ago I made a tool to read it hourly and add new toots to a Postgres table.

Before you can add any toots to a table, of course, you’ve got to create that table. Here’s how I made this one.

create table mastodon_home_timeline as 
  select * from mastodon_toot_home limit 200

Once created, the table can be updated with new toots like so.

with data as (
  select
    account,
    -- more
    -- columns
    username
  from
    mastodon_toot_home
  limit 200
)
insert into mastodon_home_timeline (
  account,
    -- more
    -- columns
  username
)
select
  *
from 
  data
where 
  id not in ( select t.id from mastodon_home_timeline t )

To run that query from a crontab, on a machine where Steampipe is installed, save it as mastodon_home_timeline.sql, then schedule it.

15 * * * * cd /home/jon/mastodon; steampipe query mastodon_home_timeline.sql

That’s it! Now the number reported by select count(*) from mastodon_home_timeline is growing hourly.

I’ve only been collecting toots for a couple of weeks, and haven’t yet begun to explore that data yet; we’ll see what happens when we get there. Meanwhile, though, I want to show how such exploration can be a team exercise.

A friend of mine, whom I’ll call Elvis, shares my interest in teasing out connections among people, servers, and hashtags. He could capture his own timeline using the method shown here. But since we’ll be looking at this data together, we agreed that I’ll gather both our timelines. To enable that, he shared a (revokable) Mastodon API token that I’ve used to configure Steampipe with credentials for both our accounts.

connection "mastodon_social_jon" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

connection "mastodon_social_elvis" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

Steampipe’s foreign data wrapper turns each of these named connections into its own Postgres schema. Athough we happen to share the same home server, by the way, we needn’t. A team collaborating like this could pool timelines from mastodon.social and hachyderm.io and fosstodon.org and any other Mastodon-API-compatible server.

(You can do the same thing with AWS or Slack or GitHub or other kind of account by defining multiple connections. Steampipe makes API calls concurrently across parallel connections.)

With this configuration I can read my timeline like so.

select * from mastodon_social_jon.mastodon_toot_home limit 200

And Elvis’ like so.

select * from mastodon_social_elvis.mastodon_toot_home limit 200

If I want to query both in realtime, for example to count the combined total, I can use a SQL UNION. Or I can define umbrella connection that aggregates these two.

connection "all_mastodon" {
    plugin = "mastodon"
    type = "aggregator"
    connections = [ "mastodon_social_jon", "mastodon_social_elvis" ]
}

connection "mastodon_social_jon" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

connection "mastodon_social_elvis" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

Now the query select * from all_mastodon.mastodon_toot_home limit 200 makes API calls on behalf of both accounts — in parallel — and combines the results. When we follow the resulting URLs in order to reply or boost, we’ll do so as individual identities. And we’ll be able to use Steampipe queries and dashboards in that same single-user mode. But we’ll also be able to pool our timelines and point our queries and dashboards at the combined history.

Will that prove interesting? Useful? That remains to be seen. I think it’s one of many experiments worth trying as the fediverse sorts itself out. And I see Steampipe as one laboratory in which to run such experiments. With SQL as the abstraction over APIs, aggregation of connections, and dashboards as code, you have all the ingredients needed to iterate rapidly, at low cost, toward shared Mastodon spaces tailored for teams or groups.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Visualizing Mastodon server moderation

In Mapping the wider fediverse I showed how a new table added to the Mastodon pluginmastodon_domain_block — enables queries that find which servers are moderating which other servers. For example, here are servers on nerdculture.de‘s list of offenders.

select
  server,
  domain,
  severity
from
  mastodon_domain_block
where
  server = 'https://nerdculture.de'

+------------------------+--------------------------------+----------+
| server                 | domain                         | severity |
+------------------------+--------------------------------+----------+
| https://nerdculture.de | roysbeer.place                 | silence  |
| https://nerdculture.de | *.10minutepleroma.com          | suspend  |
| https://nerdculture.de | *.activitypub-troll.cf         | suspend  |

...snip...

| https://nerdculture.de | shitposter.club                | suspend  |
| https://nerdculture.de | wolfgirl.bar                   | suspend  |
| https://nerdculture.de | www2.gabbers.me                | suspend  |
+------------------------+--------------------------------+----------+

I used the new query pattern to build a dashboard to report, for each server in the home timeline:

  • The list of blocking servers.
  • The count of blocked servers for each blocking server.
  • The list of blocking servers for each blocked server, and the count of those blocking servers.

This was a good start, but I had a hunch that relationship graphs would reveal non-obvious connections among these servers. And indeed they do! Here’s a view of the new graph-enriched dashboard.

The left pane shows the blocked servers shunned by a blocking server selected from those in the home timeline. The right pane has the inverse view: the blocking servers that shun a selected blocked server. These were the two categories I defined for the first iteration of these graphs.

category "blocking_server" {
  color = "darkgreen"
  icon = "server"
}

category "blocked_server" {
  color = "darkred"
  icon = "server"
}

Here’s the code for one of the graphs.

graph {
   node {
    base = node.blocking_server
  }
   node {
    base = node.blocked_server
  }
   node {
    base = node.blocked_and_blocking_server
  }
   edge {
    args = [ self.input.blocking_server.value ]
    base = edge.match_blocked_server
  }
   edge {
    args = [ self.input.blocking_server.value ]
    base = edge.match_blocking_server
  }
}

Here is the definition of node.blocking_server, which refers to category.blocking_server.

node "blocking_server" {
  category = category.blocking_server
  sql = <<EOQ
    with servers as (
      select distinct
        blocking_server,
        blocked_server
      from
        blocking_servers(${local.limit})
    )
    select
      blocking_server as id,
      blocking_server as title
    from
      servers
    order by
      blocking_server
  EOQ
}

The FROM clause calls blocking_servers(), a set-returning function defined like so.

create or replace function public.blocking_servers(max int) returns table (
  blocking_server text,
  blocked_server text
) as $$
with servers as (
  select distinct
    server as domain,
    'https://' || server as server_url
  from
    mastodon_toot
  where
    timeline = 'home'
  limit max
),
blocking_and_blocked as (
  select
    s.domain as blocking_domain,
    d.domain as blocked_domain
  from
    servers s
  join
    mastodon_domain_block d
  on
    s.server_url = d.server
)
select
  blocking_domain,
  blocked_domain
from
  blocking_and_blocked
order by
  blocking_domain, blocked_domain
$$ language sql

I thought these ingredients would suffice. But when I began poking around in the graphs made with these definitions, infosec.exchange behaved strangely. Someties it would appear as a blocking server, other times as a blocked server. I was missing a category!

category "blocked_and_blocking_server" {
  color = "orange"
  icon = "server"
}

As you can see in the graph, infosec.exchange not only blocks 73 servers, it is also blocked by two servers: religion.masto.host and weatherishappening.net. Why?

weatherishappening.net calls infosec.exchange a “HOSTING DEATH SPIRAL FASCIST ORGANIZATION” and blocks it with “Limited” severity.

religion.masto.host blocks infosec.exchange at the “Suspended” level, but does not explain why (“Reason not available”).

Although those servers could, in turn, be blocked by others, no such blocks appear in the block lists of my immediate server neighborhood.

> select count(*) from blocking_servers(100) where blocked_server = 'weatherishappening.net'
+-------+
| count |
+-------+
| 0     |
+-------+

> select count(*) from blocking_servers(100) where blocked_server = 'religion.masto.host'
+-------+
| count |
+-------+
| 0     |
+-------+

There is another switch hitter in my neighborhood, though. c.im is blocked by me.dm and octodon.social.

Why?

me.dm (Medium) has “Suspended” c.im for “Hate speech”.

octodon.social has “Suspended” c.im for “Reason not available”.

When the opinions and policies of your server differ from those of mine, we see different realities through our respective lenses. Could such fragmentation drive the fediverse-curious back into arms of Big Social? I’m sure that will happen — indeed is happening — to some degree.

But I hope that some of us, at least, will learn to thrive in diverse networks of online communities, aware of the kaleidoscopic interplay of filters but not overwhelmed by it. That skill will serve us well IRL too. To acquire it, we’ll need to visualize the operation of our filters. One great way to do that: SQL queries that drive relationship graphs.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mapping people and tags on Mastodon

In Mastodon relationship graphs I showed how to use Steampipe to map Mastodon network neighborhoods. When I use the word map here, I’m channeling Denis Wood’s The Power of Maps:

Every map shows this … but not that, and every map shows what it shows this way … but not the other way. Not only is this inescapable but it is precisely because of this interested selectivity — this choice of word or sign or aspect of the world to make a point — that the map is enabled to work.

The aspect chosen by those neighborhood maps is the boost — the Mastodon version of a retweet. One of the maps focuses on a selected instance that appears in the home timeline. It shows people who belong to that instance and who boost toots from people on the same or different instances.

The other map zooms out to show boost relationships among all the instances that appear in the home timeline. This view wouldn’t be legible if it included people, so it omits them in order to focus on server-to-server relationships.

These maps represent (or as Denis Wood emphasizes, “re-present”) a set of toots. They omit original toots to which nobody replies, and they also omit replies, in order to focus on boost relationships. What about replies? That would be a different map, one that might also be interesting to draw.

Meanwhile, though, I’ve come up with another map to display the tags that appear in the results of a Mastodon tag search, along with the accounts that use those tags. It proved its worth today when I was looking for perspectives on Section 230 of the Communications Decency Act. As you’ve likely noticed, the US Supreme Court is reconsidering Section 230. My understanding of the topic wasn’t current, I wanted to refresh it, and I especially wanted to test whether Mastodon could provide a useful alternative to a conventional web search.

One affordance that Mastodon provides: search for toots that use the #Section230 tag. Here are two ways to map the results of that search.

On the left is a conventional Mastodon view: a list of toots that match the query. In this case the article I ultimately wanted to read appears way down in that list. The toot that announced it was from The Markup, “a nonprofit newsroom that investigates how powerful institutions are using technology to change our society.” The article, Section 230 Is a Load-Bearing Wall — Is It Coming Down?, transcribes part of a conversation with two legal scholars whom I know to be reliable guides to Net-related issues.

On the right is my Steampipe-based Mastodon tag explorer. Working with the same data, it surfaced The Markup’s article in a way that brought it immediately to my attention. The first thing that caught my eye was the conjunction of two tags: #section230 and #scotus. Since the Supreme Court’s interest in Section 230 is what’s driving the current news cycle, I wanted to hear from legal scholars qualified to discuss the Court’s interest in Section 230. So the tag conjunction was a significant landmark.

The map displayed two nodes that connect to both #section230 and #scotus. How did I choose between them? My prior familiarity with The Markup led me to click on that node and visit the Markup’s Mastodon instance where I read the article.

Had I been following The Markup then, as I am now, I would likely have seen the article on the news list to which I’ve assigned The Markup’s account. But that wouldn’t have changed the experience of searching for the #section230 tag. The relationship graph works by reformulating the results of that search. It omits the text of toots that contain the tag, and the images in those toots, in order to highlight two aspects of the result list: people (or accounts) and tags. It contextualizes those tags by charting their relative frequency in the result list. And it attaches, to each tag node, a link to a new graph focused on that tag.

This “interested selectivity” enables the map to do its work: find accounts that use given tags. Like a tag node, an account nodes provides a link — in this case, to the account’s Mastodon home page. It also reports the accounts’ description using a property that appears when hovering the node. So if I were unfamiliar with The Markup I could reveal its description without leaving the graph. Here’s the query that adds that property to the node.

select
  note
from 
  mastodon_search_account
where 
  query = 'https://mastodon.themarkup.org/@themarkup'

+---------------------+
| note                |
+---------------------+
|  Watching Big Tech. |
+---------------------+

That query is embedded in another query that joins across two Steampipe plugins: one that wraps the Mastodon API and another that queries RSS feeds. That’s because, as noted in Mastodon, Steampipe, and RSS, the RSS feeds that Mastodon provides for tags enrich the results available from the core API.

Enabling SQL to query diverse APIs in a common way is one of Steampipe’s core superpowers. Enabling such queries to form the nodes and edges of relationship graphs is another. Used together, these two superpowers enable maps that select what is helpful, omit what isn’t, and thus “re-present” information for a given purpose.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

News in the fediverse

On a recent episode of You’re Wrong About, Sarah Marshall delivered a crisp summary of how the 24-hour news cycle came to be. This led me to realize that many among us do not remember when news was confined to time slots: the 7 o’clock news, the 11 o’clock news, the morning paper. I think it might be healthy to bring that back in some form. From the excitement I heard in their voices I gather that Sarah Marshall and her co-presenter Blair Braverman feel the same way. When I hear people complain about Mastodon’s lack of breaking news, I think: “Feature, not bug!” Maybe what the 24-hour cycle has been breaking is us.

So when press.coop launched today I was a bit conflicted. On the one hand, I would like to use Mastodon to read mainstream news headlines just as I once used RSS to do. (Full disclosure: press.coop is run as a public service by hello.coop whose founder, Dick Hardt, is a longtime friend.) On the other hand, when reading Mastodon timelines I’m enjoying NOT knowing what’s happening in the world right now.

What if you could exclude news from the home timeline, put it on a list, and go there when — and only when — in a news mindset? That’s a feature of the Fedilab client, I learned from Keith Soltys.

What would it take to implement the same idea in the Mastodon reader I’m developing? It couldn’t be just an extra WHERE condition, could it?

Well, actually, it could.

Problem solved. Now I can read news in Mastodon when and how I want, and never see it otherwise.

If you want that same control, you shouldn’t have to use a particular Android client, or Steampipe client, or any other. There’s a Cambrian explosion of such creatures right now. The more they can share important DNA, the better for all of us.

I hope that the Steampipe plugin for Mastodon, which enables the dashboards I’m building using Steampipe’s own dashboards-as-code system, can provide some useful common DNA. A rule like news only on lists, not timelines, once expressed in SQL, can be used (and thus not reinvented) by any kind of Steampipe (read: Postgres) client: psql (or any Postgres CLI), Metabase or Tableau or any Postgres-compatible BI tool, Python or JavaScript or any programming language. Steampipe is a versatile component. Its mapping from APIs to SQL can, in theory, provide the data layer for any application.

My Steampipe + Mastodon adventure notwithstanding, let’s hold on to that larger idea. The fediverse is our chance to reboot the social web and gain control of our information diets. Since our diets all differ, it ought to be trivial for anyone — in any client — to turn on a rule like news only on lists, not timelines. I’ve shown one way it can be trivial. Please show me others!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Protocols, APIs, and conventions

The Fediverse is wired together by protocols like ActivityPub and WebFinger which, as yet, I know very little about. That’s because the Steampipe plugin, which supports the dashboards I’ve been building and describing in this series, doesn’t require me to understand or use those protocols.

It does, however, require me to understand and use the Mastodon API. Mostly I use that API by way of the Go SDK for Mastodon (thanks, mattn!), sometimes I make REST calls directly. Either way, my read-only dashboards use a fairly small subset of the Mastodon API. The full API is quite broad and deep; it enables API clients to read from and write to Mastodon servers in all sorts of ways. Here are the chapters of the Mastodon API book: apps, accounts, admin, instance, search, statuses, timelines, notifications, oembed. These chapters define what’s common to all Mastodon clients, including web apps, phone apps, native OS apps, and Steampipe dashboards.

So far I’ve ignored protocol-enabled interop in order to focus on API-enabled interop. I’m aware that the Fediverse includes much more than just Mastodon. I intend to explore BookWrym, Friendica, Funkwhale, Lemmy, takahe, PeerTube, Pixelfed, PeerTube, and others in due time. But right now the Mastodon ecosystem is plenty to try to wrap my head around.

For example, there’s a new web client for Mastodon: elk.zone. With the recent addition of support for lists, it has become my favorite way to interact in Mastodon space. So naturally I wanted to be able to click through from Steampipe dashboards to Elk, and use it as an alternative to the batteries-included Mastodon web app.

It turned out to be easy to enable that integration. Not thanks to ActivityPub, and not even thanks to the API. It works thanks to a third level of interop at play: common patterns for account URLs and toot URLs.

Here’s the account URL for Ward Cunningham who hangs his shingle at mastodon.radio: https://mastodon.radio/@k9ox. But as we saw in instance-qualified Mastodon URLs, if you visit that URL directly — and if it’s not your home server — you can’t follow Ward there, or add him to a list. You’ll need to copy that URL, paste it into your home server’s search box, run the search, and arrive at an instance-qualified URL where you can follow him or add him to a list: https://mastodon.social/@k9ox. If you’re home is fosstodon.org this would instead be https://fosstodon.org/@k9ox.

Similarly here is one of Ward’s toots at mastodon.radio: https://mastodon.radio/@k9ox/109802968820955379. If you want to reply or boost or favorite, you can’t do it there. The URL you need is again one that routes through your home server: https://mastodon.social/@k9ox/109802969999396562. Note that the ids for the same toot differ! That difference surprised me and some others, and is a topic for another episode. Here I’ll just note that these two patterns govern how we interact when crossing server boundaries in Mastodon space using the stock web client.

When I started using Elk, another layer of pattern emerged. Here are those same URLs in Elk:

https://elk.zone/mastodon.social/@k9ox

https://elk.zone/mastodon.social/@k9ox/109802969999396562

As it turns out, I just needed to make two of the Steampipe plugin’s transform functions prepend elk.zone to the instance-qualified URLs, then make such prefixing a configuration option. Now when I visit Mastodon links from dashboards, to reply or boost or follow or enlist, I land in the Elk experience that I prefer.

ActivityPub and WebFinger are formal standards. I would describe the Mastodon API as a de-facto standard. But this prefixing maneuver is just a convention. It’s not guaranteed to work with another web client, and not even guaranteed to work across all URLs presented by the stock Mastodon client. That’s OK by me. Conventions are incredibly useful. The Twitter hashtag is just a convention, after all, inspired in turn by an IRC convention.

We’re in one of those Internet moments of rapid innovation, when new conventions can unlock emergent behaviors. It hadn’t even occurred to me that the Steampipe dashboards could support Elk. A few hours after I thought they might, they did. I’ve seen this kind of thing before, perhaps most notably when the blogosophere adopted <link rel="alternate" type="application/rss+xml" href="{feedUrl}"> to enable browsers to auto-discover RSS feeds. That happened about 20 years ago, and quite suddenly when a handful of leading blogging tools adopted the convention in a matter of days. It was a lot of fun to live through that era. If you missed it, enjoy the sequel that’s unfolding now!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mapping the wider Fediverse

I began this journey convinced that Steampipe could help the Fediverse evolve, but not sure exactly how. My first thought was to use Steampipe’s API-wrangling superpower to study patterns of communication (and conflict) across the Fediverse. But as one of many Twitter escapees last November, I soon realized that the network I was joining reflected a culture that had been humming along nicely for six years and didn’t particularly want to be the object of sociological study.

As I argued in Autonomy, packet size, friction, fanout, and velocity, Mastodon bakes in certain kinds of friction for reasons. You’ve likely heard about a default unfriendliness to search, which is both a technical setting and a cultural choice that privileges the experience of current flow over the mining of past flow. Even more fundamentally, the ID of a toot not only differs from server to server but also obfuscates the toot’s date, another technical/cultural choice that means you can’t randomly access history by date. None of these frictions is insurmountable. They will be overcome for purposes good and bad. I hope and expect that communities will be able to choose their desired amounts and kinds of friction while still interoperating with others. But for my project it seemed that trying to survey the wider Fediverse wasn’t the right place to start.

So instead I began to explore a different way to read my home timeline. The dashboards that I’ve built and described in this series have become, for me at least, an effective way to scan recent Mastodon flow, then click into the stock client to post, reply, or boost. After overcoming a few obstacles, things are starting to feel like the Bloomberg terminal for Mastodon that I envision.

One of those obstacles was the awkward copy/paste/search of foreign toot URLs that was required in order to interact with them. That’s now overcome by Instance-qualified Mastodon URLs. Another obstacle was the difficulty of curating and effectively reading topical lists of people. The strategies described in Lists and people on Mastodon and Working with Mastodon lists have improved matters nicely. And relationship graphs turned out to be a more useful alternate view of the current flow than I had expected.

I think I’ve proven that a set of Steampipe dashboards, layered on a plugin that maps the Mastodon API to tables that the dashboards query, can improve the ability to absorb and react to Mastodon flow. An unproven corollary: Steampipe’s dashboards-as-code system is only one of potentially many clients of the Mastodon plugin. Any dashboarding system or web app or native app could tap into the same query capability to deliver still another way to experience the flow. But that’s a future episode.

Meanwhile, with a decent reading experience in place, this seemed like a good time to circle back to the question of surveying the wider Fediverse. To begin enabling that I added a couple of new tables to the plugin: mastodon_peer and mastodon_domain_block.

Here’s a query that uses mastodon_peer.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(p.peer) as peers
from
  servers s
join
  mastodon_peer p
on
  s.server = p.server
group by
  s.server,
  s.occurrences
order by
  peers desc

In Engish: gather the most recent 100 toots in my home timeline, count the occurrences of each origin server, then ask each origin server how many other servers it talks to. Unsurprisingly my home server, mastodon.social, occurs most often. And because it’s the marquee Mastodon server it has the most peers.

+----------------------------------+-------------+-------+
| server                           | occurrences | peers |
+----------------------------------+-------------+-------+
| https://mastodon.social          | 11          | 49732 |
| https://fosstodon.org            | 1           | 33973 |
| https://octodon.social           | 1           | 29983 |
| https://infosec.exchange         | 2           | 26833 |
| https://indieweb.social          | 9           | 26279 |
| https://hachyderm.io             | 3           | 19911 |
| https://social.treehouse.systems | 3           | 18110 |
| https://journa.host              | 1           | 18021 |
| https://nerdculture.de           | 9           | 17984 |
| https://werd.social              | 2           | 13792 |
| https://dan.mastohon.com         | 2           | 13351 |
| https://masto.nyc                | 1           | 10917 |
| https://mastodon.archive.org     | 1           | 9582  |
| https://social.fossdle.org       | 1           | 8343  |
| https://devdilettante.com        | 12          | 6898  |
+----------------------------------+-------------+-------+

Here’s a query that uses mastodon_domain_block.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(d.domain) as "blocked domains"
from
  servers s
join
  mastodon_domain_block d
on
  s.server = d.server
group by
  s.server,
  s.occurrences
order by
  "blocked domains" desc

This one says: Again gather the origin servers in my recent home timeline, but this time ask each one how many other servers it blocks. Here we see that octodon.social, which happened to show up in my timeline when I ran the query, blocks a lot more servers than mastodon.social does.

+--------------------------+-------------+-----------------+
| server                   | occurrences | blocked domains |
+--------------------------+-------------+-----------------+
| https://octodon.social   | 1           | 510             |
| https://mastodon.social  | 8           | 181             |
| https://hachyderm.io     | 4           | 125             |
| https://infosec.exchange | 4           | 66              |
| https://nerdculture.de   | 1           | 36              |
| https://indieweb.social  | 4           | 23              |
+--------------------------+-------------+-----------------+

One could, and maybe at some point I will, comprehensively acquire and store this data. But meanwhile, how might it enhance the experience of reading recent Mastodon flow? Here’s what I’ve got so far.

We’ve already seen the first table which answers the question: “How many servers does each of the origin servers in my timeline block?” The second table answers a different question: “Which servers are most often blocked by the origin servers in my timeline?”

The blocking server list column shows an interesting mix of consensus and variation, and I think that will be something to explore comprehensively across the Fediverse. But for now I like how this view contextualizes what’s in my home timeline. Like the relationship graphs, it’s highly dynamic because my immediate network neighborhood is changing all the time. Whatever the current set of servers happens to be, though, I now have some clues about how connected each of those servers is and how aggressively each one is blocking others. This feels like a good first step toward mapping the wider Fediverse.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/