Ed Iacobucci on DayJet, from IT Conversations

In a 1991 BYTE review entitled Citrix’s New Multiuser OS/2 I wrote:

Citrix Systems, a team of IBM refugees led by OS/2 guru Ed Iacobucci, has extended OS/2 into the multiuser territory occupied by Unix, Xenix, QNX, Theos, and a variety of DOS-based operating systems.

That’s how I met Ed Iacobucci. He left Citrix in 2000 to found DayJet, a company that aimed to make real the vision of point-to-point air travel that James Fallows had championed in his book Free Flight.

In 2007 I interviewed Ed for IT Conversations. It was one of the most memorable of the podcasts I made for that series.

Ed died in 2013.

The other day, Alan Levine pointed me to Descript, a new tool for making podcasts with transcripts; it features a technique, first shown to me by Laurian Gridinoc, that uses timecoded text as the interface for audio editing.

I’ve been waiting for that technique to appear in a mainstream podcast production app, so when I heard Descript was doing this I jumped right into a remake of the Ed Iacobucci podcast. It was a remarkable experience in many ways.

There was no automatic transcription back then, so while I did a few experiments with Mechanical Turk, there were no transcripts of my shows. I uploaded the show’s MP3 file to Descript and it did a fantastic job of speech-to-text, faster and better than anything else I’ve tried. But that was just the start.

I’m listed as the post-production engineer on that original podcast, which meant I edited the audio for clarity: eliding phrases or sentences, removing ums and likes. Doing that in a waveform editor was laborious to say the least. With Descript I was able to do much more of that kind of editing, better and faster, because text selection as the interface to audio editing is the absolute holy grail.

But I wasn’t just editing the audio, I was also tuning the transcript at the same time: correcting the recognized text (pretty rarely!), tweaking the auto-generated speaker labels, refining the flow to strike a harmonious balance between the words you hear and the words you read.

There’s much more to say about Descript, which by the way deployed a significant upgrade in the middle of my project, but for now I’ll just say: Thank you! It’s a brilliant piece of software that enabled me to revisit one of my most treasured conversations and bring it to life in a way that people can now search for and read, as well as hear, with maximum fidelity. DayJet folded in 2008; it was a remarkable tale of innovation; here’s hoping Ed’s dream will come true.

Here’s a link to the new version of the show’s MP3 file. Here’s the interview as published by Descript. And here’s the HTML export of the transcript.


Phil Windley: Up next on Interviews with Innovators, host Jon Udell talks with Ed Iacobucci, co-founder of Dayjet. Widely known in IT circles as the co-founder of Citrix, he left in 2000 to pursue his interest in aviation. In 2000 he co-founded DayJet, a company whose mission is to deliver on-demand jet travel services.

From IT Conversations.

Jon Udell: I have an ulterior and extremely selfish motivation for having this conversation. So you’ve been to Peterborough when you came to visit us at BYTE. You had to schlep to Manchester and then drive an hour, right?

Ed Iacobucci: And getting to Manchester wasn’t even easy coming out of Florida.

Jon Udell: Now, what you probably didn’t know is that 20 miles west of Peterborough is the town I live in, which has a fabulous, and of course completely underutilized regional airport. We have in Keene an airport where we have, I think the third longest runway in the state.

It can land and does occasionally land Air Force One during political seasons. And back when you visited me, at BYTE in the, what was it like the late eighties, early nineties, we had commercial air service out of Keene that was subsidized by the essential air service program of the government. And so I was actually able to drive two minutes to my airport and I could be midtown Manhattan in an hour, quicker than anyone else could get there.

Ed Iacobucci: But I bet it was hard to get to Huntsville, Alabama, though.

You can fix one degree of freedom, but not both.

Jon Udell: At some point I picked up on this emerging air taxi movement and I read the James Fallows book, and I just reread it the other day to prepare for this conversation and I’ve been basically just desperate for this to play out.

I’m just really curious to know, first of all, how you got into this and second of all how it’s going and, and third of all, most importantly for me, cause it’s all about me, you know, when can I be using my own airport again the way I’d like to?

Ed Iacobucci: (laughs) Well, that’s the objective.

I can’t tell you that we’re gonna achieve it all overnight. It’s one of those frontiers that’s never been properly addressed. You know, I spent most of my career, as you did yours, trying to elevate the computational and communication capabilities that businesses have through systems and networks and such. But one thing that we never really have addressed, on a broad scale, is the same interconnectivity from a physical standpoint.

Jon Udell: And of course architecturally it’s the exact same style, right?

Ed Iacobucci: It’s an awful lot of similarities.

Jon Udell: Yeah. It’s the peer-to-peer air travel network.

Ed Iacobucci: That’s exactly what it is. I love talking to people that understand what I’m saying when I talk technology. It’s a different world that I’m involved in now, but basically, it is peer-to-peer versus hierarchical networks and all the same issues relating to efficiency and best routing in a dynamic environment.

That’s not the only issues, but those are some of the issues. That’s what attracted me to the project. I think if you remember when I left Citrix, one of the things that I wanted to do other than retire, which lasted about two months, that’s pretty boring actually. Retirement isn’t that’s cracked up to be. One of the things that I wanted to pursue, at least intellectually was where do we go with software as a service? Which remember we started exploring back in the late nineties. And Citrix, obviously that took on a life of its own.

And with ASPs and with a massively interconnected and computationally rich web, you know, one of the things that we’ve always done as technologists is focus on the evolution of infrastructure that people use for delivering things.

But there hasn’t been as much of a focus on vertically integrated, highly specialized systems, which, in many ways we always kind of poo-pooed. I mean, real technologists work on horizontal platforms, you know, kind of a macho attitude. In reality, that’s one step, but every pendulum swings.

And I think that the place where there’s a lot of value that can be added by a lot of smart people is, instead of building a highly horizontal platform that appeals to a lot of people, is go the other end of the spectrum and build a very highly specialized, vertical solution that solves one problem really well and leverage that platform not by making it available to a broad audience, cause almost by definition there isn’t one, but focus on the needs of a fully integrated service delivery. And that’s really what drove me to something that looks so different, like a service industry.

Jon Udell: The appeal of the air taxi idea though, is extremely broad. I mean, initially it’s not going to be something that most people, I guess, will be able to afford. But the idea can go extremely broad, right?

Ed Iacobucci: Oh yeah. If we actually dive into the market itself, I like to joke with people, saying that Citrix was a lot of fun, but man, was it hard to explain to people why you’d wanna do remote computing when they didn’t think they needed it.

This on the other hand is the exact opposite. Everybody understands exactly what it is that we’re trying to achieve: seamless point to point travel on a cost efficient basis. And they grasp the concept very quickly, but very seldom do most people think deeply enough in terms of what the complexities are that are behind it, because, as a lot of appealing concepts are, it’s easy to come up with an idea, but it’s really hard to make it real.

A lot of people have had this idea for many, many years. In fact, that’s probably one tha t’s been kicked around for ages, but it’s not until recently — and by recently, I mean, I don’t know, 15 years, 20 years — that we’ve had the kind of tools that you need to really tackle the problem. Not solve it, but just tackle it.

Jon Udell: So let’s go through what the enablers are. Now the Fallows book lays out two broad categories of technological enabler. One is a new generation of general aviation craft, which are smarter, faster, lighter, and safer than what we’ve had, kind of unchanged really, for 30 or 40 years.

So that’s the airplane part of it. And he focuses a lot on that. In fact, a lot of the book is the story of, in particular, Cirrus.

Ed Iacobucci: Sure.

Jon Udell: In parallel with that, and he talks less about it, but I’m hoping that we’ll talk a lot more about it, is this notion that you also have got now this thing called the internet. And one of the things that it can do is make demand visible and aggregate demand.

That if you could go to the internet and put a bid out that says, well, I’m here, and if there’s three or four other people within a 50 mile radius that wanna go to this other place, or various places within a 50 mile radius of some destination, then we can discover one another in a very dynamic way.

That was never conceivable until fairly recently. But I think that you’re talking about also operational routing capabilities.

Ed Iacobucci: There’s a lot of complicating factors and, from a broad brush basis, I agree totally. It’s the hardware and the communications and software, obviously, that all come together. But I mean, at it’s very root, first of all, let’s talk about just general trends. What the real enabler in all this is really what we’ve been doing for 20 years.

It’s Moore’s law, it’s cheap memory and it’s good communications. I mean, that’s really what enables manufacturers to build airplanes better. That’s what essentially the glass cockpits are. I used to joke about it when we first started working on this is that it’s, you know, it’s a thin client.

It’s three pieces of glass up in the front and what used to be individual steam gauges all became applications that run an operating system. It’s highly redundant and, you know, multiple data buses and all sorts of enhancements to guarantee a certain level of reliability but at its core it’s an application running on an operating system. So the IT gains that we’ve made, or the technology gains, I should say, in the last 20 years, come very strongly to play in terms of how these new airplanes are designed.

Now having said that, let me back off for a second and say, I have a slightly different opinion that it’s not necessarily the airplane that’s revolutionary because this is something that I’ve learned in talking to a lot of manufacturers. There’s been a lot of hoopla about, you know, these VLJs (very light jets) are really what’s gonna change the world.

In reality, it isn’t really the airplane. Other than a few things like the integration of information systems, it’s still the same basic laws of physics that come to play. It’s still a wing, it’s two turbo fan engines, it’s one or two pilots. Structurally and aerodynamically it’s not a massive change from what existed before. The cost points are better. The manufacturing advantage that a modern manufacturer can do by intelligently engaging (with the) supply chain. Precision design and manufacturing are incremental improvements. But in the final analysis, a VLJ might give you 10, 20% better capability than a pre-existing design.

The other thing the VLJ lets you do is build volume because, as you’ll soon see, all these models are based on large, dense networks.

Jon Udell: You need a big fleet of taxis for this to make sense.

Ed Iacobucci: You can’t do it with one or two, which is traditionally what people have done. And the problem traditionally is when people independently buy and then let out airplanes to the charter world is that they all wind up being personal things that are all different.

And if they’re all different, then you can’t optimize, you can’t do fleet maintenance programs, you can’t scale. So those are all barriers to scaling. So what you need is a large fleet of identical aircraft that are modern, that can be acquired in fleet quantities.

Part of what this revolution’s about, be it a VLJ or turbine or turboprop, I mean, the airplane technology isn’t as important as the fact that you can make them in large quantities.

Jon Udell: And you mean literally identical, or do you mean interoperable and interchangeable?

Ed Iacobucci: Literally identical. Well, it’s the same thing. From an operations standpoint, we abstract it a little bit and say, as long as they have the same operating characteristics, when you train on one, you can fly any of them. Whereas private aircraft on charter, every one of them is different because they have different instrument sets.

So on the aviation side, airplane design certainly benefited from technology, but it isn’t a revolution. It’s more of a rapid evolution, from a manufacturing standpoint. I think there’s a good shot that, you know, we can start seeing something that looks a little bit more like a revolution in that sense, that you can make a lot of them fast. Be that Honda, be that Eclipse, be that whoever, they’re all gonna wind up in the same place.

But the real revolution, I think, is often missed because it’s least understood. It’s in the business model. You and I have spent a lot of our lives in technology and we’ve seen the effect of creating or identifying new markets, new customers, new value propositions, and how things just spring up outta nowhere.

This is something that’s been part of our lives, our careers. But has not really played out very well , in many industries, one of which is aviation. The core business model of commercial aviation hasn’t changed at all.

They’re scheduled services. You can say they went to hubs and then Southwest did something a little bit less hubby, but it’s all about building a schedule and advertising and getting people to show up. Which is kind of like defining a big freight train that goes around on a schedule and and then you try to reach everybody possible.

And you have big wars in terms of who piles in and how many seats there are. And it’s a fixed market. I like to say, it’s the same one and a half million people that fly from Atlanta or Orlando every year. They fly on different airlines and they all fight each other over who gets the share.

But it’s not a growing market. It’s essentially the same zero sum game.

Jon Udell: And everyone’s going through the same hubs and those hubs are beyond capacity.

Ed Iacobucci: And the charter world, nothing changes. The tax subsidies change from year to year, so it’s more attractive or less attractive for people of means to buy airplanes and then give them to somebody to manage them, to help offset costs.

But it isn’t a fundamental sea change in model. So what we tried to do was kind of abstract that, and I actually had a kind of a false start after I left Citrix. I feel very fortunate that I’ve been able to be in a position where I could actually buy my own airplane which to me, it isn’t so much the luxury of having an aircraft.

It’s really the luxury of being able to schedule your time. That’s really what it’s all about. It has nothing to do with carpeting and entertainment systems or gold plated anything or, cause I’m very down to earth when it comes to that. You know, I travel in blue jeans all the time and t-shirt cause that’s the way I lived most of my life.

Jon Udell: But you go when you go when and where you want.

Ed Iacobucci: But I go when and where I want. That’s the real luxury. And that appeals to everybody. You don’t have to be wealthy to see the value. Everybody has an intrinsic value of time.

So when I first left, I engaged Nancy, my uh, I think when I first left Citrix, we were just married about then, so she was transitioning from fiance to wife, um, to manage our bigger airplanes while I tried to figure out a business plan to make it, you know, like business jets for the masses. And I’ve found out very quickly that that’s a really hard problem because the deck is stacked against you.

The aircraft are extremely expensive. You know, a Challenger 604 is about 25, 26 million new . It’s a very high capital cost. Everything is handbuilt. It’s very high level of service, very customized service. You can’t mass produce it in any reasonable way.

The expectations of the customers are high. The asset utilization’s low. And believe it or not, the competition is incredible because a lot of people have bought airplanes that, throw ’em at the charter to offset costs. So it’s just like all the elements of a crummy business all rolled up in one. And coming right outta software, to me it was a shock to see that this was really such a bad business environment. So basically I hung it up there, I said, nah, I don’t think I wanna spend a lot of time on this.

Then something happened. Two things actually, since you brought it up already. I read Fallows’ book and about the same time I went to Agenda 2001 and Vern Raburn who I knew from years past, most notably from Slate, although he was at Lotus and Symantec also, Vern Raburn was pitching his new sub- million -dollar jet for the masses.

That was his idea. I was skeptical, knowing a little bit about airplanes, but in the same panel we had Dr. Bruce Holmes who was the head of the NASA, well, one of the A’s in NASA is aeronautics, right, it’s not all just space.

He was the chief strategist for the aeronautics, the small airport systems and integration and technology to utilize underutilized assets that we have in our country.

Jon Udell: Just as an aside, for people who haven’t read the book, there’s a subplot about NASA and about Bruce Holmes and about Dan Goldin and about how these guys in a very, kind of underground way, were really seriously trying to figure out how to reinvent air travel and find a way to utilize these 5,000 regional airports that we have around the country.

Ed Iacobucci: Which is a huge capital investment that our country’s made.

Jon Udell: Exactly. Exactly.

Ed Iacobucci: And yet we’re still piling everything into 15 major airports.

Jon Udell: Exactly. I think Bruce Holmes is the guy who clocked all of his travel door to door, for a period of years.

Ed Iacobucci: He still does.

Jon Udell: And he found out that …

Ed Iacobucci: Bruce works for us now.

Jon Udell: He does? No kidding.

Ed Iacobucci: I hired him this year

Jon Udell: Wow.

Ed Iacobucci: To be our chief strategist for next generation air traffic systems.

Jon Udell: That’s awesome!

So the story is that he clocked all of his trips and he figured out that for anything under 500 miles, it was like just as fast to drive.

Ed Iacobucci: Exactly.

Which was, to me was a tremendous epiphany. Of course, we all know that.

Jon Udell: Yeah.

Ed Iacobucci: We just never really actually took the trouble to measure it all.

Jon Udell: Exactly.

So what did he say at Agenda that turned things around for you?

Ed Iacobucci: So I went to Agenda, which was as you know, it’s a technology conference, always has been, first and foremost. There’s a panel there, it was about aviation and it was a real oddball panel. They had Fallows, they had Dr. Holmes and they had Vern on the panel. I was already in the middle of trying to investigate how to do aviation. I was on the verge of giving up and I was looking for other things to do. And it dawned on me that what we needed to make point to point travel a reality was a platform. That’s what we were missing. The platforms were big, expensive handbuilt creatures that were bought by very, very wealthy people and, as I said, underutilized expensive assets.

But what we needed was something that was more akin to what the PC was before we had explosions of local area network. So it dawned on me what Vern was saying about his Eclipse –which was still a concept, this was 2001, he hadn’t really built anything yet– was an approach to what he called making the model T of business jets, and in other words, being the first mass produced precision manufactured aircraft.

And so that got me thinking, so I gave Vern and his wife a lift back on our airplane after agenda, back to Albuquerque with Nancy. And we went and took a look at his new facilities and immediately started thinking of software, network operating systems, and so forth. And that set the wheel in motion.

Jon Udell: So when you use the word platform, you are not just referring to the equivalence of the plane to the pc, but you are referring to the software systems that animate that piece hardware.

Ed Iacobucci: In the early days, I looked at the airplane as the pc. I mean, this is kind of oversimplification, but let’s say that the airplane was the mass produced pc, the engines were– originally Williams and then ultimately Pratt and Whitney was the Intel making the thing that had to be mass produced to make these things a reality.

It was incredibly important, high value component. And then I looked at ourselves as some kind of a blend of an operating system/ network operating system/ application that laid on top of all that and it delivered low cost, point to point travel.

Jon Udell: Because that’s what operating systems do is resource utilization.

Ed Iacobucci: Resource management.

Jon Udell: Okay, so we’re getting down to it. Describe what kind of network operating system you are building.

Ed Iacobucci: Okay, so here’s the thing. It didn’t take very long to look at the business models and say if you take a $25 million airplane and replace it with 25 $1 million airplanes, then it fixes a lot of problems. It gives you a lot of robustness in terms of being able to service individual demands. If you couple that also, by the way, with doing it by the seat instead of by the plane. That lets you interleave packets, or payloads. It became very clear that what we need to do is build a very large self optimizing network that would take a lot of other factors into consideration.

Like the physics of the airplane, the temperature, the loads. The beauty of aviation is that it’s like physics meets business. How much you can carry depends on what the temperature is. The altitude, the runway lengths. Safety is all expressed in terms of parameters that you can operate under, right?

But now the optimizer has to take it into account as it starts shuffling around different customers. So it’s not just a straight optimization, it’s an optimization that has to be done very quickly, real time, and it has an incredible number of constraints, many, many more than you would normally have.

So that’s why it’s so important. You have to eliminate all the optional constraints. Like the customers that want black jelly beans. Well they ain’t, they aren’t gonna be our customers. Right. You have to be willing to take any flight. With no special services. Cause special services complicate things.

Jon Udell: But that’s okay. Southwest has found that people are very happy to go along with that idea.

Ed Iacobucci: It’s the same, it’s the same notion. It simplifies, but, but when you’re talking about a real time on demand network, it becomes even more important. So I had to hire mathematicians. I actually had a group of folks that I’d had previous experience with in adaptive compression algorithms.

Really smart guys. We brought them on, gave ’em a challenge of their lifetime and actually for the rest of their lives. The problem is you never find a solution to the problem cause it’s what mathematicians called NP Hard. Which means that you can take every computer on earth that’s ever gonna be manufactured between now and the end of our lives and put it to work on the project til the end of our solar system, and you’re not gonna find the optimal solution. So you have to move from traditional hard optimization into heuristics that are aided by optimization techniques. And so then we brought in Georgia Tech, who I’m alumni of, and I’m on their engineering board, advisory board, brought in the operations research group. And these are really heavy hitter guys that have been doing optimization. Actually, some of ’em were my professors when I was at school, it was funny to see ’em again. These guys did optimization routines for large air carriers. But the difference, the essential differences is in a fixed schedule, there is a fixed schedule.

And optimizing assets around a fixed schedule is a vastly different problem than trying to determine the most optimal solution in real time for something that doesn’t have any fixed schedule and that’s morphing every request that comes in. Basically their response was, nobody’s ever done this, and these are the, you know, smart OR guys, and since then, MIT guys have said the same thing, you know, a lot of people.

It really is a problem that hasn’t been very adequately addressed. So now it looks really exciting for me cause you know, if it was just like chartering airplanes, that is not very exciting. But now we’ve got new science, new math, it’s a lot of green fields in areas that we can get collaboration of some major universities.

Top-notch people want to work and want to assign PhD students to work with us. An opportunity to set up a group of really smart people. And so we did that. And that’s how we got started. And that was the beginnings of the optimization. And the idea is that every reservation that comes in, you have to figure out how how to value these things.

So we have invented a market model where the pricing is based on how you value your time. If you value your time highly, you don’t wanna waste any time at all, you want to say, I’m gonna tell you I’m gonna be there at nine and I’ve gotta be there by 10 :30. And we know that it’s about an hour and a half flight. That’s a very expensive ticket.

But then if you say, I’m willing to leave as early as six in the morning and be there by noon for something that nominally takes an hour, then we can afford to give a very large discount for that seat.

Without knowing whether we’re gonna match anybody up or not. Statistically, we can determine pricing models. And we did that through, uh, I’ll explain in a second another area of science that we have to bring in. But, with that flexible model, that gave us the degrees of freedom that the optimizers could build some really effective schedules. And so then we used Georgia Tech team to do traditional formulations and run what we would determine in real time. They would run it over the weekend on their mesh of computers and determine if they could improve upon it with adding a lot of computational time.

So we’ve kind of determined that our algorithms and heuristics are within 5% or less of what ultimately might be optimal solutions for problems that you can test. Which are smaller problems. I mean, they can’t use the traditional techniques for a typical network because like I said, it’s far too big.

So for problems of, I don’t know, 10 airports and 20 aircraft, or 50 aircraft, they can test our optimality. It doesn’t work for 300 aircraft…

That was half of the problem. I don’t wanna bore you with a lot of details here, so

Jon Udell: It’s not boring, believe me. But feel free to move along.

Ed Iacobucci: The other side of the equation was that we also had to knock the problem of mesh and how you parallelize things cause these problems aren’t really well suited to be solved in parallel. So we had to build a strategy around that and build a mesh computing platform and all that stuff. That’s kind of like nuts and bolts, but that went on in parallel. But the big problem that we had then, which was one that had been nagging me from the beginning was, if we really succeed at building a new market, which I believe we are, is, how do I know how many people are gonna use a service that never existed before? Especially if it’s people that maybe hadn’t even flown before.

Jon Udell: Yeah, this is the the web 2.0 problem of, oops, I’m too successful.

Ed Iacobucci: Or, how do you determine what is gonna be successful or not? Cause these are really big asset investment decisions you have to make, you have to lay out airplanes, you have to lay out bases and people, and you have to go places and you have to do marketing investments and, believe me, there’s far more ways you can go bankrupt than there are that you can make money because as it turns out, jump forward a little bit, that these value networks, which is really what we’re building, is a value network between regional points.

The composition of the value network determines the load of the value network. Which is, you know, I guess intuitive to somebody that looks at networks. But from a social standpoint, it’s all the same rules apply. If I have a network of nodes A, B, C, and D and I add E, that could have an impact on all the nodes at their various times of the day.

But if I add F it could have impact some nodes differently than other nodes. So it’s an interrelated loading. A problem that’s very, very difficult to determine. So for that, I thought, you know, we’ve got all these guys that are taking basically chaos and organizing it into order so we can file flight plans and make it all look organized or actually be organized on the back end.

But what I need is another group of people that creates organized chaos or organized complexity that mimics the social behaviors of a region of travelers that can be used to test how well we can reorganize that into order.

Jon Udell: So you mean you need to actually be able to model…

Ed Iacobucci: …you need to model the markets and the demand. And that’s not a simple problem either cause it’s gonna depend on pricing and value, time/ value trade offs and density of your transportation network and what nodes you introduce and what the interactions between the nodes are, cause every city has a different effect than others.

For that, I thought, you know, I did study operations research way back when and never really used it heavily, but one of the things I always loved was modeling. And, in fact, before even did Citrix, my other alternative plan was doing something similar to what Maxis did, and that’s built a platform for large scale simulations and never did that.

They did a marvelous job, by the way. I’m thinking the kind of thing that Sim City represented in a generic sense is the kind of technology we wanna bring in. And that happens to be, the modern phraseology for that is complexity science.

When I was at school it was discrete time simulation, but then it got a biological twist to it. And so it became complexity science. At one point in time it was called chaos theory, but complexity science is more, uh, accepted terminology for it. And I had some people that I had done some work with, in fact, through one of my directors , we had both served on the board of a Bios institute in Santa Fe, which was an offshoot of the Santa Fe Institute, which was biological modeling or evolutionary modeling of large complex systems.

So we got in touch with some of those folks, and really, really smart guys that have been doing this all their lives, and we offered ’em a job. We said, Hey, come on board and we’re gonna build the most sophisticated regional traveler model that anybody’s ever built that’s gonna have a great deal of fidelity.

They were gonna use it not just for postulating the future, but actually for building a business. And integrating that as part of our long term planning systems. And so they came on and worked for about four years. And the result was this other piece of technology, which gets married to our system, which is called ABM.

We call it ABM. It’s not very imaginative. It’s agent based model.

Jon Udell: Not anti ballistic missile.

Ed Iacobucci: No, no. And it’s basically, I mean, the simplest way to describe it is, it’s Sim City on steroids. A commercial grade Sim City on steroids. Very targeted on the specific problem of regional travel.

So we’ve got like nine different types of agents or sims that operate and they’re populated according to IRS statistics and 10 square mile zones in the country. And they all have different rules on how they book trips and what flexibility they have.

And then we loaded on top of that a bunch of demographic data. Some we bought, some we got from DOT, some we got from IRS. And then we loaded on top of that all the schedules for all the airlines between all the airports in the contiguous 48. And then we actually developed some driving algorithms so we can estimate driving times. Not great circle, but actual routes and added congestion, time of day congestion through various nodes when driving.

Then we added train schedules and basically had this very sophisticated, very high fidelity model of the transportation options that you would face if you lived in one 10 square mile region of the US and needed to go to another one.

Jon Udell: That’s pretty amazing, that’d be a useful service in and of itself.

Ed Iacobucci: Well, the interesting thing is after we started developing that, that’s when we started doing a lot more work with the FAA, DOT, NASA, uh, actually not only Eclipse, but Pratt and Whitney, then GE when they came into the market, and Honda and Embraer. And so, you know, we’re a little startup company, but we’ve pretty much been in the business of trying to plot out the future, and the manufacturers started sending us design specs for power plants and hypothetical airplanes.

We actually found that we had to build some real Chinese firewalls. Cause we were getting all this confidential data from competitors to see how does this airplane design work in loading, you know? So we were able to make some really quantitative assessment.

Jon Udell: Well, this leads into another question I was getting ready to ask, which is that presumably you’re not gonna be the only such service operating on the planet. At some point there comes the need to federate these things, right?

It sounds like this other piece that you’ve built is potentially part of the answer to how you are able to make that happen.

Ed Iacobucci: Well, lemme back up one second. One thing that I need to add that’s a complexity or actually a reality check, of this whole thing, that was the biggest curve that we had to handle as we went through this. As a technologist, longtime technologist, I’d like to look at the problem, break down in pieces, and then figure out, in this case, how could we build an ASP service? Or how do we get the federation of this across a bunch of people that are buying assets?

Cause I don’t really wanna buy airplanes and run ’em. But it turns out, interestingly and surprisingly, one of the constraints that we have to live with is the regulatory environment we’re in today. Which winds up being a great, big reality test. You can’t build a business that requires this kind of an investment having in your business plan an assumption that you’re gonna change regulations to make it happen.

There’s no investor in the world that’s gonna invest in that. So one of the constraints we had on our business from the beginning was, this has to operate in the regulatory domain and environment that we have today. We better get pretty darn good at understanding how that works.

So right from the beginning, I started making trips to Washington, meeting with senior FAA folks. Tell them, man, listen, this stuff that NASA’s doing that we want to do in kind of a practical implementation of it, it makes a lot of sense, and we also recognize you can’t change all the rules for something doesn’t exist yet.

So help us understand how we follow the rules so that you can at least not feel like you have to stand in our way. So that turned out to be a long road, a very rewarding road, we built some really good relationships with the regulators. But the punchline is because there is such an interwoven relationship between operating control and safety, it’s very hard to say you own the airplane, you guys over there own the pilots, and we’ll run it.

Almost to play by the rules, you have to vertically integrate across the board. So you have to own the airplanes, hire the crews, train the crews, manage all the concurrency of crews, cause there’s limitations of how much they can fly in a day, a month, a year and all that stuff. So you have to manage, that’s all safety aspects, which can be done. It’s all fairly well understood rules, but you have to kind of integrate the whole thing.

Same with the airplane.

Jon Udell: So you do have to kind of own it for now.

Ed Iacobucci: You have to own it all, from the trip formation, the planning, and you have to do it fast. Because if you’re gonna build a system where 95% of the trips are managed by a computer, instead of vast rooms of human beings doing flight plans, which is non-scalable by the way, then you have to be able to embody all those business and regulatory processes in a system that uses the planning tools, and to develop stuff that conforms. So we wound up building a very vertically integrated company. We used our relationship with Eclipse to put in place a very long term, large scale order for airplanes.

We’re working with other manufacturers as well, but mostly Eclipse right now. This has been a long, long, long road that we’ve had to travel because not only do we have to build the core technology for routing and realtime optimization, but had to develop the market models, had to run market studies to back check the market models to make sure that we’re not inhaling our own exhaust or whatever the right metaphor is, right? We had the regulatory side that we had to develop a relationship over four years, and the culmination of all that is a system that we’re about to launch, and it’s gotten a lot of attention.

I’m trying not to hype it a lot. But it’s one of those things that everybody wants to talk about because it’s such a compelling story.

Jon Udell: We’ve been hearing about it for a long time now, it’s about to get switched on in some form that people can see and experience.

Ed Iacobucci: So what this means to you is that you’ll be able to go to that airport, in your community, and be able to say, I need to go to some other airport in my value network. I don’t think it’s the model that says you can fly anywhere you want to fly, everybody flies where they want and it’s totally open.

I don’t think there’s enough scale to make that make business sense, cause there’s economics of flying the airplanes, so you have to focus on what we call day ports, which are kinda like preferred airports where we do the per-seat services between, we’ve identified five and we’re gonna grow that dramatically as we go along based on customer input, actually the reservations that they make themselves tell us a lot about what direction they want to go in.

Jon Udell: So what are those original five non hub airports?

Ed Iacobucci: They’re all in Florida. Because we wanted to keep it all close by here so we could watch the launch carefully. It’s one in South Florida in Boca Raton, which is between Palm Beach and Fort Lauderdale, basically on the east coast. Lakeland, which is between Orlando and Tampa in the central region. Gainesville, which is where the University of Florida is and the business hub, getting up towards the panhandle, and Tallahassee, state capital, and then Pensacola way out on the panhandle. So it’s kind of like a network that initially spans the center of Florida from the east coast.

Jon Udell: And these are all characteristic of that group of 5,000 airports that Fallows talks about.

The regional, underutilized, you know, you drive up to one and there’s not a parking problem and there’s Cessnas sitting on the runway and stuff.

Ed Iacobucci: Absolutely. Two of ’em actually have commercial traffic, but it’s very limited traffic , very limited service I should say. They go mostly through Atlanta, like Tallahassee and Gainesville are the two. To get anywhere else in the network commercially, you go through Atlanta.

The value in this whole thing is saving time, right? I like to say that we’re really not an aviation company or a technology company or a marketing company. We’re in the business of time arbitrage. Some people sell us their time that they don’t need. And we resell it more expensively to other people that can use it. So we use the flexibility one customer has to give , another customer that doesn’t have flexibility a service and aggregate ’em in a way that makes the business make money.

And it comes together.

Jon Udell: When do you switch this thing on? What’s the first day?

Ed Iacobucci: We actually have started turning on some customers in a controlled fashion. Because what we want to do is guarantee service level. We discovered nobody wants to use this as a business critical service if they’re gonna get rejected frequently . They may not get accepted, you know, the capacity, then we may not be able to find a solution to their request in the prescribed time, which is 10 seconds now, by the way, from the time that you tell us what you wanna do, we’ll tell you whether we can do it or not in 10 seconds. And it’s all done virtually, right? Because there’s no real airplanes, or this is all done in advance. So we have a virtual mapping of what the capacity of the network is. And so we can construct a yea or a nay answer.

Jon Udell: So if you make a promise that you can do it, you don’t necessarily have complete knowledge at that moment about all of the components that are that are gonna satisfy that request, right?

You’re making a bet and, and it’s a good bet.

Ed Iacobucci: Well, it’s more than a bet. We know we can do it. We don’t know with what assets, we know we’ve got 10 virtual airplanes that are gonna be available that day. And we know we’ve made X number of commitments for these windows.

And we know we’ve got the virtual crews. Now the reality of it is, which one is it gonna be and what the sequencing is gonna be, and when we pick you up and drop you off, and what order the airplanes fly around in that’s changing all the time.

And what the load will be and therefore what your profit will or won’t be.

And every request that comes in disrupts the previous schedule, but it gives us an opportunity to find a better schedule. It not schedule. It’s planned basically. Schedule implies that we publish it. We don’t, that’s one of the things we had to do that made this all hard, is to operate under the on-demand rules we couldn’t publish a schedule.

Jon Udell: So that also leads to this other question, which is, I heard Esther Dyson talking about this topic recently, and she said, like you said, it’s really not about the airplanes, it is though an awful lot about the customer experience. What’s your sense of how people react to this highly dynamic environment in which I’m presuming you might get a number of updates as to, well now you’re leaving at such and such time, right?

Ed Iacobucci: Well that, you know, this is what we’ve been grappling with for the last six months.

We know how to make it work. We know how to scale it. We know how to make the business work, and we know approximately the demand that we’re gonna see because we’ve been investing a lot in developing markets and customers in advance. What we’ve been grappling with is how do we take this basically entirely fluid system…

Jon Udell: …and present it to people…

Ed Iacobucci: and present it in a way that people can reliably use it. And what that means is you just add more constraints and in other words, it’s not, you’re not gonna be flipped around all over the place. We start with, essentially, you negotiate as big a window as you can accept because the bigger the window, the cheaper the ticket, right? And it’s not a departure window. It’s a window in which we will complete the mission for you. And you basically have to block that outta your schedule. So if you have a lot of meetings during the day, you’re gonna necessarily be pushed to smaller windows, but that’s okay.

You’re more productive and then may cost a little more. But it allows you to be productive. If you don’t have any hard commitments one way or the other, you can give us a bigger travel window, and then somewhere in there, the flight will be generated.

Now, obviously you can’t say to somebody, you know, you negotiated all morning to get me from A to B, which is maybe an hour flight . People don’t feel comfortable with that, so you have to shrink that down. So then the next challenge operationally is, How do you shrink those windows down over time as you get closer to flight time?

Leaving enough space in the windows for disruption recovery, and not flooding on notifications. So there’s a fine balance and we’re learning that. And really what we’ve discovered is the night before we can crunch it down pretty much.

The big windows are useful for us in the days leading up to the travel because that’s when we’re doing most of our optimizing. When we get right before the day of travel, we can afford to crunch that down pretty far. So we crunch the window down to a much smaller size and we notify the customer the night before, that they have to be at the airport by time X, and they’ll be at their destination by time Y and leaving a little room for wiggling around for us in that. That’s our commitment and it’s a reasonable window, you know? So maybe an hour flight they may have to allocate two hours to get there. Which is far better than driving or anything else.

Jon Udell: Exactly, exactly. It’s like, what’s it compared to? So this is absolutely fascinating, Ed, this is so cool.

Ed Iacobucci: Yeah, it’s fun because it’s like, I feel like we’re kind of inventing a market, not inventing, but we’re pioneering a market and a customer set and it’s all those things that we’ve learned over the years, all packaged up in something that doesn’t look like technology, but it is all about deployment and acceptance of technology.

Jon Udell: Wow. Thank you so much for making the time to do this and congratulations on getting to this point. I know you said it was a long road, but on the other hand, I mean what you’ve done in I think not much more than five years…

Ed Iacobucci: right, it’s five years…

seems like a hell of a lot.

Well, it does when you look back on it, but you know, we’ll see. And the true test is gonna be the customers and I think they’re gonna be happy with it. So we’ll just stay tuned.

Jon Udell: All right. Well, thanks so much, Ed.

Ed Iacobucci: Take care, Jon.

Yeah, you too.

Phil Windley: You’ve been listening to Interviews with Innovators, with host Jon Udell.

The post-production audio engineer for this program was Jon Udell. Our website editor was Niels Makel the series producer is Niels Makel. This is Phil Windley. I hope you’ll join me next time for another great edition of Interviews with Innovators on IT Conversations.

When the rubber duck talks back

I’d been needing to refactor the pagination logic in the Mastodon plugin for Steampipe. After a couple of abortive tries, I took another run at it this week with the help of the latest generation of LLM-powered coding assistants.

Here was the problem. The pre-release version of the plugin consolidated pagination for many tables in one place. That was a good thing, but the downside was that there was only one Steampipe table which represented what should have been many of them. So you could say select * from mastodon_timeline but then you had qualify with where timeline = 'home' or where timeline = 'local' and so on. For a user of the plugin this was awkward, you’d rather say select * from mastodon_timeline_home or select * from mastodon_timeline_local, and reserve the where clause for more specific purposes. The v1 plugin made separate tables, but duplicated the pagination logic on a per-table basis. It worked, and was good enough to ship the plugin in time to demo at FediForum, but it obviously needed improvement.

ChatGPT-4 and Sourcegraph Cody

Since then, Sourcegraph has released its new coding assistant, Cody, which you can run as a VSCode extension or on sourcegraph.com. This set up the possibility for an interesting comparison. ChatGPT-4 builds on OpenAI’s LLM; Sourcegraph’s Cody, on the other hand, uses Anthropic’s Claude.

Another key difference is that ChatGPT only has the context you paste into it. Cody, sitting inside VSCode, can see your repository and has all that context. And if you index your repo, which is something Sourcegraph are willing to do for beta users on request, then Cody has access to what are called embeddings that represent the structure of your code in various ways. These embeddings, according to Sourcegraph, can powerfully enhance your LLM prompts. Even without embeddings, Cody offers quite a range of assistance, from a high-level overview of what your repo does to line-level improvement. It’s all packaged, in the extension, as a set of recipes behind buttons with names like Explain selected code, Improve variable names, and Smell code. I haven’t yet used these recipes enough to form solid opinions, though. For this exercise I used Cody mostly in a ChatGPT-like conversational way. In that mode, it’s wonderful to be able to select the code you want to talk about, instead of pasting it into the chat.

In both cases, as should be no surprise, it wasn’t enough to just ask the tools to consolidate the pagination logic. They were perfectly happy to propose solutions that could never work and might not even compile. So I began with a simpler version of the problem. Mastodon uses the same pagination machinery for APIs that return arrays of different kinds of results: Statuses (toots), Accounts, and Notifications. By focusing on these separately I reduced the duplicate pagination from 13 instances to three. Then, in a separate pass, I worked out how to collapse those into a single paginate function that accepted one of three data-fetching function parameters.

I tried to pay careful attention to prompts and completions as I went along, but in the heat of the action I didn’t do a great job of that, partly because I was switching back and forth between the two tools. But I’m quite happy with the result. There was one key insight in particular which, fascinatingly, I am hard pressed to assign credit for. Was it me or one of the assistants? I think it was me, but in a way that doesn’t matter, and isn’t the point of this story.

The key insight

Here was the insight. When I was building the transitional paginateStatus function, the first attempt returned results to the calling code in each table’s List function, which was responsible for streaming the data to Steampipe. This led to a series of detours to work around the problem that the returned data could be quite large, and chew up a lot of memory. That could probably be solved with a goroutine that would stream results back to the caller, instead of returning them as a batch. I tried prodding both LLMs to come up with that kind of solution, had no luck with several tries in both cases, but then came the insight. The helper functions could stream results directly to Steampipe, and just return nil or err to the calling List function.

With that dramatic simplication I was able to complete the phase 1 refactoring, which yielded three pagination functions: paginateStatus, paginateAccount, and paginateNotification. Phase 2, which consolidated those into a single paginate function, was a bit more prosaic. I did need some help understanding how the necessary switch statements could switch on the timeline types passed into the paginate function. Both assistants had seen lots of examples of this pattern, and both helpfully augmented my imperfect knowledge of golang idioms.

Partnering with machine intelligence

I came away with a profound sense that the real value of these assistants isn’t any particular piece of code that they get “right” or “wrong” but rather the process of collaborating with them. When you’re working alone, you have an ongoing conversation with yourself, usually in your own head. The point of talking to a rubber duck is to voice that conversation so you can more effectively reason about it. Externalizing your thinking in that way is intrinsically valuable. But when the rubber duck talks back, it’s a whole new game. As Garry Kasparov famously wrote:

The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and coaching their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

I’m not worried about robot overlords. Instead, I look forward to collaborating with robot partners.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Migrating Mastodon lists

My original Mastodon account was on mastodon.social. I chose it because it’s the largest instance, it’s operated by the developers of the service, and it runs the reference implementation of the software. I wanted to experience the fediverse, initially, as most people do, then consider whether to migrate to another server — and if so, to which one.

Last week I migrated to social.coop which, like cosocial.ca, it what Darius Kazemi calls a community-owned independent social media site. I don’t think the acronym COISMS will fly, but the business model is just what I’ve been looking for. I don’t want to use “free” services that make me the product they sell. I want to pay for a service that’s funded by my dollars, not by my data. That’s why I bailed on Gmail years ago, in favor of Fastmail, and it’s why I’m happy to spend $10/month — the price of a couple of cappucinos — to support the people who keep the lights on at social.coop and maintain civility.

The instructions for migrating from one server offer two paths: Profile redirect and Profile move. I chose the former, because I assumed I would retain API access to mastodon.social post-migration, and thus have the option to use Steampipe to query both accounts. Why do that? You can’t migrate your posts from the old server to the new one, only your followers and (optionally) your follows, bookmarks, blocks, and mutes. Steampipe can connect to two different Mastodon servers at the same time, so I figured I could run queries against both.

Lessons learned

That turned out to be a wrong assumption. I can reactivate my mastodon.social account by turning off the redirection, but I’m vague on the implications of doing that. Meanwhile it’s still possible to export my posts, so I can always reacquire them that way if needed.

Here was another wrong assumption. I think that if I’d chosen to move my account, instead of redirect it, my profile would have transferred to the new server. Instead it was on me to fill out the new profile. I’d created @judell@social.coop but, when I kicked off the migration, I hadn’t yet completed the profile. So when followers were notified that judell@social.coop was now following them, there was no picture or bio or verified website. That led some people to think it was a bogus account and block it. It wasn’t a catastrophe, and I’ve spoken with a few of them to resolve the matter, but it was an unforced error that would have been easy to avoid.

The view from here

Here’s a comparison of weekly activity on two servers, by way of the server dashboard.

What’s it like moving to a server with only 1% of the flow that happens on mastodon.social? Not too different! Because I’ve moved my followers and follows, and because I interact primarily with the home timeline and with lists, the experience is mostly the same. For example, here’s the graph of boost relationships among servers in my home timeline. It looks just like it did before.

It’s wonderful to have a portable social graph that you can bring along as you move around in the fediverse. The key differences are that I’m meeting new people, and spending more time on the local timeline. On a big server like mastodon.social the local timeline feels like the Twitter firehose, for those who remember when that was still a thing you could watch. I’d look at it occasionally, because it could be a source of useful serendipity, but mostly it was just too random. The local timeline on social.coop is still somewhat random, but I find it more useful — not only because it represents a far smaller population but also because the people who’ve signed up share a common interest in the coop model.

I want my lists

There was, however, one big obstacle. You can export lists from the old server and import them into the new server, but there’s no export/import for the accounts you’ve assigned to those lists. Since I rely heavily on lists to navigate Mastodon flow in a focused and intentional way, this was a problem I really needed to fix. I tried Eliot Nash’s masto-list-importer but when I tried to transfer my 460 person/list mappings I hit API rate limits. So instead I worked out a Steampipe-based solution which is a useful alternative, and also a good way to illustrate the steps involved.

Let’s assume that I want to migrate again, from social.coop to fosstodon.org. I’ve exported the names of my lists from social.coop, and imported them to fosstodon.org. Now I need to populate those lists. The SQL solution I came up with proceeds by steps, each being a common table expression (CTE) that accomplishes one part of a complex transformation, then passes results to the next step.

Step 1: Gather the accounts assigned to each social.coop list.

We start by joining two tables to enumerate accounts assigned to each list.

  with accounts_by_list as (
  select 
    a.acct as old_account,
    a.id as old_account_id,
    l.id as old_list_id,
    l.title as title
  from
    social_coop.mastodon_my_list l
  join
     social_coop.mastodon_list_account a
  on
    l.id = a.list_id
)
select * from accounts_by_list

The output, in accounts_by_list, is effectively a temporary table that you can review to make sure the results are as expected. The ability to verify the output of each phase of a CTE pipeline, before sending it to the next phase, is a key advantage of this approach. Nested subqueries are much harder to debug!

Step 2: Map the accounts on social.coop to accounts on fosstodon.org

Two transformations happen in this phase. If an account is a bare username, say personA, then it’s a social.coop account. On fosstodon.org that same account will be represented as personA@social.coop. Conversely if personB@fosstodon.org exists on social.coop, that account becomes the bare username personB there. All other accounts (e.g. personC@hachyderm.io pass through unchanged.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  select
    old_account,
    old_account_id,
    old_list_id,
    title,
    case 
      when old_account !~ '@' then old_account || '@social.coop' 
      when old_account ~ '@fosstodon.org' then replace(old_account, '@fosstodon.org', '')
      else old_account
    end as new_account
  from
     accounts_by_list
)
select * from adjusted_accounts_by_list

Step 3: Map the list titles to the new server.

The names of the lists are the same in both places, but their ids differ. For example, my Fediverse list is 1043 on social.coop and 6771. To invoke the API call that adds someone to a list, we’ll need to use the latter id.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  select
    a.*,
    l.id as new_list_id
  from
    adjusted_accounts_by_list a
  join
    fosstodon.mastodon_my_list l
  on
    a.title = l.title
)
select * from adjusted_list_ids

Step 4: Map the account ids to the new server.

Like lists, accounts on the new server also have different ids, and those are also required for the API call. We can find the new ids by joining the new_account column from step one with the table fosstodon.mastodon_my_following.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  -- as above
,
),
adjusted_account_ids as (
  select 
    a.*,
    f.id as new_account_id
  from
    adjusted_list_ids a
  join
    fosstodon.mastodon_my_following f
  on
    f.acct = a.new_account
)
select * from adjusted_account_ids

Step 5: Build the API calls

There are lots of ways to skin this cat. Because you can use Steampipe as a component, you could use any programming language with a Postgres driver to run this query, and call the Mastodon API with the appropriate list and account ids.

And then, of course, there’s always curl.

with accounts_by_list as (
  -- as above
),
adjusted_accounts_by_list as (
  -- as above
),
adjusted_list_ids as (
  -- as above
),
adjusted_account_ids as (
  -- as above
)
select
  'curl -X POST -H "Authorization: Bearer ***" https://fosstodon.org/api/v1/lists/' ||
  new_list_id ||
  '/accounts/?' ||
  'account_ids[]=' ||
  new_account_id ||
  '; sleep 1;' as command

To avoid throttling, I appended sleep 1 to each line. I saved the query in migrate-lists.sql, and exported the output to a file.

steampipe query --output csv migrate-lists.sql > migrate.sh

That was close, but not quite right. The output looked like this:

command
"curl -X POST -H ""Authorization: Bearer ***"" https://fosstodon.org/api/v1/lists/6771/accounts/?account_ids[]=279462; sleep 1;"
"curl -X POST -H ""Authorization: Bearer ***"" https://fosstodon.org/api/v1/lists/6771/accounts/?account_ids[]=109283421559929728; sleep 1;"

I need to remove the header line, remove the double quotes on either end of each line, and deduplicate the pairs of double quotes. For that I turned to ChatGPT (v4). It took a few tries to get it right, but soon enough it produced a working bash script that I’m very glad I did not have to write.

#!/bin/bash

# read the original script from migrate.sh and store it in an array
readarray -t original_script < migrate.sh

# remove the first line of the original script
modified_script=("${original_script[@]:1}")

# loop through the modified script array and replace double double-quotes with a single double-quote and remove double quotes at the beginning and end of each line
for (( i=0; i<${#modified_script[@]}; i++ )); do
    modified_script[$i]=$(sed 's/""/"/g' <<< "${modified_script[$i]}")
    modified_script[$i]=$(sed 's/^"//' <<< "${modified_script[$i]}")
    modified_script[$i]=$(sed 's/"$//' <<< "${modified_script[$i]}")
done

# print out the modified script with newlines
printf '%s\n' "${modified_script[@]}"

And that did the trick. With my lists now fully populated on social.coop, I'm now reading Mastodon from my new home server in the same list-oriented way as I was on the old one.

If you move servers and want to migrate the people on your lists, you should definitely try masto-list-importer first, and consider the method here as fallback if that doesn't work for you. I agree with Eliot Nash that list transfer ought to be baked into the Mastodon UI. People rightly complain that it can be hard to find the people you want to follow and topics you want to explore. You can follow hashtags but that can be too random. Assigning people to topical lists strikes a good balance. But only if those list assignments are portable!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

The Mastodon plugin is now available on the Steampipe Hub

When Twitter changed hands last November I switched to Mastodon; ever since I’ve enjoyed happier and more productive social networking. To enhance my happiness and productivity I began working on a Mastodon plugin for Steampipe. My initial goal was to study the fediverse writ large. Which people and which servers are powerful connectors? How do moderation policies work? What’s it like to join a small server versus a large one? These are important questions, and you can use the plugin to begin to answer them. But I soon realized that as a newcomer to a scene that’s been evolving for six years, and has not welcomed such analysis, I should start by looking for ways to enhance the experience of reading Mastodon. So I began building a set of dashboards that augment the stock Mastodon client or (my preference) elk.zone. And I’ve narrated that project in a series of posts.

Last week we released the plugin to the Steampipe hub. If you’ve installed Steampipe, you can now get the plugin using steampipe plugin install mastodon. The next phases of this project will explore using the plugin and dashboards in Steampipe Cloud, and speeding up the dashboards by means of persistent Postgres tables and Steampipe Cloud snapshots. Meanwhile, here’s a recap of what I’ve learned thus far.

See more with less distraction

While the dashboards use charts and relationship graphs, they are mainly tables of query results. Because Steampipe dashboards don’t (yet) render HTML, these views display plain text only — no images, no styled text. I’ve embraced this constraint, and I find it valuable in two ways. First, I’m able to scan many more posts at a glance than is possible in conventional clients, and more effectively choose which to engage with. When I described this effect to a friend he said: “It’s a Bloomberg terminal for Mastodon!” As those of us who rode the first wave of the blogosphere will recall, RSS readers were a revelation for the same reason.

Second, I find that the absence of images and styled text has a calming effect. To maintain a healthy information diet you need to choose sources wisely but, no matter where you go, sites deploy a barrage of attention-grabbing devices. I find dialing down the noise helpful, for the same reason that I often switch my phone to monochrome mode. Attention is our scarcest resource; the fewer distractions, the better.

There’s a tradeoff, of course; sometimes an image is the entire point of a post. So while I often read Mastodon using these Steampipe dashboards, I also use Elk directly. The Steampipe dashboards work alongside conventional Mastodon clients, and indeed depend on them: I click through from the dashboards to Elk in order to boost, reply, or view images. That experience is enhanced by instance-qualified URLS that translate foreign URLs to ones that work on your home server.

Manage people and lists

The ability to assign people to lists, and read in a list-oriented way, is a handy Twitter affordance that I never used much because it was easy to let the algorithms govern my information diet. Because Mastodon doesn’t work like that, lists have become the primary way I read the fediverse flow. Of the 800+ people I follow so far, I’ve assigned more than half to lists with titles like *Climate* and *Energy* and *Software*. To help me do that, several dashboards report how many of the people I follow are assigned to lists (or not).

I want as many people on lists as possible. So I periodically review the people I follow, put unassigned people on lists, and track the ratio of people who are, or aren’t, on lists. Here’s the query for that.

with list_account as (
  select
    a.id,
    l.title as list
  from
    mastodon_my_list l
    join mastodon_list_account a on l.id = a.list_id
),
list_account_follows as (
  select
    list
  from
    mastodon_my_following
    left join list_account using (id)
)
select 'follows listed' as label, count(*) from list_account_follows where list is not null
union
select 'follows unlisted' as label, count(*) from list_account_follows where list is null

When you read in a list-oriented away, as is also true when you read by following hashtags, there are always people whose chattiness becomes a distraction. To control that I’ve implemented the following rule: show at most one original toot per person per list per day. Will I miss some things this way? Sure! But if you’ve said something that resonates with other people, I’m likely to hear about it from someone else. It’s a tradeoff that’s working well for me so far.

Here’s the SQL implementation of the rule.

with data as (
  select
    list_id,
    to_char(created_at, 'YYYY-MM-DD') as day,
    case
      when display_name = '' then username
      else display_name
    end as person,
    instance_qualified_url as url,
    substring(content from 1 for 200) as toot
  from
    mastodon_toot_list
  where
    list_id = '42994'
    and reblog is null -- only original posts
    and in_reply_to_account_id is null -- only original posts
  limit
    40
)
select
  distinct on (person, day) -- only one per person per day
  day,
  person,
  toot,
  url
from
  data
order by
  day desc,
  person;

On the home timeline’s dashboard I’ve made it optional to include or hide boosts, which can be the majority of items. On the list-reading dashboard I’ve opted to always exclude them, but the SQL idiom for doing so — select distinct on (person, day) — is simple, easy to understand, and easy to change.

Visualize relationships

I’ve so far found three ways in which relationship graphs can make Mastodon more legible. First, in Mastodon relationship graphs, I showed how to use SQL-defined nodes and edges to show boost relationships among people and servers. In another article I used the same tools to map relationships among people and tags. And most recently I used them to explore server-to-server moderation.

In all three cases the format conveys information not directly available from tabular views. Clusters of interesting people pop out, as do people who share tags. And when I graphed servers that block other servers I discovered an unexpected category: some servers that block others are themselves also blocked, like infosec.exchange in this example.


blocked and blocking mastodon servers

The Steampipe combo of SQL-oriented API access and dashboards as code is a uniquely productive way to build relationship graphs that can unlock insights in any domain. As we’ve seen with Kubernetes, they can help make cloud infrastructure more legible. The Mastodon graphs suggest that the same can happen in the social networking realm.

Use RSS feeds

When you append .rss to the URL of a Mastodon account, or tag, you produce an RSS feed like https://mastodon.social/@judell.rss or https://mastodon.social/tags/steampipe.rss. These feeds provide a kind of auxiliary API that includes data not otherwise available from the primary API: related tags, which appear in the feeds as RSS category elements. Steampipe really shines here thanks to the RSS plugin which enables joins with the primary Mastodon API. This query augments items in account’s feed with tags that appear in each item.

with data as (
  select
    name,
    url || '.rss' as feed_link
  from
    mastodon_search_hashtag
  where
    query = $1,
    and name = query
  limit 
)
select
  to_char(r.published, 'YYYY-MM-DD') as published,
  d.name as tag,
  (
    select string_agg(trim(JsonString::text, '"'), ', ')
    from jsonb_array_elements(r.categories) JsonString
  ) as categories,
  r.guid as link,
  ( select content as toot from mastodon_search_toot where query = r.guid ) as content
from
  data d
join
  rss_item r
on
  r.feed_link = d.feed_link
order by
  r.published desc
limit 10

A similar query drives the graph discussed in Mapping people and tags on Mastodon.


blocked and blocking mastodon servers

In that example, surfacing the connection between a user, @themarkup, and a pair of tags, scotus and section230, was useful in two ways. First, it helped me instantly spot the item that I most wanted to read, which was buried deep in the search results. Second, it helped me discover a source that I’ll return to for guidance on similar topics. Of course I added that source to my Law list!

Own the algorithm

Everyone who comes to Mastodon appreciates not having an adversarial algorithm control what they see in their timelines. Most of us aren’t opposed to algorithmic influence per se, though, we just don’t like the adversarial nature of it. How can we build algorithms that work with us, not against us? We’ve already seen one example: the list-reading dashboard displays just one item per list per person per day. That’s a policy that I was able to define, and easily implement, with Steampipe. And in fact I adjusted it after using it for a while. The original policy was hourly, that was too chatty, so I switched to daily by making a trivial change to the SQL query.

In News in the fediverse I showed another example. The Mastodon server press.coop aggregates feeds from mainstream news sources. I was happy to have those feeds, but I didn’t want to see those news items mixed in with my home timeline. Rather, I wanted to assign them to a News list and read them only when I visit that list in a news-reading mindset. The fediverse offers an opportunity to reboot the social web and gain control of our information diets. Since our diets all differ, it ought to be possible — and even easy — for anyone to turn on a rule like *news only on lists, not timelines*. Steampipe can make it so.

Steampipe as component

When you ask people on Mastodon about these kinds of features, the response is often “Have you tried client X? It offers feature Y.” But that solution doesn’t scale. It would require massive duplication of effort for every client to implement every such policy; meanwhile, people don’t want to switch to client X just for feature Y (which might entail losing feature Z). Could policies be encapsulated and made available to any Mastodon client? It’s interesting to think about Steampipe as a component that delivers that encapsulation. A timeline built by SQL queries, and governed by SQL-defined policies, is a resource available to any app that can connect to Postgres, either locally or in Steampipe Cloud.

If you’re curious about the Steampipe + Mastodon combo, install the plugin, try out the sample queries, then clone the mod and check out the dashboards. Do they usefully augment your Mastodon reader? What would improve them? Can you use these ingredients to invent your own customized Mastodon experience? Join our Slack community and let us know how it goes!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mastodon timelines for teams

Welcome to episode 17 of the Steampipe+Mastodon series, in which we introduce a new subplot: timeline history. So far, the examples I’ve shown and discussed work with current timelines. We’ve seen SQL queries that fetch results from realtime calls to the Mastodon API, and Steampipe dashboards that display those results. But Steampipe isn’t just an API siphon, it’s also a Postgres database. As such it supports the transient tables created by Steampipe’s foreign data wrapper and plugins, but also enables you to create your own native tables as well. And you can use those native tables to accumulate data from the transient foreign tables.

Because saving and searching Mastodon data is a controversial topic in the fediverse — none of us wants to recapitulate Big Social — I’ve focused thus far on queries that explore recent Mastodon flow, of which there are plenty more to write. But nobody should mind me remembering my own home timeline, so a few weeks ago I made a tool to read it hourly and add new toots to a Postgres table.

Before you can add any toots to a table, of course, you’ve got to create that table. Here’s how I made this one.

create table mastodon_home_timeline as 
  select * from mastodon_toot_home limit 200

Once created, the table can be updated with new toots like so.

with data as (
  select
    account,
    -- more
    -- columns
    username
  from
    mastodon_toot_home
  limit 200
)
insert into mastodon_home_timeline (
  account,
    -- more
    -- columns
  username
)
select
  *
from 
  data
where 
  id not in ( select t.id from mastodon_home_timeline t )

To run that query from a crontab, on a machine where Steampipe is installed, save it as mastodon_home_timeline.sql, then schedule it.

15 * * * * cd /home/jon/mastodon; steampipe query mastodon_home_timeline.sql

That’s it! Now the number reported by select count(*) from mastodon_home_timeline is growing hourly.

I’ve only been collecting toots for a couple of weeks, and haven’t yet begun to explore that data yet; we’ll see what happens when we get there. Meanwhile, though, I want to show how such exploration can be a team exercise.

A friend of mine, whom I’ll call Elvis, shares my interest in teasing out connections among people, servers, and hashtags. He could capture his own timeline using the method shown here. But since we’ll be looking at this data together, we agreed that I’ll gather both our timelines. To enable that, he shared a (revokable) Mastodon API token that I’ve used to configure Steampipe with credentials for both our accounts.

connection "mastodon_social_jon" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

connection "mastodon_social_elvis" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

Steampipe’s foreign data wrapper turns each of these named connections into its own Postgres schema. Athough we happen to share the same home server, by the way, we needn’t. A team collaborating like this could pool timelines from mastodon.social and hachyderm.io and fosstodon.org and any other Mastodon-API-compatible server.

(You can do the same thing with AWS or Slack or GitHub or other kind of account by defining multiple connections. Steampipe makes API calls concurrently across parallel connections.)

With this configuration I can read my timeline like so.

select * from mastodon_social_jon.mastodon_toot_home limit 200

And Elvis’ like so.

select * from mastodon_social_elvis.mastodon_toot_home limit 200

If I want to query both in realtime, for example to count the combined total, I can use a SQL UNION. Or I can define umbrella connection that aggregates these two.

connection "all_mastodon" {
    plugin = "mastodon"
    type = "aggregator"
    connections = [ "mastodon_social_jon", "mastodon_social_elvis" ]
}

connection "mastodon_social_jon" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

connection "mastodon_social_elvis" {
    plugin = "mastodon"
    server = "https://mastodon.social"
    access_token = "..."
}

Now the query select * from all_mastodon.mastodon_toot_home limit 200 makes API calls on behalf of both accounts — in parallel — and combines the results. When we follow the resulting URLs in order to reply or boost, we’ll do so as individual identities. And we’ll be able to use Steampipe queries and dashboards in that same single-user mode. But we’ll also be able to pool our timelines and point our queries and dashboards at the combined history.

Will that prove interesting? Useful? That remains to be seen. I think it’s one of many experiments worth trying as the fediverse sorts itself out. And I see Steampipe as one laboratory in which to run such experiments. With SQL as the abstraction over APIs, aggregation of connections, and dashboards as code, you have all the ingredients needed to iterate rapidly, at low cost, toward shared Mastodon spaces tailored for teams or groups.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Visualizing Mastodon server moderation

In Mapping the wider fediverse I showed how a new table added to the Mastodon pluginmastodon_domain_block — enables queries that find which servers are moderating which other servers. For example, here are servers on nerdculture.de‘s list of offenders.

select
  server,
  domain,
  severity
from
  mastodon_domain_block
where
  server = 'https://nerdculture.de'

+------------------------+--------------------------------+----------+
| server                 | domain                         | severity |
+------------------------+--------------------------------+----------+
| https://nerdculture.de | roysbeer.place                 | silence  |
| https://nerdculture.de | *.10minutepleroma.com          | suspend  |
| https://nerdculture.de | *.activitypub-troll.cf         | suspend  |

...snip...

| https://nerdculture.de | shitposter.club                | suspend  |
| https://nerdculture.de | wolfgirl.bar                   | suspend  |
| https://nerdculture.de | www2.gabbers.me                | suspend  |
+------------------------+--------------------------------+----------+

I used the new query pattern to build a dashboard to report, for each server in the home timeline:

  • The list of blocking servers.
  • The count of blocked servers for each blocking server.
  • The list of blocking servers for each blocked server, and the count of those blocking servers.

This was a good start, but I had a hunch that relationship graphs would reveal non-obvious connections among these servers. And indeed they do! Here’s a view of the new graph-enriched dashboard.

The left pane shows the blocked servers shunned by a blocking server selected from those in the home timeline. The right pane has the inverse view: the blocking servers that shun a selected blocked server. These were the two categories I defined for the first iteration of these graphs.

category "blocking_server" {
  color = "darkgreen"
  icon = "server"
}

category "blocked_server" {
  color = "darkred"
  icon = "server"
}

Here’s the code for one of the graphs.

graph {
   node {
    base = node.blocking_server
  }
   node {
    base = node.blocked_server
  }
   node {
    base = node.blocked_and_blocking_server
  }
   edge {
    args = [ self.input.blocking_server.value ]
    base = edge.match_blocked_server
  }
   edge {
    args = [ self.input.blocking_server.value ]
    base = edge.match_blocking_server
  }
}

Here is the definition of node.blocking_server, which refers to category.blocking_server.

node "blocking_server" {
  category = category.blocking_server
  sql = <<EOQ
    with servers as (
      select distinct
        blocking_server,
        blocked_server
      from
        blocking_servers(${local.limit})
    )
    select
      blocking_server as id,
      blocking_server as title
    from
      servers
    order by
      blocking_server
  EOQ
}

The FROM clause calls blocking_servers(), a set-returning function defined like so.

create or replace function public.blocking_servers(max int) returns table (
  blocking_server text,
  blocked_server text
) as $$
with servers as (
  select distinct
    server as domain,
    'https://' || server as server_url
  from
    mastodon_toot
  where
    timeline = 'home'
  limit max
),
blocking_and_blocked as (
  select
    s.domain as blocking_domain,
    d.domain as blocked_domain
  from
    servers s
  join
    mastodon_domain_block d
  on
    s.server_url = d.server
)
select
  blocking_domain,
  blocked_domain
from
  blocking_and_blocked
order by
  blocking_domain, blocked_domain
$$ language sql

I thought these ingredients would suffice. But when I began poking around in the graphs made with these definitions, infosec.exchange behaved strangely. Someties it would appear as a blocking server, other times as a blocked server. I was missing a category!

category "blocked_and_blocking_server" {
  color = "orange"
  icon = "server"
}

As you can see in the graph, infosec.exchange not only blocks 73 servers, it is also blocked by two servers: religion.masto.host and weatherishappening.net. Why?

weatherishappening.net calls infosec.exchange a “HOSTING DEATH SPIRAL FASCIST ORGANIZATION” and blocks it with “Limited” severity.

religion.masto.host blocks infosec.exchange at the “Suspended” level, but does not explain why (“Reason not available”).

Although those servers could, in turn, be blocked by others, no such blocks appear in the block lists of my immediate server neighborhood.

> select count(*) from blocking_servers(100) where blocked_server = 'weatherishappening.net'
+-------+
| count |
+-------+
| 0     |
+-------+

> select count(*) from blocking_servers(100) where blocked_server = 'religion.masto.host'
+-------+
| count |
+-------+
| 0     |
+-------+

There is another switch hitter in my neighborhood, though. c.im is blocked by me.dm and octodon.social.

Why?

me.dm (Medium) has “Suspended” c.im for “Hate speech”.

octodon.social has “Suspended” c.im for “Reason not available”.

When the opinions and policies of your server differ from those of mine, we see different realities through our respective lenses. Could such fragmentation drive the fediverse-curious back into arms of Big Social? I’m sure that will happen — indeed is happening — to some degree.

But I hope that some of us, at least, will learn to thrive in diverse networks of online communities, aware of the kaleidoscopic interplay of filters but not overwhelmed by it. That skill will serve us well IRL too. To acquire it, we’ll need to visualize the operation of our filters. One great way to do that: SQL queries that drive relationship graphs.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mapping people and tags on Mastodon

In Mastodon relationship graphs I showed how to use Steampipe to map Mastodon network neighborhoods. When I use the word map here, I’m channeling Denis Wood’s The Power of Maps:

Every map shows this … but not that, and every map shows what it shows this way … but not the other way. Not only is this inescapable but it is precisely because of this interested selectivity — this choice of word or sign or aspect of the world to make a point — that the map is enabled to work.

The aspect chosen by those neighborhood maps is the boost — the Mastodon version of a retweet. One of the maps focuses on a selected instance that appears in the home timeline. It shows people who belong to that instance and who boost toots from people on the same or different instances.

The other map zooms out to show boost relationships among all the instances that appear in the home timeline. This view wouldn’t be legible if it included people, so it omits them in order to focus on server-to-server relationships.

These maps represent (or as Denis Wood emphasizes, “re-present”) a set of toots. They omit original toots to which nobody replies, and they also omit replies, in order to focus on boost relationships. What about replies? That would be a different map, one that might also be interesting to draw.

Meanwhile, though, I’ve come up with another map to display the tags that appear in the results of a Mastodon tag search, along with the accounts that use those tags. It proved its worth today when I was looking for perspectives on Section 230 of the Communications Decency Act. As you’ve likely noticed, the US Supreme Court is reconsidering Section 230. My understanding of the topic wasn’t current, I wanted to refresh it, and I especially wanted to test whether Mastodon could provide a useful alternative to a conventional web search.

One affordance that Mastodon provides: search for toots that use the #Section230 tag. Here are two ways to map the results of that search.

On the left is a conventional Mastodon view: a list of toots that match the query. In this case the article I ultimately wanted to read appears way down in that list. The toot that announced it was from The Markup, “a nonprofit newsroom that investigates how powerful institutions are using technology to change our society.” The article, Section 230 Is a Load-Bearing Wall — Is It Coming Down?, transcribes part of a conversation with two legal scholars whom I know to be reliable guides to Net-related issues.

On the right is my Steampipe-based Mastodon tag explorer. Working with the same data, it surfaced The Markup’s article in a way that brought it immediately to my attention. The first thing that caught my eye was the conjunction of two tags: #section230 and #scotus. Since the Supreme Court’s interest in Section 230 is what’s driving the current news cycle, I wanted to hear from legal scholars qualified to discuss the Court’s interest in Section 230. So the tag conjunction was a significant landmark.

The map displayed two nodes that connect to both #section230 and #scotus. How did I choose between them? My prior familiarity with The Markup led me to click on that node and visit the Markup’s Mastodon instance where I read the article.

Had I been following The Markup then, as I am now, I would likely have seen the article on the news list to which I’ve assigned The Markup’s account. But that wouldn’t have changed the experience of searching for the #section230 tag. The relationship graph works by reformulating the results of that search. It omits the text of toots that contain the tag, and the images in those toots, in order to highlight two aspects of the result list: people (or accounts) and tags. It contextualizes those tags by charting their relative frequency in the result list. And it attaches, to each tag node, a link to a new graph focused on that tag.

This “interested selectivity” enables the map to do its work: find accounts that use given tags. Like a tag node, an account nodes provides a link — in this case, to the account’s Mastodon home page. It also reports the accounts’ description using a property that appears when hovering the node. So if I were unfamiliar with The Markup I could reveal its description without leaving the graph. Here’s the query that adds that property to the node.

select
  note
from 
  mastodon_search_account
where 
  query = 'https://mastodon.themarkup.org/@themarkup'

+---------------------+
| note                |
+---------------------+
|  Watching Big Tech. |
+---------------------+

That query is embedded in another query that joins across two Steampipe plugins: one that wraps the Mastodon API and another that queries RSS feeds. That’s because, as noted in Mastodon, Steampipe, and RSS, the RSS feeds that Mastodon provides for tags enrich the results available from the core API.

Enabling SQL to query diverse APIs in a common way is one of Steampipe’s core superpowers. Enabling such queries to form the nodes and edges of relationship graphs is another. Used together, these two superpowers enable maps that select what is helpful, omit what isn’t, and thus “re-present” information for a given purpose.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

News in the fediverse

On a recent episode of You’re Wrong About, Sarah Marshall delivered a crisp summary of how the 24-hour news cycle came to be. This led me to realize that many among us do not remember when news was confined to time slots: the 7 o’clock news, the 11 o’clock news, the morning paper. I think it might be healthy to bring that back in some form. From the excitement I heard in their voices I gather that Sarah Marshall and her co-presenter Blair Braverman feel the same way. When I hear people complain about Mastodon’s lack of breaking news, I think: “Feature, not bug!” Maybe what the 24-hour cycle has been breaking is us.

So when press.coop launched today I was a bit conflicted. On the one hand, I would like to use Mastodon to read mainstream news headlines just as I once used RSS to do. (Full disclosure: press.coop is run as a public service by hello.coop whose founder, Dick Hardt, is a longtime friend.) On the other hand, when reading Mastodon timelines I’m enjoying NOT knowing what’s happening in the world right now.

What if you could exclude news from the home timeline, put it on a list, and go there when — and only when — in a news mindset? That’s a feature of the Fedilab client, I learned from Keith Soltys.

What would it take to implement the same idea in the Mastodon reader I’m developing? It couldn’t be just an extra WHERE condition, could it?

Well, actually, it could.

Problem solved. Now I can read news in Mastodon when and how I want, and never see it otherwise.

If you want that same control, you shouldn’t have to use a particular Android client, or Steampipe client, or any other. There’s a Cambrian explosion of such creatures right now. The more they can share important DNA, the better for all of us.

I hope that the Steampipe plugin for Mastodon, which enables the dashboards I’m building using Steampipe’s own dashboards-as-code system, can provide some useful common DNA. A rule like news only on lists, not timelines, once expressed in SQL, can be used (and thus not reinvented) by any kind of Steampipe (read: Postgres) client: psql (or any Postgres CLI), Metabase or Tableau or any Postgres-compatible BI tool, Python or JavaScript or any programming language. Steampipe is a versatile component. Its mapping from APIs to SQL can, in theory, provide the data layer for any application.

My Steampipe + Mastodon adventure notwithstanding, let’s hold on to that larger idea. The fediverse is our chance to reboot the social web and gain control of our information diets. Since our diets all differ, it ought to be trivial for anyone — in any client — to turn on a rule like news only on lists, not timelines. I’ve shown one way it can be trivial. Please show me others!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Protocols, APIs, and conventions

The Fediverse is wired together by protocols like ActivityPub and WebFinger which, as yet, I know very little about. That’s because the Steampipe plugin, which supports the dashboards I’ve been building and describing in this series, doesn’t require me to understand or use those protocols.

It does, however, require me to understand and use the Mastodon API. Mostly I use that API by way of the Go SDK for Mastodon (thanks, mattn!), sometimes I make REST calls directly. Either way, my read-only dashboards use a fairly small subset of the Mastodon API. The full API is quite broad and deep; it enables API clients to read from and write to Mastodon servers in all sorts of ways. Here are the chapters of the Mastodon API book: apps, accounts, admin, instance, search, statuses, timelines, notifications, oembed. These chapters define what’s common to all Mastodon clients, including web apps, phone apps, native OS apps, and Steampipe dashboards.

So far I’ve ignored protocol-enabled interop in order to focus on API-enabled interop. I’m aware that the Fediverse includes much more than just Mastodon. I intend to explore BookWrym, Friendica, Funkwhale, Lemmy, takahe, PeerTube, Pixelfed, PeerTube, and others in due time. But right now the Mastodon ecosystem is plenty to try to wrap my head around.

For example, there’s a new web client for Mastodon: elk.zone. With the recent addition of support for lists, it has become my favorite way to interact in Mastodon space. So naturally I wanted to be able to click through from Steampipe dashboards to Elk, and use it as an alternative to the batteries-included Mastodon web app.

It turned out to be easy to enable that integration. Not thanks to ActivityPub, and not even thanks to the API. It works thanks to a third level of interop at play: common patterns for account URLs and toot URLs.

Here’s the account URL for Ward Cunningham who hangs his shingle at mastodon.radio: https://mastodon.radio/@k9ox. But as we saw in instance-qualified Mastodon URLs, if you visit that URL directly — and if it’s not your home server — you can’t follow Ward there, or add him to a list. You’ll need to copy that URL, paste it into your home server’s search box, run the search, and arrive at an instance-qualified URL where you can follow him or add him to a list: https://mastodon.social/@k9ox@mastodon.radio. If you’re home is fosstodon.org this would instead be https://fosstodon.org/@k9ox@mastodon.radio.

Similarly here is one of Ward’s toots at mastodon.radio: https://mastodon.radio/@k9ox/109802968820955379. If you want to reply or boost or favorite, you can’t do it there. The URL you need is again one that routes through your home server: https://mastodon.social/@k9ox@mastodon.radio/109802969999396562. Note that the ids for the same toot differ! That difference surprised me and some others, and is a topic for another episode. Here I’ll just note that these two patterns govern how we interact when crossing server boundaries in Mastodon space using the stock web client.

When I started using Elk, another layer of pattern emerged. Here are those same URLs in Elk:

https://elk.zone/mastodon.social/@k9ox@mastodon.radio

https://elk.zone/mastodon.social/@k9ox@mastodon.radio/109802969999396562

As it turns out, I just needed to make two of the Steampipe plugin’s transform functions prepend elk.zone to the instance-qualified URLs, then make such prefixing a configuration option. Now when I visit Mastodon links from dashboards, to reply or boost or follow or enlist, I land in the Elk experience that I prefer.

ActivityPub and WebFinger are formal standards. I would describe the Mastodon API as a de-facto standard. But this prefixing maneuver is just a convention. It’s not guaranteed to work with another web client, and not even guaranteed to work across all URLs presented by the stock Mastodon client. That’s OK by me. Conventions are incredibly useful. The Twitter hashtag is just a convention, after all, inspired in turn by an IRC convention.

We’re in one of those Internet moments of rapid innovation, when new conventions can unlock emergent behaviors. It hadn’t even occurred to me that the Steampipe dashboards could support Elk. A few hours after I thought they might, they did. I’ve seen this kind of thing before, perhaps most notably when the blogosophere adopted <link rel="alternate" type="application/rss+xml" href="{feedUrl}"> to enable browsers to auto-discover RSS feeds. That happened about 20 years ago, and quite suddenly when a handful of leading blogging tools adopted the convention in a matter of days. It was a lot of fun to live through that era. If you missed it, enjoy the sequel that’s unfolding now!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Mapping the wider Fediverse

I began this journey convinced that Steampipe could help the Fediverse evolve, but not sure exactly how. My first thought was to use Steampipe’s API-wrangling superpower to study patterns of communication (and conflict) across the Fediverse. But as one of many Twitter escapees last November, I soon realized that the network I was joining reflected a culture that had been humming along nicely for six years and didn’t particularly want to be the object of sociological study.

As I argued in Autonomy, packet size, friction, fanout, and velocity, Mastodon bakes in certain kinds of friction for reasons. You’ve likely heard about a default unfriendliness to search, which is both a technical setting and a cultural choice that privileges the experience of current flow over the mining of past flow. Even more fundamentally, the ID of a toot not only differs from server to server but also obfuscates the toot’s date, another technical/cultural choice that means you can’t randomly access history by date. None of these frictions is insurmountable. They will be overcome for purposes good and bad. I hope and expect that communities will be able to choose their desired amounts and kinds of friction while still interoperating with others. But for my project it seemed that trying to survey the wider Fediverse wasn’t the right place to start.

So instead I began to explore a different way to read my home timeline. The dashboards that I’ve built and described in this series have become, for me at least, an effective way to scan recent Mastodon flow, then click into the stock client to post, reply, or boost. After overcoming a few obstacles, things are starting to feel like the Bloomberg terminal for Mastodon that I envision.

One of those obstacles was the awkward copy/paste/search of foreign toot URLs that was required in order to interact with them. That’s now overcome by Instance-qualified Mastodon URLs. Another obstacle was the difficulty of curating and effectively reading topical lists of people. The strategies described in Lists and people on Mastodon and Working with Mastodon lists have improved matters nicely. And relationship graphs turned out to be a more useful alternate view of the current flow than I had expected.

I think I’ve proven that a set of Steampipe dashboards, layered on a plugin that maps the Mastodon API to tables that the dashboards query, can improve the ability to absorb and react to Mastodon flow. An unproven corollary: Steampipe’s dashboards-as-code system is only one of potentially many clients of the Mastodon plugin. Any dashboarding system or web app or native app could tap into the same query capability to deliver still another way to experience the flow. But that’s a future episode.

Meanwhile, with a decent reading experience in place, this seemed like a good time to circle back to the question of surveying the wider Fediverse. To begin enabling that I added a couple of new tables to the plugin: mastodon_peer and mastodon_domain_block.

Here’s a query that uses mastodon_peer.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(p.peer) as peers
from
  servers s
join
  mastodon_peer p
on
  s.server = p.server
group by
  s.server,
  s.occurrences
order by
  peers desc

In Engish: gather the most recent 100 toots in my home timeline, count the occurrences of each origin server, then ask each origin server how many other servers it talks to. Unsurprisingly my home server, mastodon.social, occurs most often. And because it’s the marquee Mastodon server it has the most peers.

+----------------------------------+-------------+-------+
| server                           | occurrences | peers |
+----------------------------------+-------------+-------+
| https://mastodon.social          | 11          | 49732 |
| https://fosstodon.org            | 1           | 33973 |
| https://octodon.social           | 1           | 29983 |
| https://infosec.exchange         | 2           | 26833 |
| https://indieweb.social          | 9           | 26279 |
| https://hachyderm.io             | 3           | 19911 |
| https://social.treehouse.systems | 3           | 18110 |
| https://journa.host              | 1           | 18021 |
| https://nerdculture.de           | 9           | 17984 |
| https://werd.social              | 2           | 13792 |
| https://dan.mastohon.com         | 2           | 13351 |
| https://masto.nyc                | 1           | 10917 |
| https://mastodon.archive.org     | 1           | 9582  |
| https://social.fossdle.org       | 1           | 8343  |
| https://devdilettante.com        | 12          | 6898  |
+----------------------------------+-------------+-------+

Here’s a query that uses mastodon_domain_block.

with data as (
  select
    'https://' || server as server
  from
    mastodon_toot
  where
    timeline = 'home'
  limit 100
),
servers as (
  select
    server,
    count(*) as occurrences
  from
    data
  group by
    server
)
select
  s.server,
  s.occurrences,
  count(d.domain) as "blocked domains"
from
  servers s
join
  mastodon_domain_block d
on
  s.server = d.server
group by
  s.server,
  s.occurrences
order by
  "blocked domains" desc

This one says: Again gather the origin servers in my recent home timeline, but this time ask each one how many other servers it blocks. Here we see that octodon.social, which happened to show up in my timeline when I ran the query, blocks a lot more servers than mastodon.social does.

+--------------------------+-------------+-----------------+
| server                   | occurrences | blocked domains |
+--------------------------+-------------+-----------------+
| https://octodon.social   | 1           | 510             |
| https://mastodon.social  | 8           | 181             |
| https://hachyderm.io     | 4           | 125             |
| https://infosec.exchange | 4           | 66              |
| https://nerdculture.de   | 1           | 36              |
| https://indieweb.social  | 4           | 23              |
+--------------------------+-------------+-----------------+

One could, and maybe at some point I will, comprehensively acquire and store this data. But meanwhile, how might it enhance the experience of reading recent Mastodon flow? Here’s what I’ve got so far.

We’ve already seen the first table which answers the question: “How many servers does each of the origin servers in my timeline block?” The second table answers a different question: “Which servers are most often blocked by the origin servers in my timeline?”

The blocking server list column shows an interesting mix of consensus and variation, and I think that will be something to explore comprehensively across the Fediverse. But for now I like how this view contextualizes what’s in my home timeline. Like the relationship graphs, it’s highly dynamic because my immediate network neighborhood is changing all the time. Whatever the current set of servers happens to be, though, I now have some clues about how connected each of those servers is and how aggressively each one is blocking others. This feels like a good first step toward mapping the wider Fediverse.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Images considered harmful (sometimes)

The Mastodon dashboards I’ve been developing and describing in this series are backed by a Steampipe plugin that translates SQL queries to Mastodon API calls. Like all Steampipe plugins you can use this one to run those queries in all sorts of ways: from psql or another Postgres CLI (perhaps via cron, perhaps in a CI/CD pipeline); from Metabase or Grafana or any Postgres-compatible BI tool; from Python or JavaScript or any programming language. The Steampipe core is a versatile software component that you can plug into just about any environment.

There’s also, of course, Steampipe’s dashboards as code approach which powers the alternate Mastodon UX that I’ve been exploring in this series. You can think of this dashboarding tool as a primitive web browser with a freakish talent for querying Postgres and weaving SQL results into widgets such as infocards, input controls, charts, tables, and relationship graphs. You compose widgets using HCL (Hashicorp Configuration Language) instead of HTML, arrange them using basic layout syntax, and view them by connecting your browser to the local Steampipe dashboard server or to cloud.steampipe.io.

The alternate Mastodon UX built this way was aptly described (thanks again Greg Wilson!) as A Bloomberg terminal for Mastodon. As a reminder, the stock Mastodon web client looks like this.

And the pages served by the Mastodon dashboards look like this.

I can scan the latter view far more effectively.

Now, I’ve been trying out a bunch of alternate Mastodon clients lately. I love Ivory on my phone, and Elk in the browser, and I use them when I want an experience that feels social-media-ish. But I don’t use Ivory or Elk or Toot! or the stock Mastodon web clients (browser, phone) when I want an experience that feels RSS-reader-ish.

For some information landscapes I want to unfold a big map, spread it out, and scan for points of interest. Social media is that kind of landscape, RSS readers were the first way I surveyed it effectively, and these Mastodon dashboards are becoming my new way.

When I compare those two screenshots though, it’s not just the density of the latter that I notice, but also the absence of pictures. At first I fretted about that. Could the dashboard pages render full-strength HTML? Perhaps they will, and there are a few different ways that could happen, but meanwhile I’ve come to embrace the text-only constraint. When I’ve got my map unfolded, and I’m scanning my home timeline or my lists for things to click into, images can be a distraction. I wouldn’t have chosen to omit them, but I find that their absence enables me to focus very effectively on who is speaking, and what they are saying or boosting.

There is also, of course, the graphical view afforded by relationship graphs. These feel very map-like in the way they reveal clusters of people interacting with one another. I’m finding them more useful than anticipated.

But the text that I read in these dashboards is image-free for now. And I think it’s having a calming effect. There are plenty of times when I want images, don’t get me wrong, and that’s partly why I use a mix of conventional Mastodon clients. But there are times when I want to dial down the clutter, just like there are times when I switch my phone to monochrome. Two things can be true: The pictures we share with one another are a source of joy, and they are sensory overload. I’ll make them optional here when I can, and would like to have the option to use all social media interfaces in text mode.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Working with Mastodon lists

Since the early days of the blogosphere I have cherished the ability to view the world through the eyes of people more qualified than me to understand and explain what happens in particular domains. Although Twitter lists were a great way to collect people who provide those perspectives, I made little use of them. Prompted by Martin Fowler’s frustration with lists I described my first steps toward reviving the technique in Lists and people on Mastodon.

First I encapsulated the relevant APIs in a pair of tables provided by the Steampipe plugin for Mastodon: mastodon_list and mastodon_list_account. Then I used those tables to enhance the Mastodon dashboard in a couple of ways. The followers and following tabs now show me who is or isn’t on a list.

And I can use the list tab to read recent posts from people on each list.

With these basic building blocks in place I want to be more intentional about curating these lists. To that end I thought I’d share the lists I’ve built so far, and invite suggestions. You can export your lists from the Mastodon web client using Preferences -> Data export -> Lists. The resulting CSV file has two columns: the name of a list and the Mastodon handle for a person.

list person
Library liza@post.lurk.org
Library alexwade@fosstodon.org
Library jdiffin@glammr.us

Here’s the export I really wanted.

list person server url note
Library liza post.lurk.org https://mastodon.social/@liza@post.lurk.org Technologist in Residence at the Harvard Library Innovation Lab. Interactive fiction, generative art, Democratic political tech. Amateur scholar of fringe utopian sci-fi. I will pet your dog.
Library alexwade fosstodon.org https://mastodon.social/@alexwade@fosstodon.org Librarian, open science advocate, geek. VP Data Products, @digitalscience (Seattle, WA) Formerly: AllenAI (AI2), CZI, Amazon, Microsoft Research, UW
Library jdiffin glammr.us https://mastodon.social/@jdiffin@glammr.us Head of Library Technology Services Section, NLM. Opinions are my own.

It’s easy to create that list using the Steampipe plugin for Mastodon.

steampipe query “select title as list, username, server, instance_qualified_account_url, note from mastodon_list l join mastodon_list_account a on l.id = a.list_id order by list, username” –output csv > mastodon_lists.csv

(I’m using instance-qualified URLs relative to my home server, if your home is elsewhere than mastodon.social you’ll want to adjust the links accordingly.)

I’ve uploaded the output of that query to a Google sheet. The most recent addition to my climate list is Peter Gleick who was quoted yesterday in Farhad Manhjoo’s What Will ‘Weather Whiplash’ Mean for California? I’d been looking for an expert perspective on California water policy, so I checked to see if Peter is on Mastodon, found that he is, and added him to my climate list. Here’s an example of the kind of perspective I’m looking for.

California reservoirs have captured enough water from the recent storms to supply all urban water use in the state for a year. Not counting the massive snowpack. Those claims that “all the water is being wasted flowing to the ocean” is nonsense. — Mastodon link

Of course it’s a complex issue, and I don’t think I’m wrong to be very interested in strategies for capturing more rainfall and using it to recharge aquifers. But this was a useful reality check.

Who are the experts you follow in various domains? Can I cherrypick from your lists? And can we imagine a way of remixing and sharing these lists? I’m looking at you, Lucas Gonze, with fond memories of how Webjay did that for MP3 playlists. Can we do that for Mastodon lists too?


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Of course the attention economy is threatened by the Fediverse

Megan McArdle says this week, in the Washington Post, that “Twitter might be replaced, but not by Mastodon or other imitators.” I’m not linking to the article, you can easily find it, but that title is all we need for my purpose here, along with this bit of context: she has 93K followers on Twitter.

Nobody wants to walk away from that kind of audience. Well, almost nobody. Sam Harris’ recent Twitter exit is a rare example of someone concluding that a large follower count is a net negative. If I were in his shoes I’m not sure I’d be able to do the same. When my own audience was at its peak — at BYTE during the dawn of the Internet, then at InfoWorld in the early years of the blogosphere — I could press the Publish button on my blog and watch in realtime as the responses rolled in on waves of dopamine. It’s addictive, there’s never enough, you’re always looking for the next hit.

When Twitter started, that momentum carried forward for a while. I never racked up a huge follower count — it maxed out just shy of 6K — but most of those people followed me early on, thanks to the the ad-supported publications that had brought me to their attention. My Twitter following reached a plateau years ago. Did I wish for 100K followers? Sure, I’d be lying to pretend otherwise. But gradually I came to see that there was a sweet spot, somewhere between (let’s say) 200 and 15,000 followers, where it was possible to enjoy the kinds of pleasant and stimulating interaction that I’d first experienced in web forums and the blogosophere.

Until it wasn’t. Like a frog in slowly boiling water, I failed to notice how the Twitter experience degraded over time. Fewer and fewer of my 6K followers corresponded regularly, and my social graph there became stagnant. For me the Mastodon reboot has been a delightful replay of the early blogosphere: new acquaintances, collegial discussion, positive energy.

If you occupy a privileged position in the attention economy, as Megan McArdle does now, and as I once did in a more limited way, then no, you won’t see Mastodon as a viable replacement for Twitter. If I were still a quasi-famous columnist I probably wouldn’t either. But I’m no longer employed in the attention economy. I just want to hang out online with people whose words and pictures and ideas intrigue and inspire and delight me, and who might feel similarly about my words and pictures and ideas. There are thousands of such people in the world, not millions. We want to congregate in different online spaces for different reasons. Now we can and I couldn’t be happier. When people say it can’t work, consider why, and who benefits from it not working.

Here’s a graph of the Fediverse as it appears from my perspective right now.

It looks and feels healthy and it’s working just great. I don’t want us to replace Twitter, or imitate it. I want The Internet Transition that I hope is underway.

Mastodon relationship graphs

The new release of Steampipe is all about relationship graphs. Our blog post shows how these graphs provide contextual awareness for devops and security pros who can now see all the resources related to an EC2 instance, or determine at a glance whether the permissions related to an IAM role are properly scoped. As always, developers can explore and remix the code that builds these graphs, and adapt the idioms for their own purposes in any data domain.

These relationship graphs are driven by SQL queries that define nodes and edges. Such queries can use any column of any table provided by any Steampipe plugin to form nodes, and then edges between nodes. If you want to see connections among the people and objects represented by diverse APIs, you can now use SQL idioms to graph them. The only limit is your imagination.

Naturally I imagined graphing Mastodon relationships. So far I’ve built two graphs that visualize my home timeline. Here’s the first one.

Here we’re looking at the most recent 50 boosts (the Mastodon version of retweet) in my homeline. This is the query to find them.

select
  *
from
  mastodon_toot
where
  timeline = 'home'
  and reblog_server is not null
  limit  50

If we focus on Brian Marick we can see that:

  • Brian belongs to mastdn.social
  • Brian boosted a post by Tim Bray
  • Tim belongs to hachyderm.io

So this graph shows people on a selected server boosting people on other servers. In this case mastdn.social is the selected server, but we can refocus the graph on any other server that’s sending boosts.

The second graph zooms out to show the web of boost relationships among servers. If anyone on infosec.exchange boosts anyone on mastodon.world, there’s an edge connecting the two nodes. Although it’s not happening anywhere in this graph, the arrow can point both ways and would if anyone on mastodon.world were also boosting anyone on infosec.exchange.

Let’s build up the first graph step by step.

Step 1: Identify the selected server

Here’s the definition of the node that represents the selected server.

node {
  category = category.selected_server
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      server as id,
      server as title,
      jsonb_build_object(
        'server', server
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Per the documentation, a node’s query must at least select a column aliased as id. Here it’s the server column in a row returned by the above query. I’ve packaged that query into a SQL function, mastodon_boosts, to hide details (timeline = 'home' reblog_server is not null limit 50) and make it easier to focus on what’s special about each node. In this case the special quality is the server column that gives the node its identity matches the selected server.

If the graph block includes only this node, and mastdn.social is the selected server, here is the rendering. Not much to see here yet!

The node defines a bag of properties that can be any of the columns returned by the underlying query; these appear when you hover the node. The node also refers to a category that governs the node’s icon, color, and link. Here’s the category for the selected server.

category "selected_server" {
  color = "darkgreen"
  icon = "server"
  href  = "https://{{.properties.'server'}}"
}

Step 2: Identify boosted servers

Now we’ll add boosted servers. This node uses the same set of records: the 50 most recent boosts in my feed. Again it finds just those whose server column matches the selected server. But the id is now the reblog_server which is the target, instead of the origin, of boosts from the selected server.

node {
  category = category.boosted_server
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_server as id,
      reblog_server as title
    from
      mastodon_boosts()
    where
      server = $1
    EOQ
}

Here’s the graph with both selected_server and boosted_server nodes. We’ve used another category to differentiate the boosted nodes.

There’s only one selected server but it can send boosts to more than one boosted server. The default rendering folds them into one node but you can click to unfold and see all of them.

Step 3: Identify people who boost others

Where are the people? Let’s add them next, starting with the people who are sending boosts.

node {
  category = category.person
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      username as id,
      display_name as title,
	    jsonb_build_object(
        'instance_qualified_account_url', instance_qualified_account_url
      ) as properties
    from
      mastodon_boosts()
    where
     server = $1
    EOQ
  }

The username column gives the node its identity. Note also the property instance_qualified_account_url. That’s the synthetic column we added to the Mastodon plugin last time to ensure that links to people and toots will work properly in the Mastodon client. Because it’s included in a property here, and because category.person refers to that property, links representing people in the graph will resolve properly.

Step 4: Identify people who are boosted

This node takes its identify from the reblog_username column, and uses the synthetic column instance_qualified_reblog_url to provide the link.

node {
  category = category.boosted_person
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_username as id,
      reblog_username as title,
      jsonb_build_object(
        'instance_qualified_reblog_url', instance_qualified_reblog_url
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Step 5: Connect boosters on the selected server to that server

So far we’ve seen only nodes, whose queries minimally return the id property. An edge connects nodes by way of a query that minimally returns columns aliased to from_id and to_id.

edge {
  sql = <<EOQ
    select
      username as from_id,
      server as to_id,
      'belongs to' as title
    from
      mastodon_boosts()
  EOQ
}

You’ll also want to provide a title to label the edge. Here this edge occurs twice to represent “John Mashey belongs to mstdn.social” and “Brian Marick belongs to mstdn.social.”

Step 6: Connect people on boosted servers to their servers

This edge works the same way, but captures the relationship between boosted people and their servers.

edge {
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      reblog_username as from_id,
      reblog_server as to_id,
      'belongs to' as title
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

Step 7: Connect boosters to the people they boost

Finally we add an edge to connect boosters to the people they boost.

edge {
  category = category.boost
  args = [ self.input.server.value ]
  sql = <<EOQ
    select
      username as from_id,
      reblog_username as to_id,
      'boosts' as title,
      jsonb_build_object(
        'reblog_username', reblog_username,
        'reblog_server', reblog_server,
        'content', reblog ->> 'content'
      ) as properties
    from
      mastodon_boosts()
    where
      server = $1
  EOQ
}

And now we’ve completed the first graph shown above.

Graphing GitHub relationships

You can use this grammar of nodes and edges to describe relationships in any domain. Here’s a graph that looks across all the Steampipe-related repos and shows recently-updated PRs from external contributors.

And here’s one that uses any Steampipe plugin to show recently-updated pull requests for a selected repo.

These two views share a common SQL query and serve complementary purposes. The table is handy for sorting by date or author, the graph highlights one-to-many relationships.

Lifting the burden of context assembly

In What TimeDance got right I mourned the passing of a tool for scheduling meetings that had excelled at bringing together the messages and documents related to a meeting. I called this “context assembly” — a term I’d picked up from Jack Ozzie, cofounder of Groove, another collaboration tool whose passing I mourn. Context assembly is hard work. Too often the burden falls on people who only need to use that context and would rather not spend time and effort creating it.

We’ve seen how SQL can unify access to APIs. Now it can also help us see relationships among the data we extract from those APIs.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Instance-qualified Mastodon URLs

In Lists and people on Mastodon I showed how I added a list column to the following tab of the Mastodon browser I’m building. That was a step in the direction of easier and more powerful list management. It enables me to see whether the people I follow are assigned to lists, and to consider who should be on a list (or perhaps on a different list).

Today, as I began to use that new affordance in earnest, I discovered a new challenge. In order to assign someone to a list, or change a list assignment, I clicked the link in the account_url column to open that person’s profile in the Mastodon web app. That was fine for accounts on my home server, mastodon.social. An account URL like Shelley Powers’ https://mastodon.social/@burningbird brings me to Shelley’s profile on my home server where the list manager is available.

But if I’m following someone elsewhere, like Ward Cunningham at https://mastodon.radio/@k9ox, the account URL brings me to Ward’s profile on that server where the list manager isn’t available. In order to assign Ward to a list I had to capture his account URL, paste it into the search box in my home server’s web app, and then click the resulting link: https://mastodon.social/@k9ox@mastodon.radio.

That got old real fast, so I adjusted the following tab to display the latter flavor of URL which I’ll call an instance-qualified URL.

Steampipe provides a few ways to make that adjustment. As a user of the dashboard, you can use Postgres’ regular expression functions to do the transformation in the SQL query that drives the view. But you’d rather not have to. It’s much nicer if the plugin does that for you, so the SQL can just refer to a column called instance_qualified_url.

I chose the latter approach. As the author of a Steampipe plugin you want to make life as easy as possible for users of the plugin. When you’re the author of both the plugin and the dashboard, as I am in this case, you can enjoy a nice virtuous cycle. As the dashboard evolves you discover ways to improve the plugin, which leads to more use of the dashboard, which suggests further opportunities to improve the plugin. I have been greatly enjoying the coevolution of these two components!

Adding a new column to a Steampipe table

To make the change, I extended the structure that defines the columns of the tables mapped from Mastodon Account API. A Steampipe plugin defines columns using a list of structs like this.

...,
{
	Name:        "url",
	Type:        proto.ColumnType_STRING,
	Description: "URL for the account.",
},
...,

That struct says: “When the name of a top-level field in the API response is url, tell Steampipe to make a database column with that name and with the Postgres type text.”

You can also transform values in API responses to synthesize new columns that don’t appear in API responses. Here’s the struct I added for this case.

...,
{	
	Name:        "instance_qualified_account_url",
	Type:        proto.ColumnType_STRING,
	Description: "Account URL prefixed with my instance.",
	Transform:   transform.FromValue().Transform(instanceQualifiedAccountUrl),
},
...

That one says: “Send the API response to the transform function instanceQualifiedAccountUrl, and use its result as the value of the column.

Here’s the function.

func instanceQualifiedAccountUrl(ctx context.Context, input *transform.TransformData) (interface{}, error) {
	url := input.Value.(*mastodon.Status).Account.URL
	qualifiedUrl := qualifiedUrl(ctx, url)
	return qualifiedUrl, nil
}

It delegates the real work to another function.

func qualifiedUrl(ctx context.Context, url string) string {
	plugin.Logger(ctx).Debug("instanceQualifiedUrl", "server", homeServer, "url", url)
	re := regexp.MustCompile(`https://([^/]+)/@(.+)`)
	matches := re.FindStringSubmatch(url)
	if len(matches) == 0 {
		return url
	}
	person := matches[1]
	server := matches[2]
	qualifiedUrl := fmt.Sprintf("%s/@%s@%s", homeServer, server, person)
	plugin.Logger(ctx).Debug("instanceQualifiedUrl", "qualifiedUrl", qualifiedUrl)
	schemelessHomeServer := strings.ReplaceAll(homeServer, "https://", "")
	qualifiedUrl = strings.ReplaceAll(qualifiedUrl, "@"+schemelessHomeServer, "")
	plugin.Logger(ctx).Debug("qualifiedUrl", "qualifiedUrl", qualifiedUrl)
	return qualifiedUrl
}

Why? Two different sets of column definitions need the same transformation. instanceQualifiedAccountUrl works with responses from the Account API. But account URLs also appear in the Status API that drives timeline views. Those use a different transform function, instanceQualifiedStatusUrl, to do the same transformation for a different API response.

From account URLs to status URLs

The instanceQualifiedAccountUrl column solved the original problem. I was able to remove my plugin-author hat, put on my dashboard-author hat, and refer to account urls as instance-qualified URLs in all the tabs that display them. Any such link now leads to a profile that I view through the lens of mastodon.social and that enables me to use the web app’s list manager directly, without the cumbersome copy/paste/search procedure.

My happy dance didn’t last long, though. Newly sensitized to that copy/paste/search friction, I realized it was still happening when I try to reply to items that appear in a timeline view. Here is a recent example: https://techpolicy.social/@mnot/109610641523489182.

That’s the URL displayed in the dashboard. When I click it I land on Mark’s server and can view the item, but if I try to reply I’m confronted with the dreaded copy/paste/search operation.

No problem! I’ll use a similar transform! Not so fast. I can form an URL like https://mastodon.social/@mnot@techpolicy.social/109610641523489182 but it doesn’t go anywhere.

If I do the copy/paste/search operation, I land on a similar-but-different URL: https://mastodon.social/@mnot@techpolicy.social/109610641692667630. It has the same structure but a different toot ID. This URL is also the one that appears in the web app’s home timeline, which is why I can reply directly from that view.

I’m out of my depth here so I’ll just end with an appeal for help. It makes sense that a home server will assign its own id to an item fetched from a foreign server, and that the web app will use that id. But I’m not seeing a way to aquire that id directly from the API. I suspect it’s possible to acquire it by way of search, but doing that for every item in a timeline will quickly exhaust the tight budget for API requests (just 300 every 5 minutes).

So, Lazy Mastodon, am I just stuck here or is there a way to transform foreign status URLs into instance-relative status URLs?

Update: Solved!

After chatting with Jari Pennanen I took another look and realized the needed ID was available in the API response after all, I just wasn’t using it (facepalm). And in fact there are two flavors of the ID — one for original toots, another for boosts. Columns for both cases are added here and the tweak to make the dashboard use them here.

Here is the result.

Thanks for being my rubber duck, Jari! The instance-qualified toot and reblog URLs make this dashboard massively more useful.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

How many people in my Mastodon feed also tweeted today?

I lasted tweeted on Dec 22. (It was, unsurprisingly, a link to a blog post about Mastodon.) Today I wondered what percentage of the people who appear in my Mastodon timeline today also appeared on Twitter today.

To start, I wrote this query which tries to match Twitter and Mastodon usernames. When it finds a match, it reports the day on which that person last tweeted.

with mastodon as (
  select
    substring(username from 1 for 15) as username, -- twitter names are max 15 chars
    'from:' || substring(username from 1 for 15) as query -- we will query twitter using, e.g., 'from:judell'
  from
    mastodon_toot
  where
    timeline = 'home'
  limit
    500
)
select
  m.username as mastodon_person,
  t.author->>'username' as twitter_person,
  max(to_char(t.created_at, 'YYYY-MM-DD')) as last_tweet_day
from 
  mastodon m
left join
  twitter_search_recent t -- see https://hub.steampipe.io/plugins/turbot/twitter/tables/twitter_search_recent
on
  t.query = m.query
group by
  mastodon_person,
  twitter_person
order by
  last_tweet_day desc

This is my favorite kind of Steampipe query: two different APIs, each represented as a Postgres table, combined with a SQL JOIN.

The result looks like this, with nulls for failed matches.

+-----------------+-----------------+----------------+
| mastodon_person | twitter_person  | last_tweet_day |
+-----------------+-----------------+----------------+
| AlanSill        | null            | null           |
| Colarusso       | null            | null           |
| ...                                                |
| williamgunn     | null            | null           |
| xian            | null            | null           |
| ...                                                |
| futurebird      | futurebird      | 2022-12-29     |
| glynmoody       | glynmoody       | 2022-12-29     |
| ...                                                |
| khinsen         | khinsen         | 2022-12-23     |
| blaine          | blaine          | 2022-12-23     |
+-----------------+-----------------+----------------+

Next I created a table from the above query.

create table public.mastodon_twitter as 
  -- sql as above

And then ran this query.

select
  last_tweet_day,
  count(*)
from
  mastodon_twitter
where
  last_tweet_day is not null
group by
  last_tweet_day
order by
  last_tweet_day desc

Here’s the result.

+----------------+-------+
| last_tweet_day | count |
+----------------+-------+
| 2022-12-29     | 36    |
| 2022-12-28     | 6     |
| 2022-12-27     | 1     |
| 2022-12-26     | 1     |
| 2022-12-25     | 2     |
| 2022-12-23     | 2     |
+----------------+-------+

The 500 toots represented here were created by 93 people who tooted today.

select count(*) from mastodon_twitter

+-------+
| count |
+-------+
| 93    |
+-------+

Of those 93 people, 48 have matching usernames.

select count(*) from mastodon_twitter where last_tweet_day is not null

+-------+
| count |
+-------+
| 48    |
+-------+

Of the 48 with matching usernames, 36 also tweeted today.

So there’s my answer: 75% of the people who appeared in my Mastodon home timeline (when I sampled it just now) also appeared on Twitter today.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

My belated introduction to Super Mario Brothers

On a hike today my friend noticed a clump of Amanita Muscaria and took away this glorious specimen. As we headed down the trail he said: “I can never see one of these without hearing the music from Super Mario Brothers — beep, boop.” He went on to explain that the spotted red mushroom was a coveted power-up that made Mario grow large.

I knew nothing about this. I have never played Super Mario Brothers, or any other iconic game, or really any computer-based game at all, except briefly in the early 1990s when I played Rogue during an office craze that lasted a few weeks, and around the same time played Putt-Putt with our young kids.

This seems unusual for someone like me. I have spent my adult life deeply engrossed in computer-based activity, and am often described to others, by friends and family, as a “computer guy.” That makes me a geek and/or nerd by definition. But I’ve never been comfortable with either of those terms.

Over the years I’ve known a great many others in the geek/nerd tribe who self-identify as gamers. It always puzzles me. I’m perfectly happy to spend long hours in front of a screen reading or writing or coding. But games? I simply lack that gene. It baffles me how other members of the tribe — especially adults who like me already put in long hours in front of screens — could then pile up more screen time playing games, when they could be outside hiking, or cycling, or gardening, or doing really anything else at all.

Perhaps this explains my reluctance to self-identify as a member of the geek/nerd tribe. If I really belonged I’d be gamer. Since I’m not I must be an impostor. I wonder if there’s a word — in some other language than English probably — for when you share certain core attributes with a tribe but not others?

Lists and people on Mastodon

I hadn’t thought to use Mastodon lists until I read the Frustration with lists chapter of Martin Fowler’s Exploring Mastodon, in which he writes:

I like lists because they allow me to divide up my timeline to topics I want to read about at different times. They are frustrating because the tools to manage them in Twitter are very limited, so it’s more hassle to set up the kind of environment I’d like. Mastodon also has lists, sadly its current management tools are equally bad.

This seemed like a good challenge for Steampipe. To tackle it, I first needed to add some new tables to the plugin to encapsulate the list APIs: mastodon_list and mastodon_list_account. I’ll save that story for another time. Here I’ll just show that together they enable queries like this.

select
  l.title as list,
  array_agg(a.username order by a.username) as people
from
  mastodon_list l
join
  mastodon_list_account a
on
  l.id = a.list_id
group by
  l.title
+--------------+--------------------------------------+
| list         | people                               |               
+--------------+--------------------------------------+
| Academic     | ____, ______, ____, ___              |
| Education    | ___, ______  ___, ______             |
| Energy       | ___, ______, ____ __                 |
| Fediverse    | ____ __,                             |
| Humor        | ____, ____ __, ____ __               |
| Journalism   | ___ __, ___ ____,  ___, ______       |
| Library      | __                                   |
| Net          | ___ __, _____, ___ __, __ __, ____   |
| Science      | __, ____ __, ______                  |
| Software     | ____ __, ______, ____ __             |
+--------------+--------------------------------------+

That’s a useful view, and I’ve now included it, but it didn’t address Martin’s specific need.

To manage these lists, I really need a display that shows every account that I follow in a table with its lists. That way I can easily see which list each account is on, and spot any accounts that aren’t on a list.

For that I needed to add a list column to the Following tab.

This was the original query.

select
  url,
  case when display_name = '' then username else display_name end as person,
  to_char(created_at, 'YYYY-MM-DD') as since,
  followers_count as followers,
  following_count as following,
  statuses_count as toots,
  note
from
  mastodon_following
order by
  person

The new version captures the above join of mastodon_list and mastodon_list_account, and joins that to the mastodon_following (people I follow) table. It’s a left join which means I’ll always get all the people I follow. If you’re not on a list, your list column will be null.

with data as (
  select
    l.title as list,
    a.*
  from
    mastodon_list l
  join
    mastodon_list_account a
  on
    l.id = a.list_id
),
combined as (
  select
    d.list,
    f.url,
    case when f.display_name = '' then f.username else f.display_name end as person,
    to_char(f.created_at, 'YYYY-MM-DD') as since,
    f.followers_count as followers,
    f.following_count as following,
    f.statuses_count as toots,
    f.note
  from
    mastodon_following f
  left join
    data d
  on
    f.id = d.id
)
select
  *
from
  combined
order by
  person

That query drives the new version of the Following tab.

It’s pretty sparse, I’ve only just begun adding people to lists. And honestly I’m not sure I’ll want to keep doing this curation, it’s the kind of thing that can become a burden, I need to play around some more before I commit. Meanwhile, the default sort puts unlisted people first so they’re easy to find.

To provide a better way to find people who are on lists, I expanded the List tab in a couple of ways. It had included a dropdown of lists by which to filter the home timeline. Now that dropdown has counts of people on each list.

input "list" {
  type = "select"
  width = 2
  sql = <<EOQ
    with list_account as (
      select
        l.title
      from
        mastodon_list l
      join
          mastodon_list_account a
      on
        l.id = a.list_id
    ),
    counted as (
      select
        title,
        count(*)
      from
        list_account
      group by
        title
      order by
        title
    )
    select
      title || ' (' || count || ')' as label,
      title as value
    from
      counted
    order by
      title
  EOQ
}

I also used this query to expand the List tab.

select
  l.title as list,
  array_to_string( array_agg( lower(a.username) order by lower(a.username)), ', ') as people
from
  mastodon_list l
join
  mastodon_list_account a
on
  l.id = a.list_id
group by
  l.title

The result is the list / people table on the right.

I know that some won’t cotton to this SQL-forward programming model. But for others who will, I wanted to show a few detailed examples to give you a sense of what’s possible at the intersection of Mastodon and Steampipe.

If you’re not tuned into SQL (like I wasn’t for a very long time), here’s your takeaway: as SQL goes, this stuff is not too scary. Yes there are joins, yes there’s an array_agg that transposes a column into a list. It’s not beginner SQL. But lots of people know how to use join and array_agg in these ways, lots more could easily learn how, and with SQL ascendant nowadays these are skills worth having.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Create your own Mastodon UX

I’ve been discussing Mastodon UX wishlists with some new acquaintances there. This excerpt from A Bloomberg terminal for Mastodon concludes with part of my own wishlist.

In a Mastodon timeline, a chatty person can dominate what you see at a glance. When we participate in social media we are always making bids for one another’s attention. As publishers of feeds it’s wise to consider how a flurry of items can overwhelm a reader’s experience. But it’s also useful to consider ways that feed readers can filter a chatty source. Steampipe’s SQL foundation affords an easy and natural way to do that. Here’s part of the query that drives the list view.

select distinct on (list, person, hour) -- only one per list/user/hour
  person,
  url,
  hour,
  toot
from
  data
order by
  hour desc, list, person

It was easy to implement a rule that limits each person to at most one toot per hour. Next steps here will be to apply this rule to other views, show the number of collapsed toots, and enable such rules on a per-person basis.

As a warmup exercise, I decided to first add a simple control for boosts that enables me to see my home timeline with or without boosts. To give technically-inclined readers a sense of what’s involved in doing this kind of thing with Steampipe, I’ll describe the changes here. I’m obviously biased but I find this programming environment to be accessible and productive. If it seems that way to you as well, you might want to try out some of the items on your own UX wishlist. And if you do, let me know how it goes!

Here are the original versions of the two files that I changed to add the new feature. First there’s home.sp which defines the dashboard for the home timeline.

dashboard "Home" {
  
  tags = {
    service = "Mastodon"
  }

  container {
    // a text widget with the HTML links that define the menu of dashboards
  }

  container {
    text {
    // a block that displays the HTML links that form a menu of dashboards
    }

    card {
    // a block that reports the name of my server
    }

    input "limit" {
      width = 2
      title = "limit"
      sql = <<EOQ
        with limits(label) as (
          values 
            ( '50' ),
            ( '100' ),
            ( '200' ),
            ( '500' )
        )
        select
          label,
          label::int as value
        from 
          limits
      EOQ
    }    
  }


  container { 

    table {
      title = "home: recent toots"
      query = query.timeline
      args = [ "home", self.input.limit ]
      column "person" {
        wrap = "all"
      }
      column "toot" {
        wrap = "all"
      }
      column "url" {
        wrap = "all"
      }
    }
  }

}

And here’s the new version. It adds an input block called boosts, and passes its value to the referenced query.

dashboard "Home" {
  
  tags = {
    service = "Mastodon"
  }

  container {
    // a text widget with the HTML links that define the menu of dashboards
  }

  container {
    text {
    // a block that displays the HTML links that form a menu of dashboards
    }

    card {
    // a block that reports the name of my server
    }

    input "limit" {
    // as above
    }

    input "boosts" {
      width = 2
      title = "boosts"
      sql = <<EOQ
        with boosts(label, value) as (
          values
            ( 'include', 'include' ),
            ( 'hide', ' ' ),
            ( 'only', ' 🢁 ' )
        )
        select
          label,
          value
        from
          boosts
      EOQ
    }

  }

  container { 

    table {
      // as above
      args = [ "home", self.input.limit, self.input.boosts ]
    }
  }

}

Steampipe dashboards are built with two languages. HCL (Hashicorp configuration language) defines the UX widgets, and SQL fills them with data. In this case we’re selecting static values for the boosts input. But any Steampipe query can run there! For example, here is the input block I use on the dashboard that filters the timeline by the list to which I’ve assigned people.

input "list" {
  type = "select"
  width = 2
  title = "search home timeline"
  sql = <<EOQ
    select
      title as label,
      title as value
    from
      mastodon_list
    order by
      title
  EOQ
}

Now here is the referenced query, query.timeline, from the file query.sp which contains queries used by all the dashboards.

query "timeline" {
  sql = <<EOQ
    with toots as (
      select
        account_url as account,
        case 
          when display_name = '' then user_name 
          else display_name
        end as person,
        case
          when reblog -> 'url' is null then
            content
          else
            reblog_content
        end as toot,
        to_char(created_at, 'MM-DD HH24:MI') as created_at,
        case
          when reblog -> 'url' is not null then '🢁'
          else ''
        end as boosted,
        case
          when in_reply_to_account_id is not null then ' 🢂 ' || ( select acct from mastodon_account where id = in_reply_to_account_id )
          else ''
        end as in_reply_to,
        case
          when reblog -> 'url' is not null then reblog ->> 'url'
          else url
        end as url
      from
        mastodon_toot
      where
        timeline = $1
      limit $2
    )
    select
      account,
      person || 
        case 
          when in_reply_to is null then ''
          else in_reply_to
        end as person,
      boosted || ' ' || toot as toot,
      url
    from
      toots
    order by
      created_at desc
  EOQ
  param "timeline" {}
  param "limit" {}
}

And here is the new version of that query.

query "timeline" {
  sql = <<EOQ
    with toots as (
    // as above      
    ),
    boosted as (
      select
        $3 as boost,
        boosted,
        account,
        in_reply_to,
        person,
        toot,
        url
      from
        toots
      order by
        created_at desc
    )
    select
      account,
      person ||
        case
          when in_reply_to is null then ''
          else in_reply_to
        end as person,
      boosted || ' ' || toot as toot,
      url
    from
      boosted
    where
      boost = boosted
      or boost = 'include'
      or boost = 'n/a'
  EOQ
  param "timeline" {}
  param "limit" {}
  param "boost" {}
}

The original version uses a single CTE (aka common table expression aka WITH clause), toots, to marshall data for the concluding SELECT. The new version inserts another CTE, boosts, into the pipeline. It uses $3 to reference param "boost" {} which maps to the self.input.boosts passed from home.sp

The SQL code is all standard. Postgres is the engine inside Steampipe, and I sometimes use Postgres-specific idioms, but I don’t think any of those are happening here.

The HCL code may be unfamiliar. Steampipe uses HCL because its core audience are DevSecOps pros who are familiar with Terraform, which is HCL-based. But its a pretty simple language that can be used to describe all kinds of resources. Here the resources are widgets that appear on dashboards.

The other thing to know, if you want to roll up your sleeves and try building your own dashboards, is that the developer experience is — again in my biased opinion! — pretty great because if you’re using an autosaving editor you’ll see your changes (to both HCL and SQL code) reflected in realtime.

To illustrate that, here’s the screencast we included in our blog post introducing the dashboard system.

Not shown there, because we wanted to focus on the happy path, is realtime feedback when your SQL queries provoke Postgres errors. The experience feels very much like the one Bret Victor champions in Inventing on Principle. The core principle: “Creators need an immediate connection to what they’re creating.”

Here’s the wrong way that too often constrains us.

If there’s anything wrong with the scene, or if I go and make changes, or if I have further ideas, I have to go back to the code, and I edit the code, compile and run, see what it looks like. Anything wrong, I go back to the code. Most of my time is spent working in the code, working in a text editor blindly, without an immediate connection to this thing, which is what I’m actually trying to make.

And here is the right way.

I’ve got this picture on the side, and the code on the side, and this part draws the sky and this draws the mountains and this draws the tree, and when I make any change to the code, the picture changes immediately. So the code and the picture are always in sync; there is no compile and run. I just change things in the code and I see things change in the picture.

A screenshot of live editing from Bret Victor's Inventing on Principle

We want to work the right way wherever we can. The experience isn’t available everywhere, yet, but it is available in Steampipe where it powerfully enables the experimentation and prototyping that many of us are inspired to do as we delve into Mastodon.

If you want to try this for yourself, please check out the setup instructions for the plugin that maps Mastodon APIs to Postgres tables, and the dashboards that use those tables, and ping me (on Mastodon if you like!) with any questions you may have.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

A Bloomberg terminal for Mastodon

As I mentioned last time, the Steampipe dashboard for Mastodon has evolved in unexpected ways. I imagined that the components — a plugin that maps Mastodon APIs to Postgres foreign tables, and a suite of views that query the APIs — would combine to enable a broad overview of activity in the Fediverse. That didn’t pan out for two reasons.

First, I learned that the Mastodon community didn’t appreciate the kind of surveillance required for such analysis. That was the original community, I should stress, and things have changed dramatically, but I want to respect the original ethos. Plenty of people will, nevertheless, crawl and index the Fediverse, but I don’t need to put my shoulder to that wheel. And if I did I’d be pushing Steampipe out of its sweet spot: realtime acquisition, querying, and visualization of API-sourced data.

Second, Mastodon’s API allows 300 requests every 5 minutes. You can use Steampipe in batch mode to defeat that limit, and you can store data permanently in its Postgres database, but that cuts across the grain with respect to both Steampipe and Mastodon. All Mastodon clients are subject to the same API rate limit. If you use the web app, or one of the phone apps, you will likely never have seen a message announcing that you’ve hit the limit and need to wait a few minutes. I never saw that message until I started querying the API with Steampipe while also using the web app.

So if Mastodon culture and tech resist deep data mining, and the system is optimized for clients that live within an API budget of 300 requests every 5 minutes, what kind of Mastodon client could Steampipe enable? It wouldn’t be a conventional client because Steampipe is a read-only system. The path forward would be some kind of reader, or browser, that augments the interactive apps.

The outcome, so far, is a suite of dashboards that display tabular views (along with some charts) of the home, local, and federated timelines, of my toot history and my favorites, of my follows and followers, of my notifications, of searches for terms, people, and hashtags, and of the timelines formed by the lists to which I’ve assigned people I follow. These are all HTML tables rendered by Steampipe’s dashboard server. The columns are all sortable, and the cells of the tables can contain only links or plain text.

Given that the toot content returned from the Mastodon API is HTML, the plain-text-only constraint felt, initially, like a blocker. No images? No links in toot content? What good is that?

Some constraints are worth embracing, though, and that may prove true here. The views created this way put a lot of information onto the screen. Here’s my default view in the stock client.

At a glance I can see three items on the home timeline, and if I want to scroll through 100 items I can only do so awkwardly in small gulps.

Here’s my home timeline in the Steampipe dashboard. I can see a dozen items at a glance, and can easily scan 100 items in gulps of that size.

When I described this effect to Greg Wilson he gave me the title for this post: “That sounds like the Bloomberg terminal for Mastodon.” I’ve never used one, and I’m aware that its design is often derided as a UX disaster, but as I understand it the product is built to enable traders to scan fast-moving data feeds from many different sources. In that sense I do think it’s an interesting and useful comparison.

The underlying principle is one I’ve learned from Edward Tufte: present information at maximum density. Our brains are built to take in a lot of information at a glance, and if it’s organized well we can do that very effectively. It feels like that’s happening for me when I scan these densely-packed views of Mastodon activity.

To enhance the effect, I’ve begun to apply filters. In a Mastodon timeline, for example, a chatty person can dominate what you see at a glance. When we participate in social media we are always making bids for one another’s attention. As publishers of feeds it’s wise to consider how a flurry of items can overwhelm a reader’s experience. But it’s also useful to consider ways that feed readers can filter a chatty source. Steampipe’s SQL foundation affords an easy and natural way to do that. Here’s part of the query that drives the list view.

select distinct on (list, user_name, person, hour) -- only one per list/user/hour
  person,
  url,
  hour,
  toot
from
  data
order by
  hour desc, list, person

It was easy to implement a rule that limits each person to at most one toot per hour. Next steps here will be to apply this rule to other views, show the number of collapsed toots, and enable such rules on a per-person basis.

There are always links into the Mastodon web app, and I follow them when I want to view images, boost someone, or reply to someone. The dashboards help me scan a lot of Mastodon activity quickly, and decide which items I want to interact with. Your 500-character toot is all you’ve got to grab my attention, and I’ll only see it as an unformatted chunk of plain text. That’s a pretty severe constraint, and not everyone will want to embrace it, but it’s working pretty well for me so far.

I expect that our dashboard system will support formatted text and images in cells of HTML tables. When it does I’d like to make it an option you can turn on or off in Mastodon dashboards. What should the default be? I suspect I’ll want plain text and no images, especially if image captions can appear along with the text of toots. Some of the original Mastodon cultural norms aren’t surviving the onslaught of new people, but writing descriptions of images is one that’s held up so far, and it’s a wonderful thing. So write a short thoughtful post, write a caption for your image if you include one, and if you capture my attention I’ll click through to view and interact.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Browsing the Fediverse

A month ago, when the Great Discontinuity happened, I started working on a Steampipe plugin to enable SQL queries against the Mastodon API, along with a companion Steampipe “mod” (suite of dashboards) to display and chart the results of those queries.

I expect these dashboards will soon be available in Steampipe Cloud, where it will take just a few seconds to pop in your Mastodon access token (from, e.g., https://mastodon.social/settings/applications/new) and begin using the dashboards.

Meanwhile, if you’re so inclined, you can find the plugin here and the dashboards here. If you’re reasonably technical you can pretty quickly and easily install Steampipe, clone these repos, build the plugin, and start using the dashboards.

Why would you want to? My own motivation, originally, was to do Mastodon analytics. I thought Steampipe’s SQLification of the API would be a handy way to discern and monitor activity trends during a period of extraordinary flux. And that’s proven to be true, to a limited extent. Here’s a snapshot of the dashboard that uses the instance activity API.

I’m watching this chart with great interest. Where does it go from here? I’m not going to hazard a guess. Everything’s up in the air right now, and anything could happen.

But as I added tables to the plugin to encapsulate more of the Mastodon API, and added dashboards to visualize those tables, my focus shifted. I began to see the suite of dashboards as a Mastodon reader/browser that complements the web and phone clients, and that’s how I mainly use them now.

I think the key benefit is one of Edward Tufte’s core principles: information density. Each of these dashboards shows more activity than you can see at a glance in the web or phone interfaces. I find this very helpful for searching and browsing. When I see items of interest that I want to interact with, I click through to the web app in order to boost, reply, or favorite.

Will this way of browsing Mastodon appeal to you? To get a feel for what it’s like, here are snapshots of some of the dashboards I’ve built so far.

dashboard.Favorites

dashboard.Following

dashboard.Home

dashboard.List

dashboard.Me

dashboard.Notification

dashboard.PeopleSearch

dashboard.StatusSearch

dashboard.TagSearch

For me, at least, this approach has become an effective way to browse the fediverse, find interesting people, read what they boost, and keep track of my own activity.

If you are dev-minded, by the way, please note that these dashboards are just one way to skin the results of queries against the plugin. Any SQL client can connect to Steampipe’s Postgres endpoint. You could use dashboards like Metabase or Grafana, or you could embed Steampipe as a component in an app.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

When your database is an HTTP client

Here are three things I once hoped — but no longer expect — to outlive:

1. PDF files

2. passwords

3. The occasional need to scrape data from web pages

PDF files and passwords are topics for another day, but web scraping is timely. Today I was asked to corral data from the Steampipe blog, and the path of least resistance was (of course!) to extract it from the site.

I was all warmed up for the exercise because we’d just published a post dramatically entitled “Why build an HTTP client into a database? So you can ingest web data directly!” In that post I show three solutions enabled by the Net plugin’s net_http_request table.

Since the dawn of the web, scraping has worked this way: Use a script to fetch the data, then save it for retrieval and analysis. You might use the script language to query the data, or the query language of a database.

A couple of years ago I found a way to unify those ingredients: Run the script inside the database. You can do a lot with Postgres’ built-in procedural language, and even more if you activate Python inside Postgres. I went deeply into both and explained why in an earlier episode.

PL/Python was great for advanced uses, and I used it for a bit of web scraping too. It all worked fine, and I never thought to ask this question: “What if the database is the HTTP client, and SQL the common way to reason over data coming through that pipe?”

The examples in the post show what that’s like. In its simplest form you write a query like this.

select
    response_status_code,
    jsonb_pretty(response_headers) as headers
  from
    net_http_request
  where
    url = 'https://steampipe.io'

The result is like this.

+----------------------+-------------------------------------------------------+
| response_status_code | headers                                               |
+----------------------+-------------------------------------------------------+
| 200                  | {                                                     |
|                      |     "Age": [                                          |
|                      |         "45557"                                       |
|                      |     ],                                                |
|                      |     "Date": [                                         |
|                      |         "Fri, 09 Dec 2022 06:46:40 GMT"               |
|                      |     ],                                                |
|                      |     "Etag": [                                         |
|                      |         "W/\"614a142998557b388e053bfa4408cf70\""      |

The response_status_code is a regular Postgres column, the headers column is a JSONB column that you can index into (e.g. headers ->> 'etag'). If you also select the response_body column of the net_http_request table you’ll get another regular Postgres column containing the text of the web page. If it’s HTML text, you can use regular expressions to match patterns in it. If it’s JSON text you can use Postgres’ JSON functions to query and transform it.

You can join fields nested in the JSON with other Postgres columns. And those other columns can belong to tables populated in any of the ways Steampipe plugins populate tables: from JSON or GraphQL API endpoints, from CSV or Terraform or Yaml files, from anywhere really. As a developer writing Steampipe queries (and flowing results into dashboards) you see all of these sources as tables, you query them individually in all the ways Postgres can, and you join across diverse sources in a common way.

Of course web pages are structured in ways that regular expressions can’t easily grok. It’s easy to match links, but parsing HTML tag names and attributes is a job for a real parser. I’d made a start on an HTML plugin for Steampipe. There were already two tables: one to extract links from a web page, one to transform HTML tables to CSV format. So today, when tasked with tabulating blog metadata, I added a third table to enable these queries.

-- find the title

select
  page,
  tag_name,
  tag_content
from
  html_tag
where
  page = 'https://steampipe.io/blog/selective-select'
  and tag_name = 'title'

-- list the meta tags

select
  page,
  tag_name,
  tag_attrs
from
  html_tag
where
  page = 'https://steampipe.io/blog/selective-select'
  and tag_name = 'meta'

That’s a powerful way to reason over HTML data! It was easy for me to extend the HTML plugin in this way, and I assure you that I’m no 10x programmer. The Steampipe plugin SDK and the wonderful goquery package are doing all the heavy lifting. I just had to stitch the components together, and if you’re any kind of programmer, with or without Go experience, you could pretty readily do the same.

Mastodon, Steampipe, and RSS

I was determined to write my Mastodon #introduction today. To get started I used the tag search in the dashboard I’m building.

The idea was to look at a bunch of other #introduction posts to get a feel for how mine should go. When you search specifically for hashtags, the Mastodon search API returns this information.

"hashtags": [
    {
      "name": "introduction",
      "url": "https://mastodon.social/tags/introduction",
      "history": [
        {
          "day": "1574553600",
          "uses": "10",
          "accounts": "9"
        },
        // ...
      ]
    },

A first version of the dashboard, having only this data to work with, just listed the names of tags matching the search term along with corresponding URLs. Here was the initial query.

select 
  name,
  url
from 
  mastodon_search_hashtag 
where 
  query = 'introduction'

That produced a list of links, like https://mastodon.social/tags/introduction, to home pages for variants of the tag. These are useful links! Each goes to a page where you can see who is posting to the tag.

To make this view slightly more useful, I tapped the third element of the API response, history, in a revised query.

with data as (
  select 
    name,
    url,
    ( jsonb_array_elements(history) ->> 'uses' )::int as uses
  from 
    mastodon_search_hashtag 
  where 
    query = 'introduction'
)
select
  name,
  url,
  sum(uses)
from
  data
group by
  name, url
order by
  sum desc

These results help me decide which variant to use.

+-------------------+---------------------------------------------------+------+
| name              | url                                               | sum  |
+-------------------+---------------------------------------------------+------+
| introduction      | https://mastodon.social/tags/introduction         | 1816 |
| introductions     | https://mastodon.social/tags/introductions        | 218  |
| introductionpost  | https://mastodon.social/tags/introductionpost     | 19   |
| introductionfr    | https://mastodon.social/tags/introductionfr       | 6    |

But I still need to visit each link’s page to explore how it’s being used. It would be nice to surface more context in the dashboard, and I found a nifty way to do it, but first let’s dwell on the revised query for a minute. Postgres’ JSON features are powerful and it’s often a challenge (at least for me) to visualize how they work.

The Postgres jsonb_array_elements() function is what’s called a set-returning function. Here it unpacks Postgres’ JSON representation of the list of history structures returned from the Mastodon API. In its simplest form, the function call jsonb_array_elements(history) produces a temporary table with per-tag, per-day data.

select
  name,
  jsonb_array_elements(history) as history
from
  mastodon_search_hashtag 
where 
  query = 'introduction'
+--------------------------------+----------------------------------------------------+
| name                           | history                                            |
+--------------------------------+----------------------------------------------------+
| introduction                   | {"accounts":"16","day":"1670371200","uses":"19"}   |
| introduction                   | {"accounts":"250","day":"1670284800","uses":"269"} |
| introduction                   | {"accounts":"259","day":"1670198400","uses":"274"} |
| introduction                   | {"accounts":"253","day":"1670112000","uses":"270"} |
| introduction                   | {"accounts":"245","day":"1670025600","uses":"269"} |
| introduction                   | {"accounts":"345","day":"1669939200","uses":"383"} |
| introduction                   | {"accounts":"307","day":"1669852800","uses":"339"} |
| introductionsfr                | {"accounts":"0","day":"1670371200","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670284800","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670198400","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670112000","uses":"0"}     |
| introductionsfr                | {"accounts":"0","day":"1670025600","uses":"0"}     |

history is a JSONB column that holds an object with three fields. The revised query uses Postgres’ JSON indexing operator ->> to reach into that object and hoist the number of daily uses into its own column, so it can be the target of a SQL SUM function.

OK, ready for the nifty solution? Recall that https://mastodon.social/tags/introduction is the home page for that variant of the tag. There you can see introduction posts from people using the tag. Those posts typically include other tags. In the dashboard shown above you can see that Kathy Nickels is using these: #Music #Art #Equestrian #Nature #Animals. The tags appear in her introduction post.

I didn’t immediately see how to capture them for use in the dashboard. Then I remembered that certain classes of Mastodon page have corresponding RSS feeds, and wondered if the tag pages are members of one such class. Sure enough they are, and https://mastodon.social/tags/introduction.rss is a thing. That link, formed by tacking .rss onto the base URL, provides the extra context I was looking for. Here’s the final version of the query.

with data as (
  select 
    name,
    url,
    ( jsonb_array_elements(history) ->> 'uses' )::int as uses
  from 
    mastodon_search_hashtag 
  where 
    query = 'introduction'
  ),
  uses as (
    select 
      name,
      url || '.rss' as feed_link,
      sum(uses) as recent_uses
    from 
      data 
    group 
      by connection, name, url
  )
  select
    u.name,
    r.guid as link,
    to_char(r.published, 'YYYY-MM-DD') as published,
    r.categories
  from
    uses u
  join
    rss_item r
  on 
    r.feed_link = u.feed_link
  where
    recent_uses > 1
  order by 
    recent_uses desc, published desc
)

The new ingredients, courtesy of the RSS feed, are: guid which links to an individual introduction like Kathy’s, published which is the day the introduction appeared, and categories which has the tags used in the introduction post. Sweet! Now I can scan the dashboard to get a sense of which introductions I want to check out.

The first three queries use the Steampipe plugin for Mastodon, and in particular its mastodon_search_hashtag table which encapsulates the Mastodon API for searching tags. The final version joins that table with the rss_item table provided by the RSS plugin, using the common base URL as the basis of the join.

This delights me in so many ways. When the blogosphere first emerged in the early 2000s, some of us discovered that the RSS protocol was capable of far more than just delivering feeds to RSS readers. The other new hot protocol in that era was XML web services. As an InfoWorld analyst I was supposed to be cheering the latter as an enterprise-grade technology, but I couldn’t help noticing that RSS kept turning out to be a great way to move data between cooperating systems. That’s always been true, and I love how this example reminds us that it’s still true.

I’m equally delighted to show how Steampipe enables this modern exercise in RSS-powered integration. Steampipe was, originally, an engine for mapping results from JSON API endpoints to SQL tables. Over time, though, it has broadened its view of what constitutes an API. You can use Steampipe to query CSV files, or Terraform files, or — as we see here — RSS feeds. Data comes in all kinds of flavors. Steampipe abstracts those differences and brings all the flavors into a common space where you can reason over them using SQL.

And finally, it’s just wonderful to be at the intersection of Mastodon, Steampipe, and RSS in this remarkable moment. I’ll readily admit that nostalgia is a factor. But RSS did bust things wide open twenty years ago, Mastodon’s doing that now, and I love that RSS can help it happen again.

Now I need to write that #introduction!


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Autonomy, packet size, friction, fanout, and velocity

Nostalgia is a dangerous drug and it’s always risky to wallow in it. So those of us who fondly remember the early blogosphere, and now want to draw parallels to the fediverse, should do so carefully. But we do want to learn from history.

Here’s one way to compare five generations of social software along the five dimensions named in the title of this post.

              Autonomy    Packet Size   Friction    Fanout    Velocity
 
Usenet        medium      high          medium      medium    low

Blogosphere   high        high          high        low       low

Facebook      low         high          low         medium    high

Twitter       low         low           low         high      high

Fediverse     high        medium        high        medium    medium

These are squishy categories, but I think they surface key distinctions. Many of us who were active in the blogosphere of the early 2000s enjoyed a high level of autonomy. Our RSS readers were our Internet dashboards. We loaded them with a curated mix of official and individual voices. There were no limits on the size of packets exchanged in this network. You could write one short paragraph or a 10,000-word essay. Networking wasn’t frictionless because blog posts did mostly feel like essays, and because comments didn’t yet exist. To comment on my blog post you’d write your own blog post linking to it.

That friction limited the degree to which a post would fan out through the network, and the velocity of its propagation. The architecture of high friction, low fanout, and low velocity was a winning combination for a while. In that environment I felt connected but not over-connected, informed but not overloaded.

Twitter flipped things around completely. It wasn’t just the loss of autonomy as ads and algos took over. With packets capped at 120 characters, and tweets potentially seen immediately by everyone, friction went nearly to zero. The architecture of low friction created an addictive experience and enabled powerful effects. But it wasn’t conducive to healthy discourse.

The fediverse can, perhaps, strike a balance. Humans didn’t evolve to thrive in frictionless social networks with high fanout and velocity, and arguably we shouldn’t. We did evolve in networks governed by Dunbar’s number, and our online networks should respect that limit. We need less friction within communities of knowledge and practice, more friction between them. We want messages to fan out pervasively and rapidly within communities, but less so between them.

We’re at an extraordinary inflection point right now. Will the fediverse enable us to strike the right balance? I think it has the right architectural ingredients to land where I’ve (speculatively) placed it in that table. High autonomy. As little friction as necessary, but not too little. As much fanout and velocity as necessary, but not too much. Nobody knows how things will turn out, predictions are futile, behavior is emergent, but I am on the edge of my seat watching this all unfold.


1 https://blog.jonudell.net/2022/11/28/autonomy-packet-size-friction-fanout-and-velocity/
2 https://blog.jonudell.net/2022/12/06/mastodon-steampipe-and-rss/
3 https://blog.jonudell.net/2022/12/10/browsing-the-fediverse/
4 https://blog.jonudell.net/2022/12/17/a-bloomberg-terminal-for-mastodon/
5 https://blog.jonudell.net/2022/12/19/create-your-own-mastodon-ux/
6 https://blog.jonudell.net/2022/12/22/lists-and-people-on-mastodon/
7 https://blog.jonudell.net/2022/12/29/how-many-people-in-my-mastodon-feed-also-tweeted-today/
8 https://blog.jonudell.net/2022/12/31/instance-qualified-mastodon-urls/
9 https://blog.jonudell.net/2023/01/16/mastodon-relationship-graphs/
10 https://blog.jonudell.net/2023/01/21/working-with-mastodon-lists/
11 https://blog.jonudell.net/2023/01/26/images-considered-harmful-sometimes/
12 https://blog.jonudell.net/2023/02/02/mapping-the-wider-fediverse/
13 https://blog.jonudell.net/2023/02/06/protocols-apis-and-conventions/
14 https://blog.jonudell.net/2023/02/14/news-in-the-fediverse/
15 https://blog.jonudell.net/2023/02/26/mapping-people-and-tags-on-mastodon/
16 https://blog.jonudell.net/2023/03/07/visualizing-mastodon-server-moderation/
17 https://blog.jonudell.net/2023/03/14/mastodon-timelines-for-teams/
18 https://blog.jonudell.net/2023/04/03/the-mastodon-plugin-is-now-available-on-the-steampipe-hub/
19 https://blog.jonudell.net/2023/04/11/migrating-mastodon-lists/
20 https://blog.jonudell.net/2023/05/24/when-the-rubber-duck-talks-back/

Debuggable explanations

I’ve been reviewing Greg Wilson’s current book project, Software Design in Python. Like the earlier JavaScript-based Software Design by Example it’s a guided tour of tools, techniques, and components common to many software systems: testing frameworks, parsers, virtual machines, debuggers. Each chapter of each of these books shows how to build the simplest possible working version of one of these things.

Though I’ve used this stuff for most of my life, I’ve never studied it formally. How does an interpreter work? The chapter on interpreters explains the basic mechanism using a mixture of prose and code. When I read the chapter I can sort of understand what’s happening, but I’m not great at mental simulation of running code. I need to run the code in a debugger, set breakpoints, step through execution, and watch variables change. Then it sinks in.

The GitHub repo for the book includes all the text and all the code. I’d like to put them side-by-side, so that as I read the narrative I can run and debug the code that’s being described. Here’s how I’m doing that in VSCode.

This is pretty good! But it wasn’t dead simple to get there. In a clone of the repo, the steps included:

  • Find the HTML file for chapter 3.
  • Install a VSCode extension to preview HTML.
  • Find the code for chapter 3.
  • Adjust the code to not require command-line arguments.
  • Arrange the text and code in side-by-side panes.

Though it’s all doable, the activation threshold is high enough to thwart my best intention of repeating the steps for every chapter.

Whether in VSCode or another IDE or some other kind of app, what would be the best way to lower that activation threshold?

Curating the Studs Terkel archive

I can read much faster than I can listen, so I rarely listen to podcasts when I’m at home with screens on which to read. Instead I listen on long hikes when I want to shift gears, slow down, and absorb spoken words. Invariably some of those words strike me as particularly interesting, and I want to capture them. Yesterday, what stuck was these words from a 1975 Studs Terkel interview with Muhammad Ali:

Everybody’s watching me. I was rich. The world saw me, I had lawyers to fight it, I was getting credit for being a strong man. So that didn’t really mean nothing. What about, I admire the man that had to go to jail named Joe Brown or Sam Jones, who don’t nobody know who’s in the cell, you understand? Doing his time, he got no lawyers’ fees to pay. And when he get out, he won’t be praised for taking a stand. So he’s really stronger than me. I had the world watching me. I ain’t so great. I didn’t do nothing that’s so great. What about the little man don’t nobody know? He’s really the one.

I heard these words on an episode of Radio OpenSource about the Studs Terkel Radio Archive, an extraordinary compilation of (so far) about a third of his 5600 hours of interviews with nearly every notable person during the latter half of the twentieth century.

If you weren’t aware of him, the Radio OpenSource episode, entitled Studs Terkel’s Feeling Tone, is the perfect introduction. And it’s delightful to hear one great interviewer, Chris Lydon, discuss with his panel of experts the interviewing style of perhaps the greatest interviewer ever.

Because I’d heard Muhammad Ali’s words on Radio OpenSource, I could have captured them in the usual way. I always remember where I was when I heard a segment of interest. If that was 2/3 of the way through my hike, I’ll find the audio at the 2/3 mark on the timeline. I made a tool to help me capture and share a link to that segment, but it’s a clumsy solution.

What you’d really rather do is search for the words in a transcript, select surrounding context, use that selection to define an audio segment, and share a link to both text and audio. That’s exactly what I did to produce this powerful link, courtesy of WFMT’s brilliant remixer, which captures both the written words I quoted above and the spoken words synced to them.

That’s a dream come true. Thank you! It’s so wonderful that I hesitate to ask, but … WFMT, can you please make the archive downloadable? I would listen to a lot of those interviews if I could put them in my pocket and take them with me on hikes. Then I could use your remixer to help curate the archive.

GitHub for English teachers

I’ve long imagined a tool that would enable a teacher to help students learn how to write and edit. In Thoughts in motion I explored what might be possible in Federated Wiki, a writing tool that keeps version history for each paragraph. I thought it could be extended to enable the kind of didactic editing I have in mind, but never found a way forward.

In How to write a press release I tried bending Google Docs to this purpose. To narrate the process of editing a press release, I dropped a sample release into a GDoc and captured a series of edits as named versions. Then I captured the versions as screenshots and combined them with narration, so the reader of the blog post can see each edit as a color-coded diff with an explanation.

The key enabler is GDoc’s File -> Version history -> Name current version, along with File -> See version history‘s click-driven navigation of the set of diffs. It’s easy to capture a sequence of editing steps that way.

But it’s much harder to present those steps as I do in the post. That required me to make, name, and organize a set of images, then link them to chunks of narration. It’s tedious work. And if you want to build something like this for students, that’s work you shouldn’t be doing. You just want to do the edits, narrate them, and share the result.

This week I tried a different approach when editing a document written by a colleague. Again the goal was not only to produce an edited version, but also to narrate the edits in a didactic way. In this case I tried bending GitHub to my purpose. I put the original doc in a repository, made step-by-step edits in a branch, and created a pull request. We were then able to review the pull request, step through the changes, and review each as a color-coded diff with an explanation. No screenshots had to be made, named, organized, or linked to the narration. I could focus all my attention on doing and narrating the edits. Perfect!

Well, perfect for someone like me who uses GitHub every day. If that’s not you, could this technique possibly work?

In GitHub for the rest of us I argued that GitHub’s superpowers could serve everyone, not just programmers. In retrospect I felt that I’d overstated the case. GitHub was, and remains, a tool that’s deeply optimized for programmers who create and review versioned source code. Other uses are possible, but awkward.

As an experiment, though, let’s explore how awkward it would be to recreate my Google Docs example in GitHub. I will assume that you aren’t a programmer, have never used GitHub, and don’t know (or want to know) anything about branches or commits or pull requests. But you would like to be able to create a presentation that walks a learner though a sequence of edits, with step-by-step narration and color-coded diffs. At the end of this tutorial you’ll know how to do that. The method isn’t as straightforward as I wish it were. But I’ll describe it carefully, so you can try it for yourself and decide whether it’s practical.

Here’s the final result of the technique I’ll describe.

If you want to replicate that, and don’t already have a GitHub account, create one now and log in.

Ready to go? OK, let’s get started.

Step 1: Create a repository

Click the + button in the top right corner, then click New repository.

Here’s the next screen. All you must do here is name the repository, e.g. editing-step-by-step, then click Create repository. I’ve ticked the Add a README file box, and chosen the Apache 2.0 license, but you could leave the defaults — box unchecked, license None — as neither matters for our purpose here.

Step 2: Create a new file

On your GitHub home page, click the Repositories tab. Your new repo shows up first. Click its link to open it, then click the Add file dropdown and choose Create new file. Here’s where you land.

Step 3: Add the original text, create a new branch, commit the change, and create a pull request

What happens on the next screen is bewildering, but I will spare you the details because I’m assuming you don’t want to know about branches or commits or pull requests, you just want to build the kind of presentation I’ve promised you can. So, just follow this recipe.

  • Name the file (e.g. sample-press-release.txt
  • Copy/paste the text of the document into the edit box
  • Select Create a new branch for this commit and start a pull request
  • Name the branch (e.g. edits)
  • Click Propose new file

On the next screen, title the pull request (e.g. edit the press release) and click Create pull request.

Step 4: Visit the new branch and begin editing

On the home page of your repo, use the main dropdown to open the list of branches. There are now two: main and edits. Select edits

Here’s the next screen.

Click the name of the document you created (e.g. sample-press-release.txt to open it.

Click the pencil icon’s dropdown, and select Edit this file.

Make and preview your first edit. Here, that’s my initial rewrite of the headline. I’ve written a title for the commit (Step 1: revise headline), and I’ve added a detailed explanation in the box below the title. You can see the color-coded diff above, and the rationale for the change below.

Click Commit changes, and you’re back in the editor ready to make the next change.

Step 5: Visit the pull request to review the change

On your repo’s home page (e.g. https://github.com/judell/editing-step-by-step), click the Pull requests button. You’ll land here.

Click the name of the pull request (e.g. edit the press release) to open it. In the rightmost column you’ll see links with alphanumeric labels.

Click the first one of those to land here.

This is the first commit, the one that added the original text. Now click Next to review the first change.

This, finally, is the effect we want to create: a granular edit, with an explanation and a color-coded diff, encapsulated in a link that you can give to a learner who can then click Next to step through a series of narrated edits.

Lather, rinse, repeat

To continue building the presentation, repeat Step 4 (above) once per edit. I’m doing that now.

… time passes …

OK, done. Here’s the final edited copy. To step through the edits, start here and use the Next button to advance step-by-step.

If this were a software project you’d merge the edits branch into the main branch and close the pull request. But you don’t need to worry about any of that. The edits branch, with its open pull request, is the final product, and the link to the first commit in the pull request is how you make it available to a learner who wants to review the presentation.

GitHub enables what I’ve shown here by wrapping the byzantine complexity of the underlying tool, Git, in a much friendlier interface. But what’s friendly to a programmer is still pretty overwhelming for an English teacher. I still envision another layer of packaging that would make this technique simpler for teachers and learners focused on the craft of writing and editing. Meanwhile, though, it’s possible to use GitHub to achieve a compelling result. Is it practical? That’s not for me to say, I’m way past being able to see this stuff through the eyes of a beginner. But if that’s you, and you’re motivated to give this a try, I would love to know whether you’re able to follow this recipe, and if so whether you think it could help you to help learners become better writers and editors.

How to rewrite a press release: a step-by-step guide

As a teaching fellow in grad school I helped undergrads improve their expository writing. Some were engineers, and I invited them to think about writing and editing prose in the same ways they thought about writing and editing code. Similar rules apply, with different names. Strunk and White say “omit needless words”; coders say “DRY” (don’t repeat yourself.) Writers edit; coders refactor. I encouraged students to think about writing and editing prose not as a creative act (though it is one, as is coding) but rather as a method governed by rules that are straightforward to learn and mechanical to apply.

This week I applied those rules to an internal document that announces new software features. It’s been a long time since I’ve explained the method, and thanks to a prompt from Greg Wilson I’ll give it a try using another tech announcement I picked at random. Here is the original version.

I captured the transformations in a series of steps, and named each step in the version history of a Google Doc.

Step 1

The rewritten headline applies the following rules.

Lead with key benefits. The release features two: support for diplex-matched antennas and faster workflow. The original headline mentions only the first, I added the second.

Clarify modifiers. A phrase like “diplex matched antennas” is ambiguous. Does “matched” modify “diplex” or “antennas”? The domain is unfamiliar to me, but I suspected it should be “diplex-matched” and a web search confirmed that hunch.

Omit needless words. The idea of faster workflow appears in the original first paragraph as “new efficiencies aimed at streamlining antenna design workflows and shortening design cycles.” That’s a long, complicated, yet vague way of saying “enables designers to work faster.”

Step 2

The original lead paragraph was now just a verbose recap of the headline. So poof, gone.

Step 3

The original second paragraph, now the lead, needed a bit of tightening. Rules in play here:

Strengthen verbs. “NOUN is a NOUN that VERBs” weakens the verb. “NOUN, a NOUN, VERBs” makes it stronger.

Clarify modifiers. “matching network analysis” -> “matching-network analysis”. (As I look at it again now, I’d revise to “analysis of matching networks.”)

Break up long, weakly-linked sentences. The original was really two sentences linked weakly by “making it,” so I split them.

Omit needless words. A word that adds nothing, like “applications” here, weakens a sentence.

Strengthen parallelism. If you say “It’s ideal for X and Y” there’s no problem. But when X becomes “complex antenna designs that involve multi-state and multi-port aperture or impedance tuners,” and Y becomes “corporate feed networks with digital phase shifters,” then it helps to make the parallelism explicit: “It’s ideal for X and for Y.”

Step 4

Omit needless words. “builds on the previous framework with additional” -> “adds”.

Simplify. “capability to connect” -> “ability to connect”.

Show, don’t tell. A phrase like “time-saving options in the schematic editor’s interface” tells us that designers save time but doesn’t show us how. That comes next: “the capability to connect two voltage sources to a single antenna improves workflow efficiency.” The revision cites that as a shortcut.

Activate the sentence. “System and radiation efficiencies … can be effortlessly computed from a single schematic” makes efficiencies the subject and buries the agent (the designer) who computes them. The revision activates that passive construction. Similar rules govern the rewrite of the next paragraph.

Step 5

When I reread the original fourth paragraph I realized that the release wasn’t only touting faster workflow, but also better collaboration. So I adjusted the headline accordingly.

Step 6

Show, don’t tell. The original version tells, the new one shows.

Simplify. “streamline user input” -> “saves keystrokes” (which I might further revise to “clicks and keystrokes”).

Final result

Here’s the result of these changes.

I haven’t fully explained each step, and because the domain is unfamiliar I’ve likely missed some nuance. But I’m certain that the final version is clearer and more effective. I hope this step-by-step narration helps you see how and why the method works.

The Velvet Bandit’s COVID series

The Velvet Bandit is a local street artist whose work I’ve admired for several years. Her 15 minutes of fame happened last year when, as reported by the Press Democrat, Alexandria Ocasio-Cortez wore a gown with a “Tax the Rich” message that closely resembled a VB design.

I have particularly enjoyed a collection that I think of as the Velvet Bandit’s COVID series, which appeared on the boarded-up windows of the former Economy Inn here in Santa Rosa. The building is now under active renovation and the installation won’t last much longer, so I photographed it today and made a slideshow.

I like this image especially, though I have no idea what it means.

If you would like to buy some of her work, it’s available here. I gather sales have been brisk since l’affaire AOC!