The Image of Postgres

At the 2015 Postgres conference, the great IT philosopher Robert r0ml Lefkowitz delivered a talk entitled The Image of Postgres. Here’s the blurb.

How do you think about what a database is? If you think of a database as only a place to store your data, then perhaps it does not really matter what the internals of that database are; all you really need is a home for your data to be managed, nothing more.

If you think of a database as a place where you develop applications, then your expectations of your database software change. No longer do you only need data management capabilities, but you require processing functions, the ability to load in additional libraries, interface with other databases, and perhaps even additional language support.

If your database is just for storage, there are plenty of options. If your database is your development framework, you need Postgres.

Why? Well, let’s get philosophical.

For over a year, I’ve been using Postgres as a development framework. In addition to the core Postgres server that stores all the Hypothesis user, group, and annotation data, there’s now also a separate Postgres server that provides an interpretive layer on top of the raw data. It synthesizes and caches product- and business-relevant views, using a combination of PL/pgSQL and PL/Python. Data and business logic share a common environment. Although I didn’t make the connection until I watched r0ml’s talk, this setup harkens back to the 1980s when Smalltalk (and Lisp, and APL) were programming environments with built-in persistence. The “image” in r0ml’s title refers to the Smalltalk image, i.e. the contents of the Smalltalk virtual machine. It may also connote reputation, in the sense that our image of Postgres isn’t that of a Smalltalk-like environment, though r0ml thinks it should be, and my experience so far leads me to agree.

I started writing a book to document what I’ve learned and done with this idea. It’s been a struggle to find motivation because, well, being the patron saint of trailing-edge technologies is often lonely and unrewarding. A book on this particular topic is likely to appeal to very few people. Stored procedures? So last century! Yes, Python provides a modern hook, but I can almost guarantee that one of the comments on my first book — “has a vision, but too weird” — would come back around.

I’m tempted not to bother. Maybe I should just focus on completing and polishing the things the book would describe.

And yet, it’s hard to let go. This isn’t just a compelling idea, it’s delivering excellent results. I rewatched r0ml’s talk today and got fired up again. Does it resonate for you? Would you like to see the ideas developed? If you watch the talk, please let me know.

Here are some excerpts to pique your interest.

On databases vs file systems:

I submit that the difference between the database and a file system is that database is a platform for enforcing your business rules.

On ontology:

client: The business guys are always upset because they want to know how many customers we have and we can’t tell them.

r0ml: That doesn’t sound like a very hard problem. SELECT * from the customer table, right?

client: No you don’t understand the problem.

r0ml: OK, what’s the problem?

client: It depends what you mean by customer because if you’re selling cell phone insurance, is the customer the person who has the cell phone account? What if they have two handsets and they’re both insured? What if it’s a family account and there are kids on the plan, do they count as customers? What if it’s a business account and you have 1000 people covered but only 700 using?

r0ml: How my customers you have, that’s a business rule.

So figuring out what your schema is, and figuring out how you organize the stuff and what do you do in the database, that’s all part of enforcing your business rules.

You have to decide what these things mean.

It’s an ontological problem.

You have to classify your knowledge and then enforce your business rules.

On n-tier architecture:

Let us think about the typical web application architecture. This architecture is called the three-tier architecture because it has four tiers. You have your browser, your web server, the thing that runs Python or PHP or JavaScript or Ruby or Java code, and then the database. And that’s always how you do it. And why do you do it that way? Well because that’s how everybody does it.

On Smalltalk and friends:

This is the BYTE magazine cover from August of 1981. In the 70s and the 80s, programming languages had this sort of unique perspective that’s completely lost to history. The way it worked: a programming environment was a virtual machine image, it was a complete copy of your entire virtual machine memory and that was called the image. And then you loaded that up and it had all your functions and your data in it, and then you ran that for a while until you were sort of done and then you saved it out. And this wasn’t just Smalltalk, Lisp worked that way, APL worked that way, it was kind of like Docker only it wasn’t a separate thing because everything worked that way and so you didn’t worry very much about persistence because it was implied. If you had a programming environment it saved everything that you were doing in the programming environment, you didn’t have to separate that part out. A programming environment was a place where you kept all your data and business logic forever.

So then Postgres is kind of like Smalltalk only different.

What’s the difference? Well we took the UI out of Smalltalk and put it in the browser. The rest of it is the same, so really Postgres is an application delivery platform, just like we had back in the 80s.

Posted in .

12 thoughts on “The Image of Postgres

  1. I’d submit that r0ml still doesn’t understand the number-of-customers problem.

    The problem isn’t that definitions need to be made. That’s relatively quick work. The problem’s that the real customer types will sprout and die all the time, and whoever’s maintaining the database can’t be reorganizing the whole thing as it grows organically in complexity, regulations change, etc. Going by sales category won’t work well, either, because that won’t cover sales’ need to come up with bespoke instances (say, there’s an opportunity to sell phone insurance to a school that’s buying all its teachers phones for use in school-specific circumstances). You can’t tell the sales people “stop selling bespoke phone insurance contracts because you’re making our customer-sorting work difficult”. There’s also a span between the time that a contract type is bespoke and ordinary, and in that time, how are you sorting those customers?

    So the real question is “How do we have a database that can follow an adventitiously-developing business closely and organically, so that we can continue to tell our business people how many customers we have and what kind?” Which is probably a more interesting one.

    The other problem is an arena in which you have “business guys” rather than “business people”.

    cheers,

    amy

  2. > So the real question is …

    I agree completely. The approach I’m taking feels like one good way to achieve the necessary agility.

    > business guys

    Agree there too.

  3. This is great, actually. I’m just finishing getting the syllabus together for a course to do with teaching scientists & writers to work togther in industry, and an important thing here is the ability to hear the question that isn’t asked because the asker doesn’t know enough to be able to frame it, or maybe to think of it at all. (The Passover Hagaddah also covers this circumstance.) Retail trains you well for that; academia, not so much. Do you mind if I use this exchange/post?

    amy

  4. Anything you did to develop/write a treatment for the book, I would totally read. And mostly because as “fashion trends” in I.T. go, more voices = more better. I say this as someone who kinda lazily followed the so-called short lived revolution of NoSQL, big data from Hadoop, to sharding, to all the innovations meant to allow these big files to be useful in some way (not just farming it as AI/ML data sets). That’s a pretty rarified, elite usage, like all the tech surrounding Formula 1 racing, or launching rockets into space. Meanwhile the rest of us have jobs, and show up, and “do the work”. We’re not architects, designers. So anything you have that sheds light on that process, and adds another voice to the mix is valuable to me.

  5. Furthermore, alone the lines of the Postgres presentation by r0lm, everything I ever needed to know about Object Databases, I learned, or had initial contact with by reading Byte magazine: https://en.wikipedia.org/wiki/Object_database. Prior to Wikipedia existing, I collected links to every manufacturer’s datasheet, PDF support page, anything I could because I felt like object databases were handling, well “objects” that were not as well managed in a simple file system as such. Eventually I learned about the Z-Object Database and the “Zope” app server, which was a wonderful playground to try and learn about these things (dBs and App servers) on a desktop scale environment. Never, was I worried about “scale” just discovering and playing, and applying what I learned to work situations. And it was great for that. https://en.wikipedia.org/wiki/Zope

    1. What’s wonderful is that Postgres (and other SQL engines) have now married relational and object (in the form of JSON) strategies.

      Markus Winand, Modern SQL in PostgreSQL, https://www.youtube.com/watch?v=nFfS1HmiWCM, is a nice overview of how “this ain’t your grandfather’s SQL” any more — although many don’t seem to acknowledge that.

  6. Thank you for posting the link to the r0ml talk, as it is one I had missed but had heard about.

    Regarding a book treatment I would really like that. I have a handful of book about concepts of building things that I have read and keep around (your Groupware book is one that I have kept close as it helped with early framing of group interactive platforms I was building from scratch (not in Perl though). But, this type of book I find continually helpful and become insightful and helpful for understanding future projects even though I didn’t have the experience, the learnings from it I do have.

    It is the lessons learned, models for understanding and approaching things, and new patterns that I find helpful.

    1. A challenge is how to present concepts and patterns grounded in very specific real-world examples, while at the same time making a case for their generality.

  7. I find stored procedures far from trailing edge. They are particularly useful when you have multiple applications written in multiple languages that need a consistent way to execute very specific data manipulations.

    Not using nearly as many of them as I used to, but a few strategically deployed embedded procedures and database triggers allows you to add an amazing amount of functionality.

    The biggest issue I’ve always had with embedded pgsql is version control. Recent projects have used Alembic to maintain database versioning, including triggers and stored procedures.

    (Now, we could talk about that last Oracle project on which I worked where we wrote thousands of lines of Oracle PL/SQL code.)

    So glad to be focusing on PostgreSQL these days.

    1. > embedded pgsql

      And along with plpgsql, plpython. Having Python available in that context feels like a game-changer. It doesn’t replace plpgsql, which is the best choice in various circumstances, but it opens up a whole new universe of capability that lives next to data.

      > version control

      Yes, and the problem is compounded for us because our system is a mixture of views and functions in Postgres, and queries managed in Metabase (which, if you haven’t met it, is awesome).

      It’s straightforward to manage the views and functions in GitHub. Metabase is trickier because it stores query text in its db, so we have to extract that in order to manage versioned queries in GitHub.

    2. Stored procedures (really, for the most part, functions) are part of the story. The bigger story I’ve been reaching to articulate is about the shift away from, and now back toward, databases as intelligent processors vs dumb storage.

      So far, https://theartofpostgresql.com/ is the best articulation I’ve found. Among many other things, he shows how there’s no reason to reject the strategy for failing to meet expectations around version control and testing, because you absolutely can meet those expectations.

Leave a Reply