IronPython/Azure status report

As I mentioned here, I’m exploring the viability of Python as a way of programming the newly-announced Microsoft cloud platform, Azure. Partly that’s because I love Python, but mainly it’s because I believe that the culture surrounding Python and other open source dynamic languages can fruitfully cross-pollinate with the culture that infuses Microsoft’s platforms.

One of the reasons these cultures face each other across a great divide is religious attachment to low-level operating systems. In the cloud, though, the differences among these low-level systems are increasingly hidden behind interfaces to higher-order constructs: compute nodes, storage objects. These, in turn, are building blocks for still-higher-order services that will be created — and consumed — both by platform vendors and by the developers who are their customers.

It becomes possible, in this new world, for platforms to support a continuum of access styles. You want object-oriented? Do it that way. RESTful? Go for it. You know the Python or Ruby libraries best? Use them. The .NET Framework? Use that. Or even mix and match according to convenience and taste.

Consider this Python module written by Sriram Krishnan, which wraps the RESTful interface to Azure blobs. It’s written in standard Python, using OpenSSL-based cryptography. When I tried it on my machine, though, I ran into an inconsistency in my local Python installation.

Normally a Python developer would debug and fix the installation. But I was planning to deploy this module in IronPython on Azure, and IronPython doesn’t run compiled modules such as OpenSSL. It can, of course, use equivalent .NET functionality — in this case, the method implementing the SHA-256 flavor of keyed-Hash Message Authentication Code. So I made that small change.

At this point, having eliminated my module’s only dependency on unmanaged code, I thought I could run it in the Azure development fabric, and then deploy it to the Azure cloud. But no. Azure’s security model currently won’t allow Python even to import pure-Python modules at runtime. A wacky solution might be to use Python’s custom import mechanism to load those modules over the network. More practically, the modules might be provisioned into Azure.

I don’t know how this will play out. Meanwhile, there’s another option: Eliminate all use of Python modules, and rely only on the .NET Framework. So as an experiment, I switched over from Python’s minidom, httplib, time, and base64 modules to their .NET equivalents.

The good news is that this works. I can deploy the module to Azure, and use it in the cloud. The bad news is that, in some cases, I’d rather use the standard Python modules. The .NET equivalent to Python’s httplib, for example, is the HttpWebRequest/HttpWebResponse pair. But these APIs differ from those provided by httplib in a couple of ways that annoy me.

First, there’s an inconsistency in the way headers are handled. You get and set most headers using the Headers collection. But you get and set a few special ones, like Content-Type and Content-Length, using special named properties.

Second, status codes are handled inconsistently. Most responses return status codes. But for codes in the 4xx series, an exception is thrown.

To me these behaviors are quirks that make it trickier to create RESTful interfaces. I’m sure there are reasons for them, and people who prefer them for those reasons, but I’d rather just use httplib. In any case, if both styles are available, there’s no need to argue. Everybody gets what they need.

We’re not there yet in the current Azure preview. Those of us chomping at the bit to run IronPython in the cloud will have to be inventive. I expect things will get easier as both Azure and IronPython mature, and as Python technologies like Django and NWSGI are — I hope — woven into the fabric.

Why might this matter? Again, I’m looking for cross-pollination. Python culture will be able to make really productive use of higher-order Azure services such as identity, access control, workflow, Live Services. And it will also exert a positive influence on the future evolution of the Azure platform.

Carl Hewitt on cloud computing, scalable semantics, and Wikipedia

It was my great privilege to interview Carl Hewitt for this week’s Innovators show. He is principally known for work dating back to the late 1960s and early 1970s, when he helped lay the foundations for a declarative, message-oriented model of computation. Then, and for decades thereafter, the virtues of that model were not widely appreciated because the problems it solves were not evident. Now, in an era of multi-core systems, cloud-based computation, and global interconnectivity, it makes all kinds of sense.

In this conversation, we review the themes Carl sounded in this recent talk at Stanford. (Video is here, and an audio-only version I made for myself is here.)

In one of the most striking moments in that talk, Carl says:

What can I change? Just me. For anything else, I send a message, I say please, and I hope for the best.

Then he laughs and adds:

Does this sound like some circumstances you are familiar with?

Having thought deeply, for 40 years, about the intersection of computation and human affairs, he has arrived at an elegant synthesis: The same organizational and communication patterns govern both realms. As well they should, since the two are now and forever intertwingled.

At the end of our conversation, we turn to Carl’s critique of Wikipedia. He raises important questions about how Wikipedia’s cadre of mostly-anonymous administrators, dedicated to the codification of conventional knowledge, come into conflict with academics and researchers whose work pushes the boundaries and conflates the categories of conventional knowledge.

Visual numeracy for collective survival

In response to an item last week about regional sources of imported oil, @jesperfj wrote:

Not sure what to conclude? Do informed people like Udell really not know that?

I really didn’t. And the reaction to the item, plus my survey of friends and associates, tells me that while some informed people did, many did not.

From this, I know exactly what to conclude. Like all complex systems, our civilization is buggy. We need many eyes to make the bugs shallow, and there all kinds of things that the brains behind those eyes can’t know a priori. But with the right kinds of mental prosthetics, we can learn rapidly and bootstrap ourselves into a position to reason effectively.

Data visualization is a crucially important mental prosthetic. But we’ve yet to evolve it much beyond the graphical equivalent of the wooden leg.

Consider this chart:

It’s a somewhat useful way to visualize the fact — counter-intuitive for many — that the Middle East ranks only third among suppliers of oil to the U.S. But here is a much more useful way to visualize the fact — intuitive for everyone — that the Middle East is where most of the world’s oil reserves exist:

What do you call this kind of projection, where country size is proportional to a variable? It’s the sort of wickedly effective graphical device that we should all want to be able to deploy, at a moment’s notice and with minimal effort, in order to make sense of data and reason about the world.

Like Tim Bray, I’m angry about “the financial professionals who paid themselves millions for driving the economy into a brick wall at high speed, then walked away while we pick up the pieces.” But I’m also angry at myself for visualizing, way too late, along with the rest of us, the magnitude of the giant pool of money and its constituent flows.

We could have seen more, seen better, and seen sooner. In many domains, as we go forward, we will have to.

Twine, del.icio.us, and event-driven service integration

Last week on Interviews with Innovators I spoke with Nova Spivack about Twine, a service that’s been variously described as the first mainstream semantic web application and “just del.icio.us 2.0”. You’ll find support for both points of view in my conversation with Nova. It’s true that, unlike del.icio.us and other comparable services, Twine is built squarely on top of what Nova calls a “semantic web stack.” But it’s hard to discern, in Twine’s current incarnation, just what that entails.

One of the bookmarks I imported into Twine, for example, is http://www.educause.edu/HEBCA/623. It’s the home page for an organization called the Higher Education Bridge Certification Authority (HEBCA). In Twine, the item shows up tagged as an Organization. That’s the kind of thing that you’d expect a semantically-aware service to do. But what does it mean for Twine to classify HEBCA as an Organization? It’s unclear. Here’s the offered link. It points to a small collection of items that mention HEBCA, but Twine does not “know” anything at all about HEBCA.

What our conversation revealed, though, is that my method of testing Twine — which involved importing all my del.icio.us bookmarks — was flawed. I had assumed, incorrectly, that Twine would absorb the bookmarked pages themselves. It will, but it doesn’t yet, currently it only aborbs the del.icio.us metadata such as title and link. If you want to find out what Twine’s linguistic and semantic analysis can do, you need to pump content into the system.

That’s easier said than done. The only API available for content injection is email. Twine materializes a private email address to which you can send items you want to post as private notes.

I spent a few minutes thinking about writing a script to automate the injection of items I bookmark on del.icio.us. It’s doable, of course, but only by dint of hackery that I would undertake grudgingly and normal folks would never imagine or attempt.

This kind of integration will get a whole lot easier for everyone when the various services export events representing our actions within them. For example, a couple of weeks ago I reorganized my del.icio.us tagspace, adding the tag socialinformationmanagement to a group of otherwise-tagged items in order to emphasize that particular facet. And I tweeted:

Imagining new kind of FriendFeed event: “Jon Udell updated 9 del.icio.us bookmarks, adding the tag socialinformationmanagement”

In other words, when I perform a public action in some service — like bookmarking an item in del.icio.us, or even just retagging an existing item — the service posts an event on a topic to which other interested services subscribe. In this case, FriendFeed is the interested service. When I configure FriendFeed to monitor my del.icio.us account, it asks del.icio.us for the list of event types that it exports, and I choose which of those to display in FriendFeed.

Of course FriendFeed needn’t be only an event subscriber. It can and should be a publisher too. Another service should be able to ask FriendFeed for the list of event types it aggregates for me — bookmarking an item on del.icio.us, posting a photo on Flickr, adding a book to my LibraryThing library — and then subscribe to all or just some of those events.

(While we’re at it, I want a service that can not only subscribe to my aggregated event feed, but also take actions. One of the actions I’d configure would be: When Jon bookmarks a new item on del.icio.us, fetch the item and inject it into Twine using the specified secret email address.)

Of course there’s nothing new here with respect to basic change notification. Weblogs.com has been doing that in the blog realm for many years. Now it’s time to generalize the mechanism across the range of services that manage various aspects of our online lives.

Where the oil comes from: Not from where I thought

At a party the other night, a friend mentioned that the country supplying us with the most oil is Canada. Maybe so, I said, but on a regional basis the Middle East dominates, right? He wasn’t sure, but didn’t think so. And it turns out he was right, at least according to the US Dept. of Energy data he sent me. That data says that the Middle East ranks third among our regional sources, behind North America and Africa.

Here’s the world overview for 2007 in thousands of barrels:

And here’s the regional breakdown:

North America 1,648,765 33.56%
Africa 980,231 19.95%
Middle East 837,841 17.05%
South America 784,999 15.98%
Europe 567,152 11.54%
Asia 91,236 1.86%
Oceania 2,774 0.06%

The links go to regional views where you can hover to reveal per-country numbers.

When I do these kinds of exercises, I’m always struck by two things. First, it amazes me how much of what we think we know is wrong. I was sure that the Middle East was the dominant regional source.

Second, I’m always a bit discouraged by how geeky you still have to be — even with the great online tools we have now — in order to pull answers to simple questions out of raw data. When my friend cited these numbers, the first thing I wanted to know was: How do they break down by region?

I wound up using Dabble DB because I happened to know that it includes all the necessary ingredients:

  • Can import tabular data from web pages
  • Can drop and rename columns in an imported table
  • Given a column with locations — countries, states, zipcodes — can map the corresponding columns
  • Can publish views for anybody to see

This was a huge leg up! But a lot of folks wouldn’t know about that tool. And even if they did, many wouldn’t overcome some of the remaining obstacles. For example:

  • Importing. There are a few different ways to grab data from a web page. You can have Dabble DB parse the page, or you can copy/paste. In this case, I wound up trying both and had better luck with the latter. But we’re still very much in an era when data published to the web is not really intended to be used as data. That first step can be a doozy.
  • Sharing. After pasting in the data and reducing the table to two columns — country names and 2007 1000s of barrels — I had my answer. And if you were an authorized user of the application, I could have shared it with you. But in order to publish to the world, I had to produce a special URL. And then I realized a single one wouldn’t suffice. The shareable views aren’t interactive. You can’t drill down from the world overview to the Middle East segment. So I wound up having to create views for each region, generate an URL for each view, and keeping track of all that was confusing even for me.

Still, I’m excited. We’re really close to the point where non-specialists will be able to find data online, ask questions of it, produce answers that bear on public policy issues, and share those answers online for review and discussion. A few more turns of the crank, and we’ll be there. And not a moment too soon.

Hello World

In July 1995 I wrote a column in BYTE with the same title as this blog post. It began:

One day this spring, an HTTP request popped out the back of my old Swan 386/25, rattled through our LAN, jumped across an X.25 link to BIX, negotiated its way through three major carriers and a dozen hosts, and made a final hop over a PPP link to its rendezvous with BYTE’s newborn Web server, an Alpha AXP 150 located just 2 feet from the Swan.

Thus began the project on which this column will report monthly. Its mission: To engage BYTE in direct electronic communication with the world, retool our content for digital deployment, and showcase emerging products, technologies, and ideas vital to these tasks. We don’t have all the answers yet — far from it. But we’re starting to learn how a company can provide and use Internet services in a safe, effective, maintainable, and profitable way.

Today I felt that same kind of excitement when I clicked on this URL:

http://elmcity.cloudapp.net

There isn’t much to see. But what happens behind the scenes is quite interesting to me. The URL hits a deployment in the Azure cloud where I’m hosting an IronPython runtime. Then it invokes that runtime on a file that contains this little Python program:

hello = "Hello world"

Finally, it gets back an object representing the result of that program, extracts the value of the hello variable, and pops it into the textbox.

This is the proof of concept I’ve been looking for. Now I can begin an experiment I’ve been wanting to do for a long time. I have an ongoing personal project, elmcity.info, about which I’ve written from time to time. It’s hosted at http://bluehost.com, it’s written in Python using Django, and it’s invoked by way of FastCGI.

Back in the BYTE era, I loved learning about the web by building out a live project, and explaining what I learned step by step. Now I want to explore, and document, what it’s like to build out another live project in the Azure cloud.

Could I do it in Amazon’s cloud? Sure. In fact I already did, as an experiment. And if it were cheaper to run there than on Bluehost, I’d currently be hosting elmcity.info on EC2 instead.

Could I do it in Google’s cloud? Not sure. I didn’t score an account there and can’t yet try. The interactive pieces of my application should slide nicely into AppEngine’s Django framework. But much of the work is done in long-running processes which I believe AppEngine doesn’t yet support.

In any case, it’s obvious why I’ll be focusing on Azure. I suspect, though, that my focus will be different than most. I’m not a hotshot .NET developer, just an average guy who can get some useful things done in environments that enable me to create small, simple, understandable programs, and do so in agile and dynamic ways. I think that Azure — admittedly nascent in its current form, as Ray Ozzie said at the PDC — can be such an environment. Let’s find out.

When the lights go on at the New York Times, our work can start

On election night, the most useful information display I found was the New York Times’ interactive election map. It’s another bravura performance from a team of talented designers and programmers who keep raising the bar. Back in May, two of them — Gabriel Dance and Shan Carter — joined me for a conversation about how they do this work, and why it matters.

Last week, the venture capitalist Tim Oren wrote an essay entitled The Newspaper Crash of 2009… And How You Can Help in which he argues:

The industry has abdicated its social function to support a well-informed electorate, and become a propaganda arm of the left. In so doing, they have sullied their brands and lost the trust of their readers. The economic consequences of this default of their value proposition are now becoming apparent. The Internet and an economic crisis together would be bad enough, but the industry has only itself to blame for the egregious behavior on display for the last few years, and at its worst right now.

And concludes:

When the lights go out at the New York Times, our work will be finished.

The newspaper industry has surely earned this kind of scathing criticism. And it may well fail to capitalize on the amazing opportunities for self-reinvention afforded by the Internet. But the Times is attracting an all-star team of information architects, interactive graphics designers, programmers, and media producers. And according to Gabriel Dance and Shan Carter, these folks are increasingly collaborating with reporters to marshall complex information in ways that make the newspaper’s stories deeper and more open to independent analysis and interpretation.

So I’ll say it differently: When the lights go on at the New York Times, our work can start.

My upcoming World Usability Day talk

Next Thursday is World Usability Day, a distributed event that will happen in lots of places. One of them is Putney, Vermont, not far from my home, where I’ll be speaking at the New England venue, Landmark College.

The program says:

A description of Jon’s talk is forthcoming, but we’ve asked him to help the audience further their thinking about the potential of video on the web in support of teaching and learning, as well as the the importance of the structure behind the information with which we all work, exemplified by his work on compiling disparate web resources, as in his work on Keene-related events culled from the internet and viewable at elmcity.info/events.

Great suggestions! Video and structured data are very different domains. That creates a nice opportunity to talk about key underlying principles, and relate them to the practices of teaching and learning. So, here’s the blurb.

Title: Teaching users to be more usable teachers

Description:
Technologists and designers, including those who self-identify as usability professionals, think of themselves as creators of products and services for “the user” or “the consumer”. But as Eric von Hippel argues in Democratizing Innovation, producers and consumers are not, and never have been, distinct groups. At various times and in various contexts, we are all producers and consumers, teachers and learners, co-creators of products, services, experience, and knowledge.

We learn by imitating how good teachers think and act. Conversely, good teachers think and act in ways that inspire and reward imitation. In the era of peer production on peer networks, we can all be better teachers — more usable teachers — by thinking and behaving in ways that others can imitate easily and effectively. From this perspective, online video and structured data aren’t just new ways to distribute entertainment and information. They’re new environments for teaching and learning. Engineers and designers aren’t solely responsible for make these environments usable. We, the inhabitants, must make ourselves usable too.

This is going to be fun!

For Granicus, transparent democracy is just business as usual

This week’s Interviews with Innovators explores the Granicus solution for civic webcasting with CEO Tom Spengler. If you’re lucky enough to live in a city that is a Granicus client you’re already familiar with how it works. If not, take a look at the Newport Beach, CA site. It’s a beautiful thing. You can see the video and minutes in a synchronized view, jump to the agenda items you care about, and view associated staff reports in context.

For citizens the benefit is clear. If you have access to these proceedings on cable TV — even random access with a DVR — it’s still a challenge to pinpoint a segment you care about. What’s more, there’s no way to form a URL that refers to that segment so you can share it, and so that online discussion about the segment can aggregate around that URL. Granicus gets it right. Agenda items define the natural set of RESTful resources for these meetings, and this system enables people to cite, bookmark, and link to those resources.

Behind the scenes the system enables the town clerk to annotate a copy of the minutes with timecodes, so that the data required for segmentation and synchronization is captured in realtime and available immediately upon conclusion of the meeting. That’s exactly the kind of pragmatic approach that will help make transparent democracy as ordinary and routine as it ought to be.