Recently I’ve posted two examples[1, 2] of Python idioms alongside corresponding C# idioms. It always intrigues me to look at the same concept through different lenses, and it seems to intrigue others as well, so here’s a third installment.
Today’s example comes from a real scenario. I’ve recently added a feature to the elmcity service that enables curators to control their hubs by sending Twitter direct messages to the service. One method, GetDirectMessagesFromTwitter, calls the Twitter API and returns a list of direct messages sent to the elmcity service. Another method, GetDirectMessagesFromAzure, calls the Azure table storage API and returns a list of direct messages stored there. The difference between the two lists — if any — represents new messages to be processed.
Here’s one take on Python and C# idioms for finding the difference between two lists:
fetched_messages = GetDirectMessagesFromTwitter(); stored_messages = GetDirectMessagesFromAzure(); diff = set(fetched_messages) - set(stored_messages) return list(diff)
var fetched_messages = GetDirectMessagesFromTwitter(); var stored_messages = GetDirectMessagesFromAzure(); var diff = fetched_messages.Except( stored_messages); return diff.ToList();
I can’t decide which one I prefer. Python’s set arithmetic is mathematically pure. But C#’s noun-verb syntax is appealing too. Which do you prefer? And why?
PS: The Python example above is slightly concocted. It won’t work as shown here because I’m modeling Twitter direct messages as .NET objects. IronPython can use those objects, but the set subtraction fails because the objects returned from the two API calls aren’t directly comparable.
A real working example would add something like this:
fetched_message_sigs = [x.text+x.datetime for x in fetched_messages] stored_message_sigs = [x.text+x.datetime for x in stored_messages] diff = list(set(fetched_message_sigs) - set(stored_message_sigs))
But that’s a detail that would only obscure the side-by-side comparison I’m making here.
5 thoughts on “More Python and C# idioms: Finding the difference between two lists”
Python sets are wonderful, but they can be a little deceiving, particularly in the memory usage department. They also require that contained objects are immutable (having meaningful order and hash values).
For these reasons, and especially for larger collections that are approaching memory limits, the merge algorithm is a great alternative. http://pastie.org/666041 is an implementation I use for diffing large lists of integers in a Google AppEngine application (memory limit ~60mb). This algorithm only requires memory to store the differences, rather than also including intermediary sets.
Another useful set-related trick is, if you use set’s methods rather than its operator overloads, you may directly pass any sequence as the right operand without first converting it to a set:
set(L1) – set(L2)
Oh, so there’s the noun-verb pattern. Thanks!
The OO syntax (noun.verb) is great when there’s one noun. When there are zero or two or more it doesn’t work as nicely, and the functional syntax (verb (noun, noun, noun)) works better. When there’s just one noun, you can dispatch on the type of that, which also means you can control namespaces.
For dispatching the alternative not limited to one noun is multimethods. They don’t seem to take off. For namespaces the alternative isn’t explored as much, but Koenig lookup in C++ is a stab at it.
I think we should also look at sentences in spoken languages. We tend to always have a verb, and then we have some nouns. So the functional approach seems like a better match. However at least in English the verb often occurs after the subject, so noun.verb is a better order.
Noun.verb also gives better completion in IDEs, because the noun is drawn from a limited namespace (locals and globals) and then the verb can also be drawn from a limited namespace (only the methods defined on the type of the noun).
As far as my preference, I am not a big fan of uniformity. I like things to look different syntactically so that I can easily distinguish them when scanning code. I don’t have a strong opinion for set difference though.
I like the convenience of C#’s “.Except” – someone in a hurry might not think of the sets immediately, but would certainly bother to look at the list’s methods.
Of course, this only works because you’re using the built-in list type which has rich features. The python approach works on pretty much anything you can iterate over, which is handy for those of us who sometimes use OTHER data structures besides the built-in ones.
quite good article.