I first wrote about PowerShell back in 2004, when it was called Monad and/or MSH. What most intrigued me was the way .NET objects could flow through its pipeline. I wouldn’t have thought then, or now, of reaching for PowerShell just to do some basic logfile processing. But Scott Hanselman’s log analysis example today got my attention.
My assumption was that Python would offer easier and more natural ways to absorb, consolidate, sort, and emit the kind of simple log data shown in Scott’s example. But when I tried to recreate that example in Python, I developed a new appreciation for PowerShell’s data munging chops. The selection, grouping, summing, and sorting capabilities — and the pervasive use of the pipeline — are potent ways to manipulate .NET objects. But they’re equally potent when you’re just munging a CSV file.
True, Python has a csv module that’s roughly equivalent to PowerShell’s Import-CSV, so you can easily slurp the data into a dictionary, taking fieldnames from the first row. But how do you perform the selection, grouping, summing, and sorting as succintly and — for lack of a better word — as composably? I didn’t find obvious answers. Of course, just because I failed doesn’t mean it can’t be done. I’d be curious to see solutions in Python (or Ruby, Perl, etc.) that people think capture the spirit of Scott’s example. I suspect there may be interesting lessons to be learned going in both directions.
PS: When I ran the script that Lee Holmes wrote based on Scott’s one-liner, I was initially puzzled to see a one-column display of names, not a two-column display of names and counts. Then I realized it was a formatting problem — the second column was running off the edge of my display. So I changed:
Get-ShowHits | Sort -Desc Hits
Get-ShowHits | Sort -Desc Hits | Format-Table Name,Hits -auto