Tagging mechanisms and strategies part 3: Taxonomy and folksonomy

Should a tag namespace be a top-down taxonomy or a bottom-up folksonomy? My answer is: both. In recent months, as I curate calendar hubs for selected cities, I’ve been working toward an approach that harmonizes the two styles.

Principle: Top-down and bottom-up

In the elmcity context, the most important taggable object is the calendar feed. That’s because when you can characterize a whole feed with a tag, all the events in that feed inherit the tag. The primary sources of taggable feeds are Eventful, Upcoming, Facebook, Meetup, and EventBrite. I call them taggable because, while some of these services tag individual events, none tag feeds based on venues (Eventful, Upcoming) or Pages (Facebook) or groups (Meetup) or organizers (EventBrite). Assigning feed-level tags is an editorial exercise for the curator.

To these sources I add as many standalone iCalendar feeds as I can find. For Boston and Seattle, the results add up to lists of over 600 tagged iCalendar feeds. Here’s a table of the current list of tags for Boston, the current list for Seattle, and the intersection of the two lists.

adoption 1
african 1
animals 28
arabic 1
architecture 38
art 222
asian 30
astronomy 13
baseball 94
basketball 78
berklee 285
boating 7
books 160
boston 7
boston.com 1020
boston.gov 395
boston-latin-hs 109
bpl 153
bu 101
business 191
cathedral-hs 15
children 73
church 469
climbing 23
comedy 174
comics 1
community 883
conferences 96
confernces 1
conflict-resolution 7
cycling 21
dance 156
dining 4
diving 18
dorchester 4
east-boston-hs 17
education 365
english 7
environment 12
european 4
eventbrite 452
eventful 2692
facebook 436
family 79
fashion 4
film 194
finance 2
fitness 2
food 115
football 1
french 3
games 33
german 3
government 42
green-technology 2
harvard 11
health 206
highschool 142
hiking 44
hispanic 1
history 156
hockey 35
indian 4
irish 1
islamic 15
italian 13
japanese 50
jazz 70
language 83
law 5
lectures 304
lgbt 1
library 378
martial-arts 47
massart 10
meditation 10
meetup 975
mensa 46
museum 94
music 2023
nature 91
networking 105
northeastern 167
performing-arts 222
philanthropy 1
philosophy 6
photography 53
poetry 10
politics 153
polyamory 10
portuguese 8
pub-crawl 4
recreation 201
running 59
sailing 7
science 151
seminars 10
simmons 15
social-justice 89
softball 1
south-boston-hs 1
spanish 1
spirituality 106
sports 590
statistics 2
suffolk 2
support 46
surfing 2
swimming 29
synagogue 5
technology 320
theater 90
tourism 43
tours 90
travel 3
umass 9
university 599
upcoming 212
visual-arts 72
volunteer 46
women 25
writing 17
ymca 4
yoga 27

africa 4
animals 26
aquarium 13
art 563
arts-and-crafts 14
ballet 4
basketball 31
beer 1
boating 9
books 201
business 62
business-and-technology 31
charity-and-volunteer 10
children 302
chinese 28
church 399
circus 23
cleveland-high 9
climbing 16
coffee 4
comedy 47
comics 1
community 574
conferences 65
cooking 3
dance 151
diving 8
dogs 3
education 103
environment 123
eventbrite 139
eventful 1996
facebook 216
fairs-and-festivals 13
film 136
finance 22
fitness 243
food 40
food-and-dining 26
games 114
garfield-high 12
german 13
government 145
gradeschool 17
green-technology 1
health 192
highschool 35
hiking 74
history 4
ingraham-high 1
insurance 1
italian 3
japanese 17
jazz 46
knitting 35
language 153
latin-american 30
lectures 101
lgbt 21
library 190
martial-arts 1
meetup 1107
museum 77
music 1223
native-american 20
nature 32
networking 54
nonprofit 4
nscc 1
opera 2
pacific-science-center 609
performing-arts 337
philosophy 2
photography 12
police 11
politics 31
real-estate 1
recreation 195
roosevelt-high 4
running 108
science 174
sculpture 1
seattle.gov 449
seattlepi 347
seattleu 12
seminars 23
skiing 2
spanish 19
spirituality 29
sports 151
storytelling 1
sustainability 5
swedish 73
synagogue 4
technology 98
teens 94
theater 166
tourism 11
town-hall-seattle 54
transportation 87
travel 9
trumba 296
university 390
upcoming 66
uw 366
vegan 4
visual-arts 89
volunteer 48
walk-bike-ride 3
walking 41
wallingford 159
wine 9
witches 13
women 26
writing 42
yoga 21
youth 105
Common Tags

Among the dynamics in play here, we can see the general and specific principle at work. For a general tag like university there are city-specific instantiations: bu and northeastern for Boston, uw and seattleu and nscc for Seattle. Likewise for the general tag highschool there are specific tags like boston-latin-hs and cathedral-hs for Boston, garfield-high and ingraham-high for Seattle.

These city-specific tags are top-down in the sense that I, as curator of the hub, have assigned them and made them part of the hub’s core tag vocabulary. But they are also bottom-up in the sense that they represent discoverable sources that are providing enough event flow to warrant such treatment.

These core hub vocabularies are fluid. As I move from hub to hub I’ve been keeping an eye on the common core and refactoring all the hub vocabularies as I go along. I also use these evolving hub vocabularies as templates against which to match vocabularies from other sources.

Mechanism: Tag matching

Some of the source services, notably Eventful and EventBrite, include per-event tags. When one of these tags matches a tag in the (evolving) core vocabulary for that hub, the elmcity service adds that tag to the event’s list of tags which it inherited from its feed.

There are also tables for each foreign service that map tags used there to tags in the hub’s core vocabulary. So, for example, the Eventful tag movies_film and the EventBrite tag movies both map to the core tag film.

As we saw in Portable tags, some iCalendar feeds use the CATEGORIES property of the iCalendar format to express per-event tags. Managing these tags is trickier because, well, they’re unmanaged. Until recently I was suppressing them. Now I’m experimentally allowing them to appear, but segregating them from the core vocabulary. If you check the tags for Boston or Seattle or another city you’ll see that the list divides into two sections. The first presents managed tags: the core vocabulary. The second presents unmanaged tags from iCalendar feeds, enclosed in squiggly brackets to differentiate them from the core vocabulary.

Here’s the current set of unmanaged tags for Boston and Seattle:

{academics} 10
{adams street} 4
{air pollution control
commission hearings}
{alumni relations} 6
{athletics} 6
{bikes} 1
{blc} 3
{boston home center} 2
{boston main streets} 1
{boston public library} 3
{brighton} 2
{central library} 84
{charlestown} 2
{city clerk} 15
{city council} 8
{college of arts &
{college of business
{college of computer
& information science}
{college of engineering} 13
{commercial} 1
{connolly} 12
{dnd} 1
{dudley literacy center} 11
{dudley} 19
{east boston} 9
{egleston square} 3
{elderly commission} 1
{election} 1
{faneuil} 2
{fields corner} 11
{group exercise} 4
{grove hall} 11
{honan- allston} 11
{hyde park} 10
{jamaica plain} 7
{licensing} 1
{lower mills} 5
{massart events} 1
{mattapan} 15
{north end} 11
{ongoing} 33
{orient heights} 5
{other} 90
{parker hill} 10
{performing/visual arts} 60
{president} 1
{public event} 139
{public health
{roslindale} 8
{social} 6
{south boston} 18
{south end} 6
{student affairs} 2
{student development} 9
{uphams corner} 2
{washington village} 4
{west end} 13
{west roxbury} 4
{animal shelter} 23
{athletics} 6
{boards &
{bothell} 29
{built environments} 3
{career management} 1
{city council} 88
{community centers} 29
{community outreach} 10
{community technology} 18
{concerts} 17
{continuing education} 20
{diversity} 2
{eastside } 58
{emergency} 12
{engineering} 13
{environmental learning} 3
{exhibits} 97
{farther afield} 10
{forums} 8
{global health} 1
{health sciences} 18
{hearing examiner} 12
{hr-benefits} 1
{jackson school of
international studies}
{libraries} 1
{meetings} 5
{north sound} 19
{office of the mayor} 2
{other} 1
{panel discussions} 2
{parks} 2
{performing/visual arts} 29
{psychology} 4
{ptsa} 6
{public outreach and
{public} 23
{readings} 1
{research} 1
{sales} 1
{school of art} 92
{school of business} 22
{schoolof art} 1
{seattle area} 188
{seattle fire department} 2
{seattle youth
{south sound} 21
{special events} 16
{sports/spirit} 1
{student activities} 6
{tacoma} 3
{technical communication} 8
{the center for wooden
boats – south lake union}
{tours} 27
{training} 13
{urbanization} 1
{vst} 1
{walk bike ride} 3
{workshops} 6

When one of these tags matches a tag in a hub’s core vocabulary I promote it — that is, I treat it as part of the managed core and it no longer shows up in squigglies. That’s a top-down approach. But there’s a complementary bottom-up approach. As I scan the unmanaged tags, both within and across hubs, it can become clear that an unmanaged tag belongs in the managed core. To accomplish that I simply use the unmanaged tag somewhere in the managed core. From then on, occurrences of the unmanaged tag are promoted into the core.

A logical next step is to enable curators to edit per-hub maps so that, for example, Seattle’s {central library} and Boston’s {libraries} will be promoted to simply library. I haven’t built this mapping feature yet but it’s on the todo list.

I’m still exploring the interplay between the top-down and bottom-up approaches. But it definitely feels like the right way to handle common vocabularies augmented by different (and regionally-varying) vocabularies.

(This series: elmcity tagging principles.)

Tagging mechanisms and strategies part 2: Portable tags

Last month I was looking over the shoulder of my auto mechanic, Jonah, when he was retrieving my service record on his computer. I watched him search for udell and find a file called something like 2011-11-04_udell.odf. (He uses an Open Office spreadsheet to keep track of things.) The first thing Jonah did, upon opening the file, was rename it to 2012-01-14_udell.odf. My thought was: I wish we could teach more people how (and why) to do that.

Jonah’s strategy tags each .ODF file with two items of information: a customer name, and a date. His convention is to keep the date current, so that current projects float to the top in date-ordered folder views. For many people the names of files in a folder are just one unorganized namespace. For Jonah they represent two parallel namespaces — or, as I encourage people to think of it, two sets of tags.

One of the benefits of this approach is portability. He could, if needed, transfer those files to another computer, perhaps even one running another operating system, without losing his ability to organize and retrieve records by customer name and date.

Principle: Create and use portable tags

For calendars, the CATEGORIES property of the iCalendar format is the most obvious way to tag events. Unfortunately it isn’t portable. Some content management systems enable users to tag events using the CATEGORIES property. And some calendar applications, like Outlook, also do. But other calendar apps, like Google Calendar and Hotmail Calendar, don’t. If you’re using one of these to publish a calendar, you can’t tag an event as a concert. And if you’re viewing a calendar that has events tagged that way, you won’t see or be able to make use of the concert tag.

There’s a simple and portable solution. iCalendar’s SUMMARY property, which is the title of an event, is universally readable and writable. So if your event stream naturally divides into concerts and lectures, it’s really helpful to identify events accordingly in their titles:

Concert: Joey Pratt Album Release Party with Noah Lefebvre

Lecture: Technology Future Shock: Society, Policy and Innovation in the Digital World

An even better strategy is to provide two separate feeds, one for concerts and the other for lectures. But that’s for a future installment. The key point here is that you can add value to any namespace — a set of files in a folder, a set of events on a calendar — by using tags to qualify filenames or titles.

Mechanism: Use iCalendar filters to extract tag-based feeds

The elmcity service provides a growing set of filters that can extract subsets of iCalendar feeds based on tags found in the SUMMARY (title) or DESCRIPTION (or URL) properties of events.

In the ideal scenario, providers of feeds would use tags as prefixes to the SUMMARY property. In the real world that doesn’t happen, at least not yet. But the elmcity filter is still useful because it’s natural to include keywords in titles and descriptions. Consider, for example, the calendar for Vinology, a wine bar and restaurant in Ann Arbor. Its calendar mixes two different kinds of events. Some are about food and drink (“small plate special”, “happy hour”). Others are about the jazz acts often appearing at Vinology. By filtering on jazz in the SUMMARY and/or DESCRIPTION of Vinology’s Google calendar, the elmcity service is able to extract just the jazz events and add them to Ann Arbor’s music and jazz calendars.

Currently there’s no incentive for Vinology (or anyone else) to adopt this strategy in a more intentional way. That’s because Ann Arbor’s elmcity syndication hub isn’t aligned with attention hubs like AnnArbor.com and ArborWeb.com. If Vinology knew that events tagged with music and/or jazz would show up on those sites in those categories, there would be a strong reason to do it.

(This series: elmcity tagging principles.)

PS: The next Vinology event in the music view of Ann Arbor’s elmcity hub, by the way, is the Doug Horn Trio, this Thursday at 9PM. That event isn’t on the AnnArbor.com calendar or the ArborWeb calendar. To put it there, Vinology would have to take data that it has already entered here and reenter it here and here. I think those other calendars should syndicate the data straight from Vinology (and everyone else).

PPS: See also Harry Tuttle’s busy month and The art of organizing search results.

Tagging mechanisms and strategies part 1: General and specific

Back in May I asked: Can elmcity and Delicious continue their partnership? The answer turned out to be no. That’s partly because the new Delicious broke some capabilities I was relying on. But it’s mainly because tagging is so fundamental to the elmcity service that I needed to be able to control, explore, and evolve it.

It continues to evolve, but now’s a good time to review — from the perspective of elmcity curators and contributors — how the principles and mechanisms for tagging calendar feeds (and individual events) illustrate (and extend) some ideas I originally developed during a long infatuation with Delicious. I have a lot to say on this subject, so my plan is to say it in a series of installments of which this is the first.

Principle: Describe things in both general and specific terms

For university calendars, I advise curators to use both a general tag, university, and a specific one. In the case of Seattle some specific tags are uw for the University of Washington, seattleu for Seattle University, and nscc for North Seattle Community College. That makes these views available:

All university-related events, a view that’s currently based on this set of feeds:

University of Washington 376
GoHuskies: Women’s Basketball 22
GoHuskies: Volleyball 3
North Seattle Community College 20
GoHuskies: Basketball 23
Seattle University Redhawks: Basketball 23
Elisabeth Miller Library 15
Seattle University Redhawks: Women’s Basketball 23
Seattle University Redhawks: Volleyball 2
North Seattle Community College (eventful.com) 3
Graduate Student Council at Seattle University – University (facebook.com) 1
Seattle University Redhawks: Swimming 8
Seattle University Redhawks: Women’s Swimming 9
UW Medicine- South Lake Union Campus (eventful.com) 1

Just UW events, based on these feeds:

University of Washington 376
GoHuskies: Women’s Basketball 22
GoHuskies: Volleyball 3
GoHuskies: Basketball 23
Elisabeth Miller Library 15
UW Medicine- South Lake Union Campus (eventful.com) 1

Just NSCC events, based on these feeds:

North Seattle Community College 19
North Seattle Community College (eventful.com) 3

Just Seattle U events, based on these feeds:

Seattle University Redhawks: Basketball 23
Seattle University Redhawks: Women’s Basketball 23
Seattle University Redhawks: Volleyball 2
Graduate Student Council at Seattle University – University (facebook.com) 1
Seattle University Redhawks: Swimming 8
Seattle University Redhawks: Women’s Swimming 9

Mechanism: Multi-tag query

The views above are all based on single-tag queries:





Here are some examples of multi-tag queries:

view=university,sports (all university sports)

view=seattleu,sports (just Seattle U sports)

view=seattleu,swimming (just Seattle U swimming)

view=university,basketball (all university basketball events)

The last two examples again illustrate the general/specific idea. For sporting events I recommend using the general tag sports and specific tags like swimming and basketball.

Back in 2006, in Del.icio.us is a database, I wrote:

Although it’s intuitively obvious to me, I suspect that most people don’t yet appreciate how easily, and powerfully, tagging systems can work as databases for personal (yet shareable) information management.

Del.icio.us isn’t simply backed by a database, it can function as a database.

I think most people still don’t appreciate that possibility. In the elmcity context I’m hoping to show how it applies not only to personal but also to collective information management.

(This series: elmcity tagging principles.)