[Meh-ta] Aggregating aggregators

Antenna-Nano. Why?, Read on.

Let me describe my feed reading habits. Most of the entries I see originate from feed aggregators. I may not read every single entry I come across, but I will at least scan headlines and read summaries to get a sense of what others are interested in.

In terms of anime and related topics, Anime Nano or the AB Antenna are go-to sources, but if you’ve ever subscribed to the full feeds of either, then you know that the unread post count can and does accumulate rapidly.

I’m subscribed to both.

I think you can see how this might be a problem, but it’s actually not too bad most of the time. In fact, it’s roughly only half bad. There’s a lot of overlap between the two feeds since many sites have been admitted to both, and it’s also probably a major reason why many pick just one of the two, if they pick one at all.

It would be useful to me — and perhaps others — if I could take a union of sorts, with the resultant being a new set with the entries of both but devoid of duplicates. I thought that, if nothing else, it would be a rudimentary study in web services programming.

So I wrote Antenna-Nano, which periodically polls both aggregators, and maintains a list of all unique entries within a 10 hour window. There are actually two feeds that do the exact same thing, one being the main site and the other being a FeedBurner feed that acts as a kludge-tastic cron service whenever it refreshes.

As a result, expect updates every 30 minutes. I’d prefer every 10 minutes, but I don’t have much of a case griping over free cron for a hobby project. I’m pretty happy with it otherwise.

Those of you with more experience than me (it wouldn’t take much) are probably pointing to Yahoo Pipes right now while scratching your heads over my choice of Google App Engine. Actually, I do have a Yahoo Pipes version and it is indeed very easy to do what I want, but the output is ugly for the sole reason that it has sketchy Unicode support.

It may have been fine even a year ago when most used only the 7-bit subset of UTF-8, but these days, more people are using a couple non-ASCII characters to decorate their titles and post descriptions. But it’s not just exotic characters that Yahoo Pipes has trouble with: I’ve even seen something as mundane as the apostrophe encoded into its 3-byte equivalent.

Going with a pipe is more elegant, though, and I did try to make it work, but the result was me subjecting myself to a day’s worth of Unicode hell as I tried to build a custom filter to de-Yahoo the data. We can discuss the sordid details if you’re so inclined, but in the end I found no workarounds.