[Meh-ta] Aggregating aggregators

Antenna-Nano. Why?, Read on.

Let me describe my feed reading habits. Most of the entries I see originate from feed aggregators. I may not read every single entry I come across, but I will at least scan headlines and read summaries to get a sense of what others are interested in.

In terms of anime and related topics, Anime Nano or the AB Antenna are go-to sources, but if you’ve ever subscribed to the full feeds of either, then you know that the unread post count can and does accumulate rapidly.

I’m subscribed to both.

I think you can see how this might be a problem, but it’s actually not too bad most of the time. In fact, it’s roughly only half bad. There’s a lot of overlap between the two feeds since many sites have been admitted to both, and it’s also probably a major reason why many pick just one of the two, if they pick one at all.

It would be useful to me — and perhaps others — if I could take a union of sorts, with the resultant being a new set with the entries of both but devoid of duplicates. I thought that, if nothing else, it would be a rudimentary study in web services programming.

So I wrote Antenna-Nano, which periodically polls both aggregators, and maintains a list of all unique entries within a 10 hour window. There are actually two feeds that do the exact same thing, one being the main site and the other being a FeedBurner feed that acts as a kludge-tastic cron service whenever it refreshes.

As a result, expect updates every 30 minutes. I’d prefer every 10 minutes, but I don’t have much of a case griping over free cron for a hobby project. I’m pretty happy with it otherwise.

Those of you with more experience than me (it wouldn’t take much) are probably pointing to Yahoo Pipes right now while scratching your heads over my choice of Google App Engine. Actually, I do have a Yahoo Pipes version and it is indeed very easy to do what I want, but the output is ugly for the sole reason that it has sketchy Unicode support.

It may have been fine even a year ago when most used only the 7-bit subset of UTF-8, but these days, more people are using a couple non-ASCII characters to decorate their titles and post descriptions. But it’s not just exotic characters that Yahoo Pipes has trouble with: I’ve even seen something as mundane as the apostrophe encoded into its 3-byte equivalent.

Going with a pipe is more elegant, though, and I did try to make it work, but the result was me subjecting myself to a day’s worth of Unicode hell as I tried to build a custom filter to de-Yahoo the data. We can discuss the sordid details if you’re so inclined, but in the end I found no workarounds.

8 Comments

  1. Posted May 31, 2008 at 9:28 am | Permalink

    AWESOME! Thanks!

  2. Posted May 31, 2008 at 10:00 am | Permalink

    Wow, that’s really cool! You rock :)

  3. Posted May 31, 2008 at 11:46 am | Permalink

    Very cool indeed! I’m so incredibly tempted to use it, but I think I can’t really afford reading *all* blog postings, but rather reduce myself to the blogs I have in my Google Reader.

  4. Posted May 31, 2008 at 1:08 pm | Permalink

    Neat idea, though I just grab individual feeds and place them in my feed reader. If I ever get enough time in my life (ie: college), then I’ll get back to this.

  5. Posted May 31, 2008 at 1:44 pm | Permalink

    Neat, I guess, but could you let me know what your user agent is so I can make sure you’re not abusing the rss? And maybe ask permission next time you want to do something like this :)

  6. Posted May 31, 2008 at 2:20 pm | Permalink

    I’m going to have to very politely ask you if you read the Antenna’s FAQ, located here: http://antenna.dev.animeblogger.net/faq/ The second question and answer is pertinent in this case, in part it says “Also it’s not allowed to pull the Antenna’s feed and make it part of your own aggregator.”

    I don’t wish to play the bad guy, but I’m going to have to ask you to stop until we’ve talked about this. This is nothing personal, and it’s quite clear in the FAQ (which has been in place since September) that this isn’t an allowed usage of the Antenna.

  7. introspect
    Posted May 31, 2008 at 2:39 pm | Permalink

    As requested, I’ve disabled the app for the time being, and sent apologies to the admins of ANO and ABA.

  8. Posted June 1, 2008 at 2:37 am | Permalink

    lol aggregator red-tape.

One Trackback

  1. By Heterochromia - Blogroll issues. Help me? on May 31, 2008 at 4:12 pm

    [...] onto my actual, simple problem: introspect has compiled a brilliant feed combining the two big aggregators. When I looked through them, I realized that the blogosphere has grown quite a lot yet again, and [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*