Words in Boxes

Nouns, verbs, and occasionally adjectives.

Wednesday, December 08, 2010

NReadability

I just released Wordcycler 1.4. The only change is that Wordcycler will now fetch every page of multi-page articles, which has been a long-time user request.

If you're a frequent Instapaper user, you might ask why it has taken so long when Instapaper itself has been doing this for a while. The answer is that Wordcycler, since version 1.2, doesn't pull the article text from Instapaper.com; instead it pulls each articles from its original site and cleans it locally. I do that to avoid slamming Instapaper with hundreds of requests at once (you'd be surprised how many people maintain 400-item reading lists).

The technology behind Wordcycler's page cleaning is NReadability, an open-source C# port of the Javascript Readability bookmarklet. It strips away all the non-article content of a web page, and cleans up the formatting - just like Instapaper's text view. 

NReadability is maintained by Marek Stój, and powers his Instafetch app for Android and Windows Phone 7. I was extremely lucky to find NReadability - he posted it a mere days before Instapaper stepped up its rate limiting, which broke any Wordcycler version prior to 1.2. I plugged it in to my code was able to release a repaired version 1.2 almost immediately.

The one thing NReadability didn't do was fetch multipage articles, so porting that from readability.js was my side project for a few weeks in November. About a week ago, Marek accepted my patch and released version 1.3.1.0. I’m happy I was able to contribute back to this great project.

I'm James Sulak, a software developer in Houston, Texas.

You can also find me on Twitter, or if you're curious, on my old-fashioned home page. If you want to contact me directly, you can e-mail comments@wordsinboxes.com.