Words in Boxes

Nouns, verbs, and occasionally adjectives.

Monday, July 30, 2007

Visualizing Wikipedia

Graduate student Chris Harrison has created some really interesting internet visualizations, including Wikipedia "clusterballs:"

This visualization shows the structure of three levels of Wikipedia category pages and their interconnections. Centered in the graph is a parent node. Pages that are linked from this parent node are rendered inside the ball. Finally, pages that are linked to the latter (secondary) nodes are rendered on the outer ring. Links between category pages are illustrated by edges, which are color coded to represent their depth from the parent node. Nodes are clustered such that edge lengths are minimized. This forces highly connected groups of pages to clump together, essentially forming topical groups. The center acts as an anchor while the ring provides a fixed perimeter. This allows the secondary, super-categories to "float" above clusters.
I'm planning on framing a few prints for the apartment walls. I'm fortunate to have a girlfriend who has the same, slightly dorky sense of decorating as I do. This is also a good opportunity to point out the blog Data Mining: Text Mining, Visualization and Social Media, which frequently has links to interesting data visualizations and diagrams (including this one).

I'm James Sulak, a software developer in Houston, Texas.

You can also find me on Twitter, or if you're curious, on my old-fashioned home page. If you want to contact me directly, you can e-mail comments@wordsinboxes.com.