Tuesday, May 18, 2010

jr. webcrawling

Yesterday I was working on a project to retrieve 1st and 2nd degree Twitter followers for an unconference prior to a list being built. These were listed on individual WordPress pages under a single directory. I used a two step process to extract the Twitter handles.

1. I used an old Java tool called websphinx, which gave me the ability to crawl the directory of the site I was looking at, and concatenate each of the pages into one massive page.

2. I posted that page in the sandbox of my site and directed Dapper to it. From there, I was able to create a Dapp identifying the fields I wanted, group them together, and create a CSV document to put into Excel.

This was my first time playing with Dapper, and can definitely see a lot of great uses for it!

Labels:

Saturday, May 15, 2010

Twitter Network Visualization to Measure Events

I put together a brief write-up following on my Mapping Social Networks white paper, in which I used the Twitter API to monitor a network of unconference participants before and after the event. This kind of analysis can be a useful proxy for visualizing network connections and growth without using traditional questionnaires or surveys.





Download: the two page summary.

Copyright 2006 jumpSLIDE networks. All rights reserved.