Wednesday, November 25, 2009

Automated Data Collection Strategies on Groups

I am a researcher in [a country] working on a Facebook-based, agenda-setting study. Ideally, I would like to perform an analysis of participation in a specific, public FB group but I have run into a snag. I am trying to capture all of the content on the FB group's wall without resorting to tedious copy and paste methods. Do you know of anyone who has successfully worked with this type of data set or how one would go about automating the copying process?


As far as I know it’s not possible to use webcrawlers or automated data collection programs with Facebook – they have mechanisms built into the site to block or ban users who exhibit inhuman tendencies (like being able to follow a hundred links at a time). Offhand I can think of two workarounds, though. I believe you can set Facebook Page (as in the Facebook Pages platform application, a front/service used by companies and other stakeholders - http://www.facebook.com/advertising/?pages) to push out to an RSS feed. I think that’ll only capture wall content put up by the page operators, however. The other way you might go about it could be to write a client-side script with Greasemonkey and Firefox or something similar. I think ideally you’d have to visit the page to download it and then perform some kind of programmatic operation on it to sort or refine the data. It might be possible to write a script that could be used in conjunction with a user account that’s always logged on – it would refresh the page say every hour, download it, parse through the code to record the posts, stopping if it encounters an identical one, and appending the text into a file. Lots of work but it might be possible to do without flagging Facebook’s bot-catching mechanisms.

Honestly I haven’t done anything like this so I’d suggest you try contacting Eric Gilbert (at UIUC) or Fred Stutzman (UNC Chapel Hill) and asking them, they’re quite a bit more familiar with the security and programming side of Facebook research.

No comments:

Post a Comment