Sunday, September 11, 2011

Twitter stream to MongoDB in Java and Clojure

I needed to grab a bunch of twitter messages for my upcoming data mining project. Given that the Twitter streaming API sends the data over in JSON format, I thought it would be a good time to try out MongoDB, which uses BSON as its database storage format. I put together this java program using the Java MongoDB Driver and Twitter4j. Then I thought I'd try the same thing in Clojure, this time using the http.async client, congomongo, and clojure-json. I'm still not quite sure how I would make it stop in the Clojure version after a specified number of tweets have been downloaded. I tried adding an if statement as part of the doseq, but I don't think I'm fully understanding the doseq thing right.