Filtering the Twitter Streaming API

Jun 30, 2010

A couple of months ago I gave a brief introduction on how I've been parsing the Twitter Streaming API in my ruby applications. Part of the new featureset of the streaming API is the ability to filter the stream to only include tweets that include your preferred users, a set of keywords, from within a specific geographic location, or a combination of the above. There are limits on how many filter rules you can provide, and they vary depending on your access level so check the documentation for more details

Extending the Twitter class created in the last post should be fairly straight forward. First there is a new URL:

url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")

And we have to POST to this address instead of requesting with GET:

Yajl::HttpStream.post(url, :symbolize_keys => true)

We also need to pass through the predicates we want to filter on. I've opted for building up a list of the settings, only including them if it's been supplied. I've also given the options different names to the Twitter specified ones to try and prevent me confusing them in the future (does follow mean "follow this user" or "follow these keywords"? Avoid the confusion by calling them users and keywords instead):

params = []

params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]

params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]

params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]

You'll notice above that I'm actually splatting the value of each setting into a Hash and then calling join on that. The reason is so I can pass through just a single value (:users => 12) or a list of values (:users => [12,13]) and they'll both work the same way.

All that is left is to wrap it all up in a new method, and add it to our class:

require 'uri'

require 'yajl/http_stream'



class Twitter

  MAX_ALLOWED_ERRORS = 1200

  

  def self.filter_stream(username, password, filters = {}, &block)

    url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")

    params = []

    params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]

    params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]

    params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]

    consecutive_errors = 0

    while consecutive_errors < max_allowed_errors  do

      begin

        Yajl::HttpStream.post(url, params.join("&"), :symbolize_keys => true) do |status|

          consecutive_errors = 0

          yield(status)

        end

      rescue Yajl::HttpStream::InvalidContentType

        consecutive_errors += 1

      end

      sleep(0.25*consecutive_errors)

    end

  end

end

Taking it further

As previously mentioned, I'll extend on this series to cover:

Hi, I'm Glenn! 👋 I'm currently Director of Product @ HashiCorp, and we're hiring! If you'd like to come and work with me and help make Terraform Cloud even more amazing we have multiple positions opening in Product ManagementDesign, and Engineering & Engineering Management across a range of levels (i.e., junior through to senior). Please send in an application ASAP so we can get in touch.