Filtering the Twitter Streaming API

Jun 30, 2010

A couple of months ago I gave a brief introduction on how I've been parsing the Twitter Streaming API in my ruby applications. Part of the new featureset of the streaming API is the ability to filter the stream to only include tweets that include your preferred users, a set of keywords, from within a specific geographic location, or a combination of the above. There are limits on how many filter rules you can provide, and they vary depending on your access level so check the documentation for more details

Extending the Twitter class created in the last post should be fairly straight forward. First there is a new URL:

url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")

And we have to POST to this address instead of requesting with GET:

Yajl::HttpStream.post(url, :symbolize_keys => true)

We also need to pass through the predicates we want to filter on. I've opted for building up a list of the settings, only including them if it's been supplied. I've also given the options different names to the Twitter specified ones to try and prevent me confusing them in the future (does follow mean "follow this user" or "follow these keywords"? Avoid the confusion by calling them users and keywords instead):

params = []

params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]

params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]

params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]

You'll notice above that I'm actually splatting the value of each setting into a Hash and then calling join on that. The reason is so I can pass through just a single value (:users => 12) or a list of values (:users => [12,13]) and they'll both work the same way.

All that is left is to wrap it all up in a new method, and add it to our class:

require 'uri'

require 'yajl/http_stream'



class Twitter

  MAX_ALLOWED_ERRORS = 1200

  

  def self.filter_stream(username, password, filters = {}, &block)

    url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")

    params = []

    params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]

    params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]

    params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]

    consecutive_errors = 0

    while consecutive_errors < max_allowed_errors  do

      begin

        Yajl::HttpStream.post(url, params.join("&"), :symbolize_keys => true) do |status|

          consecutive_errors = 0

          yield(status)

        end

      rescue Yajl::HttpStream::InvalidContentType

        consecutive_errors += 1

      end

      sleep(0.25*consecutive_errors)

    end

  end

end

Taking it further

As previously mentioned, I'll extend on this series to cover:

Hi, I'm Glenn! 👋 I've spent most of my career working with or at startups. I'm currently the Director of Product @ Ockam where I'm helping developers build applications and systems that are secure-by-design. It's time we started securely connecting apps, not networks.

Previously I led the Terraform product team @ HashiCorp, where we launched Terraform Cloud and set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.