Glenn Gillen

Filtering the Twitter Streaming API

A couple of months ago I gave a brief introduction on how I've been parsing the Twitter Streaming API in my ruby applications. Part of the new featureset of the streaming API is the ability to filter the stream to only include tweets that include your preferred users, a set of keywords, from within a specific geographic location, or a combination of the above. There are limits on how many filter rules you can provide, and they vary depending on your access level so check the documentation for more details

Extending the Twitter class created in the last post should be fairly straight forward. First there is a new URL:

1
url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")

And we have to POST to this address instead of requesting with GET:

1
Yajl::HttpStream.post(url, :symbolize_keys => true)

We also need to pass through the predicates we want to filter on. I've opted for building up a list of the settings, only including them if it's been supplied. I've also given the options different names to the Twitter specified ones to try and prevent me confusing them in the future (does follow mean "follow this user" or "follow these keywords"? Avoid the confusion by calling them users and keywords instead):

1
2
3
4
params = []
params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]
params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]
params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]

You'll notice above that I'm actually splatting the value of each setting into a Hash and then calling join on that. The reason is so I can pass through just a single value (:users => 12) or a list of values (:users => [12,13]) and they'll both work the same way.

All that is left is to wrap it all up in a new method, and add it to our class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
require 'uri'
require 'yajl/http_stream'

class Twitter
  MAX_ALLOWED_ERRORS = 1200
  
  def self.filter_stream(username, password, filters = {}, &block)
    url = URI.parse("http://#{username}:#{password}@stream.twitter.com/1/statuses/filter.json")
    params = []
    params << "follow=#{[*filters[:users]].join(",")}" if filters[:users]
    params << "track=#{[*filters[:keywords]].join(",")}" if filters[:keywords]
    params << "locations=#{[*filters[:locations]].join(",")}" if filters[:locations]
    consecutive_errors = 0
    while consecutive_errors &lt; max_allowed_errors  do
      begin
        Yajl::HttpStream.post(url, params.join("&"), :symbolize_keys => true) do |status|
          consecutive_errors = 0
          yield(status)
        end
      rescue Yajl::HttpStream::InvalidContentType
        consecutive_errors += 1
      end
      sleep(0.25*consecutive_errors)
    end
  end
end

Taking it further

As previously mentioned, I'll extend on this series to cover:

Glenn Gillen

I'm an advisor to, and investor in, early-stage tech startups. Beyond that I'm an incredibly fortunate husband and father. Working on a developer-facing tool or service? Thinking about starting one? Email me and let me know or come to one of our days to help make it a reality.