Quick and Dirty Twitter API in Python

Wednesday, Nov 19, 2014

QUICK DISCLAIMER: this is a quick and dirty solution to a problem, so may not represent best coding practice, and has absolutely no error checking or handling. Use with caution…

A recent project has needed me to scrape some data from Twitter. I considered using Tweepy, but as it was a project for the MSc in Computational Journalism, I thought it would be more interesting to write our own simple Twitter API wrapper in Python.

The code presented here will allow you to make any API request to Twitter that uses a GET request, so is really only useful for getting data from Twitter, not sending it to Twitter. It is also only for using with the REST API, not the streaming API, so if you’re looking for realtime monitoring, this is not the API wrapper you’re looking for. This API wrapper also uses a single user’s authentication (yours), so is not setup to allow other users to use Twitter through your application.

The first step is to get some access credentials from Twitter. Head over to https://apps.twitter.com/ and register a new application. Once the application is created, you’ll be able to access its details. Under ‘Keys and Access Tokens’ are four values we’re going to need for the API - the  Consumer Key and Consumer Secret, and the Access Token and Access Token Secret. Copy all four values into a new python file, and save it as ‘_credentials.py’. Once we have the credentials, we can write some code to make some API requests!

First, we define a Twitter API object that will carry out our API requests. We need to store the API url, and some details to allow us to throttle our requests to Twitter to fit inside their rate limiting.

class Twitter_API:

 def __init__(self):

   # URL for accessing API
   scheme = "https://"
   api_url = "api.twitter.com"
   version = "1.1"

   self.api_base = scheme + api_url + "/" + version

   #
   # seconds between queries to each endpoint
   # queries in this project limited to 180
   # per 15 minutes
   query_interval = float(15 * 60)/(175)

   #
   # rate limiting timer
   self.__monitor = {'wait':query_interval,
     'earliest':None,
     'timer':None}

We add a rate limiting method that will make our API sleep if we are requesting things from Twitter too fast:

#
# rate_controller puts the thread to sleep
# if we're hitting the API too fast
def __rate_controller(self, monitor_dict):

 #
 # join the timer thread
 if monitor_dict['timer'] is not None:
 monitor_dict['timer'].join()

 # sleep if necessary
 while time.time() < monitor_dict['earliest']:
   time.sleep(monitor_dict['earliest'] - time.time())

 # work out then the next API call can be made
 earliest = time.time() + monitor_dict['wait']
 timer = threading.Timer( earliest-time.time(), lambda: None )
 monitor_dict['earliest'] = earliest
 monitor_dict['timer'] = timer
 monitor_dict['timer'].start()

The Twitter API requires us to supply authentication headers in the request. One of these headers is a signature, created by encoding details of the request. We can write a function that will take in all the details of the request (method, url, parameters) and create the signature:

#
# make the signature for the API request
def get_signature(self, method, url, params):

 # escape special characters in all parameter keys
 encoded_params = {}
 for k, v in params.items():
   encoded_k = urllib.parse.quote_plus(str(k))
   encoded_v = urllib.parse.quote_plus(str(v))
   encoded_params[encoded_k] = encoded_v

 # sort the parameters alphabetically by key
 sorted_keys = sorted(encoded_params.keys())

 # create a string from the parameters
 signing_string = ""

 count = 0
 for key in sorted_keys:
   signing_string += key
   signing_string += "="
   signing_string += encoded_params[key]
   count += 1
   if count < len(sorted_keys):
     signing_string += "&"

 # construct the base string
 base_string = method.upper()
 base_string += "&"
 base_string += urllib.parse.quote_plus(url)
 base_string += "&"
 base_string += urllib.parse.quote_plus(signing_string)

 # construct the key
 signing_key = urllib.parse.quote_plus(client_secret) + "&" + urllib.parse.quote_plus(access_secret)

 # encrypt the base string with the key, and base64 encode the result
 hashed = hmac.new(signing_key.encode(), base_string.encode(), sha1)
 signature = base64.b64encode(hashed.digest())
 return signature.decode("utf-8")

Finally, we can write a method to actually make the API request:

def query_get(self, endpoint, aspect, get_params={}):

 #
 # rate limiting
 self.__rate_controller(self.__monitor)

 # ensure we're dealing with strings as parameters
 str_param_data = {}
 for k, v in get_params.items():
   str_param_data[str(k)] = str(v)

 # construct the query url
 url = self.api_base + "/" + endpoint + "/" + aspect + ".json"

 # add the header parameters for authorisation
 header_parameters = {
   "oauth_consumer_key": client_id,
   "oauth_nonce": uuid.uuid4(),
   "oauth_signature_method": "HMAC-SHA1",
   "oauth_timestamp": time.time(),
   "oauth_token": access_token,
   "oauth_version": 1.0
 }

 # collect all the parameters together for creating the signature
 signing_parameters = {}
 for k, v in header_parameters.items():
   signing_parameters[k] = v
 for k, v in str_param_data.items():
   signing_parameters[k] = v

 # create the signature and add it to the header parameters
 header_parameters["oauth_signature"] = self.get_signature("GET", url, signing_parameters)

 # add the OAuth headers
 header_string = "OAuth "
 count = 0
 for k, v in header_parameters.items():
   header_string += urllib.parse.quote_plus(str(k))
   header_string += "=\""
   header_string += urllib.parse.quote_plus(str(v))
   header_string += "\""
   count += 1
   if count < 7:
     header_string += ", "

 headers = {
   "Authorization": header_string
 }

 # create the full url including parameters
 url = url + "?" + urllib.parse.urlencode(str_param_data)
 request = urllib.request.Request(url, headers=headers)

 # make the API request
 try:
   response = urllib.request.urlopen(request)
   except urllib.error.HTTPError as e:
   print(e)
 raise e
   except urllib.error.URLError as e:
   print(e)
   raise e

 # read the response and return the json
 raw_data = response.read().decode("utf-8")
 return json.loads(raw_data)

Putting this all together, we have a simple Python class that acts as an API wrapper for GET requests to the Twitter REST API, including the signing and authentication of those requests. Using it is as simple as:

 ta = Twitter_API()

 # retrieve tweets for a user
 params = {
    "screen_name": "martinjc",
 }

 user_tweets = ta.query_get("statuses", "user_timeline", params)

As always, the full code is online on Github, in both my personal account and the account for the MSc Computational Journalism.