Here is our take on news, art, ideas, technology, the web, and things we like - join the discussion and don't forget to subscribe to the RSS feed.

twitterApi1.1

   Milan Golubovic

Taming the Twitter API v1.1

Some time ago, after Twitter introduced the new version of its API, things has been changed around significantly. From now on even for some “simple” actions we can not use our favorite REST API anymore, but must stream data in real time, authenticate with OAuth and do similar scary things that any common PHP developer is trying to avoid.

And why Twitter did all this? Well, it’s a lot less data transfer and resources used on their side if they just notify you when a new tweet has been made than to send you i.e. a list of all tweets for every page visited on your website. So now all the work is on our side, poor web developers. To watch stream, store incoming tweets and provide some way to get information about them when needed.¬†And that’s exactly what we are going to do here. But hold on – this doesn’t have to be hard as it may look like at first sight. So, we are going to:

  • Open stream to Twitter by using the new API v1.1 (with new authentication method)
  • Then, when we receive info about a new tweet we are going to store it and some info about it in the database
  • And we are going to provide some simple interface for getting information about stored feeds, but also an interface to administrate terms we are receiving trough our streaming connection

So, let’s go!

First, we won’t make all the code from scratch. Why should we, since there are a lot of useful PHP classes available. This time we will use one created by Mark Pultz. So, please visit his page and pickup class code here:

mikepultz.com/2013/06/mining-twitter-api-v1-1-streams-from-php-with-oauth/

Bellow class code there is a small, 4-row snippet of how to use this class. We will use it that way, but with a little change:
The author planned that we put our code for handling incoming tweets inside his class in process_tweet() function (method) so he made that function private. I thought that it will be better to keep that file as clean as possible (in case i.e. class gets updated), so I just changed this function signature – I made it inheritable. This way we will have our code in our own file, extend original class and override process_tweet() with our code.

So, we will have one script (let’s call it i.e. “get_tweets.php”) that will include and extend ctwitter_stream class, basically something like this:

<?php

 require 'ctwitter_stream.php';
 class tweet extends ctwitter_stream{
   function process_tweet(array $_data)
   {
  . . . Do our stuff here
   }
 }

Our class called simple “tweet” extends “ctwitter_stream” and overrides it’s function. So, we will use our new class instead of original one, something like:

$t = new tweet();
$t->login('consumer_key', 'consumer secret', 'access token', 'access secret');
$t->start($terms);

We are creating object of our new class, passing needed parameter (of course you have to enter you parameters instead of dummy ones) and calling start function with array of terms (hash tags) we want to follow. We are not passing just static array of terms, but instead we will make some nice small interface for administrating those terms. That way you can have i.e. one script running and collecting terms for multiple websites.

So, we will track our terms and also we must track tweets too. That means we need a database with two tables (we’ll use UTF8-general encoding format).

  • The first one we will call “term” and it will have only two fields:
    • “term_text” which can be i.e. type varchar(255) and which will hold terms (hash tags) and second
    • “term_count” which will count term appearing and will be used for simple request – how many times some tag appeared totally.
  • The second one we’ll call “tweet” and we will use it to store tweets there. So, we will track terms (hash tags), time when tweet has been made and tweet it self, in JSON format). That means we will end up with following fields here:
    • terms (text type of field, where we store terms)
    • date (date-time type of field)
    • json (long text type of field for storing tweets)

Ok, we got our database now (you can use phpMyAdmin or similar tool to create it), and we have to connect to it. So we will need code like this:

$db_username = "db_username";
$db_password = "db_password";
$db_host = "localhost";
$db_database = "twitter";

mysql_connect($db_host,$db_username,$db_password);
mysql_query('SET NAMES utf8');
mysql_query('SET CHARACTER SET utf8');
mysql_select_db($db_database);

You can keep database variables in separate file if you like to and just include that file before db connecting code:

require_once ("db_data.php");

And after connecting to database we’ll read array of terms that we want to track with simple mySql query:

$terms = array();
$result=mysql_query("SELECT term_text FROM term"); mysql_err(__FILE__,__LINE__);
if ($result && mysql_num_rows($result)) {
  while($l=mysql_fetch_array($result, MYSQL_ASSOC)){
    $terms[] = $l['term_text'];
  }
}

So scenario is:
– we include and override original class
– we connect to database to collect list of terms we are tracking (stored in database)
– we create object of our “tweet” class and call its start function.

BTW mysql_err() is my little function that I’m using instead of standard mysql_error() PHP function – it gives me some extra info about MySql errors that occur. You’ll find it in db_data.php and you are free to use it if you like.

This call will connect to Twitter and keep connection open. Every time a tweet containing some of the terms we’ve passed, our version of process_tweet() will be called, so we need the code that will store received data:

function process_tweet(array $_data)
{
  global $terms;

  $tracked_terms = array();
  foreach ($terms as $term){
    if (stripos($_data['text'], $term) !== false){
      $tracked_terms[] = $term;
      // let's increase term count
      mysql_query("UPDATE term SET term_count = term_count+1 WHERE term_text='$term'");
      mysql_err(__FILE__,__LINE__);
    }
  }
  // Saving tweet
  if (count($tracked_terms)){
    $tracked_terms = addslashes(implode(',',$tracked_terms));
    $json = addslashes(json_encode($_data));
    mysql_query("INSERT INTO tweet SET terms='$tracked_terms', date=now(), json='$json'");
    mysql_err(__FILE__,__LINE__);
  }
  return true;
}

So, we are using global $terms array to check what of terms has triggered this call and we are updating counters for those terms.
After that we are saving the tweet: we are creating $tracked_terms as list of comma separated words and since we are already receiving json as function parameter we are just preparing it to be stored in the database. If you are interested in what exact parameters our function is receiving for testing just replace the whole function code with:

print_r($_data);

and the function will print out all data passed to it.
O
k. so our first script is done. When started it will connect to Twitter, keep the connection and receive and store to the database all incoming tweets.

Now we need another script that will handle our websites calls. That script will have an interface for handling tracked tags, but also for providing tweets and information about them (mostly count).

I called that second script “twitter.php” (how clever, isn’t it?).

At the beginning we need to provide some simple authentication mechanism so only we (our website) can call this script. I did this with code:

$password = 'some_password';
if (!isset($_REQUEST['password']) || $_REQUEST['password'] != $password)
  die('Authentication failed.');

So, when ever we call this script we need to pass “password” parameter:

twitter.php?password=some_password

or we’ll get just ‘Authentication failed.’ message and no data.

If you can come up with some other authentication solution feel free to use it of course.

After this we are good to connect to the database and query it for some info about our tweets. We’ll use the same code we used in our first script. We could actually place database connection line to that “db_data.php” script too and just include it. It’s up to you.

Now, we’ll check for other passed parameter(s) and depending on them do some actions and/or return some data. I.e. we check is some term tracked or not like this:

// Check is passed term tracked
if (isset($_REQUEST['is_tracked'])){
  $result=mysql_query("SELECT * FROM term WHERE term_text = '".
    addslashes($_REQUEST['is_tracked'])."' ");
  mysql_err(__FILE__,__LINE__);
  if ($result && mysql_num_rows($result)) {
    echo '1'; exit(0);
  }
  echo '0'; exit(0);
}

So, when you call our script like:

twitter.php?is_tracked=portugal&password=some_password

it will return (write out) ‘1’ if term (hash tag) is tracked (we have it in the database) or ‘0’ if it’s not. For working with terms we have the following parameters:

– “is_tracked” – check is some term tracked, i.e.
twitter.php?is_tracked=portugal&password=some_password
returns 1 if term exists or 0 if it doesn’t

– “add_term” – add term for tracking i.e.
twitter.php?add_term=portugal&password=some_password
returns 1 if term has been added at call or 0 if it doesn’t (already existed)

– “remove_term” – removes term, i.e.
twitter.php?remove_term=portugal&password=some_password
returns 1

The other part of our little interface is about tweets:

twitter.php?get_tweets_count_for=term&password=some_password

so we pass term and password and script return as a total count of tweets containg that term. This is a faster version, where we read “term” table.

The other version is with passed time range, so we have extra optional parameters “from” and “to”, i.e.:

twitter.php?get_tweets_count_for=term&password=some_password&from=2013-07-20&to=2013-07-30

or just one

twitter.php?get_tweets_count_for=term&password=some_password&to=2013-07-30

“from” and “to” parameters can be in mySQL date(“YYYY-MM-DD”) or date-time format (“YYYY-MM-DD hh:mm:ss”).

And finally, if we want to get actual tweets we’ll use a syntax like this:

twitter.php?get_tweets_for=game&password=some_password&to=2013-07-04&max_count=4

So, again we have term (hash tag), optional “from”, “to” and “max_count” parameters and of course password, as always. This call will return JSON array, so it has to be decoded with json_decode() PHP function before used.

And one more thing – after changing the list of followed terms we have to stop and restart our get_tweets.php script so the new list of terms will be passed, but that’s out of scope for this post. Ask for help from server admin if you can’t do that on your own.

So, that’s it. I hope you’ll find this post useful. And remember that you have all scripts mentioned here and database dump in this archive. The code is well documented, so read on comments please.

Happy tweeting!

Leave a Reply

Your email address will not be published. Required fields are marked *