Kafka, Spouts and Tuples: Twitter Explains The Role of Humans In Its Real-Time Search

In a lengthy blog post, Twitter’s engineering group describes the way it handles search queries in real time. It’s fascinating stuff, even if I can’t say I fully understand sentences like this:

The Storm topology attaches a spout to this Kafka queue, and the spout emits a tuple containing the query and other metadata (e.g., the time the query was issued and its location) to a bolt for processing.

What I found most interesting is the essential role of humans in the process:

We’ve built a real-time human computation engine to help us identify search queries as soon as they’re trending, send these queries to real humans to be judged, and then incorporate the human annotations into our back-end models.

Twitter uses workers supplied by Amazon’s Mechanical Turk to perform the evaluations, and it goes into great depth to explain how they vet the queries and work together to deliver responses. This aspect of real-time is compelling because while machines are certainly involved, the human brain plays a vital role. Basically, what Twitter is saying is that real time is not solely the domain of computing, and that humans, if given the right tools and working environment, can do fast, high-quality work that makes real-time even better than if machines did it alone.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s