In a lengthy blog post, Twitter’s engineering group describes the way it handles search queries in real time. It’s fascinating stuff, even if I can’t say I fully understand sentences like this:
The Storm topology attaches a spout to this Kafka queue, and the spout emits a tuple containing the query and other metadata (e.g., the time the query was issued and its location) to a bolt for processing.
What I found most interesting is the essential role of humans in the process:
We’ve built a real-time human computation engine to help us identify search queries as soon as they’re trending, send these queries to real humans to be judged, and then incorporate the human annotations into our back-end models.
Twitter uses workers supplied by Amazon’s Mechanical Turk to perform the evaluations, and it goes into great depth to explain how they vet the queries and work together to deliver responses. This aspect of real-time is compelling because while machines are certainly involved, the human brain plays a vital role. Basically, what Twitter is saying is that real time is not solely the domain of computing, and that humans, if given the right tools and working environment, can do fast, high-quality work that makes real-time even better than if machines did it alone.