Arun Prasath, an engineer at Walmart Labs, describes the retailer’s 2013 social analytics agenda in a recent blog post. What he writes crystallizes the excitement—and terror—of confronting the idea of using social data in real time:
Social data mining comes with incredible challenges, which only makes it all the more exciting for our super smart engineers to come to work every day. Data volume is formidably huge. We are talking about petabytes here. Real-time social data processing requires sophisticated data stores and blazingly fast algorithms. The noise levels are exorbitant, the language used in social forums is heavily informal, unstructured and often ungrammatical, and filtering out that helpful insight out of the huge amount of noise is super hard. Just consider algorithmically parsing – “OMG!!! dis is sooo coool! i luv ma new fone. i cant believ ma luck 4 chosin this! #wellwhatdoyathink”. Popular text analytics and natural language processing techniques based on standard language models simply fail. We need altogether different techniques to filter out and focus on social data that is relevant to us, which in itself is a daunting task. The next step is to map this to meaningful retail products. All of these are difficult tasks.