Abstract - Natural Language Processing ||
Cross-stream event detection

Miles Osborne

University of Edinburgh

 

Abstract
Social Media (especially Twitter) is widely seen as a source of real-time breaking news. For example, when Osama Bin Laden was killed by US forces the news was first made public on Twitter. Rapidly finding all breaking news has clear economic and humanitarian benefits. Finding all such breaking news presents hard computational challenges. We need to detect news-related novelty in massive streams (upwards of two thousand posts per second) as quickly as possible. Efficiency is not the only consideration however and we also need to confront the enormous quantity of irrelevant posts. In this talk I will outline how we tackle the first problem using Locality Sensitive Hashing, taking constant time per post. In tandem I will mention how we use Storm to parallelise this computation, yielding a system capable of processing 2k tweets per second. The second problem is tackled by intersecting the Twitter stream with Wikipedia page requests, filtering-out spurious first stories. Taken together, this results in processing more than 250 million items per day. Finally I will consider the question of whether Twitter really does lead Newswire for breaking news. Joint work with Sasa Petrovic (Edinburgh), Craig MacDonald (Glasgow), Iadh Ounis (Glasgow) and Richard McCreadie (Glasgow)

Bio:
Miles Osborne is a Reader in Informatics at Edinburgh, with research interests in Machine Translation, Social Media and large scale processing of natural language. He received his PhD from the University of York in 1994 and had travelled the land, carrying-out Post Docs at Cambridge and Groningen prior to being in Edinburgh. He spent a sabbatical at Google in 2006 working within their Machine Translation group and for 2013 -- 2014 is spending a sabbatical at the Johns Hopkins.