Continuous Semantic Crawling Events

From Knoesis wiki
Revision as of 04:51, 25 December 2012 by Pavan (Talk | contribs) (Hashtag Analysis)

Jump to: navigation, search

Abstract

The need to tap into the wisdom of the crowd" via social networks in real-time has already been demonstrated during critical events such as the Arab Spring and the recently concluded US Elections. As Twitter becomes a platform of choice for streaming event related information in real-time, we face several challenges in the related to filtering, realtime monitoring and tracking of the dynamic evolution of an event. We present a novel approach to continuously track an evolving event on Twitter by leveraging hashtags that are filtered using an evolving background knowledge (Wikipedia). Our approach (1) collects evolving hashtags by adapting tag co-occurrence information; (2) exploits the semantics of events for selecting hashtags by monitoring and leveraging the corresponding Wikipedia event pages; and (3) filters tweets using hashtags that are determined to be semantically relevant to the event. We evaluated our approach on two recent events: United States Presidential Elections 2012 and Hurricane Sandy. The results demonstrate that Wikipedia can be leveraged to determine, rank, and evolve small, high quality event-related hashtags in real-time to filter event-relevant tweets stream.

Hashtag Analysis

We performed a preliminary analysis of hashtags, prior to architect a solution to this problem. The analysis includes answering a couple of questions

  • How many hashtags contribute in retrieving the event-related tweets?
  • Can these hashtags be detected automatically?

In order to answer these questions, we utilized the dataset for two events from Twitris system. The two events are (1) Occupy Wall Street (OWS) (2) Colorado Shooting (CMS). The details of the dataset is provided below.

Dataset for Analysis from Twitris
Event Tweets Hashtags (Distinct) Start Date End Date
CMS 122062 192512 (12350) 7/20/12 9/10/12
OWS 6077378 15963209 (191602) 9/29/11 9/20/12
Total 6199440 16155721

Approach

Evaluation