Difference between revisions of "Continuous Semantic Crawling Events"

From Knoesis wiki
Jump to: navigation, search
(Evaluation)
(Hashtag Analysis)
Line 21: Line 21:
 
|}
 
|}
  
We analyzed the frequency of hashtags in the event-relevant tweets and discovered that the hashtag frequencies follow a power law <ref>Zipf, G.k. Human Behavior and the Priciple of Least Effort, 1949</ref> as shown in the below Figure.
+
 
[[File:Zipf hashtag.png|center|600px|Power Law of Hashtag Frequencies]]
+
[[File:Zipf hashtag.png|left|400px|Power Law of Hashtag Frequencies]]
 +
We analyzed the frequency of hashtags in the event-relevant tweets and discovered that the hashtag frequencies follow a power law <ref>Zipf, G.k. Human Behavior and the Priciple of Least Effort, 1949</ref> as shown in the Figure.
  
 
=Approach=
 
=Approach=
 
=Evaluation=
 
=Evaluation=
 
<references />
 
<references />

Revision as of 05:35, 25 December 2012

Abstract

The need to tap into the wisdom of the crowd" via social networks in real-time has already been demonstrated during critical events such as the Arab Spring and the recently concluded US Elections. As Twitter becomes a platform of choice for streaming event related information in real-time, we face several challenges in the related to filtering, realtime monitoring and tracking of the dynamic evolution of an event. We present a novel approach to continuously track an evolving event on Twitter by leveraging hashtags that are filtered using an evolving background knowledge (Wikipedia). Our approach (1) collects evolving hashtags by adapting tag co-occurrence information; (2) exploits the semantics of events for selecting hashtags by monitoring and leveraging the corresponding Wikipedia event pages; and (3) filters tweets using hashtags that are determined to be semantically relevant to the event. We evaluated our approach on two recent events: United States Presidential Elections 2012 and Hurricane Sandy. The results demonstrate that Wikipedia can be leveraged to determine, rank, and evolve small, high quality event-related hashtags in real-time to filter event-relevant tweets stream.

Hashtag Analysis

We performed a preliminary analysis of hashtags, prior to architect a solution to this problem. The analysis includes answering a couple of questions

  • How many hashtags contribute in retrieving the event-related tweets?
  • Can these hashtags be detected automatically?

In order to answer these questions, we utilized the dataset for two events from Twitris system. The two events are (1) Occupy Wall Street (OWS) (2) Colorado Shooting (CMS). The details of the dataset is provided below table.

Dataset for Analysis from Twitris
Event Tweets Hashtags (Distinct) Start Date End Date
CMS 122062 192512 (12350) 7/20/12 9/10/12
OWS 6077378 15963209 (191602) 9/29/11 9/20/12
Total 6199440 16155721


Power Law of Hashtag Frequencies

We analyzed the frequency of hashtags in the event-relevant tweets and discovered that the hashtag frequencies follow a power law <ref>Zipf, G.k. Human Behavior and the Priciple of Least Effort, 1949</ref> as shown in the Figure.

Approach

Evaluation

<references />