Twarql Continuous Semantics

From Knoesis wiki
Jump to: navigation, search

Introduction

The recent years have seen a significant change in the dissemination of news and information. Observations of unfolding events are increasingly shared real-time through ubiquitously accessible microblogging platforms. However, the information being shared is growing exponentially. Twitter alone generates more than 100 Million microposts a day. This avalanche of data makes it difficult to seek out specific information, especially when done real-time. Event-specific information is often only temporarily interesting and gets stale quickly. To achieve the highest information gain it is important that select content finds its way to the user quickly. This kind of information tracking has proved its importance in the recent Egypt protests where twitter and other social networking sites were used as major platforms for protesters to organize gatherings and to stay updated with major changes in the event. This paper presents a semantic web approach to support dynamic event tracking on twitter.

In this work we offer a solution to the problem of event following based on the dynamic creation of semantic event models. The user will need to specify his area of interest only once, when an event model is automatically created. As the event unfolds, microposts are analyzed and, based on new developments, an updated model is created that subsequently filters microposts for the next iteration in the cycle. This work thus presents an early realization of Continuous Semantics.

Architecture

Background

The following two applications forms an integral part of our architecture

Twarql

  • Project page: Twarql
  • Short Description: Twarql encodes information from microblog posts as Linked Open Data and hence providing flexibility to filter microposts by leveraging background knowledge.

Doozer

  • Project page: Doozer
  • Short Description: Doozer is an application that aims at generating or extracting a domain model from Wikipedia or other similarly structured knowledge sources. It takes as input an incomplete description of a domain, such as a query or list of seed concepts. Doozer then expands on these seeds to get related concepts, which are then again evaluated regarding their indicativeness of the domain.

Continuous Semantics for Real-time Social Data

Event information enters the cycle as streaming microposts from Twitter. Twarql filters microposts matching certain user-defined constraints (e.g. a SPARQL query), creating a corpus of relevant microposts. Keyphrase extraction techniques are used to select prominent relevant terms, in order to keep focus on the unfolding event. The selected keyphrases are fed into Doozer for the automatic creation of an, a domain model that specifically describes the event of our focus. The model is then translated into a Twarql filter. The last step on the cycle is then to update the micropost filter in Twarql to reflect the model created by Doozer.

Continuous semantics architecture.png

People

Pavan Kapanipathi
Christopher Thomas
Pablo Mendes
Amit Sheth

References

  1. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia-a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, (2009).
  2. J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497{506. ACM, (2009).
  3. P. Mendes, A. Passant, and P. Kapanipathi. Twarql: tapping into the wisdom of the crowd. In Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), (2010).
  4. P. Mendes, A. Passant, P. Kapanipathi, and A. Sheth. Linked open social signals. In Web Intelligence and Intelligent Agent Technology, 2010. WI-IAT'10. IEEE/WIC/ACM International Conference on, 2010.
  5. A. Passant, P. Laublet, J. G. Breslin, and S. Decker. A URI is Worth a Thousand Tags: From Tagging to Linked Data with MOAT. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3):71{94, (2009).
  6. A. P. Sheth, C. J. Thomas, and P. Mehra. Continuous Semantics to Analyze Real Time Data. Internet Computing, IEEE, 14(6):84{89, (2010).
  7. C. J. Thomas, P. Mehra, R. Brooks, and A. P. Sheth. Growing Fields of Interest Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:496{502, (2008).