Difference between revisions of "Twitris"

From Knoesis wiki
Jump to: navigation, search
(A. Twitris v1: Spatio-Temporal-Thematic (STT) processing of Twitter and associated news, multimedia and Wikipedia content)
Line 52: Line 52:
 
were made on that date. We call this the spatio-temporal slice.
 
were made on that date. We call this the spatio-temporal slice.
  
[[File:Twitris_fig5.jpg|right|500px|thumb|]]
+
[[File:Twitris_fig5.jpg|right|300px|thumb|]]
  
 
[[File:Twitris_fig6.jpg|right|500px|thumb|]]
 
[[File:Twitris_fig6.jpg|right|500px|thumb|]]

Revision as of 17:41, 15 April 2013

Twitris, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties. Twitris 2.0 also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system's integration and analysis abilities.

Introduction

Well over a billion people have become 'citizens' of an Internet- or Web-enabled social community. Web 2.0 fostered the open environment and applications for tagging, blogging, wikis, and social networking sites that have made information consumption, production, and sharing so incredibly easy. With over 5 billion mobile connections, over a billion with data connections (smartphones) and with many more having ability to communicate using SMS, digital media can be shared with the rest of the humanity instantly. As a result, humanity is interconnected as never before. This interconnected network of people actively observe, report, collect, analyze, and disseminate information via text, audio, or video messages, increasingly through pervasively connected mobile devices, has led to what we term citizen sensing (Sheth, 2009-a) (Sheth, 2009-b). This phenomenon is different from the traditional centralized information dissemination and consumption environments where citizens primarily act as consumers of reported information from several authoritative sources.

Figure 1: Twitris- three primary dimensions of analysis

This citizen sensing is complemented by the growing ability to access, integrate, dissect, and analyze individual and collective thinking of humanity, giving us a capability that is recognized as collective intelligence. Citizen sensing involves humans in the loop, and with it all the complexities associated with and intelligence captured in human communication. As citizen sensing has gained momentum, it’s generating millions of observations, creating significant information overload. In many cases it becomes nearly impossible to make sense of the information around a topic of interest. Given this data deluge, analyzing the numerous social signals can be extremely challenging. In response to this growing citizen sensing data deluge, Twitris has been developed with the vision of performing semantics-empowered analysis of a broad variety of social media exchanges.

Twitris, named by combining Twitter with Tetris, a tile-matching puzzle game, has incorporated increasingly sophisticated analysis of social data and associated metadata, combining it with background knowledge, and more recently (albeit not discussed here) machine sensor or data captured from sensors and devices that make up Internet of Things(IoT). Twitris’ evolution can be characterized in three phases (and corresponding versions of the system). Figure 1 outlines the corresponding dimensions Twitris considers.

Twitris is a comprehensive platform for analyzing social content along multiple dimensions leading to in-depth insights into various aspects of an event or a situation. The central thesis behind this work is that citizen sensor observations are inherently multi-dimensional in nature and taking these dimensions into account while processing, aggregating, connecting and visualizing data will provide useful organization and consumption principles. Twitris evolved in three phases, characterized by the versions of the systems:

  • Twitris v1: Spatio-Temporal-Thematic (STT) processing of Twitter and associated news, multimedia and Wikipedia content (Sheth, 2009-b), (Nagarajan, 2009-a) (Jadhav, 2010)
  • Twitris v2: People-Content-Network Analysis (PCNA) (Purohit, 2011-a) with use of background knowledge and semantic metadata extraction and querying/exploration
  • Twitris v3: sentiment-emotion-intent (SEI) extraction (Chen, 2012), (Wang 2012), (Nagarajan, 2009-b) along with personalization (Kapanipathi, 2011-a) and emerging

continuous semantics (Sheth, 2010) capability involving semantic streaming social stream (i.e., real-time) processing using dynamically generated and updated domain models for semantics and context
The above versions, or phases, of Twitris development is not as granular as painted above, that is, the issues identified above are not explicitly segregated by the version of the Twitris which has been in continuous development with senior students graduating and new students picking up the work. Four talks including a tutorial cover many of the issues covered by Twitris (Sheth, 2009-a), (Nagarajan, 2010-a), (Nagarajan, 2011), (Sheth, 2011).

Key Points

Social media group at Kno.e.sis investigates the role and benefits of using semantic approach, especially by metadata extraction and enrichments and contextually applying relevant background knowledge, along with demonstrating examples on real-world data using system (Twitris) developed at Kno.e.sis.

  1. Event-specific analysis of citizen sensing and discuss opportunities and challenges in understanding temporal, spatial and thematic cues
  2. Facets of people-content-network analysis with focus on user-community engagement analysis
  3. Real time social media data analysis, and the concept of continuous semantics supported by dynamic model creation
  4. Sentiment and emotion identification from citizen sensing data
  5. Recent advance in developing semantic abstracts or semantic perception to convert massive amounts of raw observational data into nuggets of information and insights that can aid in human decision making

Historical Background

The idea for research and technology development leading to Twitris occurred on November 26, 2008. Terrorists struck Mumbai, India, and over the next three days, they proceeded to make mayhem in nine locations. Each of the nine sub-events of this overall event separated by time and location (space) had distinct thematic elements or topical content. The importance of Twitter, especially in terms of citizen sensing - the ability of a regular person to use his or her mobile device to share his or her personal observation, thoughts and belief- well before a traditional news media has a chance to do reporting and to shape opinions - was extensively discussed in the immediate aftermath of this momentous event. This event also gave us a clear case for the needs and benefits of analyzing social media content such as tweets and flickr posts, and related news stories along the three dimensions of spatial (location of observation) - where, temporal (time of observation) - when, and thematic (the event in question) -what (Battle, 2009), (Impact Lab, 2008), (Keralaravind, 2008).

Twitris Platform and Three Stages of Its Evolution

A. Twitris v1: Spatio-Temporal-Thematic (STT) processing of Twitter and associated news, multimedia and Wikipedia content

Twitris fig2.jpg

Twitris v1 (Jadhav, 2010) was designed with the following three major steps:

  1. Data collection: collect user posted tweets pertaining to an event from Twitter, associated news, multimedia, and Wikipedia content
  2. Data analysis: a) process obtained tweets to extract strong event descriptors considering spatial, temporal, and thematic event attributes b) process event related

news, multimedia, and Wikipedia content to get event context and gain a better understanding

  1. Visualization: present extracted summaries on Twitris v1 user interface
Twitris fig3.jpg

Twitris v1 performs a two-step processing to extract strong event descriptors from tweets. First, it creates the Spatio-Temporal clusters of the tweet corpus surrounding an event, since every event is different and we want to preserve the social perceptions that generated this data. TFIDF computation is performed to fetch the n-grams from this set. The second step involves the association of spatial, temporal, and thematic bias to these n-grams by means of enhancing the weights, while preserving the contextual relevance of these event descriptors to the event. Further details of the text-processing algorithm are available in (Nagarajan, 2009- a). Twitris v1 user interface (Figure 3, a and b) facilitates effective browsing of when (temporal/time), where (space/location), and what (thematic/context) slices of social perceptions behind an event.

Twitris fig4.jpg

The objective of the Twitris v1 user interface is to integrate the results of the data analysis (extracted descriptors and surrounding discussions) with emerging visualization paradigms to facilitate sensemaking. To start browsing, users are required to select an event. Once the user chooses a theme, the date is set to the earliest date of recorded observations for an event and the map is overlaid with markers indicating the spatial locations from which observations were made on that date. We call this the spatio-temporal slice.

Twitris fig5.jpg
Twitris fig6.jpg

Users can further explore activity in a particular space by clicking on the overlay marker. The event descriptors extracted from observations in this spatio- temporal setting are displayed as an event descriptor cloud. The spatio-temporal-thematic (STT) scores determine the size of the descriptor in the tag cloud. In order to get event context and better understanding of the event, we enhanced Twitris, by integrating event related news, multimedia (images and videos) and Wikipedia articles. We leveraged explicit semantic information from DBPedia to identify relevant news and Wikipedia articles. When a user clicks on a particular descriptor, we display tweets containing the event descriptors and the top current news items, as well as related Wikipedia articles.

B. Twitris v2: People Content Network analysis (PCNA) with use of background knowledge and semantic metadata extraction and querying/exploration

The Mumbai Terrorism event of 2008 gave the impetus to study the event from STT dimensions, and focus on connecting with relevant news content. Social media continues to grow and revolutionize the way users interact with each other and information. Social network users are not only creators and recipients of the information, but also critical relays to propagate information. This powerful ability of sharing has played an important role in events with varied social significance, audience, and duration, such as political movements (e.g. the Jasmine Revolution in Tunisia), brand management and marketing, and perhaps most visibly, crisis and disaster management (e.g., Haitian and Japanese earthquakes). The Twitris team started to look at the issues such as the role of content nature for high vs. low attributed information diffusion (a phenomenon of propagating messages via friendship/follower connections among users of social network) (Nagarajan, 2010-b) and user engagement (given a discussion topic on social media, what motivates a user to engage in the discussion for his/her first interaction) (Purohit, 2011-a), (Ruan, 2012). Consequently, Twitris v2 embarked on a more comprehensive analysis along the three pillars of what makes anything social: who is engaging in the social activity, what is being communicated, and how does this communication flow between those engaged in the social activity. The idea is to gain insights into how permanent and transient networks arise, and what and why information flows across such networks. Twitris v2 developed the significant capability to extract more types of metadata, and the infrastructure became more semantic with the use of Semantic Web standard RDF, as well as relevant background knowledge. The latter enabled Twitris v2 to support the deep exploration capability with use of DBPedia and SPARQL over metadata extracted from the tweets. Twitris v2 research focus on coordination during disasters also led to integrate Twitris with Ushahidi’s SwiftRiver open source platform, and support ingestion of SMS which were used for events such as Pakistan Floods in 2010.


Internal

For project members only: Twitris Internal Page