Twitris

From Knoesis wiki
Jump to: navigation, search

Sheth, A., Purohit, H., Smith, G. A., Brunn, J., Jadhav, A., Kapanipathi, P., Lu, C. & Wang, W. Twitris - A System for Collective Social Intelligence. Encyclopedia of Social Network Analysis and Mining (ESNAM), 2017, pp. 1-23.


Twitris (latest version), a Semantic Web application, facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data on social media. Twitris addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties and focusing on multi-dimensional analysis of spatio-temporal-thematic, people-content-network and sentiment-emotion-intent facets. Twitris also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system's integration and analysis abilities. It has applications for studying and analyzing social sensing and perception of a broad variety of events: politics and elections, social movements and uprisings, crisis and disasters, entertainment, environment, decision making and coordination, brand management, campaign effectiveness, etc.

Introduction

More than two-thirds of all Internet users have become “citizens” of an Internet- or Web-enabled social community. Web 2.0 fostered the open environment and applications for tagging, blogging, wikis, and social networking sites that have made information consumption, production, and sharing so incredibly easy. With over five billion mobile connections, of which a majority (over 2.5 billion) use smartphones, a significant portion of the humanity has the ability to communicate using SMS, and digital media can be shared with the rest of the humanity instantly. As a result, humanity is interconnected as never before. This interconnected network of people who actively observe, report, collect, analyze, and disseminate information via text, audio, or video messages, increasingly through pervasively connected mobile devices, has led to what we term citizen sensing (Sheth 2009a, b). This phenomenon is different from the traditional centralized information dissemination and consumption environments where citizens primarily act as consumers of reported information from several authoritative sources.

This citizen sensing is complemented by the growing ability to access, integrate, dissect, and analyze individual and collective thinking of humanity, giving us a capability that is recognized as collective intelligence. Citizen sensing involves humans in the loop and with it all the complexities associated with and intelligence captured in human communication. As citizen sensing has gained momentum, it is generating millions of observations and creating significant information overload. In many cases, it becomes nearly impossible to make sense of the information around a topic of interest. Given this data deluge, analyzing the numerous social signals can be extremely challenging. In response to this growing citizen sensing data deluge, Twitris has been developed with the vision of performing semantics-empowered analysis of a broad variety of social media exchanges.

Twitris, named by combining Twitter with Tetris, a tile-matching puzzle game, has incorporated increasingly sophisticated analysis of social data and associated metadata, combining it with background knowledge, and more recently (albeit not discussed here) machine sensor or data captured from sensors and devices that make up the Internet of Things (IoT). Twitris' evolution can be characterized in three phases (and corresponding versions of the system). Figure 1 outlines the corresponding dimensions Twitris considers.

Figure 1: Twitris — three primary dimensions of analysis

Twitris is a comprehensive platform for analyzing social content along multiple dimensions leading to in-depth insights into various aspects of an event or a situation. The central thesis behind this work is that citizen sensor observations are inherently multidimensional in nature, and taking these dimensions into account while processing, aggregating, connecting, and visualizing data will provide useful organization and consumption principles. Twitris evolved in three research phases, characterized by the following versions of the system:

  • Twitris v1: Spatio-temporal-thematic (STT) processing of Twitter and associated news, multimedia, and Wikipedia content (Sheth 2009b; Nagarajan et al. 2009a; Jadhav et al. 2010).
  • Twitris v2: People-content-network analysis (PCNA) (Purohit et al. 2011, Purohit et al. 2012, Purohit et al. 2014b) with use of background knowledge and semantic metadata extraction and querying/exploration.
  • Twitris v3: Sentiment-emotion-intent (SEI) extraction with multidimensional user and content modeling (Chen et al. 2012; Wang et al. 2012; Nagarajan et al. 2009b; Purohit et al. 2013b; Purohit et al. 2014a; Purohit et al. 2015) along with personalization (Kapanipathi et al. 2011a) and emerging continuous semantics (Sheth et al. 2010) capability involving semantic social stream (i.e., real-time) processing using dynamically generated and updated domain models for semantics and context.

Following the above research phases, Twitris went through a substantial system re-architecture and software engineering work with emphasis on user interface/user experience, usability, scalability, and robustness, leading to a version termed Twitris-C. This version has since been licensed to spin off a startup, Cognovi Labs, in 2016.

The above versions, or phases, of Twitris' development are not as granular as painted above — that is, the issues identified above are not explicitly segregated by the version of the Twitris which has been in continuous development with senior students graduating and new students picking up the work. Around a dozen of talks, including tutorials, cover many of the issues covered by Twitris (Sheth 2009a; Nagarajan 2010; Nagarajan et al. 2011; Sheth 2011; Purohit et al. 2013d).

Key Points

Throughout this entry, we investigate the role and benefits of using a semantic approach, especially by metadata extraction and enrichments and contextually applying relevant background knowledge, along with demonstrating examples on real-world data for research, humanitarian, and commercial applications using the Twitris system (also termed platform given its comprehensive and extensive nature) developed at Kno.e.sis. The entry focuses on the following key points:

  1. Event-specific analysis of citizen sensing and discussion of opportunities and challenges in understanding temporal, spatial, and thematic cues.
  2. Facets of people-content-network analysis with focus on user-community engagement analysis.
  3. Real-time social media data analysis and the concept of continuous semantics supported by dynamic model creation.
  4. Sentiment, emotion, and intent identification from citizen sensing data.
  5. Recent advances in developing semantic abstracts or semantic perception to convert massive amounts of raw observational data into nuggets of information and insights that can aid in human decision-making and real-time decision-making.

Historical Background

The idea of research leading to Twitris occurred on November 26, 2008. Terrorists struck Mumbai, India, and over the next three days, they proceededed to make mayhem in nine locations. Each of the nine sub-events of this overall event separated by time and location (space) had distinct thematic elements or topical content. The importance of Twitter, especially in terms of citizen sensing — the ability of a regular person to use his or her mobile device to share his or her personal observation, thoughts, and beliefs, well before the traditional news media has a chance to do reporting and to shape opinions — was extensively discussed in the immediate aftermath of this momentous event. This event also gave us a clear case for the needs and benefits of analyzing social media content such as tweets and Flickr posts and related news stories along the three dimensions of spatial (location of observation), where; temporal (time of observation), when; and thematic (the event in question), what (Battle 2009; Impact Lab 2008).

Twitris Platform and Three Stages of Its Evolution

Twitris v1: Spatio-Temporal-Thematic (STT) Processing of Twitter and Associated News, Multimedia, and Wikipedia Content

Twitris v1 (Jadhav et al. 2010) was designed with the following three major steps:

  1. Data collection: collect user posted tweets pertaining to an event from Twitter, associated news, multimedia, and Wikipedia content (Figure 2).
  2. Data analysis: (a) process obtained tweets to extract strong event descriptors considering spatial, temporal, and thematic event attributes and (b) process event related news, multimedia, and Wikipedia content to get event context and gain a better understanding.
  3. Visualization: present extracted summaries on Twitris v1 user interface.
Figure 2: A snapshot of spatio-temporal-thematic slice of citizen sensing showing content related to Mumbai terrorism (thematic) related to Taj hotel (spatial, thematic), during a period of interest (temporal)

Twitris v1 performs two-step processing to extract strong event descriptors from tweets. First, it creates the spatio-temporal clusters of the tweet corpus surrounding an event since every event is different and we want to preserve the social perceptions that generated this data. TFIDF computation is performed to fetch the n-grams from this set. The second step involves the association of spatial, temporal, and thematic bias to these n-grams by means of enhancing the weights while preserving the contextual relevance of these event descriptors to the event. Further details of the text-processing algorithm are available in Nagarajan et al. (2009a). The Twitris v1 user interface (Figure 3) facilitates effective browsing of when (temporal/time), where (space/location), and what (thematic/context) slices of social perceptions behind an event.

Figure 3: The STT biased scoring mechanism of Twitris v1 for relevance and ranking of key phrases compared to traditional TFIDF-based ranking: “mumbai” ranked highest based on TFIDF is far less informative compared to “foreign relations perspectives.”

The objective of the Twitris v1 user interface is to integrate the results of the data analysis (extracted descriptors and surrounding discussions) with emerging visualization paradigms to facilitate sensemaking. To start browsing, users are required to select an event. Once the user chooses a theme, the date is set to the earliest date of recorded observations for an event and the map is overlaid with markers indicating the spatial locations from which observations were made on that date. We call this the spatio-temporal slice (Figures 4 and 5).

Figure 4: Early version of Twitris v1 user interfaces for displaying thematic component (using STT biasing) on right (b) based on spatial and temporal selection on left (a)
Figure 5: Twitris v1 user interface with spatio-temporal slice and multimedia widgets.

Users can further explore activity in a particular space by clicking on the overlay marker. The event descriptors extracted from observations in this spatio-temporal setting are displayed as an event descriptor cloud. The spatio-temporal-thematic (STT) scores determine the size of the descriptor in the tag cloud (Nagarajan et al. 2009a). In order to get event context and better understanding of the event, we enhanced Twitris by integrating event-related news, multimedia (images and videos), and Wikipedia articles. We leveraged explicit semantic information from DBpedia to identify relevant news and Wikipedia articles. When a user clicks on a particular descriptor, we display tweets containing the event descriptors and the top current news items as well as related Wikipedia articles (Figure 6).

Figure 6: Twitris v1 user interface (b) with event descriptor cloud, related tweets, news and Wikipedia articles for event “Austin plane attack”. Joe Stack the man responsible for the Austin suicide plane attack on the IRS office, put up his suicide note online about the attack. He was a former bass player for the Billy Eli band. Here Twitris captures STT event descriptors summarizing the important facets.


Twitris v2: People Content Network Analysis (PCNA) with use of background knowledge and semantic metadata extraction and querying/exploration

The Mumbai terrorism event of 2008 gave the impetus to study the event from STT dimensions and to focus on connecting with relevant news content. Social media continues to grow and revolutionize the way users interact with each other and information. Social network users are not only creators and recipients of the information but also critical relays to propagate information. This powerful ability of sharing has played an important role in events with varied social significance, audience, and duration, such as political movements (e.g., the Jasmine Revolution in Tunisia), brand management and marketing, and, perhaps most visibly, crisis and disaster management (e.g., Haitian and Japanese earthquakes). The Twitris team started to look at the issues such as the role of content nature for high vs. low attributed information diffusion (a phenomenon of propagating messages via friendship/follower connections among users of social network) (Nagarajan et al. 2010b) and user engagement (given a discussion topic surrounding an event on social media, what motivates a user to engage in the discussion for his/her first and subsequent interaction across the various phases of the event) (Purohit et al. 2011; Ruan et al. 2012, Purohit et al. 2014c). Consequently, Twitris v2 embarked on a more comprehensive analysis along the three pillars of what makes anything social: who is engaging in the social activity, what is being communicated, and how does this communication flow between those engaged in the social activity. The idea is to gain insights into how permanent and transient networks arise and what and why information flows across such networks. Twitris v2 developed the significant capability to extract more types of metadata, and the infrastructure became more semantic with the use of Semantic Web standard RDF as well as relevant background knowledge. The latter enabled Twitris v2 to support the deep exploration capability with use of DBpedia and SPARQL over metadata extracted from the tweets. Twitris v2 research that focused on coordination during disasters also led to integrating Twitris with Ushahidi's SwiftRiver open source platform and support ingestion of SMS which were used for events such as Pakistan floods in 2010. Let us look at some examples of Twitris v2 capabilities:

  • Evolving ad hoc nature of social media communities:
    • Event-centric communities with varied nature (Purohit et al. 2011) often bring together users from different parts of the social network, especially in Twitter where we keep switching discussions of our interests, and we may not already be connected to other participants of those communities. Therefore, in such ad hoc communities, it is difficult to depend on just follower graphs for understanding the dynamics. Twitris v2 introduced analysis of user interaction networks so that human dynamics in the evolving communities can be understood at granular levels — influencer analysis, contextually important people with roles to engage with, community evolution, etc. Twitris v2 built this feature by extending our research in the user interaction network analysis on brand-page communities (Purohit et al. 2012) and crisis response coordination for identifying important actors in social media communities (Purohit et al. 2014b).
  • Contrast in the structure of interaction networks:
    • Figure 7 shows the networks of influencers in two topical communities during the Occupy Wall Street (OWS) movement, “Occupy Chicago” on the left and “Occupy LA” on the right. Such an analysis provides insights to understand not only the real dynamics of the actors (e.g., what organizations supporters belong to and to whom are they strongly connected) but also the potential of the influencers to drive actions in the communities (tightly connected influencers are likely to drive effective “call for action” propagation in the communities). In this figure, the influencer network of Occupy LA is highly connected and self-organized as compared to sparsely connected one for Occupy Chicago and, therefore, likely to reach masses effectively for any call for action. Even the Facebook page for Occupy LA reflected such activism.
  • Slicing and dicing the networks by user features:
    • To glean insights about actionable information in the ad hoc communities, we need to understand the participants better. Therefore, Twitris v2 introduced slicing and dicing analysis of the interaction networks by providing user/node centric features. For example, the professional or organizational affiliation of users provides clues to understand the cause for dynamics — e.g., who are the people behind the organized network of Occupy LA? Are such users from the same type of organizations lead to coordinated actions? Similarly, Twitris v2 introduced the content-centric analysis, thus realizing the full potential of PCNA. Users are clustered by grouping them into sentiment segments of the target topic, thus answering questions like which candidate is going stronger in the influencer network from a sentiment perspective (Figure 9) between Mitt Romney and Ron Paul and for what issues?
  • Understanding group dynamics by community evolution:
    • Twitris v2 focused on the larger goal of predictive ability for group dynamics (Purohit et al. 2014c), and the people-content-network analysis (PCNA) framework was the key to the untapped potential of group dynamics. Therefore, Twitris v2 created clusters in the ad hoc communities based on the sentiment of the users for a targeted topic over time and associated events on the timeline for causal analytics. Figure 9 shows an example of community evolution centered around Republican presidential nominee Mitt Romney during March 1–31, 2012. It shows three snapshots taken over a 10-day period, and we observed an extremely modularized community in the end of the analysis, which was not really the case for the closest competition, Rick Santorum. And as we know, Santorum exited the race on April 9. Thus, the analysis of community evolution made Twitris v2 capable of understanding group dynamics of ad hoc communities by not limiting the output to just understand users but also the group behavior.
Figure 7: Contrast in the community structure of influencers in user interaction networks, centered on two popular events #OccupyChicago and #OccupyLA.
Figure 8: Sentiment of the influencers for the target candidate in the interaction network centered on that target: Romney (1st cluster) vs. Ron Paul (2nd cluster).
Figure 9: Interaction network evolution for topical community surrounding Mitt Romney, US Presidential Election 2012

Twitris v2 leverages Semantic Web technologies by the use of background knowledge such as DBpedia to provide deeper insights about the event. Background knowledge changes the way you can look at the information, as it puts the information in context. This is especially important for tweets because they are short and therefore individually lack the volume of information that provides an informative context. For example, in the above Figure 10, questions such as “Who are the dead people that are mentioned in the context of OWS movement” can be answered using the background knowledge, whereas simple a keyword search cannot put the information of tweets in context. Further, to answer the questions in the figure and generate answers such as Rosa Parks, the system has to have the background knowledge about this named entity as a person and also that she is dead. Going deeper into the background knowledge provides information that Rosa Parks was famous for the Montgomery Bus Boycott during the US civil rights movement in 1955–1956.

Figure 10: Leveraging Semantic Web technologies to provide insights of events.

Twitris v3: Emotion-Sentiment-Intent, Real-Time View, and Other Advancements

Behind every (well, most of the important) tweet, there is a human. And a human is complex. Through a tweet, a person expresses emotion, sentiment, and intent. Understanding this dimension is a key to unlock the true potential of social media. This is especially true for monetization of social media. Understanding an underlying intent can tell us if a user is expressing a transactional (potentially for buying a product) intent, seeking information, or just sharing information (Nagarajan et al. 2009b). Sentiment is perhaps the most sought after type of analysis of social data. Currently, it is the primary basis of social media analysis to predict whether a product or a movie will succeed, who is more likely to win an election, or to attempt to identify consumer interest and hence use it for targeting the advertisement. Analysis of or identification of emotion is likely the dark horse of the three — while techniques for its analysis are not yet as mature as sentiment analysis, it is likely to be combined with the other two to give far more signal than without it.

A key innovation in sentiment analysis, employed in Twitris v3, is topic-specific sentiment analysis — to associate sentiment with an entity (Chen et al. 2012, Chen 2016). This enables us to identify two different sentiments associated with different entities in a single tweet. For example, in the tweet “The King's Speech was bloody brilliant. Colin Firth and Geoffrey Rush were fantastic!” we can identify both the sentiment (i.e., bloody brilliant) associated with the movie “The King's Speech” and the sentiment (fantastic) associated with the actors Colin Firth and Geoffrey Rush. More recently, we are associating sentiments with events — when there is a significant change in sentiment, we attempt to associate that with real-world events. For example, by tracking both the event- and entity-specific sentiments, Twitris v3 is able to capture a substantial increase of positive sentiment toward President Obama on the immigration issue on June 15, 2012 (the day on which President Obama outlined a new immigration policy), and associate it with the event descriptors such as “dream act,” “obama 's immigration move,” and “new immigration policy.” Figure 10 shows that Twitter users have the opposite sentiments toward two candidates: Obama (green/positive) and Romney (red/negative) on the same topic “final debate.” The reason is that Obama received more positive feedback from Twitter users than Romney did, which is in line with the impression from news media. This example demonstrates Twitris' power in identifying topic-specific sentiments.

Compared with sentiment, emotion is more implicit. For example, “I will have a calculus test in two hours, but I'm not prepared at all.” We can infer that the person is nervous about the test, though there are no explicit emotion words, such as “nervous” or “panic.” It is very difficult and time consuming to label sentences with emotions, considering the implicitness of emotion. In Twitris v3, we are able to automatically create a large emotion-labeled dataset (of about 2.5 million tweets) covering: joy, sadness, anger, love, fear, thankfulness, and surprise, by harnessing emotion-related hashtags available in the tweets (Wang et al. 2012). Machine-learning classifiers are trained on the large dataset to learn how to identify people's emotions behind their tweets. And, as another key innovation, Twitris v3 can analyze people's emotional responses in different events. For example, Figure 11 shows the volume of joyful tweets, reaching peaks on October 3, 2012 (first debate), October 16, 2012 (second debate), October 22, 2012 (third debated), and October 19, 2012 (Obama went to the Daily Show). The reason is that Twitter users are very enthusiastic about all three presidential debates and Obama's presence in the Daily Show TV program. Other than analyzing emotions out of tweets, Twitris v3 is also able to identify emotions from blogs, news headlines, etc. The reason is that we adapt the classifiers trained on Twitter data to other domains with a relatively small amount of labeled emotion data in other domains.

Beyond sentiment and emotion, there is an intentional behavior expressed in the social media posts, such as intent of asking help during a disaster response. The Twitris research project led to design of intent mining techniques, especially for a context of disaster response where mining intent of seeking and offering help can greatly assist response operations for coordination. Identifying intent of a post can be formulated as a text classification problem (Purohit et al. 2015), although it is a different type of problem concerned with the future state of affairs, in contrast to topic classification—focused on subject matter of the post, as well as sentiment and emotion classification—focused on the current state of affairs. For instance, in a message “RT @xyz: Again, people in #yeg feeling helpless about #yycflood and wanting to help, go donate blood. Clinics in #yyc are closed.?!!!” topic classification focuses on the medical resource ‘blood’; sentiment and emotion classification is focused on the negative feeling expressed via ‘helpless’. In contrast, intent classification concerns the author’s intended future action, i.e. ‘wanting to help/donate’. We developed techniques (Purohit et al. 2013b, Purohit et al. 2014a, Purohit et al. 2015) for intent classification using a hybrid feature representation created by a combination of top-down processing based knowledge-guided patterns and bottom-up processing based bag-of-tokens model. Pattern-aided text-classification was found to perform well on the well-formatted text and, therefore, shows potential to improve intent-based text classification for short-text of social media. We employed diverse patterns from a variety of knowledge sources including declarative patterns from domain experts, syntactic-semantic patterns from psycholinguistics and discourse analysis theories about conversations, and contrast-mining-based patterns to tackle the sparsity challenge for intent classification.

Figure 11: Twitter users show the opposite sentiments towards two candidates on the same topic “final debate” in 2012 presidential election.

We also explored the commercial intent problem domain for how to automatically identify users' intents from posts so that monetization can be more targeted on users' needs (Nagarajan et al. 2009b). The highlight of our study is that we discover and differentiate three types of posts: (a) transactional posts, e.g., “I am looking for a 32 GB iTouch”; (b) information sharing posts, e.g., “I like my new 32 GB iTouch”; and (c) information seeking posts, e.g., “what do you think about 32 GB iTouch?” For monetization purposes, transactional posts and information seeking posts are more valuable than information sharing posts because users are looking for information that advertisers can exploit. By extracting intent/keywords/cues from transactional and information seeking posts, our system achieved an accuracy of 52 % on ad impressions using MySpace and Facebook data, while the baseline, without using our system, only achieved an accuracy of 30%.

All the above-mentioned precious assets (sentiment, emotion, and intent) of content exist due to an actionable purpose of humans. When such individual level purposes start to bring higher engagement in the groups, they become a source of group-level actions, apparently, leading to the evolution of human dynamics in the social network. Therefore, we are exploring the integral role of intent with sentiment and emotions for purposeful actions in the groups. Specifically, we are focusing on intent and sentiments behind group coordination because coordinated activity has the potential to make or break the system (Figure 13).

Figure 12: Peek patterns of the emotion joy due to excitement of Twitter users caused by three debates and one TV program (the Daily Show) in the 2012 presidential election
Figure 13: Some of the capabilities of Twitris v3: (1) show popular Topics, also called social signals (weighted n-grams) related to the chosen event for today and any day of the past since the event began to be tracked; (2) search from among the event related tweets with autocomplete, popular event hashtags, and active users and explore content for deep analysis (e.g., who are the dead people mentioned most often in the “occupy wall street event”) using background knowledge (default source is Wikipedia/DBpedia) and Semantic Web technologies (RDF/SPARQL); (3) show key topics of discussions by locations/regions, states, and country (e.g., see the differences in social signals from Mississippi (a “red state”) vs. Massachusetts (a “blue state”) related to President Obama's Nobel Prize); (4) see event relevant tweets in real time on a world map or any region; (5) analyze topic-/people-/region-specific sentiment (e.g., for the US election, sentiment on candidates by states, and by topics identified by election specific topics); (6) see the networks with insights from static (e.g., followers) and dynamic features (e.g., retweet) and people/demographics (e.g., with knowledge of profession of each person); (7) display tweets, recent news, and Wikipedia pages related to selected events and social signals; (8) show event-specific multimedia (images and video); (9) see tweet traffic; (10) change date of video/analysis; (11) select location of interest — each pin shows a collection of social signals emanating from a location; and (13) select an event of interest (e.g., US Election, Occupy Wall Street, Japanese Tsunami).

Advanced Research

Detailed research on social data analysis encompasses social intelligence in real time (Gruhl et al. 2010), which involved a Kno.e.sis-IBM collaboration leading to the operationally deployed BBC Sound Index system. In addition, Twitris has been used for research on multiple fronts for understanding social behavior in online communities, including prediction of topic volume on Twitter (Ruan et al. 2012), brand tracking (Purohit et al. 2012a), psycholinguistic analysis during emerging coordination for actionable intentional behaviors of help (Purohit et al. 2013a; Purohit et al. 2013b; Purohit et al. 2014a; Purohit et al. 2015), privacy-aware content dissemination (Kapanipathi et al. 2011b) and personalization through user interest modeling (Kapanipathi et al. 2014), user-community engagement (Purohit et al. 2011, Purohit et al. 2014c), information diffusion (Nagarajan et al. 2010b), trust in social media (Thirunarayan and Anantharam 2011), studying election events (Chen et al. 2012), analyzing cursing behavior on social media (Wang et al. 2014), and monetization of social activities (Nagarajan et al. 2009b) reported in over 40 publications and summarized in comprehensive tutorials (Nagarajan et al. 2011, Purohit et al. 2013d).

Another recent research direction for Twitris is to incorporate diverse and more fine-grained types of subjective information extraction processes in the system. To address the limitations of current subjectivity and sentiment analysis efforts focused on classifying the text polarity, specifically, whether the expressed opinion for a specific topic in a given text (e.g., document, sentence, word/phrase) is positive, negative, or neutral, we have proposed a framework for subjective information extraction (Chen 2016). The state-of-the-art narrow definition considers subjective information and sentiment as the same object, while other types of subjective information (e.g., emotion, intent, preference, expectation) are either not taken into account or are handled similarly without sufficient differentiation. This limitation may prevent the exploitation of subjective information from reaching its full potential. We extend the definition of subjective information and develop a unified framework that captures the key components of diverse types of subjective information. We define a subjective experience as a quadruple (h, s, e, c), where h is an individual who holds the experiences, s is a stimulus (or target) that elicits the experiences, e.g., an entity or an event, e is a set of expressions that are used to describe the subjective experiences, e.g., the sentiment words/phrases or the opinion claims, and c is a classification or assessment that characterizes or measures the subjectivity. Accordingly, the problem of identifying different types of subjective information can all be formulated as a data-mining task that aims to automatically derive the four components of the quadruple from text. Figure 14 summarizes this research framework.

Figure 14: Overview of subjective information extraction framework that summarizes the advanced Twitris research for subjectivity (Chen 2016).

Twitris-C: Commercialization and Further Advances

In the following, we describe two of the recent research advancements in Twitris. The first one includes user filtering analysis to decipher whether a user is a bot or not, which is essential for high quality of the downstream user-generated content analysis (see Figure 15). We developed a system that is able to quickly and accurately weed out tweets that were not authored by humans, even if the user account is owned by an actual human. To evaluate against a popular service BotOrNot, we collected all of the bots found over a fifteen minute period and ran them through BotOrNot. 67.16% of the accounts from the tweets we labeled were determined by BotOrNot to be “bot” owned accounts, and after evaluating the tweet content of the determined “human” accounts, we observed that there was automated generated content while the account holder was human. Refer to (Brunn 2016) for more details.

Figure 15: Real-time labeling of bot-generated and human-generated tweets in Twitris.

Beyond the above functional capabilities, Twitris-C, a commercial grade version of Twitris has undergone extensive system and software engineering related to the following:

  • Information sourcing pipeline and aggregation framework: while Twitris originally started with a focus on analysis of Twitter, it has since been expanded to incorporate or source and aggregate data from multiple sources. This include a JSON API that accepts data, typically from proprietary, privacy controlled or enterprise sources. Examples include feedback and product review data, as well as data from Facebook. Another important source of data is a version of Twitris handles is Web forums. Modules to incorporate Reddit and Instagram are expected shortly.
  • Twitris has been engineered with the understanding that all components can and will be updated. To this end, we have implemented a modularized architecture where possible.
    • Processing
      • Twitris’ real-time processing system uses Apache Storm, which is inherently modular in its use of “bolts”. These bolts are self-contained processing units. To add a new form of analysis, a filter or a new database connection, all one has to do is add a bolt.
      • Additionally, Twitris’ processing pipeline allows for custom plugins to be integrated into the system. Plugins can be written in a number of different programming languages, and can be turned on or off, made available for specific campaigns or campaign types, and assigned custom output fields via the Twitris UI.
    • Web Services
      • Twitris uses the Django web framework for all interaction with processed data. Django allows developers to write custom apps to interact with Twitris’ data in new and interesting ways.
    • Front End
      • Twitris’ UI uses a custom built JavaScript framework that extends Backbone.js. This framework allows developers to quickly and easily write plugins to add to the available functionality.
  • UI/UX enhancements for both campaign designer and end user (those using campaigns for getting insights and making decisions).
    • Campaign creation and editing in real time.
      • Widgets on the management page aid the campaign designer in monitoring the effectiveness of the campaign .
    • Ability to generate If This Then That (IFTTT) classifiers for incoming data.
      • In addition, the ability to use the output of custom plugins in the classifier.
    • Widget to search Twitter and import the last week's worth of data.
      • Useful for supplementing streaming data or bootstrapping a new campaign.
    • Sophisticated text search can be used in combination with faceted search simultaneously to provide full query control.
    • Web app is scalable from mobile to desktop.
  • Scalability (Twitris runs on a large Open Stack cluster with 864 CPUs, 17 TB main memory, 18 TB SSD, and 435 TB of disk space).
  • Real-time monitoring of all its components, leading to improved resource allocation, optimization capabilities, and recovery.
    • A combination of Nagios, Grafana, Graphite, and StatsD allow System Administrators to:
      • Monitor the status of each system through web dashboards.
      • Receive email alerts for:
        • Resource over utilization.
        • Node failure.
      • Identify systems that are under or over utilizing resources.
    • Using Apache JMeter allows us to test the system while we monitor the stress on the system.

Illustrative Real-world Applications

Twitris has been used in a research context for studying and analyzing social sensing and perception of a broad variety of events: politics and elections, social movements and uprisings, crisis and disasters, epidemiology and epidemic tracking, environment, etc. Its commercialization by Cognovi Labs has allowed it to pursue more commercial applications including brand tracking and advertising campaign effectiveness, sports and entertainment, defense and intelligence, and empowering professional users. We present several real-world applications. Some of the applications, especially those focused on disaster response and popular events of 2016 - Brexit and the US Presidential Election - involve real-time analysis and its use for outcome prediction.

The Twitris team has contributed for several disasters involving earthquakes, hurricanes, tornado, and floods: see http://knoesis.org/amit/media for examples. One significant event Twitris was used for in real-time was the historic floods of Jammu and Kashmir (J&K) state in Northern India in September 2014. This event suffered from a dysfunctional formal disaster response, partly because all the administrative areas and building were under the influence of flood water. In this regard, several citizen-led response initiatives were taken to address the relief and rescue efforts. Twitris supported the scalable relief effort of the largest citizen-led response initiative JKFloodRelief.org (Purohit et al. 2016a), (also cited in several mainstream media such as Hindustan Times (Saxena, 2014)) by using its technology to quickly filter information streams for rescue calls and create a situational awareness map for volunteers (see Figure 16). as well as to identify influential users for coordination of disseminating a prioritized set of needs and sourcing important information sources (see Figure 17).

Figure 16: Rescue and Evacuation Stream Map during the 2014 Jammu & Kashmir Floods (Purohit et al. 2016a).
Figure 17: Network of influential users discussing about ‘relief donations help’ during the 2014 Jammu & Kashmir floods that helped identify influentials to spread important messages in the network as well as to source relevant information (Purohit et al. 2016a).
Figure 18: Snapshot of Twitris platform based simulation tool for filtering the social stream during functional exercise of emergency response teams in Dayton on May 28, 2014 (Hampton et al. 2015).

A version of Twitris has also been used for creating a simulation interface for an emergency response exercise (Hampton et al. 2015). Data used for this simulation was based on repurposing of a 2013 Boston Bombing dataset given the focus on man-made disasters with an urban focus. As shown in Figure 18, the tool provides filtering via search (top left), evolving topics (left pane), and by location (middle pane) for the intractable real-time stream (right pane). The local disaster management and response officials found this to be a highly valuable tool for training/exercise and planning.

Twitris-C was licensed to create a startup, Cognovi Labs in 2016. The technology was used for a series of significant successes. Specifically, we were able to predict Brexit hours before the polls closed in the UK and before the US markets closed (Donovan, 2016, Sheth, 2016a), and it was used to correctly predict 2016 US elections, well before anyone else did (Cognovi Labs, 2016; Sheth, 2016b).

Some of the other applications for which Twitris played important role include:

  • Measuring public attitude, providing timely analysis for public engagement and policy making on important social issues: this was exemplified by the use case of gender-based violence (GBV) (Purohit et al 2016b).
  • Using social media for understanding brand development (Yuskel et al 2016). In this case Twitris was used over Facebook data related to the brand under study.
  • Social media analysis for epidemiological surveillance. Examples of such uses include:
    • Studying prescription drug abuse associated with opioid dependence (Daniulaityte et al. 2015).
    • Studying of drug abuse trends in the context of marijuana legalization through the analysis of cannabis- and synthetic cannabinoid–related tweets (Daniulaityte et al 2016, Lamy et al. 2016).
    • study of conversations on Zika related to disease characteristics: symptoms, transmission, prevention, and treatment (Miller et al. 2017).
  • Analyzing community level health and disease challenges: this was exemplified by analyzing clinical depressive symptoms in Twitter (Yazdavar et al. 2016).

Use of Twitris for supporting personalized digital health, with use case of asthma in children is online.

Acknowledgements

We acknowledge contributions these alumni and team members whose work have benefits Twitris in different ways: Karthik Gomadam, Meena Nagarajan, and Ajith Ranabahu, Pramod Anantharam, Shreyansh Bhatt, Prof. Krishnaprasad Thirunarayan and Prof. Valerie Shalin. This work was partially supported by these NSF funded grants: “SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response” (IIS1111182), “I-Corps: Towards Commercialization of Twitris- a system for collective intelligence,” (1343041), and “PFI: AIR - TT: Market Driven Innovations and Scaling up of Twitris- A System for Collective Social Intelligence” (1542911). It is also partially supported by these NIH grants: “Modeling Social Behavior for Healthcare Utilization in Depression,” (1 R01 MH105384-01A1), and “Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use,” (5R01DA039454-02). Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the sponsor.

Future Directions

In the near term, Twitris and its continued enhancement aim to (a) rapidly create campaign-specific knowledge graphs (background knowledge) using a knowledge graph creation tool and use the knowledge graph in enhanced semantic processing, (b) enhance subjectivity analysis, (c) study time and location correlated issues across data from multiple heterogeneous data streams, (d) carry out more studies that combine social media analysis with more traditional data collection (e.g., surveys), and (e) enhance ability to carry out more behavioral economics applications. In the medium term, campaigns will include not only social media content but also streaming data from sensors (Internet of Things).

Related Projects Using Twitris

NSF SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response

NIH eDrugTrends: Social Media Analysis to Monitor Cannabis and Synthetic Cannabinoid Use

Harassment: Context-aware Online Harassment Detection on Social Media

Project Safe Neighborhood (PSN): Westwood Partnership to Prevent Juvenile Repeat Offenders

Hazards SEES: Social and Physical Sensing Enabled Decision Support

kHealth: Semantic Multisensory Mobile Approach to Personalized Asthma Care

Modeling Social Behavior for Healthcare Utilization in Depression

References

This publication is a revision of: Sheth, A., Jadhav, A., Kapanipathi, P., Lu, C., Purohit, H., Smith, G. A., and Wang, W. Twitris - A System for Collective Social Intelligence. Encyclopedia of Social Network Analysis and Mining (ESNAM), 2014.

  1. Battle, C. (2009) New media's moment in Mumbai. Foreign Policy J, 15 Jan 2009. Accessed on 25 Feb 2017 at http://www.foreignpolicyjournal.com/2009/01/15/new-media.
  2. Brunn, J. (2016). Bots in the Election. Accessed on 9 Jan 2017 at http://blog.knoesis.org/2016/12/bots-in-election.html.
  3. Chen, L., Wang W., Nagarajan, M., Wang, S., and Sheth, A. (2012). Extracting diverse sentiment expressions with target-dependent polarity from Twitter. In: Proceedings of the 6th international AAAI conference on weblogs and social media (ICWSM), Dublin, 5–7 June 2012.
  4. Chen, L., Wang, W., and Sheth, A. P. (2012, December). Are Twitter users equal in predicting elections? A study of user groups in predicting 2012 US Republican Presidential Primaries. In International Conference on Social Informatics (pp. 379-392). Springer Berlin Heidelberg.
  5. Chen, L. (2016). Subjectivity — Tapping All the Valuable Insights Beyond Sentiment for Nextgen Information Extraction. Accessed 9 Jan 2017 http://blog.knoesis.org/2016/09/subjectivity-tapping-all-valuable.html.
  6. Cognovi Labs (2016, November 11). Twitter Analytics Startup Predicts Trump Upset in Real-Time. Accessed on 25 Feb 2017 at http://finance.yahoo.com/news/cognovi-labs-twitter-analytics-startup-181700129.html.
  7. Daniulaityte, R., Carlson, R., Brigham, G., Cameron, D., and Sheth, A. (2015). “Sub is a weird drug:” A web‐based study of lay attitudes about use of buprenorphine to self‐treat opioid withdrawal symptoms. The American Journal on Addictions, 24(5), 403-409.
  8. Donovan, J. (2016, June 29). The Twitris sentiment analysis tool by Cognovi Labs predicted the Brexit hours earlier than polls. Accessed on 25 Feb 2017 at https://techcrunch.com/2016/06/29/the-twitris-sentiment-analysis-tool-by-cognovi-labs-predicted-the-brexit-hours-earlier-than-polls/.
  9. Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., and Sheth, A. (2010). Multimodal social intelligence in a real-time dashboard system. VLDB J (Data Manage Mining Soc Netw Soc Media) (Special issue, to appear) 19(6): 825– 848.
  10. Hampton, A., Bhatt, S., Smith, A., Brunn, J., Purohit, H., Shalin, V. L., Flach, J., and Sheth, A. P. (2015). On Using Synthetic Social Media Stimuli in an Emergency Preparedness Functional Exercise. arXiv preprint arXiv:1503.00760.
  11. Impact Lab (2008) Twitter provided a vital link in Mumbai terrorist attacks, November 28, 2008. Accessed on 25 Feb 2017 at http://www.impactlab.net/2008/11/28/twitter-provided-a-vital-link-in-mumbai-terrorist-attacks/.
  12. Jadhav, A., Purohit, H., Kapanipathi, P., Ananthram, P., Ranabahu A., Nguyen, V., Mendes, P., Smith, A. G., Cooney, M., and Sheth, A. (2010). Twitris 2.0: semantically empowered system for understanding perceptions from social data. In: Semantic web application challenge at ISWC, Shanghai, 7–11 Nov 2010.
  13. Kapanipathi, P., Orlandi, F., Sheth, A., and Passant, A. (2011a). Personalized filtering of the Twitter stream. In: 2nd workshop on semantic personalized information management at ISWC 2011, Koblenz, 23–27 Oct 2011.
  14. Kapanipathi, P., Anaya, J., Sheth, A., Slatkin, B., and Passant, A. (2011b). Privacy-aware and scalable content dissemination in distributed social networks. In: International semantic web conference (ISWC), Koblenz, 23–27 Oct 2011.
  15. Kapanipathi, P., Jain, P., Venkataramani, C., and Sheth, A. (2014, May). User interests identification on Twitter using a hierarchical knowledge base. In European Semantic Web Conference (pp. 99-113). Springer International Publishing.
  16. Lamy, F. R., Daniulaityte, R., Sheth, A., Nahhas, R. W., Martins, S. S., Boyer, E. W., and Carlson, R. G. (2016). “Those edibles hit hard”: Exploration of Twitter data on cannabis edibles in the US. Drug and alcohol dependence, 164, 64-70.
  17. Miller, M., Banerjee, D., Muppalla, R., Romine, D., and Sheth, D. (2017). What Are People Tweeting about Zika? An Exploratory Study Concerning Symptoms, Treatment, Transmission, and Prevention. arXiv preprint arXiv:1701.07490.
  18. Nagarajan, M. (2010). Understanding user-generated content on social media. Ph.D. dissertation, Wright State University.
  19. Nagarajan, M., Gomadam K., Sheth A., Ranabahu A., Mutharaju R., and Jadhav A. (2009a). Spatio-temporal-thematic analysis of citizen-sensor data — challenges and experiences. In: Tenth international conference on web information systems engineering, Poznan, 5–7 Oct 2009.
  20. Nagarajan, M., Baid, K., Sheth, A., and Wang, S. (2009b). Monetizing user activity on social networks — challenges and experiences. In: IEEE/WIC/ACM international conference on web intelligence, Milan, 15–18 Sept 2009.
  21. Nagarajan, M., Purohit, H., and Sheth, A. (2010). A qualitative examination of topical tweet and retweet practices. In: 4th international AAAI conference on weblogs and social media (ICWSM), Washington, DC, 23–26 May 2010, pp 295–298.
  22. Nagarajan, M., Sheth, A., and Velmurugan, S. (2011). Citizen sensor data mining, social media analytics and development centric web applications. In: Proceedings of the WWW 2011, Hyderabad, 28 Mar—1 Apr 2011.
  23. Purohit, H., Ruan, Y., Joshi, A., Parthasarathy, S., and Sheth, A. (2011). Understanding user-community engagement by multi-faceted features: a case study on Twitter. SoME 2011 (workshop on social media engagement, in conjunction with WWW 2011), Hyderabad, 28 Mar—1 Apr 2011.
  24. Purohit, H., Ajmera, J., Joshi, S., Verma, A., and Sheth, A. (2012). Finding influential authors in brand-page communities. In: 6th international AAAI conference on weblogs and social media (ICWSM), Dublin, 5–7 June 2012.
  25. Purohit, H., Hampton, A., Shalin, V. L., Sheth, A. P., Flach, J., and Bhatt, S. (2013a). What kind of# conversation is Twitter? Mining #psycholinguistic cues for emergency coordination. Computers in Human Behavior, 29(6), 2438-2447.
  26. Purohit, H., Castillo, C., Diaz, F., Sheth, A., and Meier, P. (2013b). Emergency-relief coordination on social media: Automatically matching resource requests and offers. First Monday, 19(1).
  27. Purohit, H. and Sheth, A. P. (2013c). Twitris v3: From Citizen Sensing to Analysis, Coordination and Action. In ICWSM, pp. 746-747.
  28. Purohit, H., Meier, P., Castillo, C., and Sheth, A. P. (2013d). Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citizen Roles for Crisis Response. In ICWSM tutorials. Accessed on 25 Feb 2017 at https://www.slideshare.net/knoesis/icwsm-2013-tutorial-crisis-mapping-citizen-sensing-and-social-media-analytics.
  29. Purohit, H., Hampton, A., Bhatt, S., Shalin, V. L., Sheth, A. P., and Flach, J. M. (2014a). Identifying seekers and suppliers in social media communities to support crisis coordination. Computer Supported Cooperative Work (CSCW), 23(4-6), 513-545.
  30. Purohit, H., Bhatt, S., Hampton, A., Shalin, V. L., Sheth, A. P., and Flach, J. M. (2014b). With Whom to Coordinate, Why and How in Ad-Hoc Social Media Communications during Crisis Response. In ISCRAM, pp. 797-791.
  31. Purohit, H., Ruan Y., Fuhry, D., Parthasarathy, S., and Sheth, A. P. (2014c). On Understanding the Divergence of Online Social Group Discussion. ICWSM, 14, 396-405.
  32. Purohit, H., Dong, G., Shalin, V., Thirunarayan, K., and Sheth, A. (2015). Intent Classification of Short-Text on Social Media. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) (pp. 222-228). IEEE.
  33. Purohit, H., Dalal, M., Singh, P., Nissima, B., Moorthy, V., Vemuri, A., Krishnan, V., Khursheed, R., Balachandran, S., Kushwah, H., and Rajgaria, A. (2016a). Empowering Crisis Response-Led Citizen Communities: Lessons Learned from JKFloodRelief.org Initiative. In C. Graham (Ed.), Strategic Management and Leadership for Systems Development in Virtual Spaces (pp. 270-292). Hershey, PA: IGI Global. doi:10.4018/978-1-4666-9688-4.ch015.
  34. Purohit, H., Banerjee, T., Hampton, A., Shalin, V., Bhandutia, N., and Sheth, A. (2016b). Gender-based violence in 140 characters or fewer: A #BigData case study of Twitter. First Monday, 21(1). doi:10.5210/fm.v21i1.6148.
  35. Ruan, Y., Purohit, H., Fuhry, D., Parthasarthy, S., and Sheth, A. (2012). Prediction of topic volume on Twitter. In: 4th international ACM conference of web science (WebSci), Evanston, 22—24 June 2012.
  36. Saxena, V. (2014). Digital soldiers emerge heroes in Kashmir flood rescue. Hindustan Times News. Accessed on 25 Feb 2017 at http://www.hindustantimes.com/floodfuryhitsjk/digital-soldiers-emerge-heroes-in-kashmir-flood-rescue/article1-1262099.aspx.
  37. Sheth, A. (2009a). Semantic integration of citizen sensor data and multilevel sensing: a comprehensive path towards event monitoring and situational awareness. In: From E-Gov to connected governance: the role of cloud computing, Web 2.0 and Web 3.0 semantic technologies, Fall Church, 17 Feb 2009.
  38. Sheth, A. (2009b). Citizen Sensing, Social Signals, and Enriching Human Experience. IEEE Internet Computing, July/August 2009, pp. 80–85.
  39. Sheth, A. (2011). Citizen sensing-opportunities and challenges in mining social signals and perceptions. In: Invited talk at microsoft research faculty summit 2011, Redmond, 19 July 2011.
  40. Sheth, A., Thomas, C., and Mehra, P. (2010). Continuous semantics to analyze real-time data. IEEE Internet Comput 14(6):84–89
  41. Sheth, A. (2016a, June 24). #Brexit: “there is a big trouble for #remain” — Some Lessons from Real-time #socialmedia Analysis [LinkedIn article]. Accessed on 25 Feb 2017 at https://www.linkedin.com/pulse/brexit-big-trouble-remain-some-lessons-from-real-time-amit-sheth.
  42. Sheth, A. (2016b, November 9). Election Day #SocialMedia Analysis #Election2016 [LinkedIn article]. Accessed on 25 Feb 2017 at https://www.linkedin.com/pulse/election-day-socialmedia-analysis-election2016-06nov2016-amit-sheth?trk=prof-post.
  43. Thirunarayan, K. and Anantharam, P. (2011). Trust networks: interpersonal, sensor, and social. In: Proceedings of 2011 international conference on collaborative technologies and systems (CTS 2011), Philadelphia, 23–27 May 2011.
  44. Wang, W., Chen L., Thirunarayan, K., and Sheth, A. (2012). Harnessing Twitter ‘Big Data’ for automatic emotion identification. In: Proceedings of international conference on social computing (SocialCom), 2012, Amsterdam, 3–5 Sept 2012.
  45. Wang, W., Chen, L., Thirunarayan, K., and Sheth, A. P. (2014, February). Cursing in English on Twitter. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 415-425). ACM.
  46. Yazdavar, A. H., Al-Olimat, H. S., Banerjee, T., Thirunarayan, K., Pathak, J., and Sheth, A. (2016, August). Analyzing Clinical Depressive Symptoms in Twitter. Paper presented at 23rd NIMH Conference on Mental Health Services Research (MHSR): Harnessing Science to Strengthen the Public Health Impact, Bethesda, Maryland.
  47. Yuksel, K., Biggemann, S., Sheth, A., and Brunn, J. (2016). Using Social Media Data to Understand Brand Development. 2016 Direct/Interactive Marketing Research Summit (EDGE16), Los Angeles, CA, October 15, 2016.

Recommended Reading

Sheth, A. and Thirunarayan, K. (2012). Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications, Morgan & Claypool Publishers, December 9, 2012. ISBN: 1608457168

Internal

For project members only: Twitris Internal Page