Linked Open Social Signals

From Knoesis wiki
Revision as of 00:14, 20 May 2010 by Pablo (Talk | contribs) (Architecture)

Jump to: navigation, search

Team: Pablo N. Mendes (Kno.e.sis), Alex Passant (DERI), Pavan Kapanipathi (Kno.e.sis) and Amit P. Sheth (Kno.e.sis).

At any second of the day, millions of Web users are simultaneously publishing microblog posts (microposts) with opinions, observations and suggestions, or generally "social signals" that may represent invaluable information for businesses and researchers around the world. However, analyzing these numerous social signals can be extremely challenging. Microblog posts are streamed in large quantities every second, in textual format, creating significant information overload for the user interested on making sense of the information around a topic of interest.

In this project we investigate the representation of social signals as Linked Open Data in order to enable flexibility in handling the information overload of those interested in collectively analyzing social signals for sensemaking. Our approach can be summarized as follows:

Error creating thumbnail: File missing
Twarql Demonstration http://bit.ly/twarql
Error creating thumbnail: File missing
SparqlPuSH Demonstration http://bit.ly/sparqlpush
  • extract content (entity mentions, hashtags and URLs) from microposts;
  • encode content in a structured format (RDF) using shared vocabularies (FOAF, SIOC, MOAT, etc.);
  • enable structured querying of microposts (SPARQL);
  • enable subscription to a stream of microposts that match a given query (Concept Feeds);
  • enable scalable real-time delivery of streaming data (SparqlPuSH).

The open source software developed to realize the vision of Linked Open Social Signals is called Twarql. You can find more information on how to use it on our wiki page. We also work closely with the Twitris and SMOB projects.

Demonstration

We have two demonstration videos and a live demo. The first video demonstrates the user perspective, interacting with the system to formulate a query and obtain microblog posts that match that query. The second video focuses on the server side and demonstrates the modules of our architecture at work, distributing the microposts via pubsubhubbub. You can try out our live demo that is currently streaming tweets about the oil spill. The featured streams will illustrate concept feeds in action, and the query page will allow you to define your own concept feed through our query formulation interface.

Architecture

The driving engineering requirements in our system are: scalability and (near) real time delivery of semantically annotated information. In order to address those requirements, our architecture separates concerns, and includes decoupled implementations for collection, processing, persistence, subscription and delivery components. The coarser components of our architecture are: (i) Social Sensor Server, (ii) Semantic Publisher, (iii) Distribution Hub and (iv) Application Server.

TwarqlArchitecture.png


Twarql

Twarql is the name we gave to the software implementation realizing the Linked Open Social Signals vision. From the client side, users only need regular Web browsers in order to use our service. Query formulation, subscription requests, data visualization and analytical interfaces run on the client side (e.g. JavaScript-enabled Web browser) and communicate with the Web through the Application Server, all communications being done through HTTP. Upon the user request for a query, the Application Server relays the request to the Semantic Publisher, that passes the results collected from the Social Sensor Server onto Distribution Hubs for delivery. In these slides we describe the workflow between the components of the architecture:

SparqlPuSH

SparqlPuSH Demonstration

Frequently Asked Questions (FAQ)

  • Information relevance: Are tweets interesting at all?
  • Commercial Value: What are the use cases? Is there commercial interest?
    • Social Media Revolution: http://socialnomics.net/
      • Because of the speed in which social media enables communication, word of mouth now becomes world of mouth;
      • BRANDS: 25% of search results for the World’s Top 20 largest brands are links to user-generated content; 34% of bloggers post opinions about products & brands, 78% of consumers trust peer recommendations, Only 14% trust advertisements, Only 18% of traditional TV campaigns generate a positive ROI; We will non longer search for products and services, they will find us via social media.
      • NEWS: 24 of the 25 largest newspapers are experiencing record declines in circulation; 60 millions status updates happen on Facebook daily; We no longer search for the news, the news finds us;
    • Many companies using are using microblogging data. Some companies call it real-time web intelligence or business intelligence social media,
    • http://www.evri.com/
      • Evri's technology doesn't just hear the conversation, it listens and understands what's being said. Our semantic analysis helps you give users the context around the conversation.
    • http://tweetmeme.com/
    • http://www.sysomos.com/
    • http://www.bing.com/twitter
    • http://www.gnip.com/
    • http://faveeo.com/front-temp
      • "help internet users manage the incredibly huge amount of information available everyday on blogs, news sites all around the web by leveraging and integrating the social networks and the collective intelligence they contain."
    • http://www.ellerdale.com/
      • "the ellerdale project makes data more relevant and valuable. ellerdale develops and licenses a web intelligence platform optimized for large, real-time data feeds, including all tweets sent world-wide. "
    • Content recommendation
  • Information Overload: but how can you make sense of so much data?
    • As reported by ReadWriteWeb recently, during an emergency it’s practically impossible to get status updates on things like roads, hospitals, airports, and people using Twitter [1]
    • Twitter as a poor vehicle for marketing [2]. Many people make up hashtags as they tweet, exploding the semantic graph, creating more semantic dispersion. Some promising new tools that can help you quickly put a hashtag in context — or let people easily look up the meaning of the hashtags you launch or use [3]
    • Wouldn’t it be cool if Twitter had a topic backbone and you could snap your tweets to it as you write them? [4]
  • Information Delivery: Real Time, Push vs Pull
  • "Real-time search is a response to a fundamental shift in the way people use the Web. People used to visit a page, click a link, and visit another page. Now they spend a lot of time monitoring streams of data--tweets, status updates, headlines--from services like Facebook and Twitter, as well as from blogs and news outlets." http://www.technologyreview.com/computing/25079/page1/
    • Siegel’s rule for information life span: The half-life relevance of a piece of pushed information is about the same as the frequency of the medium. [6]
      • Twitter developed a new set of frameworks @anywhere for adding this Twitter experience anywhere on the web. Imagine being able to follow a New York Times journalist directly from her byline, tweet about a video without leaving YouTube, and discover new Twitter accounts while visiting the Yahoo! home page—and that’s just the beginning. [7]
    • Persistent Search: http://billburnham.blogs.com/burnhamsbeat/2006/04/persistent_sear.html
    • Understanding the Real-Time Web for Web Developers [8]
    • Decentralized Microblogging

Related

  • Bibliography of Research on Twitter & Microblogging [9]
  • Priamos a middleware architecture for real time semantic web [10]

At Kno.e.sis

  • Social Signals @kno.e.sis
  • A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009.
  • A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009.
  • Meenakshi Nagarajan, Karthik Gomadam, Amit P. Sheth, Ajith Ranabahu, Raghava Mutharaju, Ashutosh Jadhav: Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences. WISE 2009: 539-553 [11]

At DERI

At Twitter.com

Semantic Microblogging

  • http://semantictwitter.appspot.com/
    • HyperTwitter is semantic hashtags on Twitter. Associate hashtags together and then performer searches. Clever. Though you might want to create a special Twitter account for doing the associations rather than sending these commands through your main Twitter account.
    • Technical Report
  • http://twitlogic.fortytwo.net/
    • "TwitLogic is a semantic data aggregator which brings together a collection of compact formats for structured microblog content with Semantic Web vocabularies and best practices in order to augment the Semantic Web with real-time, user-driven data. "

Maybe related

  • Short and Tweet: Experiments on Recommending Content from Information Streams [12]
  • Cheng. Fall'09 class project at iSchool (Berkeley). Classifying Metatweets pdf
  • Klout on health care: [13]
  • Topsy on health care: [14]
  • Krishnamurty, Gill, Arlitt. SIGCOMM'08. A few chirps about twitter. pdf
    • classifies 100,000 users in broadcasters, acquaintances, miscreants or evangelists.

Streaming SPARQL

  • Barbieri et al. C-SPARQL: SPARQL for Continuous Querying WWW'09 poster EDBT'10
  • Barbieri and Della Valle, LDOW2010. A Proposal for Publishing Data Streams as Linked Data (A Position Paper) [15]
  • Streaming SPARQL - Extending SPARQL to Process Data Streams ESWC'08
  • A SPARQL Engine for Streaming RDF Data SITIS'07

Scalability

Publication Timeline

Internal

Link to internal project page: http://knoesis.wright.edu/internal/wiki/index.php/Lotter