Linked Open Social Signals

From Knoesis wiki
Jump to: navigation, search

Team: Pablo N. Mendes (Kno.e.sis), Alex Passant (DERI), Pavan Kapanipathi (Kno.e.sis) and Amit P. Sheth (Kno.e.sis).

At any second of the day, millions of Web users are simultaneously publishing microblog posts (microposts) with opinions, observations and suggestions, or generally "social signals" that may represent invaluable information for businesses and researchers around the world. However, analyzing these numerous social signals can be extremely challenging. Microblog posts are streamed in large quantities every second, in textual format, creating significant information overload for the user interested on making sense of the information around a topic of interest.

In this project we investigate the representation of social signals as Linked Open Data in order to enable flexibility in handling the information overload of those interested in collectively analyzing social signals for sensemaking. Our approach can be summarized as follows:

Twarql Demonstration
Twarql Demonstration
SparqlPuSH Demonstration
SparqlPuSH Demonstration
  • extract content (entity mentions, hashtags and URLs) from microposts;
  • encode content in a structured format (RDF) using shared vocabularies (FOAF, SIOC, MOAT, etc.);
  • enable structured querying of microposts (SPARQL);
  • enable subscription to a stream of microposts that match a given query (Concept Feeds);
  • enable scalable real-time delivery of streaming data (SparqlPuSH).

The open source software developed to realize the vision of Linked Open Social Signals is called Twarql. You can find more information on how to use it on our wiki page. We also work closely with the Twitris and SMOB projects.


We have two demonstration videos and a live demo. The first video demonstrates the user perspective, interacting with the system to formulate a query and obtain microblog posts that match that query. The second video focuses on the server side and demonstrates the modules of our architecture at work, distributing the microposts via pubsubhubbub. You can try out our live demo that is currently streaming tweets about the oil spill. The featured streams will illustrate concept feeds in action, and the query page will allow you to define your own concept feed through our query formulation interface.


The driving engineering requirements in our system are: scalability and (near) real time delivery of semantically annotated information. In order to address those requirements, our architecture separates concerns, and includes decoupled implementations for collection, processing, persistence, subscription and delivery components. The coarser components of our architecture are: (i) Social Sensor Server, (ii) Semantic Publisher, (iii) Distribution Hub and (iv) Application Server.



Twarql is the name we gave to the software implementation realizing the Linked Open Social Signals vision. From the client side, users only need regular Web browsers in order to use our service. Query formulation, subscription requests, data visualization and analytical interfaces run on the client side (e.g. JavaScript-enabled Web browser) and communicate with the Web through the Application Server, all communications being done through HTTP. Upon the user request for a query, the Application Server relays the request to the Semantic Publisher, that passes the results collected from the Social Sensor Server onto Distribution Hubs for delivery. In these slides we describe the workflow between the components of the architecture:


SparqlPuSH Demonstration

Frequently Asked Questions (FAQ)

  • Commercial Value: What are the use cases? Is there commercial interest?
    • Social Media Revolution:
      • Because of the speed in which social media enables communication, word of mouth now becomes world of mouth;
      • BRANDS: 25% of search results for the World’s Top 20 largest brands are links to user-generated content; 34% of bloggers post opinions about products & brands, 78% of consumers trust peer recommendations, Only 14% trust advertisements, Only 18% of traditional TV campaigns generate a positive ROI; We will non longer search for products and services, they will find us via social media.
      • NEWS: 24 of the 25 largest newspapers are experiencing record declines in circulation; 60 millions status updates happen on Facebook daily; We no longer search for the news, the news finds us;
    • Many companies using are using microblogging data. Some companies call it real-time web intelligence or business intelligence social media,
      • Evri's technology doesn't just hear the conversation, it listens and understands what's being said. Our semantic analysis helps you give users the context around the conversation.
      • Klout is a San Francisco based company that provides social media analytics that measures a users influence across their social network.
      • "help internet users manage the incredibly huge amount of information available everyday on blogs, news sites all around the web by leveraging and integrating the social networks and the collective intelligence they contain."
      • "the ellerdale project makes data more relevant and valuable. ellerdale develops and licenses a web intelligence platform optimized for large, real-time data feeds, including all tweets sent world-wide. "
    • Content recommendation


  • Bibliography of Research on Twitter & Microblogging [9]
  • Priamos a middleware architecture for real time semantic web [10]

At Kno.e.sis

  • Social Signals @kno.e.sis
  • A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009.
  • A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009.
  • Meenakshi Nagarajan, Karthik Gomadam, Amit P. Sheth, Ajith Ranabahu, Raghava Mutharaju, Ashutosh Jadhav: Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences. WISE 2009: 539-553 [11]



Semantic Microblogging

    • HyperTwitter is semantic hashtags on Twitter. Associate hashtags together and then performer searches. Clever. Though you might want to create a special Twitter account for doing the associations rather than sending these commands through your main Twitter account.
    • Technical Report
    • "TwitLogic is a semantic data aggregator which brings together a collection of compact formats for structured microblog content with Semantic Web vocabularies and best practices in order to augment the Semantic Web with real-time, user-driven data. "

Maybe related

  • Short and Tweet: Experiments on Recommending Content from Information Streams [12]
  • Cheng. Fall'09 class project at iSchool (Berkeley). Classifying Metatweets pdf
  • Klout on health care: [13]
  • Topsy on health care: [14]
  • Krishnamurty, Gill, Arlitt. SIGCOMM'08. A few chirps about twitter. pdf
    • classifies 100,000 users in broadcasters, acquaintances, miscreants or evangelists.

Streaming SPARQL

  • Barbieri et al. C-SPARQL: SPARQL for Continuous Querying WWW'09 poster EDBT'10
  • Barbieri and Della Valle, LDOW2010. A Proposal for Publishing Data Streams as Linked Data (A Position Paper) [15]
  • Streaming SPARQL - Extending SPARQL to Process Data Streams ESWC'08
  • A SPARQL Engine for Streaming RDF Data SITIS'07


Publication Timeline


Link to internal project page: