Difference between revisions of "Linked Open Social Signals"

From Knoesis wiki
Jump to: navigation, search
(Twarql)
 
(48 intermediate revisions by 2 users not shown)
Line 1: Line 1:
At any second of the day, millions of Web users are simultaneously publishing opinions, observations and suggestions, or generally "social signals" that may represent invaluable information for businesses and researchers around the world.  
+
'''Team''': [http://pablomendes.com Pablo N. Mendes] (Kno.e.sis), [http://apassant.net Alex Passant] (DERI), [http://knoesis.wright.edu/students/pavan Pavan Kapanipathi]  (Kno.e.sis) and [http://knoesis.wright.edu/amit Amit P. Sheth]  (Kno.e.sis).
  
In this work we investigate the representation of social signals as structured data in order to enable flexibility in handling the information overload of those interested in collectively analyzing social signals for sensemaking.  
+
At any second of the day, millions of Web users are simultaneously publishing microblog posts (microposts) with opinions, observations and suggestions, or generally "social signals" that may represent invaluable information for businesses and researchers around the world. However, analyzing these numerous social signals can be extremely challenging. Microblog posts are streamed in large quantities every second, in textual format, creating significant information overload for the user interested
 +
on making sense of the information around a topic of interest.
  
This is work in progress by Pablo N. Mendes (Kno.e.sis), Alex Passant (DERI), Pavan Kapanipathi  (Kno.e.sis) and Amit P. Sheth  (Kno.e.sis). It builds upon [[Twitris]] and [http://smob.me SMOB].
+
In this project we investigate the representation of social signals as Linked Open Data in order to enable flexibility in handling the information overload of those interested in collectively analyzing social signals for sensemaking. Our approach can be summarized as follows:
 +
[[Image:FirstFrame.png|thumb||right|top||link=http://bit.ly/twarql|alt=Twarql Demonstration|Twarql Demonstration http://bit.ly/twarql]]
 +
[[Image:SparqlPush.png|thumb||right|bottom||link=http://bit.ly/sparqlpush|alt=SparqlPuSH Demonstration|SparqlPuSH Demonstration http://bit.ly/sparqlpush]]
 +
* extract content (entity mentions, hashtags and URLs) from microposts;
 +
* encode content in a structured format (RDF) using shared vocabularies (FOAF, SIOC, MOAT, etc.);
 +
* enable structured querying of microposts (SPARQL);
 +
* enable subscription to a stream of microposts that match a given query (Concept Feeds);
 +
* enable scalable real-time delivery of streaming data ([http://code.google.com/p/sparqlpush SparqlPuSH]).
  
= Quick Info =
+
The open source software developed to realize the vision of Linked Open Social Signals is called [[Twarql]]. You can find more information on how to use it on our [[Twarql|wiki page]]. We also work closely with the [[Twitris]] and [http://smob.me SMOB] projects.
* Real Time: the load estimate for the health care topic drinking from the firehose is  
+
** 1 post per second
+
** 35K triples per hour (tph) or 10 triples per second, steady over HTTP SPARQL Update. Feasible?
+
* Writeup: Get it from here.
+
  
= Pitching =
+
= Demonstration =
  
* Introduction: But are tweets interesting at all?
+
We have two demonstration videos and a live demo. The [http://bit.ly/twarql first video] demonstrates the user perspective, interacting with the system to formulate a query and obtain microblog posts that match that query. The [http://bit.ly/sparqlpush second video] focuses on the server side and demonstrates the modules of our architecture at work, distributing the microposts via pubsubhubbub.
** Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions. http://www.loc.gov/tweet/how-tweet-it-is.html
+
You can try out our live demo that is currently streaming tweets about the [http://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill oil spill]. The [http://knoesis1.wright.edu/twarql/featured.html featured streams] will illustrate concept feeds in action, and the [http://knoesis1.wright.edu/twarql/query.html query page] will allow you to define your own concept feed through our query formulation interface.
  
* Annotation: Why annotate tweets?
+
= Architecture =
** Ideas for Twitter’s new Annotations — from obvious to intriguing http://digital.venturebeat.com/2010/04/16/twitter-annotations/
+
  
* Information Overload: but how can you make sense of so much data?
+
The driving engineering requirements in our system are: scalability and (near) real time delivery of semantically annotated
** As reported by ReadWriteWeb recently, during an emergency it’s practically impossible to get status updates on things like roads, hospitals, airports, and people using Twitter [http://www.readwriteweb.com/archives/a_new_twitter_hashtag_syntax_to_help_during_catast.php]
+
information. In order to address those requirements, our architecture separates concerns, and includes decoupled implementations for collection, processing, persistence, subscription and delivery components. The coarser components of our architecture are: (i) Social Sensor Server, (ii) Semantic Publisher, (iii) Distribution Hub and (iv) Application Server.
** Twitter as a poor vehicle for marketing [http://blog.hubspot.com/blog/tabid/6307/bid/4694/Why-Twitter-Hashtags-and-Trending-Topics-Are-Useless-to-Marketers.aspx]. Many people make up hashtags as they tweet, exploding the semantic graph, creating more semantic dispersion. Some promising new tools that can help you quickly put a hashtag in context — or let people easily look up the meaning of the hashtags you launch or use [http://www.contentious.com/2009/03/03/whats-that-hashtag-new-glossary-tools-for-twitter/]
+
 
** Wouldn’t it be cool if Twitter had a topic backbone and you could snap your tweets to it as you write them? [http://thepowerofpull.com/pull/twitter-is-unstructured-web-push]
+
[[image:TwarqlArchitecture.png|center]]
 +
 
 +
 
 +
== Twarql ==
 +
 
 +
Twarql is the name we gave to the software implementation realizing the Linked Open Social Signals vision. From the client side, users only need regular Web browsers in order to use our service. Query formulation, subscription requests, data visualization and analytical interfaces run on the client side (e.g. JavaScript-enabled Web browser) and communicate with the Web through the Application Server, all communications being done through HTTP. Upon the user request for a query, the Application Server relays the request to the Semantic Publisher, that passes the results collected from the Social Sensor Server onto Distribution Hubs for delivery. In these slides we describe the workflow between the components of the architecture:
 +
 
 +
{{#widget:SlideShare
 +
|id=4008063
 +
|width=425
 +
|height=355
 +
}}
 +
 
 +
== SparqlPuSH ==
 +
 
 +
* More about the SPARQL Push implementation: [http://apassant.net/blog/2010/04/18/sparql-pubsubhubbub-sparqlpush Introducing sparqlPuSH], [http://code.google.com/p/sparqlpush open source implementation]
 +
 
 +
[[Image:SparqlPush.png||||||link=http://bit.ly/sparqlpush|alt=SparqlPuSH Demonstration|SparqlPuSH Demonstration http://bit.ly/sparqlpush]]
 +
 
 +
= Frequently Asked Questions (FAQ) =
 +
 
 +
* '''Information relevance''': Are tweets interesting at all?
 +
** "An unprecedented analysis reveals that the micro-blogging service is remarkably effective at spreading "important" information." http://www.technologyreview.com/blog/guest/25128/?a=f
 +
** "We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature." http://an.kaist.ac.kr/~haewoon/papers/2010-www-twitter.pdf
 +
** "Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions." http://www.loc.gov/tweet/how-tweet-it-is.html
 +
** "We are trying to identify three pieces of information: where the customer experienced problems, what type of problem, and when they experienced it," from AT&T http://www.readwriteweb.com/archives/tweeting_your_iphone_angst_att_is_listening_on_twitter.php
  
* Use cases and commercial interest
+
* '''Commercial Value''': What are the use cases? Is there commercial interest?
 +
** Social Media Revolution: [http://socialnomics.net/2010/05/05/social-media-revolution-2-refresh/ http://socialnomics.net/]
 +
*** Because of the speed in which social media enables communication, word of mouth now becomes world of mouth;
 +
*** BRANDS: 25% of search results for the World’s Top 20 largest brands are links to user-generated content; 34% of bloggers post opinions about products & brands, 78% of consumers trust peer recommendations, Only 14% trust advertisements, Only 18% of traditional TV campaigns generate a positive ROI; We will non longer search for products and services, they will find us via social media.
 +
*** NEWS: 24 of the 25 largest newspapers are experiencing record declines in circulation; 60 millions status updates happen on Facebook daily; We no longer search for the news, the news finds us;
 
** Many companies using are using microblogging data. Some companies call it real-time web intelligence or business intelligence social media,
 
** Many companies using are using microblogging data. Some companies call it real-time web intelligence or business intelligence social media,
 
** http://www.evri.com/
 
** http://www.evri.com/
 +
*** Evri's technology doesn't just hear the conversation, it listens and understands what's being said. Our semantic analysis helps you give users the context around the conversation.
 +
** http://klout.com
 +
*** Klout is a San Francisco based company that provides social media analytics that measures a users influence across their social network.
 
** http://tweetmeme.com/
 
** http://tweetmeme.com/
 
** http://www.sysomos.com/
 
** http://www.sysomos.com/
 
** http://www.bing.com/twitter
 
** http://www.bing.com/twitter
 
** http://www.gnip.com/
 
** http://www.gnip.com/
 +
** http://faveeo.com/front-temp
 +
*** "help internet users manage the incredibly huge amount of information available everyday on blogs, news sites all around the web by leveraging and integrating the social networks and the collective intelligence they contain."
 
** http://www.ellerdale.com/
 
** http://www.ellerdale.com/
*** the ellerdale project makes data more relevant and valuable. ellerdale develops and licenses a web intelligence platform optimized for large, real-time data feeds, including all tweets sent world-wide.  
+
*** "the ellerdale project makes data more relevant and valuable. ellerdale develops and licenses a web intelligence platform optimized for large, real-time data feeds, including all tweets sent world-wide. "
 
** Content recommendation  
 
** Content recommendation  
 
*** http://getglue.com/
 
*** http://getglue.com/
  
* Information Delivery: Push vs Pull
+
* '''Information Overload''': but how can you make sense of so much data?
 +
** As reported by ReadWriteWeb recently, during an emergency it’s practically impossible to get status updates on things like roads, hospitals, airports, and people using Twitter [http://www.readwriteweb.com/archives/a_new_twitter_hashtag_syntax_to_help_during_catast.php]
 +
** Twitter as a poor vehicle for marketing [http://blog.hubspot.com/blog/tabid/6307/bid/4694/Why-Twitter-Hashtags-and-Trending-Topics-Are-Useless-to-Marketers.aspx]. Many people make up hashtags as they tweet, exploding the semantic graph, creating more semantic dispersion. Some promising new tools that can help you quickly put a hashtag in context — or let people easily look up the meaning of the hashtags you launch or use [http://www.contentious.com/2009/03/03/whats-that-hashtag-new-glossary-tools-for-twitter/]
 +
** Wouldn’t it be cool if Twitter had a topic backbone and you could snap your tweets to it as you write them? [http://thepowerofpull.com/pull/twitter-is-unstructured-web-push]
 +
** "51% of workers spend half their work days managing and processing information rather than using that information to do their jobs" http://www.readwriteweb.com/enterprise/2010/11/enterprise-poll-information-overload.php
 +
* '''Annotation''': Why annotate tweets?
 +
** Ideas for Twitter’s new Annotations — from obvious to intriguing http://digital.venturebeat.com/2010/04/16/twitter-annotations/
 +
** Preliminary look at Twitter Annotations [http://mehack.com/extremely-preliminary-look-at-twitters-annota]
 +
** [http://bitwacker.wordpress.com/2010/01/08/is-semtweet-client-service-nanoformat Is #semtweet a client, service or nanoformat?]
 +
** Twitter Annotations are a big deal. http://www.mmmeeja.com/blog/semantic-web/twitter-annotations-rdf.html
 +
** [http://scobleizer.com/2009/11/20/twitter-to-turn-on-advertising-you-will-love-heres-how-supertweet/ Super Tweets], advertising
 +
** [http://gigaom.com/2010/06/20/twitter-annotations-are-coming-what-do-they-mean-for-twitter-and-the-web Twitter Annotations are coming. What do they mean for Twitter and the Web?]
 +
** [http://www.readwriteweb.com/archives/how_twitter_annotations_could_bring_the_real-time_semantic_web_together.php How Twitter Annotations Could Bring the Real-Time and the Semantic Web Together]
 +
 
 +
* '''Information Delivery''': Real Time, Push vs Pull
 +
* "Real-time search is a response to a fundamental shift in the way people use the Web. People used to visit a page, click a link, and visit another page. Now they spend a lot of time monitoring streams of data--tweets, status updates, headlines--from services like Facebook and Twitter, as well as from blogs and news outlets." http://www.technologyreview.com/computing/25079/page1/
 
** Siegel’s rule for information life span: The half-life relevance of a piece of pushed information is about the same as the frequency of the medium. [http://thepowerofpull.com/pull/twitter-is-unstructured-web-push]
 
** Siegel’s rule for information life span: The half-life relevance of a piece of pushed information is about the same as the frequency of the medium. [http://thepowerofpull.com/pull/twitter-is-unstructured-web-push]
 
*** Twitter developed a new set of frameworks @anywhere for adding this Twitter experience anywhere on the web. Imagine being able to follow a New York Times journalist directly from her byline, tweet about a video without leaving YouTube, and discover new Twitter accounts while visiting the Yahoo! home page—and that’s just the beginning. [http://blog.twitter.com/2010/03/anywhere.html]
 
*** Twitter developed a new set of frameworks @anywhere for adding this Twitter experience anywhere on the web. Imagine being able to follow a New York Times journalist directly from her byline, tweet about a video without leaving YouTube, and discover new Twitter accounts while visiting the Yahoo! home page—and that’s just the beginning. [http://blog.twitter.com/2010/03/anywhere.html]
Line 43: Line 95:
 
** Decentralized Microblogging
 
** Decentralized Microblogging
 
*** [http://jeffsayre.com/2010/02/24/a-flock-of-twitters-decentralized-semantic-microblogging decentralized semantic microblogging].
 
*** [http://jeffsayre.com/2010/02/24/a-flock-of-twitters-decentralized-semantic-microblogging decentralized semantic microblogging].
 +
* Google Real time search http://www.wallblog.co.uk/2010/08/27/google-launches-home-for-real-time-with-alerts-for-social-media-monitoring/
  
= Architecture =
+
* Use of SPARQL
 
+
** SPARQLZ http://www.readwriteweb.com/archives/sparqlz.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+readwriteweb+%28ReadWriteWeb%29
[http://apassant.net/blog/2010/04/18/sparql-pubsubhubbub-sparqlpush sparqlPuSH]
+
http://code.google.com/p/sparqlpush
+
 
+
  
 
= Related =
 
= Related =
Line 75: Line 125:
  
 
== Semantic Microblogging ==
 
== Semantic Microblogging ==
 +
 +
* [http://bitwacker.wordpress.com/2010/01/08/is-semtweet-client-service-nanoformat Is #semtweet a client, service or nanoformat?]
 
* http://smob.me
 
* http://smob.me
 
* http://semantictweet.com
 
* http://semantictweet.com
Line 84: Line 136:
 
** HyperTwitter is semantic hashtags on Twitter. Associate hashtags together and then performer searches. Clever. Though you might want to create a special Twitter account for doing the associations rather than sending these commands through your main Twitter account.
 
** HyperTwitter is semantic hashtags on Twitter. Associate hashtags together and then performer searches. Clever. Though you might want to create a special Twitter account for doing the associations rather than sending these commands through your main Twitter account.
 
** [http://www.heppnetz.de/files/hypertwitter-TR.pdf Technical Report]
 
** [http://www.heppnetz.de/files/hypertwitter-TR.pdf Technical Report]
 +
* http://twitlogic.fortytwo.net/
 +
** "TwitLogic is a semantic data aggregator which brings together a collection of compact formats for structured microblog content with Semantic Web vocabularies and best practices in order to augment the Semantic Web with real-time, user-driven data. "
  
 
* http://www.semanticwave.com/blog/archives/2008/01/hashtags.jsp
 
* http://www.semanticwave.com/blog/archives/2008/01/hashtags.jsp
Line 111: Line 165:
  
 
[[Category:Information Extraction]][[Category:Information Exploration]][[Category:Annotation]][[Category:Microblogging]][[Category:RDF]][[Category:SPARQL]]
 
[[Category:Information Extraction]][[Category:Information Exploration]][[Category:Annotation]][[Category:Microblogging]][[Category:RDF]][[Category:SPARQL]]
 +
 +
= Publication Timeline =
 +
 +
* Related project [[Dynamic Linked Open Data]] accepted as a poster at WebSci'10.
 +
* Submitted to [http://www.yorku.ca/wiiat10/index.php?1 WI'2010] on April 2nd, 2010.
 +
 +
= Internal =
 +
 +
Link to internal project page: http://knoesis.wright.edu/internal/wiki/index.php/Lotter

Latest revision as of 14:21, 4 February 2016

Team: Pablo N. Mendes (Kno.e.sis), Alex Passant (DERI), Pavan Kapanipathi (Kno.e.sis) and Amit P. Sheth (Kno.e.sis).

At any second of the day, millions of Web users are simultaneously publishing microblog posts (microposts) with opinions, observations and suggestions, or generally "social signals" that may represent invaluable information for businesses and researchers around the world. However, analyzing these numerous social signals can be extremely challenging. Microblog posts are streamed in large quantities every second, in textual format, creating significant information overload for the user interested on making sense of the information around a topic of interest.

In this project we investigate the representation of social signals as Linked Open Data in order to enable flexibility in handling the information overload of those interested in collectively analyzing social signals for sensemaking. Our approach can be summarized as follows:

Twarql Demonstration
Twarql Demonstration http://bit.ly/twarql
SparqlPuSH Demonstration
SparqlPuSH Demonstration http://bit.ly/sparqlpush
  • extract content (entity mentions, hashtags and URLs) from microposts;
  • encode content in a structured format (RDF) using shared vocabularies (FOAF, SIOC, MOAT, etc.);
  • enable structured querying of microposts (SPARQL);
  • enable subscription to a stream of microposts that match a given query (Concept Feeds);
  • enable scalable real-time delivery of streaming data (SparqlPuSH).

The open source software developed to realize the vision of Linked Open Social Signals is called Twarql. You can find more information on how to use it on our wiki page. We also work closely with the Twitris and SMOB projects.

Demonstration

We have two demonstration videos and a live demo. The first video demonstrates the user perspective, interacting with the system to formulate a query and obtain microblog posts that match that query. The second video focuses on the server side and demonstrates the modules of our architecture at work, distributing the microposts via pubsubhubbub. You can try out our live demo that is currently streaming tweets about the oil spill. The featured streams will illustrate concept feeds in action, and the query page will allow you to define your own concept feed through our query formulation interface.

Architecture

The driving engineering requirements in our system are: scalability and (near) real time delivery of semantically annotated information. In order to address those requirements, our architecture separates concerns, and includes decoupled implementations for collection, processing, persistence, subscription and delivery components. The coarser components of our architecture are: (i) Social Sensor Server, (ii) Semantic Publisher, (iii) Distribution Hub and (iv) Application Server.

TwarqlArchitecture.png


Twarql

Twarql is the name we gave to the software implementation realizing the Linked Open Social Signals vision. From the client side, users only need regular Web browsers in order to use our service. Query formulation, subscription requests, data visualization and analytical interfaces run on the client side (e.g. JavaScript-enabled Web browser) and communicate with the Web through the Application Server, all communications being done through HTTP. Upon the user request for a query, the Application Server relays the request to the Semantic Publisher, that passes the results collected from the Social Sensor Server onto Distribution Hubs for delivery. In these slides we describe the workflow between the components of the architecture:

SparqlPuSH

SparqlPuSH Demonstration

Frequently Asked Questions (FAQ)

  • Commercial Value: What are the use cases? Is there commercial interest?
    • Social Media Revolution: http://socialnomics.net/
      • Because of the speed in which social media enables communication, word of mouth now becomes world of mouth;
      • BRANDS: 25% of search results for the World’s Top 20 largest brands are links to user-generated content; 34% of bloggers post opinions about products & brands, 78% of consumers trust peer recommendations, Only 14% trust advertisements, Only 18% of traditional TV campaigns generate a positive ROI; We will non longer search for products and services, they will find us via social media.
      • NEWS: 24 of the 25 largest newspapers are experiencing record declines in circulation; 60 millions status updates happen on Facebook daily; We no longer search for the news, the news finds us;
    • Many companies using are using microblogging data. Some companies call it real-time web intelligence or business intelligence social media,
    • http://www.evri.com/
      • Evri's technology doesn't just hear the conversation, it listens and understands what's being said. Our semantic analysis helps you give users the context around the conversation.
    • http://klout.com
      • Klout is a San Francisco based company that provides social media analytics that measures a users influence across their social network.
    • http://tweetmeme.com/
    • http://www.sysomos.com/
    • http://www.bing.com/twitter
    • http://www.gnip.com/
    • http://faveeo.com/front-temp
      • "help internet users manage the incredibly huge amount of information available everyday on blogs, news sites all around the web by leveraging and integrating the social networks and the collective intelligence they contain."
    • http://www.ellerdale.com/
      • "the ellerdale project makes data more relevant and valuable. ellerdale develops and licenses a web intelligence platform optimized for large, real-time data feeds, including all tweets sent world-wide. "
    • Content recommendation

Related

  • Bibliography of Research on Twitter & Microblogging [9]
  • Priamos a middleware architecture for real time semantic web [10]

At Kno.e.sis

  • Social Signals @kno.e.sis
  • A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009.
  • A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009.
  • Meenakshi Nagarajan, Karthik Gomadam, Amit P. Sheth, Ajith Ranabahu, Raghava Mutharaju, Ashutosh Jadhav: Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences. WISE 2009: 539-553 [11]

At DERI

At Twitter.com

Semantic Microblogging

  • http://semantictwitter.appspot.com/
    • HyperTwitter is semantic hashtags on Twitter. Associate hashtags together and then performer searches. Clever. Though you might want to create a special Twitter account for doing the associations rather than sending these commands through your main Twitter account.
    • Technical Report
  • http://twitlogic.fortytwo.net/
    • "TwitLogic is a semantic data aggregator which brings together a collection of compact formats for structured microblog content with Semantic Web vocabularies and best practices in order to augment the Semantic Web with real-time, user-driven data. "

Maybe related

  • Short and Tweet: Experiments on Recommending Content from Information Streams [12]
  • Cheng. Fall'09 class project at iSchool (Berkeley). Classifying Metatweets pdf
  • Klout on health care: [13]
  • Topsy on health care: [14]
  • Krishnamurty, Gill, Arlitt. SIGCOMM'08. A few chirps about twitter. pdf
    • classifies 100,000 users in broadcasters, acquaintances, miscreants or evangelists.

Streaming SPARQL

  • Barbieri et al. C-SPARQL: SPARQL for Continuous Querying WWW'09 poster EDBT'10
  • Barbieri and Della Valle, LDOW2010. A Proposal for Publishing Data Streams as Linked Data (A Position Paper) [15]
  • Streaming SPARQL - Extending SPARQL to Process Data Streams ESWC'08
  • A SPARQL Engine for Streaming RDF Data SITIS'07

Scalability

Publication Timeline

Internal

Link to internal project page: http://knoesis.wright.edu/internal/wiki/index.php/Lotter