Real Time Twitter Filtering Framework

From Knoesis wiki
Jump to: navigation, search


Twitter, a popular microblogging platform, generates approximately 500 Million tweets everyday. These tweets are filtered by diverse domains to analyze and gain insights into the opinion of online users on corresponding topics. For instance, brands monitor tweets to track their products' success and issues, journalists follow twitter to gain insights on real-time news and developments on certain issues.

Architecture and Approach

Tweet Topic Classification

Clustering of Tweets - "Our results suggest that the clusters produced by traditional unsupervised methods can often be incoherent from a topical perspective, but utilizing a supervised methodology that utilize the hash-tags as indicators of topics produce surprisingly good results. We also offer a discussion on temporal effects of our methodology and training set size considerations. Lastly, we describe a simple method of finding the most representative tweet in a cluster, and provide an analysis of the results."

Top K Ranking of Tweets for Clusters

See above for one approach.






Active Learning or Semi supervised learning on Twitter

  1. Empirical Study of Topic Modeling on Twitter
  2. Searching for Quality Microblog Posts: Filtering and Ranking Based on Content Analysis and Implicit Links
  3. Semantics + filtering + search = twitcident. exploring information in social web streams
  4. Small worlds with a difference: New gatekeepers and the filtering of political information on twitter
  5. Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy
  6. A Semi-Supervised Bayesian Network Model for Microblog Topic Classification


  • Pavan Kapanipathi
  • Alan Smith
  • Adarsh Alex