Analyzing URL Chatter on Twitter

From Knoesis wiki
Revision as of 17:07, 21 November 2009 by Pavan (Talk | contribs)

Jump to: navigation, search

Project Description

This project helps analyzing the urls available in the tweets with the theme. The data(Tweets) crawled for twitris project is being used.

Introduction(Objectives and Motivation)

Twitter is a free social networking and microblogging service. This enables user to put up their thoughts on an event, what they see, they do etc.. with around 140 characters which are termed as tweets. The twitris project developed by the knoesis center is a semantic Web application(uses tweets) that facilitates browsing for news and information, using social perceptions as the fulcrum . The twitris project does

  • Crawling of tweets
  • Spatio Temporal Thematic Analysis
  • Browsing using social signals as the fulcrum.

This project is an extension to the twitris project which uses the tweets containing any url for analysis. The crawled data for the twitris project is used and also some of the functionalities are adapted from the twitris system.

Since twitter is microblogging which just provides 140 characters to input, the space should be managed appropriately. Hence there are services which transform a long url into short. Short is anywhere between 25 to 30 characters. These are short urls.

The Url is an address for a document or a resource on the world wide web. The document/Resource is owned/published, read and searched. Here we get three perspectives for analysing a url. Publisher perspective,  User Perspective and the Search Engine perspective. The publisher gets to know where and how  the document he has published is being viewed, the User can choose the urls which are most talked about regarding his theme of interest and the search engine can use the analysis for better search.
The project analyzes the urls in the tweets over the time and the themes which are extracted during the twitris project. 

Status

Week 1

Programming Language:Java

  • Extracting the Urls from the tweets - Done
  • Recognizing the short/tiny Urls and transforming it into the long Urls. - Done

Week 2

Programming Language:Java

  • Creating a table for the Urls and the tweets to store the urls - Done
  • Analysing the urls with the presently available themes - Done

Week 3

Languages: Sql, Java(Servlets), JavaScript(Jquery), HTML, XML

  • Queries for performing the related operations
  • Working around with the Timeline javascript to integrate with the project

Week 4

  • Integrating the code to show the desired results

Future work

  • 1. Make sure the Url is not short by checking it recursively
  • 2. Themes-Entity extraction from the tweets rather than using the present available themes
  • 3. Provisions in the DB to know the popularity of the url at that particular theme.