Analyzing URL Chatter on Twitter
From Knoesis wiki
Contents
Project Description
This project helps analyzing the urls available in the tweets with the theme. The data(Tweets) crawled for twitris project is being used.
Objectives
- Classify the content of the urls
- Understand user perception of the websites
Motivation
- Search Engine Perspective - How to choose a page which is interesting to the user, given the keywords
- Publisher Perspective - What do people think about the page(URL)
Status
Week 1
- Extracting the Urls from the tweets - Done
- Recognizing the short/tiny Urls and transforming it into the long Urls. - Done
Week 2
- Creating a table for the Urls and the tweets to store the urls - In Progress
- Analysing the urls with the presently available themes - In Progress
Week 3
Week 4
Future work
- 1. Make sure the Url is not short by checking it recursively
- 2. Themes-Entity extraction from the tweets rather than using the present available themes
- 3. Provisions in the DB to know the popularity of the url at that particular theme.
Assumptions
- URL max length in the DB is 300