Difference between revisions of "Location Prediction of Twitter Users"
(→Evaluation) |
(→User Profile Generator) |
||
Line 28: | Line 28: | ||
==<b>User Profile Generator</b>== | ==<b>User Profile Generator</b>== | ||
− | In order to use the local entities from our knowledge base to predict a user's location, we need to map the entities from the user's tweets to Wikipedia articles. Linking entities in tweets to Wikipedia articles has been well researched. | + | In order to use the local entities from our knowledge base to predict a user's location, we need to map the entities from the user's tweets to Wikipedia articles. Linking entities in tweets to Wikipedia articles has been well researched. This task involves mapping named entities mentioned in tweets to be linked to the corresponding real world entities in Wikipedia. We use Zemanta [http://www.zemanta.com/blog/demo/]. We chose Zemanta because of their relatively superior performance (cite) and the rate limit extension (10,000 requests per day) provided for research purposes. |
==<b>Location Predictor</b>== | ==<b>Location Predictor</b>== |
Revision as of 20:42, 7 July 2014
PAGE UNDER CONSTRUCTION
Contents
[hide]Introduction
The existing approaches to predict the location of a Twitter user can be broadly grouped in two categories:
- Network based solutions
- Content based solutions
Architecture
Our approach comprises of three primary components:
- Knowledge Base Generator extracts local entities for each city from Wikipedia and scores them based on their relevance to the city
- User Profile Generator extracts the Wikipedia entities from the tweets of a user
- Location Predictor uses the output of Knowledge Base Generator and User Profile Generator to predict the location of a user
Knowledge Base Generator
We use the following four measures to score the local entities of a city, with respect to the city:
- Pointwise Mutual Information
In information theory, pointwise mutual information of two random variables is a measure of their mutual dependence. We use this idea to determine the association between a city and its local entities.
- Betweenness Centrality
We build a directed graph for each city using its internal links. The internal links correspond to the nodes of a graph. For a link from the Wikipedia page of one local entity to another, we draw an edge from the former to the latter in this graph. For example, in the graph of New York City an edge between Statue of Liberty and Manhattan indicates a link from the Wikipedia page of Statue of Liberty to the Wikipedia page of Manhattan. The betweenness centrality of each node (representing a local entity) gives the importance of the node relative to the rest of the nodes in the graph.
- Semantic Overlap Measures
We use the hyperlink structure of Wikipedia to compute the semantic relatedness of a city and its local entities. We use the following set based measures to compute the semantic overlap between a city and its local entities:
- Jaccard Index is a symmetric, set based measure that defines the similarity of two sets in terms of their overlap and is normalized for their sizes. We use this measure to find the similarity between a city and its local entities.
- Tversky Index is an asymmetric measure of given two sets. While the Jaccard Index determines the similarity between a city and a local entity, a local entity generally represents a part of the city. Thus we use Tversky Index which is a unidirectional measure of similarity of the local entity to the city.
User Profile Generator
In order to use the local entities from our knowledge base to predict a user's location, we need to map the entities from the user's tweets to Wikipedia articles. Linking entities in tweets to Wikipedia articles has been well researched. This task involves mapping named entities mentioned in tweets to be linked to the corresponding real world entities in Wikipedia. We use Zemanta [1]. We chose Zemanta because of their relatively superior performance (cite) and the rate limit extension (10,000 requests per day) provided for research purposes.
Location Predictor
To predict the location of a user, we compute a score for each city with overlapping local entities from the tweets of a user as a product of the score of the local entity with respect to the city and the frequency of occurrence of the local entity in the tweets of the user. Further, by ranking the scores in descending order, the top k cities for the user are predicted.
Evaluation
We conducted our experiments on the test data set created by Cheng et al. This data set was created in 2010 and contains 5119 active users from the continental United States, with 1000+ tweets of each user. Their locations are listed in the form of latitude and longitude co-ordinates which are generally more reliable than the location information from Twitter profile.