Difference between revisions of "Domain Specific Graph Selection From Linked Open Data"

From Knoesis wiki
Jump to: navigation, search
Line 12: Line 12:
 
====Rank Properties====
 
====Rank Properties====
 
#Rank first hop properties
 
#Rank first hop properties
PMI(p<sub>1i</sub>, t<sub>1</sub>) = log { Prob(p<sub>1i</sub>,t<sub>1</sub>) / (Prob(p<sub>p1i</sub>) * Prob(t<sub>1</sub>)) }   
+
PMI(p<sub>1i</sub>, t<sub>1</sub>) = log { Prob(p<sub>1i</sub>,t<sub>1</sub>) / (Prob(p<sub>1i</sub>) * Prob(t<sub>1</sub>)) }   
  
 
==Concerns==
 
==Concerns==

Revision as of 22:38, 6 January 2015

Abstract

With the Linked Open Data (LOD) initiative, there has been a great deal of work in developing a variety of open knowledge graphs freely accessible on the Web such as DBPedia, Yago and Freebase. Lately, the focus is moving greatly towards in leveraging the knowledge graphs as a background source for various applications. Knowledge graphs can provide a valuable source of information to improve tasks such as recommendations, (semantic) similarity calculation and named entity disambiguation. One of the main usage of a knowledge graph for these applications is to find out how entities are related to each other. For example, a movie recommendation system is interested in finding out how movies are related to provide recommendations based on the implicit feedback. Existing approaches consider the neighborhood graph within predefined number of hops for this purpose. But, given a particular domain not all properties and entities are relevant. Given the movie domain, Military Unit and Death Place of a movie director will have a significantly less importance compared to his Movies and Awards. In this work, we propose a schema driven approach to select the domain specific sub graph by taking PMI as the degree of association to rank the properties and classes of the neighbourhood graph in the context of the domain (e.g movie). We evaluate our approach in the movie domain with DBPedia knowledge graph to show the effectiveness our approach for movie recommendation.

Contribution

  • Propose a method to create a domain specific subgraph given the DBpedia type to represent the domain
  • Evaluate the domain specific subgraph in the movie domain, to see its effectiveness for movie recommendation

Approach

todo:need to define the sub graph properly here Proposed approach exploits the schema of the knowledge graph primarily to identify the potential candidates for sub graph (hop types and properties) and then leverages the instance statistics to rank and filter the domain specific hop types and properties. We use the PMI as a measure to rank the properties in the context of the domain of interest to find the co occurrence of properties with the type (eg:- movie).

Rank Properties

  1. Rank first hop properties

PMI(p1i, t1) = log { Prob(p1i,t1) / (Prob(p1i) * Prob(t1)) }

Concerns

  • PMI is based on the co occurence of the properties with the given type. One of the disadvantage of using PMI is, higher could mean just a one occurence of the type with that property. PMI captures specificity of the property given the type. What kind of a justification do we provide just for considering the specificity. How about the properties that are generic but still domain specific. We did not consider conditional independence since it only captures the popularity
  • As of now, we consider P(p2i, t1 p1 t) and need to analyze whether we should consider P(p2i, t1). In the second case, when ranking two hop properties too get only the hop 1 type t1 as the context
  • We need to justify our approach versus semantic similarity
  • Possibility of using material science domain also for the evaluation
  • Comparison can be done using,
    • Page rank with equal probability as the transitional probability for four hops
    • Page rank with PMI probability as the transitional probability for four hops
    • Page rank with conditional probability as the transitional probability for four hops

To Do

Sarasi

  • Complete the second hop property ranking
  • Calculate the conditional independence
  • Need to make sure there is no intersection between first hop property set and second hop property set

Pavan