Difference between revisions of "Entity Summary"

From Knoesis wiki
Jump to: navigation, search
Line 7: Line 7:
 
= Problem =
 
= Problem =
 
'''Problem Statement''' -  An entity is usually described using a conceptually different set of facts to improve coverage. We want to select a ‘representative’ subset of this set in a good summary to uniquely identify the entity.
 
'''Problem Statement''' -  An entity is usually described using a conceptually different set of facts to improve coverage. We want to select a ‘representative’ subset of this set in a good summary to uniquely identify the entity.
 +
 +
Definitions 1-4 defines basic notions related to entity summaries. They are as stated in [1].
  
 
'''Definition 1''' : A data graph is a digraph G = V, A, Lbl<sub>V</sub> , Lbl<sub>A</sub> , where (i) V is a finite set of nodes, (ii) A is a finite set of directed edges where each a ∈ A has a source node Src(a) ∈ V, a target node Tgt(a) ∈ V, (iii) Llb<sub>V</sub> : V → E ∪ L and (iv) Lbl<sub>A</sub> : A → P are labeling functions that map nodes to entities or literals and edges to properties.
 
'''Definition 1''' : A data graph is a digraph G = V, A, Lbl<sub>V</sub> , Lbl<sub>A</sub> , where (i) V is a finite set of nodes, (ii) A is a finite set of directed edges where each a ∈ A has a source node Src(a) ∈ V, a target node Tgt(a) ∈ V, (iii) Llb<sub>V</sub> : V → E ∪ L and (iv) Lbl<sub>A</sub> : A → P are labeling functions that map nodes to entities or literals and edges to properties.
Line 145: Line 147:
 
=== Dataset ===
 
=== Dataset ===
 
Evauation data is available for download [http://knoesis.wright.edu/researchers/kalpa/faces_evaluation.zip download]
 
Evauation data is available for download [http://knoesis.wright.edu/researchers/kalpa/faces_evaluation.zip download]
 +
 +
 +
= References =
 +
[1] Cheng, Gong, Thanh Tran, and Yuzhong Qu. "RELIN: relatedness and informativeness-based centrality for entity summarization." In The Semantic Web–ISWC 2011, pp. 114-129. Springer Berlin Heidelberg, 2011.

Revision as of 16:53, 15 May 2014

Check back on Friday May 16th 2014 for more details...

Creating Faceted (divesified) Entity Summaries

Creating entity summaries has been of contemporary interest in the Semantic Web community in the recet past. In our approach called FACES: FACed Entity Summaries, we are interested in generating diversified and user friendly summaries.


Problem

Problem Statement - An entity is usually described using a conceptually different set of facts to improve coverage. We want to select a ‘representative’ subset of this set in a good summary to uniquely identify the entity.

Definitions 1-4 defines basic notions related to entity summaries. They are as stated in [1].

Definition 1 : A data graph is a digraph G = V, A, LblV , LblA , where (i) V is a finite set of nodes, (ii) A is a finite set of directed edges where each a ∈ A has a source node Src(a) ∈ V, a target node Tgt(a) ∈ V, (iii) LlbV : V → E ∪ L and (iv) LblA : A → P are labeling functions that map nodes to entities or literals and edges to properties.

Defintion 2 : A feature f is a property-value pair where Prop(f ) ∈ P and Val(f ) ∈ E ∪ L denote the property and the value, respectively. An entity e has a feature f in a data graph G = V, A, LblV , LblA if there exists a ∈ A such that LblA (a) = Prop(f ), LblV (Src(a)) = e and LblV (Tgt(a)) = Val(f ).

Definition 3 : Given a data graph G, the feature set of an entity e, denoted by FS(e), is the set of all features of e that can be found in G.

Definition 4 : Given FS(e) and a positive integer k < |FS(e)|, summary of entity e is Summ(e) ⊂ FS(e) such that |Summ(e)| = k.


Evaluation

System k = 5 FACES % ↑ k = 10 FACES % ↑ time/entity in seconds
FACES 1.4314 NA 4.3350 NA 0.76 sec.
RELIN 0.4981 187 % 2.5188 72 % 10.96 sec.
RELINM 0.6008 138 % 3.0906 40 % 11.08 sec.
SUMMARUM 1.2249 17 % 3.4207 27 % NA
Ideal summ agreement 1.9168 4.6415
Table 1. Evaluating the quality of summaries under each setting, % quality improvement (↑) using FACES compared to others for k=5 and k=10, respectively and average time taken per entity.


System k = 5 FACES %↑ k = 10 FACES %↑
FACES 1.8649 NA 5.6931 NA
RELIN 0.7339 154 % 3.3993 69 %
RELINM 0.8695 114 % 4.1551 37 %
SUMMARUM 1.6484 13 % 4.4919 27 %
Ideal summ agreement 2.3194 5.6228
Table 2. Evaluating the quality of correct concepts picked using property overlap under each setting, % quality improvement (↑) using FACES compared to others for k=5 and k=10, respectively.


Experiment FACES % RELINM % SUMMARUM %
Experiment 1 84 % 16 % NA
Experiment 2 54 % 16 % 30 %
Table 3. Evaluating user preferences for entity summaries using 69 user participants.


k = 5 k = 10
Google search API Sindice seach API Google search API Sindice search API
3.5 3.4 0.5333 0.5428
Table 4. Comparison between Google and Sindice search APIs for a small random sample of entities (5 entities).

Dataset

Evauation data is available for download download


References

[1] Cheng, Gong, Thanh Tran, and Yuzhong Qu. "RELIN: relatedness and informativeness-based centrality for entity summarization." In The Semantic Web–ISWC 2011, pp. 114-129. Springer Berlin Heidelberg, 2011.