Revision as of 19:49, 15 May 2014

Check back on Friday May 16th 2014 for more details...

Creating Faceted (divesified) Entity Summaries

Creating entity summaries has been of contemporary interest in the Semantic Web community in the recet past. In our approach called FACES: FACed Entity Summaries, we are interested in generating diversified and user friendly summaries.

Problem

Problem Statement - An entity is usually described using a conceptually different set of facts to improve coverage. We want to select a ‘representative’ subset of this set in a good summary to uniquely identify the entity.

Definitions 1-4 defines basic notions related to entity summaries. They are as stated in [1].

Definition 1 : A data graph is a digraph G = V, A, Lbl_V , Lbl_A , where (i) V is a finite set of nodes, (ii) A is a finite set of directed edges where each a ∈ A has a source node Src(a) ∈ V, a target node Tgt(a) ∈ V, (iii) Llb_V : V → E ∪ L and (iv) Lbl_A : A → P are labeling functions that map nodes to entities or literals and edges to properties.

Defintion 2 : A feature f is a property-value pair where Prop(f ) ∈ P and Val(f ) ∈ E ∪ L denote the property and the value, respectively. An entity e has a feature f in a data graph G = V, A, Lbl_V , Lbl_A if there exists a ∈ A such that Lbl_A (a) = Prop(f ), Lbl_V (Src(a)) = e and Lbl_V (Tgt(a)) = Val(f ).

Definition 3 : Given a data graph G, the feature set of an entity e, denoted by FS(e), is the set of all features of e that can be found in G.

Definition 4 : Given FS(e) and a positive integer k < |FS(e)|, summary of entity e is Summ(e) ⊂ FS(e) such that |Summ(e)| = k.

Faceted entity summaries

An entity is described by a feature set. A feature (f ) is basically characterized by the property (P rop(f )) and value (V al(f )). In fact, a property binds a specific meaning to an entity using a value. We observe in general that different properties represent different aspects of an entity. For example, profession and spouse properties of an entity (of type person) represent two different aspects. The first defines an intangible value and the second defines a human; one talking about the entity’s professional life and the other about its social life. Based on this observation, we can formalize facets for a feature set.

Facets: Feature set F S(e) of an entity e can be partitioned as a collection of facets F (e). The notion of a facet of a feature set can be defined using partitions as follows.

Definition 5 : Set F (e), a collection of facets of e, is a partition of F S(e). That is, the following conditions hold for F (e). (1) ∅ ∈ F (e). (2)⋃_X∈F(e) X = F S(e). X∈F (e) (3) if X,Y ∈ F (e) and X = Y then X ∩ Y = ∅.

Note that if the number of facets is n and the size of the summary is k, at least one feature from each facet is included in the summary if k > n. If k < n, then at most one feature from each facet is included in the summary.

Approach

The FACES approach generates faceted entity summaries that are both concise and comprehensive. Conciseness is about selecting a small number of facts. Comprehensiveness is about selecting facts to represent all aspects of an entity that improves coverage. Diversity is about selecting facts that are orthogonal to each other so that the selected few facts enrich coverage. Hence, diversity improves comprehensiveness when the number of features to include in a summary is limited. Conciseness may be achieved by following various ranking and filtering techniques. But creating summaries that satisfy both conciseness and comprehensiveness constraints simultaneously is not a trivial task. It needs to recognize facets of an entity that features represent so that the summary can represent as many facets (diverse and comprehensive) as possible without redundancy (leads to conciseness). Number and nature of clusters (corresponding to abstract concepts) in a feature set is not known a priori for an entity and is hard to guess without human intervention or explicit knowledge. Therefore, a supervised clustering algorithm or unsupervised clustering algorithm with prescribed number of clusters to seek cannot be used in this context. To achieve this objective, we have adapted a flexible unsupervised clustering algorithm based on Cobweb [2][3] and have designed a ranking algorithm for feature selection.

Hierarchical conceptual clustering

We use Cobweb [2] algorithm as the hierarchical conceptual clustering algorithm in our problem. " Cobweb is an incremental system for hierarchical clustering. the system carries out a hill-climbing search through a space of hierarchical classification schemes using operator that enable bidirectional travel through this space." Cobweb uses a heauristic measure called category utility to guide search. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of objects (attribute-value pairs). Intra-class similarity is the conditional probability of the form P(A_i = V_ij| C_k), where A_i = V_ij is an attribute-value pair and C_k is a class. When this probability is larger, more memebrs from the class share more values. Inter-class similarity is the conditional probability P(C_k| A_i = V_ij). When this probability is larger, fewer objects sharing similar values are in other classes. Therefore, categori utility (CU) is defined as the product of intra-class and inter-class similarities. For a partition {C₁, C₂,...., C_n}, CU is defined as follows,

	{{f(x) Template:= x²}}

Evaluation

Table 1. Evaluating the quality of summaries under each setting, % quality improvement (↑) using FACES compared to others for k=5 and k=10, respectively and average time taken per entity.
System	k = 5	FACES % ↑	k = 10	FACES % ↑	time/entity in seconds
FACES	1.4314	NA	4.3350	NA	0.76 sec.
RELIN	0.4981	187 %	2.5188	72 %	10.96 sec.
RELINM	0.6008	138 %	3.0906	40 %	11.08 sec.
SUMMARUM	1.2249	17 %	3.4207	27 %	NA
Ideal summ agreement	1.9168		4.6415

Table 2. Evaluating the quality of correct concepts picked using property overlap under each setting, % quality improvement (↑) using FACES compared to others for k=5 and k=10, respectively.
System	k = 5	FACES %↑	k = 10	FACES %↑
FACES	1.8649	NA	5.6931	NA
RELIN	0.7339	154 %	3.3993	69 %
RELINM	0.8695	114 %	4.1551	37 %
SUMMARUM	1.6484	13 %	4.4919	27 %
Ideal summ agreement	2.3194		5.6228

Table 3. Evaluating user preferences for entity summaries using 69 user participants.
Experiment	FACES %	RELINM %	SUMMARUM %
Experiment 1	84 %	16 %	NA
Experiment 2	54 %	16 %	30 %

Table 4. Comparison between Google and Sindice search APIs for a small random sample of entities (5 entities).
k = 5		k = 10
Google search API	Sindice seach API	Google search API	Sindice search API
3.5	3.4	0.5333	0.5428

Dataset

Evauation data is available for download download

References

[1] Cheng, Gong, Thanh Tran, and Yuzhong Qu. "RELIN: relatedness and informativeness-based centrality for entity summarization." In The Semantic Web–ISWC 2011, pp. 114-129. Springer Berlin Heidelberg, 2011. [2] Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine learning 2(2), 139–172 (1987) [3] Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artificial intelligence 40(1), 11–61 (1989)

Revision as of 19:48, 15 May 2014 (view source) Kalpa (Talk \| contribs) ← Older edit		Revision as of 19:49, 15 May 2014 (view source) Kalpa (Talk \| contribs) Newer edit →
Line 37:		Line 37:
	We use Cobweb [2] algorithm as the hierarchical conceptual clustering algorithm in our problem. " Cobweb is an incremental system for hierarchical clustering. the system carries out a hill-climbing search through a space of hierarchical classification schemes using operator that enable bidirectional travel through this space." Cobweb uses a heauristic measure called ''category utility'' to guide search. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of objects (attribute-value pairs). Intra-class similarity is the conditional probability of the form P(A<sub>i</sub> = V<sub>ij</sub>\| C<sub>k</sub>), where A<sub>i</sub> = V<sub>ij</sub> is an attribute-value pair and C<sub>k</sub> is a class. When this probability is larger, more memebrs from the class share more values. Inter-class similarity is the conditional probability P(C<sub>k</sub>\| A<sub>i</sub> = V<sub>ij</sub>). When this probability is larger, fewer objects sharing similar values are in other classes. Therefore, categori utility (CU) is defined as the product of intra-class and inter-class similarities. For a partition {C<sub>1</sub>, C<sub>2</sub>,...., C<sub>n</sub>}, CU is defined as follows,		We use Cobweb [2] algorithm as the hierarchical conceptual clustering algorithm in our problem. " Cobweb is an incremental system for hierarchical clustering. the system carries out a hill-climbing search through a space of hierarchical classification schemes using operator that enable bidirectional travel through this space." Cobweb uses a heauristic measure called ''category utility'' to guide search. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of objects (attribute-value pairs). Intra-class similarity is the conditional probability of the form P(A<sub>i</sub> = V<sub>ij</sub>\| C<sub>k</sub>), where A<sub>i</sub> = V<sub>ij</sub> is an attribute-value pair and C<sub>k</sub> is a class. When this probability is larger, more memebrs from the class share more values. Inter-class similarity is the conditional probability P(C<sub>k</sub>\| A<sub>i</sub> = V<sub>ij</sub>). When this probability is larger, fewer objects sharing similar values are in other classes. Therefore, categori utility (CU) is defined as the product of intra-class and inter-class similarities. For a partition {C<sub>1</sub>, C<sub>2</sub>,...., C<sub>n</sub>}, CU is defined as follows,

−	{{~~math\|~~''f''(<var>x</var>) {{=}} <var>x</var><sup>2</sup>}}	+	{{''f''(<var>x</var>) {{=}} <var>x</var><sup>2</sup>}}

Difference between revisions of "Entity Summary"

Revision as of 19:49, 15 May 2014

Contents

Creating Faceted (divesified) Entity Summaries

Problem

Faceted entity summaries

Approach

Hierarchical conceptual clustering

Evaluation

Dataset

References

Navigation menu

Views

Personal tools

Navigation

Homepage

Search

Tools