Difference between revisions of "Entity Summary"

From Knoesis wiki
Jump to: navigation, search
Line 33: Line 33:
 
-->
 
-->
  
 +
<!--
 
== Clustering ==
 
== Clustering ==
 
The FACES approach generates faceted entity summaries that are both concise and comprehensive. Conciseness is about selecting a small number of facts. Comprehensiveness is about selecting facts to represent all aspects of an entity that improves coverage. Diversity is about selecting facts that are orthogonal to each other so that the selected few facts enrich coverage. Hence, diversity improves comprehensiveness when the number of features to include in a summary is limited. Conciseness may be achieved by following various ranking and filtering techniques. But creating summaries that satisfy both conciseness and comprehensiveness constraints simultaneously is not a trivial task. It needs to recognize facets of an entity that features represent so that the summary can represent as many facets (diverse and comprehensive) as possible without redundancy (leads to conciseness). Number and nature of clusters (corresponding to abstract concepts) in a feature set is not known a priori for an entity and is hard to guess without human intervention or explicit knowledge. Therefore, a supervised clustering algorithm or unsupervised clustering algorithm with prescribed number of clusters to seek cannot be used in this context. To achieve this objective, we have adapted a flexible unsupervised clustering algorithm based on Cobweb [2][3] and have designed a ranking algorithm for feature selection.
 
The FACES approach generates faceted entity summaries that are both concise and comprehensive. Conciseness is about selecting a small number of facts. Comprehensiveness is about selecting facts to represent all aspects of an entity that improves coverage. Diversity is about selecting facts that are orthogonal to each other so that the selected few facts enrich coverage. Hence, diversity improves comprehensiveness when the number of features to include in a summary is limited. Conciseness may be achieved by following various ranking and filtering techniques. But creating summaries that satisfy both conciseness and comprehensiveness constraints simultaneously is not a trivial task. It needs to recognize facets of an entity that features represent so that the summary can represent as many facets (diverse and comprehensive) as possible without redundancy (leads to conciseness). Number and nature of clusters (corresponding to abstract concepts) in a feature set is not known a priori for an entity and is hard to guess without human intervention or explicit knowledge. Therefore, a supervised clustering algorithm or unsupervised clustering algorithm with prescribed number of clusters to seek cannot be used in this context. To achieve this objective, we have adapted a flexible unsupervised clustering algorithm based on Cobweb [2][3] and have designed a ranking algorithm for feature selection.
Line 50: Line 51:
 
[[image:es_cobweb_algorithm.png|thumb|none|400px| Figure 1. Cobweb algorithm]]
 
[[image:es_cobweb_algorithm.png|thumb|none|400px| Figure 1. Cobweb algorithm]]
 
[[image:es_cobweb_operators.png|thumb|none|400px| Figure 2. Auxiliary Cobweb operators]]
 
[[image:es_cobweb_operators.png|thumb|none|400px| Figure 2. Auxiliary Cobweb operators]]
 +
 +
-->
  
  
Line 73: Line 76:
 
-->
 
-->
  
 +
 +
<!--
 
= Evaluation =
 
= Evaluation =
 
For this evaluation we evaluate FACES against RELIN and SUMMARUM. We chose DBpedia as the dataset as it was used in RELIN and is a huge dataset containng multi-domain entities. We extracted 50 entities randomly from English DBpedia version 3.9. We asked 15 human judges to create length 5 and 10 entity summaries (ideal summaries) and used them as the gold standard. We also made sure that each entity gets at least 7 ideal summaries. We could not use experiment data of RELIN as authors of RELIN confirmed that the data are not available. We avoid processing properties such as owl:sameAs, rdf:type, db:wordnet_type. db:wikiPageWikiLink, db:wikiPageExternalLink, db:wikiPageUsesTemplate, db:wikiPageRevisionID, db:wikiPageID, dc:subject, and db:Template. In the sample dataset, there are at least 17 ditinct properties and 19 - 88 distinct features per entity.
 
For this evaluation we evaluate FACES against RELIN and SUMMARUM. We chose DBpedia as the dataset as it was used in RELIN and is a huge dataset containng multi-domain entities. We extracted 50 entities randomly from English DBpedia version 3.9. We asked 15 human judges to create length 5 and 10 entity summaries (ideal summaries) and used them as the gold standard. We also made sure that each entity gets at least 7 ideal summaries. We could not use experiment data of RELIN as authors of RELIN confirmed that the data are not available. We avoid processing properties such as owl:sameAs, rdf:type, db:wordnet_type. db:wikiPageWikiLink, db:wikiPageExternalLink, db:wikiPageUsesTemplate, db:wikiPageRevisionID, db:wikiPageID, dc:subject, and db:Template. In the sample dataset, there are at least 17 ditinct properties and 19 - 88 distinct features per entity.

Revision as of 21:43, 15 September 2014