Difference between revisions of "Property Alignment"

From Knoesis wiki
Jump to: navigation, search
(Approach)
(Experiment and Datasets)
Line 15: Line 15:
  
 
==Experiment and Datasets==
 
==Experiment and Datasets==
 +
To evaluate our approach in the linked datasets, we have used sample datasets from Linked Open Data (LOD). The datasets we used are DBpedia, Freebase, LinkedMDB, DBLP L3S and DBLP RKB_Explorer datasets. Dbpedia and Freebase are multi domain major hubs in LOD connecting other datasets together. LinkedMDB is a specialized dataset in the movie domain and DBLP L3S and DBLP RKB_Explorer datasets are specialized datasets for scientific publications. Therefore, in our evaluation, we are covering all types of datasets and their alignments. DBpedia and Freebase alignment presents multi domain alignment whereas DBpedia and LinedMDB alignment shows mutli domain and specific domain dataset alignment. The two DBLP dataset allignment represents specific domain to specific domain alignment task.
  
 
+
For this experiment, we selected person, film and software domains between DBpedia and Freebase datasets because these domains have more complex data representations and variations.
  
 
==Analysis==
 
==Analysis==

Revision as of 23:10, 6 June 2013

Property Alignment on Linked Datasets

Property alignment in Linked Open Data (LOD) or linked datasets is a non-trivial task because of the complex data representations. Concept (class) and instance level alignment possibilities have been investigated in the recent past but property alignment has not received much attention yet. Therefore, we propose an approach that can handle complex data representations and also achieve higher correct matching ratio. Our approach is based on utilizing fundamental building block of the interlinked datasets (e.g., LOD) which is known as Entity Co-Reference (ECR) links. We try to match property extensions to come up with a measurement to approximate owl:equivalent property. We use ECR links to findout equivalent instances for a particular property extension and then accumulate the matching number of extensions to decide on a matching property pair between two datasets.


Approach

In this initial experiment, we explored property extension matching using owl:sameAs and skos:exactMatch interlinking relationships (as ECR links). We will explore other less restrictive links as skos:closeMatch and some links like rdf:seeAlso links used in certain datasets for their requirements later and check the performance.


Fig. 1. Matching mechanism of the extension based approach

Figure 1 shows how the matching process work in our extension based algorithm. Each property pair is matched separately of others by extensions by analyzing each instance associated with that property (in the extensions slots). The algorithm needs to process subject instances from starting dataset and it extracts triples from each subject instance and finds out the relevant subject instance in the second dataset by traversing through an ECR link. Then object values for the property pair is matched using ECR links again. The final result of this matching process can be illustrated by an example presented in Figure 2. We keep track of statistical measures for deciding the final matching pairs as described in the paper (to appear in isemantics 2013) as MatchCount and Co-appearanceCount as described in Figure 2. These measures help to reduce incorrect mappings such as "birth_place" and "place_of_birth".

Fig. 2. Matching example for the extension based approach

Experiment and Datasets

To evaluate our approach in the linked datasets, we have used sample datasets from Linked Open Data (LOD). The datasets we used are DBpedia, Freebase, LinkedMDB, DBLP L3S and DBLP RKB_Explorer datasets. Dbpedia and Freebase are multi domain major hubs in LOD connecting other datasets together. LinkedMDB is a specialized dataset in the movie domain and DBLP L3S and DBLP RKB_Explorer datasets are specialized datasets for scientific publications. Therefore, in our evaluation, we are covering all types of datasets and their alignments. DBpedia and Freebase alignment presents multi domain alignment whereas DBpedia and LinedMDB alignment shows mutli domain and specific domain dataset alignment. The two DBLP dataset allignment represents specific domain to specific domain alignment task.

For this experiment, we selected person, film and software domains between DBpedia and Freebase datasets because these domains have more complex data representations and variations.

Analysis