Difference between revisions of "RDF Graph Model"

From Knoesis wiki
Jump to: navigation, search
(BI)
(BI)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= RDF Graph Model =
 
= RDF Graph Model =
We propose a new formal model for representing any set of RDF triples as mathematical graphs.
+
We propose a new formal model for representing any set of RDF triples as a labeled directed multigraph, or LDM. Two existing approaches use either Node-LabeledArc-Node (NLAN) diagram or Bipartite (BI) graph.
 +
We demonstrate the three approaches using the following example.
  
 
== Example ==
 
== Example ==
Line 45: Line 46:
  
 
[[File:Example_bi.png|500px|This figure demonstrates the bipartite graph model approach for representing RDF triples as a graph]]
 
[[File:Example_bi.png|500px|This figure demonstrates the bipartite graph model approach for representing RDF triples as a graph]]
 
e<sup>I</sup>_<sub>i</sub>
 
  
 
=== The LDM model ===
 
=== The LDM model ===
Line 55: Line 54:
 
== More complex examples ==  
 
== More complex examples ==  
  
[[Singleton_Property]] approach to representing
+
We use [[Singleton_Property]] approach to represent the duration of the marriage between Bob Dylan and Sara Lownds.
  
 
[[File:Example_ldm_sp.png|400px|This figure demonstrates the new approach for representing a more complex fact as a labeled directed multigraph]]
 
[[File:Example_ldm_sp.png|400px|This figure demonstrates the new approach for representing a more complex fact as a labeled directed multigraph]]
Line 63: Line 62:
 
== RDF Datasets ==
 
== RDF Datasets ==
  
== Degree distributions ==
+
We use four RDF datasets that are publicly available on the Web.
 +
* BKR-SP: created by Vinh Nguyen et al. [http://dl.acm.org/citation.cfm?id=2567973 ACM]. This dataset is available at [[Singleton_Property]].
 +
* YAGO2S-SP: also created by Vinh Nguyen et al. [http://dl.acm.org/citation.cfm?id=2567973 ACM]. This dataset is also available at [[Singleton_Property]]
 +
* DBPedia 3.9: download at [http://wiki.dbpedia.org/Downloads39 DBPedia39]
 +
* Freebase: download at [https://developers.google.com/freebase/data Freebase]. For our experiment, we downloaded this dataset on March 30.
 +
 
 +
== Experimental Result Files ==
 +
 
 +
We created multiple MapReduce jobs to compute the degree distributions of all three approaches on four RDF datasets.
 +
 
 +
* [https://drive.google.com/file/d/0B5AIWZ9-TifAelNpTlJmeEFNT2c/edit?usp=sharing LDM degree distribution files]
 +
* [https://drive.google.com/file/d/0B5AIWZ9-TifAVEY4R0hSY3dUN0E/edit?usp=sharing BI degree distribution files]
 +
* [https://drive.google.com/file/d/0B5AIWZ9-TifAeG8xWWlidmNJZGs/edit?usp=sharing NLAN degree distribution files]
 +
 
 +
== Plotting Degree distributions ==
 +
 
 +
Here we plot the degree distributions of three approaches (LDM, NLAN, BI) on four RDF datasets.
 +
 
 +
For each dataset, we compute in-degree, out-degree, and total-degree distributions for LDM and NLAN approaches. BI approach has only total-degree distribution because it is undirected graph. We also compare the power law fit vs. exponential fit for each plot.
  
 
=== LDM ===
 
=== LDM ===
 +
 +
{| class="wikitable"
 +
! DS
 +
! Degree type
 +
! alpha
 +
! xmin
 +
! sigma
 +
! R
 +
! p
 +
! D min
 +
! Tail coverage
 +
|-
 +
| rowspan="3" | BKR
 +
| in
 +
| 1.12
 +
| 1
 +
| 0.02
 +
| 6.78
 +
| 1.24E-11
 +
| 0.15
 +
| 60%
 +
|-
 +
| out
 +
| 1.23
 +
| 1024
 +
| 0.04
 +
| 4.01
 +
| 6.10E-05
 +
| 0.14
 +
| 28%
 +
|-
 +
| total
 +
| 1.21
 +
| 955
 +
| 0.04
 +
| 3.95
 +
| 7.87E-05
 +
| 0.13
 +
| 58.33%
 +
|-
 +
| rowspan="3" | YAGO2S
 +
| in
 +
| 1.11
 +
| 1
 +
| 0.02
 +
| 6.33
 +
| 2.52E-10
 +
| 0.14
 +
| 96.3%
 +
|-
 +
| out
 +
| 1.11
 +
| 1
 +
| 0.02
 +
| 5.98
 +
| 2.18E-09
 +
| 0.14
 +
| 96.3%
 +
|-
 +
| total
 +
| 1.13
 +
| 4
 +
| 0.02
 +
| 6.36
 +
| 2.03E-10
 +
| 0.12
 +
| 92.31%
 +
|-
 +
| rowspan="3" | DBPEDIA
 +
| in
 +
| 1.49
 +
| 974383
 +
| 0.13
 +
| 1.76
 +
| 0.08
 +
| 0.11
 +
| 25%
 +
|-
 +
| out
 +
| 1.46
 +
| 524288
 +
| 0.11
 +
| 2.47
 +
| 0.01
 +
| 0.12
 +
| 28.57%
 +
|-
 +
| total
 +
| 1.13
 +
| 16
 +
| 0.02
 +
| 5.38
 +
| 7.39E-08
 +
| 0.15
 +
| 85.19%
 +
|-
 +
| rowspan="3" | FREEBASE
 +
| in
 +
| 1.16
 +
| 256
 +
| 0.02
 +
| 5.56
 +
| 2.65E-08
 +
| 0.11
 +
| 70%
 +
|-
 +
| out
 +
| 1.81
 +
| 16777216
 +
| 0.26
 +
| 1.54
 +
| 0.12
 +
| 0.12
 +
| 16.67%
 +
|-
 +
| total
 +
| 1.14
 +
| 64
 +
| 0.02
 +
| 5.91
 +
| 3.51E-09
 +
| 0.12
 +
| 79.31%
 +
|}
  
 
==== BKR-SP ====
 
==== BKR-SP ====
Line 73: Line 214:
 
File:Ldm_bkr_out.png|Out-degree distribution.
 
File:Ldm_bkr_out.png|Out-degree distribution.
 
File:Ldm_bkr_total.png|Total-degree distribution.
 
File:Ldm_bkr_total.png|Total-degree distribution.
 +
</gallery>
 +
 +
==== YAGO2S-SP ====
 +
 +
<gallery perrow=3 widths=320px heights=240px caption="LDM graphs transformed from the YAGO2S-SP dataset">
 +
File:Ldm_yago_in.png|In-degree distribution.
 +
File:Ldm_yago_out.png|Out-degree distribution.
 +
File:Ldm_yago_total.png|Total-degree distribution.
 
</gallery>
 
</gallery>
  
Line 89: Line 238:
 
File:Ldm_freebase_out.png|Out-degree distribution.
 
File:Ldm_freebase_out.png|Out-degree distribution.
 
File:Ldm_freebase_total.png|Total-degree distribution.
 
File:Ldm_freebase_total.png|Total-degree distribution.
</gallery>
 
 
==== YAGO2S-SP ====
 
 
<gallery perrow=3 widths=320px heights=240px caption="LDM graphs transformed from the YAGO2S-SP dataset">
 
File:Ldm_yago_in.png|In-degree distribution.
 
File:Ldm_yago_out.png|Out-degree distribution.
 
File:Ldm_yago_total.png|Total-degree distribution.
 
 
</gallery>
 
</gallery>
  
 
=== NLAN ===
 
=== NLAN ===
 +
This table shows the parameters of the best power law distributions for each datasets using the NLAN approach.
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 110: Line 252:
 
! R
 
! R
 
! p
 
! p
 +
! Tail Coverage
 
|-
 
|-
 
| rowspan="3" style="text-align: center;" | BKR-SP
 
| rowspan="3" style="text-align: center;" | BKR-SP
Line 119: Line 262:
 
| 2.271051161
 
| 2.271051161
 
| 0.023143881
 
| 0.023143881
 +
| 12.5%
 
|-
 
|-
 
| out
 
| out
Line 127: Line 271:
 
| 3.894424558
 
| 3.894424558
 
| 9.84E-05
 
| 9.84E-05
 +
| 31.58%
 
|-
 
|-
 
| total
 
| total
Line 135: Line 280:
 
| 3.228857504
 
| 3.228857504
 
| 0.001242858
 
| 0.001242858
 +
| 58.33%
 
|-
 
|-
 
| rowspan="3" | YAGO2S-SP
 
| rowspan="3" | YAGO2S-SP
Line 144: Line 290:
 
| 6.236336995
 
| 6.236336995
 
| 4.48E-10
 
| 4.48E-10
 +
| 92.31%
 
|-
 
|-
 
| out
 
| out
Line 152: Line 299:
 
| 4.997441147
 
| 4.997441147
 
| 5.81E-07
 
| 5.81E-07
 +
| 73.33%
 
|-
 
|-
 
| total
 
| total
Line 160: Line 308:
 
| 5.732859944
 
| 5.732859944
 
| 9.88E-09
 
| 9.88E-09
 +
| 84.62%
 
|-
 
|-
 
| rowspan="3" | DBPEDIA
 
| rowspan="3" | DBPEDIA
Line 169: Line 318:
 
| 1.899457176
 
| 1.899457176
 
| 0.057504392
 
| 0.057504392
 +
| 12.5%
 
|-
 
|-
 
| out
 
| out
Line 177: Line 327:
 
| 2.574136552
 
| 2.574136552
 
| 0.01004906
 
| 0.01004906
 +
| 0%
 
|-
 
|-
 
| total
 
| total
Line 185: Line 336:
 
| 1.774194648
 
| 1.774194648
 
| 0.076030959
 
| 0.076030959
 +
| 12.5%
 
|-
 
|-
 
| rowspan="3" | FREEBASE
 
| rowspan="3" | FREEBASE
Line 194: Line 346:
 
| 4.937155187
 
| 4.937155187
 
| 7.93E-07
 
| 7.93E-07
 +
| 64.29%
 
|-
 
|-
 
| out
 
| out
Line 202: Line 355:
 
| 6.185548611
 
| 6.185548611
 
| 6.19E-10
 
| 6.19E-10
 +
| 96.3%
 
|-
 
|-
 
| total
 
| total
Line 210: Line 364:
 
| 5.311427686
 
| 5.311427686
 
| 1.09E-07
 
| 1.09E-07
 +
| 77.78%
 
|}
 
|}
  
Line 219: Line 374:
 
File:Nlan_bkr_out.png|Out-degree distribution.
 
File:Nlan_bkr_out.png|Out-degree distribution.
 
File:Nlan_bkr_total.png|Total-degree distribution.
 
File:Nlan_bkr_total.png|Total-degree distribution.
 +
</gallery>
 +
 +
==== YAGO2S-SP ====
 +
 +
<gallery perrow=3 widths=320px heights=240px caption="NLAN graphs transformed from the YAGO2S-SP dataset">
 +
File:Nlan_yago_in.png|In-degree distribution.
 +
File:Nlan_yago_out.png|Out-degree distribution.
 +
File:Nlan_yago_total.png|Total-degree distribution.
 
</gallery>
 
</gallery>
  
Line 235: Line 398:
 
File:Nlan_freebase_out.png|Out-degree distribution.
 
File:Nlan_freebase_out.png|Out-degree distribution.
 
File:Nlan_freebase_total.png|Total-degree distribution.
 
File:Nlan_freebase_total.png|Total-degree distribution.
</gallery>
 
 
==== YAGO2S-SP ====
 
 
<gallery perrow=3 widths=320px heights=240px caption="NLAN graphs transformed from the YAGO2S-SP dataset">
 
File:Nlan_yago_in.png|In-degree distribution.
 
File:Nlan_yago_out.png|Out-degree distribution.
 
File:Nlan_yago_total.png|Total-degree distribution.
 
 
</gallery>
 
</gallery>
  
 
=== BI ===
 
=== BI ===
 +
This table shows the parameters of the best power law distributions for each datasets using the BI approach.
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 255: Line 411:
 
! R
 
! R
 
! p
 
! p
 +
! Tail Coverage
 
|-
 
|-
 
| BKR
 
| BKR
Line 263: Line 420:
 
| 4.762452101
 
| 4.762452101
 
| 1.91E-06
 
| 1.91E-06
 +
| 62.5%
 
|-
 
|-
 
| Yago2s
 
| Yago2s
Line 271: Line 429:
 
| 6.136824765
 
| 6.136824765
 
| 8.42E-10
 
| 8.42E-10
 +
| 92.31%
 
|-
 
|-
 
| DBpedia
 
| DBpedia
Line 279: Line 438:
 
| 2.766134059
 
| 2.766134059
 
| 0.005672521
 
| 0.005672521
 +
| 26.93%
 
|-
 
|-
 
| Freebase
 
| Freebase
Line 287: Line 447:
 
| 2.006340037
 
| 2.006340037
 
| 0.044819981
 
| 0.044819981
 +
| 13.79%
 
|}
 
|}
 +
 +
==== Degree Distribution ====
 +
 +
<gallery perrow=4 widths=240px heights=180px caption="BI graphs transformed from four datasets">
 +
File:Bi_bkr_total.png|BKR total-degree distribution
 +
File:Bi_yago_total.png|YAGO2S-SP total-degree distribution
 +
File:Bi_dbpedia_total.png|DBPEDIA total-degree distribution
 +
File:Bi_freebase_total.png|FREEBASE total-degree distribution
 +
</gallery>
 +
 +
== Comparison ==
 +
 +
This table compares the power-law degree distribution of three type of graphs (NLAN, LDM, and BI) based on four RDF datasets (BKR, YAGO2S, DBPEDIA and FREEBASE.
 +
The values are the percentage of data points in each degree distribution that are covered by the power law. Distribution with higher percentage will better reflect power law distribution in the tail.
 +
 +
{| class="wikitable"
 +
! DS
 +
! degree
 +
! NLAN
 +
! LDM
 +
! BI
 +
|-
 +
| rowspan="3" | BKR
 +
| in
 +
| 12.5%
 +
| 60%
 +
| NA
 +
|-
 +
| out
 +
| 31.58%
 +
| 28%
 +
| NA
 +
|-
 +
| total
 +
| 58.33%
 +
| 58.33%
 +
| 62.5%
 +
|-
 +
| rowspan="3" | YAGO2S
 +
| in
 +
| 92.31%
 +
| 96.3%
 +
| NA
 +
|-
 +
| out
 +
| 73.33%
 +
| 96.3%
 +
| NA
 +
|-
 +
| total
 +
| 84.62%
 +
| 92.31%
 +
| 92.31%
 +
|-
 +
| rowspan="3" | DBPEDIA
 +
| in
 +
| 12.5%
 +
| 25%
 +
| NA
 +
|-
 +
| out
 +
| 0%
 +
| 28.57%
 +
| NA
 +
|-
 +
| total
 +
| 12.5%
 +
| 85.19%
 +
| 29.63%
 +
|-
 +
| rowspan="3" | FREEBASE
 +
| in
 +
| 64.29%
 +
| 70%
 +
| NA
 +
|-
 +
| out
 +
| 96.3%
 +
| 16.67%
 +
| NA
 +
|-
 +
| total
 +
| 77.78%
 +
| 79.31%
 +
| 13.79
 +
|}
 +
 +
=== LDM and NLAN ===
 +
 +
In general, LDM distributions have higher coverage percentage than the NLAN graphs. Out of 12 degree distributions, 9 LDM distributions have higher coverage than NLAN distributions while only 2 NLAN distributions have higher percentage. Both share the same coverage for BKR total degree distribution.
 +
Especially for all the in-degree and total-degree distributions, 100% LDM distributions have higher percentage than the NLAN distributions.
 +
 +
=== LDM and BI ===
 +
 +
2 out of 4 LDM graphs have significantly higher coverage percentage than the BI graphs, particularly in the total degree distributions of DBPedia (LDM: 85.19% vs. BI: 29.63%) and Freebase (LDM: 79.31% vs. BI: 13.79%).

Latest revision as of 08:40, 12 June 2014

RDF Graph Model

We propose a new formal model for representing any set of RDF triples as a labeled directed multigraph, or LDM. Two existing approaches use either Node-LabeledArc-Node (NLAN) diagram or Bipartite (BI) graph. We demonstrate the three approaches using the following example.

Example

Triple Subject Predicate Object
T1 BobDylan isMarriedTo SaraLownds
T2 BarackObama isMarriedTo MichelleObama
T3 isMarriedTo rdfs:subPropertyOf isSpouseOf
T4 BobDylan isSpouseOf SaraLownds
T5 BarackObama isSpouseOf MichelleObama

For the set of RDF triples in the table above, we explain how each approach represents them in the graph.

The NLAN model

This figure demonstrates the currently recommended approach for representing RDF triples as a node-labeled arc-node diagram

The BI model

This figure demonstrates the bipartite graph model approach for representing RDF triples as a graph

The LDM model

This figure demonstrates the new approach for representing RDF triples as a labeled directed multigraph


More complex examples

We use Singleton_Property approach to represent the duration of the marriage between Bob Dylan and Sara Lownds.

This figure demonstrates the new approach for representing a more complex fact as a labeled directed multigraph

Empirical studies

RDF Datasets

We use four RDF datasets that are publicly available on the Web.

  • BKR-SP: created by Vinh Nguyen et al. ACM. This dataset is available at Singleton_Property.
  • YAGO2S-SP: also created by Vinh Nguyen et al. ACM. This dataset is also available at Singleton_Property
  • DBPedia 3.9: download at DBPedia39
  • Freebase: download at Freebase. For our experiment, we downloaded this dataset on March 30.

Experimental Result Files

We created multiple MapReduce jobs to compute the degree distributions of all three approaches on four RDF datasets.

Plotting Degree distributions

Here we plot the degree distributions of three approaches (LDM, NLAN, BI) on four RDF datasets.

For each dataset, we compute in-degree, out-degree, and total-degree distributions for LDM and NLAN approaches. BI approach has only total-degree distribution because it is undirected graph. We also compare the power law fit vs. exponential fit for each plot.

LDM

DS Degree type alpha xmin sigma R p D min Tail coverage
BKR in 1.12 1 0.02 6.78 1.24E-11 0.15 60%
out 1.23 1024 0.04 4.01 6.10E-05 0.14 28%
total 1.21 955 0.04 3.95 7.87E-05 0.13 58.33%
YAGO2S in 1.11 1 0.02 6.33 2.52E-10 0.14 96.3%
out 1.11 1 0.02 5.98 2.18E-09 0.14 96.3%
total 1.13 4 0.02 6.36 2.03E-10 0.12 92.31%
DBPEDIA in 1.49 974383 0.13 1.76 0.08 0.11 25%
out 1.46 524288 0.11 2.47 0.01 0.12 28.57%
total 1.13 16 0.02 5.38 7.39E-08 0.15 85.19%
FREEBASE in 1.16 256 0.02 5.56 2.65E-08 0.11 70%
out 1.81 16777216 0.26 1.54 0.12 0.12 16.67%
total 1.14 64 0.02 5.91 3.51E-09 0.12 79.31%

BKR-SP

YAGO2S-SP

DBPEDIA

FREEBASE

NLAN

This table shows the parameters of the best power law distributions for each datasets using the NLAN approach.

Dataset Type alpha xmin Dmin sigma R p Tail Coverage
BKR-SP in 1.933211288 895825 0.127628625 0.329940015 2.271051161 0.023143881 12.5%
out 1.343236826 3705 0.115026299 0.083247158 3.894424558 9.84E-05 31.58%
total 1.21774704 491 0.141337886 0.041150323 3.228857504 0.001242858 58.33%
YAGO2S-SP in 1.123959864 2 0.13926462 0.017708552 6.236336995 4.48E-10 92.31%
out 1.145682633 8 0.12183686 0.029136527 4.997441147 5.81E-07 73.33%
total 1.128566546 4 0.132747188 0.019165569 5.732859944 9.88E-09 84.62%
DBPEDIA in 1.648887842 970956 0.10993169 0.205196353 1.899457176 0.057504392 12.5%
out 1.668078604 715028 0.128189532 0.222692868 2.574136552 0.01004906 0%
total 1.566221177 649090 0.147983393 0.163453975 1.774194648 0.076030959 12.5%
FREEBASE in 1.176440054 410 0.117914742 0.028622356 4.937155187 7.93E-07 64.29%
out 1.10925354 1 0.119199458 0.015007128 6.185548611 6.19E-10 96.3%
total 1.148300227 59 0.123663745 0.0223571 5.311427686 1.09E-07 77.78%


BKR-SP

YAGO2S-SP

DBPEDIA

FREEBASE

BI

This table shows the parameters of the best power law distributions for each datasets using the BI approach.

Dataset alpha xmin Dmin sigma R p Tail Coverage
BKR 1.201698959 491 0.127904446 0.037454556 4.762452101 1.91E-06 62.5%
Yago2s 1.129751796 4 0.124153477 0.018728059 6.136824765 8.42E-10 92.31%
DBpedia 1.381014259 524288 0.116475129 0.092409531 2.766134059 0.005672521 26.93%
Freebase 1.684343461 24352099 0.105289281 0.216408404 2.006340037 0.044819981 13.79%

Degree Distribution

Comparison

This table compares the power-law degree distribution of three type of graphs (NLAN, LDM, and BI) based on four RDF datasets (BKR, YAGO2S, DBPEDIA and FREEBASE. The values are the percentage of data points in each degree distribution that are covered by the power law. Distribution with higher percentage will better reflect power law distribution in the tail.

DS degree NLAN LDM BI
BKR in 12.5% 60% NA
out 31.58% 28% NA
total 58.33% 58.33% 62.5%
YAGO2S in 92.31% 96.3% NA
out 73.33% 96.3% NA
total 84.62% 92.31% 92.31%
DBPEDIA in 12.5% 25% NA
out 0% 28.57% NA
total 12.5% 85.19% 29.63%
FREEBASE in 64.29% 70% NA
out 96.3% 16.67% NA
total 77.78% 79.31% 13.79

LDM and NLAN

In general, LDM distributions have higher coverage percentage than the NLAN graphs. Out of 12 degree distributions, 9 LDM distributions have higher coverage than NLAN distributions while only 2 NLAN distributions have higher percentage. Both share the same coverage for BKR total degree distribution. Especially for all the in-degree and total-degree distributions, 100% LDM distributions have higher percentage than the NLAN distributions.

LDM and BI

2 out of 4 LDM graphs have significantly higher coverage percentage than the BI graphs, particularly in the total degree distributions of DBPedia (LDM: 85.19% vs. BI: 29.63%) and Freebase (LDM: 79.31% vs. BI: 13.79%).