Provenir Ontology

From Knoesis wiki
Jump to: navigation, search
What is Provenir Ontology (PO)?
A reference ontology for modeling domain-specific provenance
Provenir Ontology Schema

Provenance, from the French word ‘provenir’ meaning to come from, describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), and many research issues in provenance are yet to be addressed. For example, a common provenance model to facilitate interoperability of provenance metadata and to support analysis using inferencing rules has not been defined. We introduce the provenir ontology as a common provenance model, which forms the core component of a modular approach to provenance management framework in eScience. Domain-specific details are an important component of provenance representation. But, a single monolithic provenance ontology that models all possible details from different domains (biology, marine sciences, and astronomy) is clearly not feasible. Hence, our proposed modular approach involves integrated use of multiple ontologies, each modeling provenance metadata specific to a particular domain (for example, the ProPreO ontology represents proteomics domain-specific provenance[1]). These multiple ontologies will use the provenir ontology as the common reference model, hence making it easier for their associated instances to be interoperable. This modular framework represents a scalable and flexible approach to provenance modeling that can be adapted to the specific requirement of different domains. In the next two Sections, we describe the classes and the properties in the provenir ontology (Figure 1).

Classes of Provenir Ontology (PO)

To represent provenance metadata classes we use the two well defined, primitive concepts of “occurrent” and “continuant” from philosophical ontology[2]. Continuant is defined as “… entities which endure, or continue to exist, through time while undergoing different sorts of changes, including changes of place” [2]. Occurrent is defined as “…entities that unfold themselves in successive temporal phases”[2]. We define three base classes in the provenir ontology representing the primary components of provenance, that is, “data”, “agent” and “process”. The two base classes, “data” and “agents” are defined as specialization (sub-class) of continuant class. The third base class “process” is a synonym of occurrent. We present the definition of each class capturing inheritance relationship:

  1. data: This class models continuant entities that represent the starting material, intermediate material, end products of a scientific experiment, and parameters that affect the execution of a scientific process. Data inherit the properties of continuants such as enduring or existing while undergoing changes.
  2. process: This class models the occurrent entities that affect (process, modify, create, delete among other dynamic activities) individuals of data.
  3. agent: This class models the continuant entities that causally affect the individuals of process.

In addition to these three base classes, five sub-classes (two direct and three indirect) of data are defined. The subclasses of data are:

  1. data_collection: This class represents atomic or composite data entities that are acted upon during a scientific process.
  2. parameter: parameter is a class of individuals that affect the behavior of scientific process in the form of constraints and input to agent and process classes.
There are three subclasses of parameter defined along the three spatial, temporal and thematic (domain-specific) dimensions:
  • temporal_parameter: This class captures the temporal details associated with individuals of data_collection class (for example, the timestamp associated with a sensor reading), process (for example, the duration of a protein analysis process), and agent (for example, the time period during which a sensor was working correctly).
  • spatial_parameter: The spatial metadata associated with individuals of process or agent or data_collection classes is represented by this class. The geographical location of an ocean buoy is an example of spatial parameter.
  • domain_parameter: The domain_parameter class is used to model domain-specific parameters (for example, tolerable salinity levels for ocean buoys).
Properties of Provenir Ontology (PO)

In this section, we define a set of foundational properties in the provenir ontology. Instead of defining a new set of properties, we adapt the properties defined in the Relation ontology (RO) from the Open Biomedical Ontologies (OBO) Foundry:

  1. part_of – This property is defined for each of the three base classes of provenir ontology. The restriction for this property is that the domain and range values belong to the same class. For example, if data is defined as the domain/range of the properties, the corresponding range/domain is also data. As defined in the RO [2], this property satisfies the standard axioms of mereology, that is, reflexivity, anti-symmetry, and transitivity.
  2. contained_in – In provenir ontology, the ro:contained_in is defined with similar constraints as ro:part_of, that is, the domain and range values belong to same class and do not overlap. The property is defined for data and agent classes. Consistent with its definition in RO, the property is also defined to be non-transitive in provenir ontology.
  3. adjacent_to – This property is defined for disjoint continuants in RO. In provenir ontology, it is defined only for agent class, where the adjacent spatial location of individuals of agent class may have an effect on data values. For example, presence of a sensor generating a magnetic field may affect the quality of observations made by an adjacent sensor. We note that, similar to the mereotopological relations defined in RO [2] such as partial overlap, tangential proper part etc., corresponding properties can be added to ontologies which extend provenir ontology.
  4. transformation_of – This property is similar to the ro:transformation_of property that is asserted between two entities that preserve their identity between the two transformation stages.
  5. derives_from – This property represents the derivation history of data entities as a chain or pathway. Unlike ro:transformation_of property which links identical entities, ro:derives_from links distinct individuals of data. For example, a peptide sample is derived from a protein sample.
  6. preceded_by – This temporal property is defined for distinct individuals of process class. Similar to its interpretation in [2], more specific types of properties, such as “immediately_preceded_by” [2] with more precise semantics may be defined in ontologies which extend provenir ontology.
  7. has_participant – This is the primary property linking data to process, where the individual of data class participates in a process.
  8. has_agent – This is a causal property that links agent to process where the agent is directly responsible for the change in state of the process. Similar to the description used in [2], the provenir ontology also allows the use of this property to “capture the directionality” of scientific experiments, for example which agent caused the activation of a process.
  9. has_parameter – This property links the individual of class parameter to an individual of a data_collection, agent, and process.
Two specialized properties describing the temporal and spatial parameters are also defined:
  • has_temporal_value – This is a specific property to assign temporal value to individuals of data_collection, process, and agent classes.
  • located_in – An instance of data or agent is associated with exactly one spatial region that is its exact location at given instance of time. In provenir ontology, this relation has two domain class agent and data_collection with spatial_parameter as range class.
Quick Links
Citing Provenir ontology

Please use the following reference when citing Provenir ontology

  • Satya S. Sahoo, Amit Sheth, 'Provenir ontology: Towards a Framework for eScience Provenance Management', Microsoft eScience Workshop, Pittsburgh, PA Oct 15-17, 2009

References

1. Sahoo SS, Thomas, C., Sheth, A., York, W. S., and Tartir, S. Knowledge modeling and its application in life sciences: a tale of two ontologies. In: Proceedings of the 15th international Conference on World Wide Web WWW '06 2006 May 23 - 26; Edinburgh, Scotland; 2006. p. 317-326.
2. Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biol 2005;6(5):R46.