Difference between revisions of "MaterialWays"

From Knoesis wiki
Jump to: navigation, search
(Introduction)
(Complementary Activities Undertaken by Others)
Line 28: Line 28:
  
 
* NIST
 
* NIST
 +
== Use Cases ==
 +
What do the materials scientists and engineers (S&E) want?  Each of the use cases below assumes the triplets have been transformed into a vocabulary/ontology that describes all processes and intermediate products that lead to a finished material or product.
 +
=== Discovery ===
 +
While the primary objective of linked data is to enable machine-to-machine interoperability, it’s generally considered a best practice to expose the same data for human consumption.  It’s easy to envision tools that allow a seasoned materials expert as well a student the ability to “follow-their-nose” through rich sets of metadata and data describing products and processes within the materials domain.  As they fluidly traverse the network combing through a combination of structured and unstructured data and information, they are likely to gain awareness and understanding more readily than through the retrieval and perusal of documents returned via keyword searches.  An additional advantage is the ability to capture structured data along the way and save for deeper analysis at a later date.
 +
Google’s Knowledge Graph, Bing’s Snapshot, Facebook’s Graph Search all exemplify steps in this direction.
  
 
== Development for a Shared Materials Vocabulary ==
 
== Development for a Shared Materials Vocabulary ==

Revision as of 15:28, 21 March 2014

Introduction

Several foundational elements required to achieve Sir Tim Berners-Lee’s vision for a semantic web are in place and available to the materials community. The semantic web, sometimes referred to as the web-of-data, focuses on ontologies as well as the linking data for machine-to-machine data interchange (implemented via RDF and OWL). Linkage between multiple datasets, files and their respective metadata can be established in an ad hoc fashion without having to adhere to specific database table structures. Linked data without context is of limited value. A semantic web for materials requires common vocabularies. An example of a common vocabulary is the Dublin Core (DC) ontology, a set of universally accepted metadata used to describe a resource (e.g. document).

The development and publishing of vocabulary using RDFS/OWL is one of the initial steps required to link relevant materials information across disparate (federated) sources. The development of common vocabularies could be jump-started via crowd sourcing and curated by materials subject matter experts (SME). Additionally, collaborative efforts with professional societies and other organizations (e.g. ASTM terminology standards, CEN, ASM, TMS, etc.) could be used to accelerate vocabulary/ontology development. Over time, multiple vocabularies would likely winnow down to key sets of generally accepted terms and mappings between terms having the same meaning. Taxonomies, a form of ontology, can express simple relationships in the materials domain.

More sophisticated relationships between materials processing, structure and properties can be expressed using complex ontologies. These ontologies need to be developed and implemented using World Wide Web Consortium (W3C) recommendations like RDF/OWL or widely accepted semantic technology standards such as time.owl and DC. As the above elements are being established on a larger scale, various forms of materials informatics could be developed to greatly expand the materials data and design space for the materials scientists and engineers.

Success requires innovative approaches during the development of agents to query linked materials data, applications to mash-up and integrate data, and reasoning/inferencing engines specifically tailored to the materials domain. Machine learning and other innovative “data hungry” approaches to extract knowledge could be developed and applied for materials design.

Project Description

(needs to be updated)

Federated Semantic Services Platform for Open Materials Science and Engineering This three-year project will undertake three broad classes of tasks. The first related to creating semantic infrastructure including ability to create semantic metadata for a variety of data types utilizing domain models and knowledge bases. The second relates to semantic search for all varieties of data, including resources with services based access. The third relates to development of a novel semantic data exchange scheme for materials science (termed Linked Open Materials Data) by developing an open data based approach

KDDM: Materials Database Knowledge Discovery and Data Mining Knoesis Center with the collaboration with AFRL/RX applying knowledge and technology in informatics to the material domains, thus introducing the materials and process community to better data management practices. A data exchange system that will allow researchers to index, search, and compare data will enable a shortened transition cycle in material science which is usually takes 5 to 10 years. This multi-disciplinary project seeks to span informatics and material science to fill this gap.

Complementary Activities Undertaken by Others

  • NIST

Use Cases

What do the materials scientists and engineers (S&E) want? Each of the use cases below assumes the triplets have been transformed into a vocabulary/ontology that describes all processes and intermediate products that lead to a finished material or product.

Discovery

While the primary objective of linked data is to enable machine-to-machine interoperability, it’s generally considered a best practice to expose the same data for human consumption. It’s easy to envision tools that allow a seasoned materials expert as well a student the ability to “follow-their-nose” through rich sets of metadata and data describing products and processes within the materials domain. As they fluidly traverse the network combing through a combination of structured and unstructured data and information, they are likely to gain awareness and understanding more readily than through the retrieval and perusal of documents returned via keyword searches. An additional advantage is the ability to capture structured data along the way and save for deeper analysis at a later date. Google’s Knowledge Graph, Bing’s Snapshot, Facebook’s Graph Search all exemplify steps in this direction.

Development for a Shared Materials Vocabulary

Guidance for Creating Vocabularies

A new (June 2013) W3C recommendation for capturing multidimensional data sets has been published. Perhaps this could be used to capture array data typically encountered in materials research (e.g. process time, temperature, pressure, degree-of-cure prediction).

Units

At some point we'll want to include "units" for the terms in the vocabulary. A couple of sources of information:

As OntoML does not cover units and quantities to the extent that is required within eCl@ss XML 1.0, an additional format named unitsML is included.

Vocabulary Sources

  • Linked Open Vocabularlies: Vocabularies (RDFS or OWL ontologies) used in the Linked Data Cloud. Here you will find vocabularies listed and individually described by metadata, classified by vocabulary spaces, interlinked using the dedicated vocabulary VOAF.

Current Draft Vocabulary via SPARQL Endpoint

[ http://knoesis.org/matvocab/ Namespace ]

Examples:

Milestones

Date Milestone
Sep 2013 Glossary received from ASM
Oct 2013 ASM Handbook 21 accessible via URL
Oct 2013 Mil-hdbks 5 and accessible via URL

Ontology Development

Existing Materials Ontologies or Schemas

Existing Ontologies

  • MatOnto 'How can we get a copy of this ontology?'
    • Towards an Ontology for Data-driven Discovery of New Materials (.pdf)‎ Materials scientists and nano-technologists are struggling with the challenge of managing the large volumes of multivariate, multidimensional and mixed-media data sets being generated from the experimental, characterisation, testing and post-processing steps associated with their search for new materials. In addition, they need to access large publicly available databases containing: crystallographic structure data; thermodynamic data; phase stability data and ionic conduction data. Materials scientists are demanding data integration tools to enable them to search across these disparate databases and to correlate their experimental data with the public databases, in order to identify new fertile areas for searching. Systematic data integration and analysis tools are required to generate targeted experimental programs that reduce duplication of costly compound preparation, testing and characterisation. This paper presents MatOnto – an extensible ontology, based on the DOLCE upper ontology, that aims to represent structured knowledge about materials, their structure and properties and the processing steps involved in their composition and engineering. The primary aim of MatOnto is to provide a common, extensible model for the exchange, re-use and integration of materials science data and experimentation. (circa 2008 or 2009)
  • MatSEEK
    • MatSEEK: An Ontology-Based Federated Search Interface for Materials Scientists (Feb 2009)
    • The MatSeek system is an ontology-based federated search interface to key materials science databases and analytical tools. By combining Semantic Web and Web 2.0 technologies, MatSeek provides materials scientists with a single Web interface that enables them to search across disparate databases containing crystal-structure data, ionic-conductivity data, and phase stability data; render 3D crystal-structure images; calculate bond lengths and angles; retrieve relevant scholarly references; and identify potential new materials with the structure and properties required to satisfy specific applications. The MatOnto ontology underlying MatSeek enables integration of data across disparate databases, and Web 2.0 technologies enable iterative searching across the databases. The results retrieved from searching the previous database are used as input to the query on the next database. By providing materials scientists with a single, integrated Web interface to the critical materials science databases and analytical tools, MatSeek represents a significant advance toward a full-fledged materials-informatics workbench.
  • MASON Can't open it with Protege
  • Plinius
    • The Plinius ontology of ceramic materials covers the conceptualisation of the chemical compositionof materials. The design decisions underlying ontology development at our group are discussed.The ontology of ceramic materials is given as a conceptual construction kit, involving several sets ofatomic concepts and construction rules for making complex concepts. One of its implementations,that in Ontolingua, is presented. source

Existing Schemas

  • EC MatDB
    • XML Related MatDB Tools for Data Exchange and Interoperability (proceedings) The web-enabled materials properties database MatDB of the European Commission Joint Research Centre (EC-JRC) is a database application for the storage, retrieval and evaluation of experimentally measured materials data coming from European R&D projects. Data exchange and interoperability are important database issues to reduce costs of expensive material tests. Many organizations world-wide are participating in the development of GEN IV reactors. To reduce costs the GEN IV International Forum has agreed to interoperate and exchange data for the screening and qualification of candidate materials. To simplify the complexity of data mapping between differently structured databases, adoption of a standardized XML schema is the favored option. The paper focuses on MatDB XML related tools and items: • Upgrade, extension and implementation of the MatDB XML schema within a planned US/EC cooperation; • European standardization activities for data exchange, interoperability and the development of standard formats for engineering materials data; • MatDB data cite participation.
    • EC MatDB Schema
Reviews and Synopsis

These schemas are at various stages of development, each with their own benefits and limitations. Most promising appears to be the EC MatDB schema, about which you can read more.

Ontology Development Approaches (beyond vocabularies)

The P^3-Triplet Approach

The materials domain consists of millions of concepts. Many are fairly static and well understood while others are evolving at a rapid pace. Intuition suggests that the materials design and development domain’s current state of chaos and intrinsic heterogeneity would benefit from a bit of structure. One conceptual construct that moves in that direction is Product-Process-Product Triplet. Triplets focus a smallish number of subject matter experts (SME) onto three naturally related materials domain entities: one or more materials or products that are subjected to a materials or fabrication process to yield a higher-value material or product. For example, epoxy resin and carbon fiber are subjected to a prepreg manufacturing process to yield a roll of prepreg material. Do we know if this construct will yield what the materials domain experts want? Not at this time, and there are other approaches for engineering a domain vocabulary or ontology. That said, P3-Triplets do have some inherent qualities that are compelling:

  • Each triplet is generally aligned with a subdomain of subject matter expertise. For example, a materials R&D organization of 500 scientists and engineers may only have a handful of processing experts for polymeric matrix composites.
  • Elements of each triplet seem to be generally aligned with protected information (e.g. intellectual property rights) which may ease the implementation of access control. For example, the properties associated with a composite material are generally made available to the commercial community; however, the processing steps used to create the composite may be closely held.
  • They enable a means to link the entire breadth of processes beginning with the extraction of raw material to the final process for a finished product.
  • Specific “important” product or processing activities can be expressed preferentially and made available for use. That is, you don’t have to build the entire skyscraper before someone can work in it.

Triplet Anatomy

Elements of a P3-triplet can themselves be considered to be elements of an RDF triple:

  • Subject - the “input” material or product
  • Predicate - the materials or manufacturing process
  • Object - the “output” material or product

Additionally, each triplet element consists of any number of relationships or assertions and generally take the form of an RDF triple. The assertions strive to express the important relationships between various parameters at the schema and instance levels. However, like Star Trek’s tribbles, triplets and their assertions can grow exponentially. Whether they become a troublesome mess or something that helps reduce chaos in the materials community remains to be seen.


Triplet.png

A series of P^3-Triplets for a Polymeric Matrix Composite (PMC). Note the product of a process results in a product that may be used as input for another process; therefore, the triplets overlap.

CompositeTripletSeries.PNG
  • Ontology Concept Elicitation Tool (OnCET) Development This tool is being designed and developed to directly elicit ontological concepts and their relationships from the user. The user provides the relationship (predicate) between two parameters or entities (concepts). The source for these relationships can be the user's subject matter expertise or captured while the user is reads a document from the materials and manufacturing domain.

Milestones

Date Milestone
dd mmm yyyy OnCET convert to web-based application
dd mmm yyyy Excel export of Triplets and triples
dd mmm yyyy Integration with iExplore

Materials Data

Materials Databases

  • European Commission Joint Research Centre
    • EC MatDB
  • National Institute for Aviation Research (NIAR)
  • MatWeb, Your Source for Materials Information The heart of MatWeb is a searchable database of material data sheets, including property information on thermoplastic and thermoset polymers such as ABS, nylon, polycarbonate, polyester, polyethylene and polypropylene; metals such as aluminum, cobalt, copper, lead, magnesium, nickel,steel, superalloys, titanium and zinc alloys; ceramics; plus semiconductors, fibers, and other engineering materials. There are over 59,000 data sheets in the collection.
    • Subjects Include: Aluminum, Ceramics, Materials Science, Metals, Nylons, Polycarbonate, Polyester, Polymers, Steels, and Titanium
    • Publisher: Automation Creations, Inc.

Materials Data Models

A significant amount data used for materials development and usage is tabular in nature. One approach being explored the use of W3C's Data Cube and QUDT ontologies. QUDT is being co-developed by NASA-Ames and TopQuadrant .

Useful References

Process Modeling

Data Modeling, Feature Identities, Descriptors and Handles with Philip Sargent Cambridge, UK

Modelling Materials Processing:An overview by Philip Sargent, Cambridge, UK

Materials Information and Conceptual Data Modelling

Msc

Non-structured Materials Science Data Sharing Based on Semantic Annotation

Towards an Ontology for Data-driven Discovery of New Materials

Integrated Computational Materials Engineering (ICME)

Ontology

Related Links

Read latest news in Material Science here: Materialstoday

Kno.e.sis Semantic Tools

Contact us

knoesismat@gmail.com

Generally Useful Information

Keyboard Shortcuts

  • <shift><click> on a link will open the linked page in a new window

Software Tools

  • Graphical representations (saved as pdf) of the schemas were created using QXmlEdit