Difference between revisions of "CyberInfrastructure Proposal For EarthCube Community"

From Knoesis wiki
Jump to: navigation, search
Line 8: Line 8:
  
  
== Publishing Data in Original Form/Legacy ==
+
== Publishing Data in Original Form/Legacy Data ==
  
 
This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.  
 
This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.  
  
 
== Publishing Data in Linked Data ==  
 
== Publishing Data in Linked Data ==  
 +
 +
  
 
== Architecture ==
 
== Architecture ==

Revision as of 21:26, 30 July 2012

This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Proposed system mainly supports three scenarios which can be identified in separate work flows.


Publishing Data in Digital Format

This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity.


Publishing Data in Original Form/Legacy Data

This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.

Publishing Data in Linked Data

Architecture

Architecture2.png

  • Data Registry

Data publishers will register their data through the data registry and important provenance information such as author, location and etc will be collected.

  • Annotator

Registered data will be annotated using standard vocabularies such as (GCMD and AGI index) which is stored in a vocabulary registry. Annotation tools will suggest the possible matches for the user and user will have the ability to further refine the suggestions given by the system. Annotations will be stored in the Meta Data Store.

  • Indexer

Collected data and its associated meta data will be indexed to facilitate Searching.

  • Simple Search

Simple Search facilitates key word based queries where user can specify some key words and system will provide a ranked list of results.

  • Faceted Search

In addition to the Simple Search functionality system will provide the Faceted Search where user can provide the key value pairs to search/discover data. Users have the ability to incoperate provenance for search.

  • Mapping to RDF

As defined in the more advanced work flow given data can be transformed to RDF using existing tools and this allows data publishers to publish the data in a standard form.

  • Data Publisher

This component will upload the RDF converted data into Linked Open Data and it will be accessed and queried from any where in the world.

  • Semantic Browsing

Semantic Browsing will allow us to navigate through the RDF data sets which is based on the triples.

Form of Data

Table

Geographic-impacts-table.png

Image

Gis relief 600.jpg

Unstructured Data

Unstructuredtostructured.jpg

Links

Annotator - Kino http://wiki.knoesis.org/index.php/Kino

Semantic Browsing - iExplore http://knoesis.wright.edu/iExplore/