Revision as of 21:46, 30 July 2012

Objective =

This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Basically the proposed platform will allow scientists to upload their data and make it standardized using well known vocabularies. System will associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI such as GCMD and AGI. Users will be able to further refine the automatically suggested annotation based on their preference. Once published the data users can use keywords, topic terms to discover resources. Faceted search functionality will also provide based on the collected annotation. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity.

Publishing Data in Original Form/Legacy Data

This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.

Publishing Data in Digital Format

This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics.

Publishing Data in Linked Data

Architecture

The following image illustrates the architecture of the proposed system.

Data Registry

Data publishers will register their data through the data registry and important provenance information such as author, location and etc will be collected.

Annotator

Registered data will be annotated using standard vocabularies such as (GCMD and AGI index) which is stored in a vocabulary registry. Annotation tools will suggest the possible matches for the user and user will have the ability to further refine the suggestions given by the system. Annotations will be stored in the Meta Data Store.

Indexer

Collected data and its associated meta data will be indexed to facilitate Searching.

Simple Search

Simple Search facilitates key word based queries where user can specify some key words and system will provide a ranked list of results.

Faceted Search

In addition to the Simple Search functionality system will provide the Faceted Search where user can provide the key value pairs to search/discover data. Users have the ability to incoperate provenance for search.

Mapping to RDF

As defined in the more advanced work flow given data can be transformed to RDF using existing tools and this allows data publishers to publish the data in a standard form.

Data Publisher

This component will upload the RDF converted data into Linked Open Data and it will be accessed and queried from any where in the world.

Semantic Browsing

Semantic Browsing will allow us to navigate through the RDF data sets which is based on the triples.

@@ Line 1: / Line 1: @@
-This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Proposed system mainly supports
+=Objective ==
-three scenarios which can be identified in separate work flows.
+This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Basically the proposed platform will allow scientists to upload their data and make it standardized using well known vocabularies. System will associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI such as GCMD and AGI. Users will be able to further refine the automatically suggested annotation based on their preference. Once published the data users can use keywords, topic terms to discover resources. Faceted search functionality will also provide based on the collected annotation. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity.
-== Publishing Data in Digital Format ==
-This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity.
+=== Publishing Data in Original Form/Legacy Data ===
+This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.
+=== Publishing Data in Digital Format ===
+This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics.
-== Publishing Data in Original Form/Legacy Data ==
-This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.
-== Publishing Data in Linked Data ==
+=== Publishing Data in Linked Data ===
 == Architecture ==
+The following image illustrates the architecture of the proposed system.
 [[File:Architecture2.png]]

Difference between revisions of "CyberInfrastructure Proposal For EarthCube Community"

Revision as of 21:46, 30 July 2012

Contents

Objective =

Publishing Data in Original Form/Legacy Data

Publishing Data in Digital Format

Publishing Data in Linked Data

Architecture

Form of Data

Table

Image

Unstructured Data

Links

Navigation menu

Views

Personal tools

Navigation

Homepage

Search

Tools