Difference between revisions of "CyberInfrastructure Proposal For EarthCube Community"
Line 1: | Line 1: | ||
− | + | =Objective == | |
− | + | ||
+ | This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Basically the proposed platform will allow scientists to upload their data and make it standardized using well known vocabularies. System will associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI such as GCMD and AGI. Users will be able to further refine the automatically suggested annotation based on their preference. Once published the data users can use keywords, topic terms to discover resources. Faceted search functionality will also provide based on the collected annotation. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity. | ||
− | |||
− | This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics | + | === Publishing Data in Original Form/Legacy Data === |
+ | |||
+ | This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data. | ||
+ | |||
+ | |||
+ | === Publishing Data in Digital Format === | ||
+ | |||
+ | This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics. | ||
− | |||
− | |||
− | == Publishing Data in Linked Data == | + | === Publishing Data in Linked Data === |
== Architecture == | == Architecture == | ||
+ | The following image illustrates the architecture of the proposed system. | ||
[[File:Architecture2.png]] | [[File:Architecture2.png]] | ||
Revision as of 21:46, 30 July 2012
Contents
Objective =
This proposal proposes a cyberInfrastructure for long tails scientists to share and discover their data. Basically the proposed platform will allow scientists to upload their data and make it standardized using well known vocabularies. System will associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI such as GCMD and AGI. Users will be able to further refine the automatically suggested annotation based on their preference. Once published the data users can use keywords, topic terms to discover resources. Faceted search functionality will also provide based on the collected annotation. System will provide flexible search and tools for harmonizing structural, content and semantic heterogeneity.
Publishing Data in Original Form/Legacy Data
This will allow long tail scientists to publish data in original form which can be in technical papers and etc. This will provide tools to read the data in original form and annotate the data.
Publishing Data in Digital Format
This workflow will serve the need for long tail scientists who have their data in a digital format. Essentially this will include data in EXCEL, CSV and relational data base format. Users will be given the ability to upload their data in to the system and automatically associate/annotate documents and captions with terms from community developed vocabularies such as GCMD and AGI. Consumers can use keywords and topic terms to discover resources. Tables and images can be searched via captions using keywords and topic terms that can be standardized for semantics.
Publishing Data in Linked Data
Architecture
The following image illustrates the architecture of the proposed system.
- Data Registry
Data publishers will register their data through the data registry and important provenance information such as author, location and etc will be collected.
- Annotator
Registered data will be annotated using standard vocabularies such as (GCMD and AGI index) which is stored in a vocabulary registry. Annotation tools will suggest the possible matches for the user and user will have the ability to further refine the suggestions given by the system. Annotations will be stored in the Meta Data Store.
- Indexer
Collected data and its associated meta data will be indexed to facilitate Searching.
- Simple Search
Simple Search facilitates key word based queries where user can specify some key words and system will provide a ranked list of results.
- Faceted Search
In addition to the Simple Search functionality system will provide the Faceted Search where user can provide the key value pairs to search/discover data. Users have the ability to incoperate provenance for search.
- Mapping to RDF
As defined in the more advanced work flow given data can be transformed to RDF using existing tools and this allows data publishers to publish the data in a standard form.
- Data Publisher
This component will upload the RDF converted data into Linked Open Data and it will be accessed and queried from any where in the world.
- Semantic Browsing
Semantic Browsing will allow us to navigate through the RDF data sets which is based on the triples.
Form of Data
Table
Image
Unstructured Data
Links
Annotator - Kino http://wiki.knoesis.org/index.php/Kino
Semantic Browsing - iExplore http://knoesis.wright.edu/iExplore/