Manuscript Details

From Knoesis wiki
Jump to: navigation, search
A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi.

Research in the parasite domain requires analyzing experimental lab data along with relevant public data resources. This task is difficult for biologists who may not possess adequate computational skills to process all the data that are in different format and stored at different locations. We, therefore, developed an intuitive and easy to use semantic problem solving environment (SPSE) for parasite research where researchers may query their lab data that is integrated with public data resources using ontologies. Other features of SPSE include integrated support for capturing and querying provenance information, and a visual query-processing tool that allows biologists to formulate complex queries without learning the query language syntax. We demonstrate the significance of SPSE by using it to query integrated data to identify gene knockout and/or other intervention (i.e., drug or vaccination) targets for T. cruzi. These queries help parasite researchers discover new or existing knowledge, which is implicitly present in the data. The evaluation results of the SPSE demonstrate improved usability than existing systems/approaches and support for design of complex queries, which was not present earlier. This work was completed in collaboration with Center for Tropical and Emerging Global Diseases(CTEGD), University of Georgia and Intelligent Thought and Action Lab, THINC Lab, University of Georgia, and is currently under review for the publication in the PLoS Neglected Tropical Diseases journal.

Below we mention all three queries along with Cuebee demo that shows how complex SPARQL queries can be formulated easily with minimal skills and ontology background.

Query 1: Show proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.

Query 1, Cuebee Demo
Download High Quality Video

SPARQL Query:

PREFIX BASE: <http://knoesis.wright.edu/ParasiteExperiment.owl#>
PREFIX PATHWAY: <http://purl.org/obo/owlapi/pathway#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX so: <http://purl.org/obo/owl/sequence#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX NCI: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>

SELECT ?gene ?microarray ?pathwaylabel ?logbasevalue ?entry ?pathway
FROM <http://knoesis.wright.edu/tcruzi4>
WHERE { GRAPH <http://knoesis.wright.edu/tcruzi4>
{
?microarray BASE:has_output_value ?gene .
?microarray BASE:has_output_value ?logbase .
?microarray rdf:type NCI:Microarray_Analysis .
?logbase rdf:type BASE:log_base2_ratio .
?logbase BASE:has_value ?logbasevalue .
?gene rdf:type so:gene .
?entry owl:sameAs ?gene .
?entry rdf:type so:gene .
?entry BASE:involved_in ?pathway .
?pathway rdf:type BASE:pathway .
?pathway rdfs:label ?pathwaylabel
FILTER regex(?microarray , "epimastigote") .
FILTER (?logbasevalue < -1) .
{
	SELECT  ?gene count(?pathway)
	FROM <http://knoesis.wright.edu/tcruzi4>
	WHERE { GRAPH <http://knoesis.wright.edu/tcruzi4>
	{
	?microarray BASE:has_output_value ?gene .
           ?microarray rdf:type NCI:Microarray_Analysis .
	?gene rdf:type so:gene .
	?entry owl:sameAs ?gene .
	?entry rdf:type so:gene .
	?entry BASE:involved_in ?pathway .
	?pathway rdf:type BASE:pathway .
	?pathway rdfs:label ?pathwaylabel
	FILTER regex(?microarray , "epimastigote") .
	}
	}
	group by ?gene
	having count(?pathway) = 1
	}
}
}


Query 2: Give the gene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosoma brucei.

Query 2, Cuebee demo
Download High Quality Video

SPARQL Query:

PREFIX BASE: <http://knoesis.wright.edu/ParasiteExperiment.owl#>
PREFIX PATHWAY: <http://purl.org/obo/owlapi/pathway#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX so: <http://purl.org/obo/owl/sequence#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX NCI: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX provenir: <http://knoesis.wright.edu/provenir/provenir.owl#>
PREFIX ro: <http://obofoundry.org/ro/ro.owl#>
PREFIX PLO: <http://paige.ctegd.uga.edu/ParasiteLifecycle.owl#>

SELECT DISTINCT ?gene ?KOStatus ?KOLog ?knockoutProject ?StrainID  ?StrainStatusURL ?strainSummary
FROM <http://knoesis.wright.edu/tcruzi4>
WHERE { GRAPH <http://knoesis.wright.edu/tcruzi4>
{
?knockout provenir:has_parameter ?gene .
?knockout ro:part_of ?knockoutProject .
?knockoutProject rdf:type BASE:knockout_project_protocol .
?gene rdf:type so:gene .
?StrainID provenir:has_parameter ?gene .
?StrainID rdf:type BASE:strain_creation_protocol .
?knockoutProject provenir:has_parameter ?Log .
?Log rdf:type BASE:KOlog .
?Log rdfs:label ?KOLog .
?knockoutProject provenir:has_parameter ?StatusURL .
?StatusURL rdf:type BASE:status .
?StatusURL rdfs:label ?KOStatus .
?microarray BASE:has_output_value ?logbase .
?microarray rdf:type NCI:Microarray_Analysis .
?microarray BASE:has_output_value ?gene .
?logbase rdf:type BASE:log_base2_ratio .
?logbase BASE:has_value ?log2ratio .
?gene BASE:is_orthologous_to ?ortholog .
?ortholog rdf:type so:gene .
?ortholog ro:derives_from ?x .
?x rdf:type PLO:organism .
FILTER regex(?x, "LMA") .
FILTER regex(?microarray, "amastigote") .
FILTER (?log2ratio > 1) .
 OPTIONAL
{?StrainID provenir:has_parameter ?StrainStatusURL .
 ?StrainStatusURL rdf:type BASE:status. }
OPTIONAL { ?StrainID provenir:has_parameter ?StrainSummaryURL .
 ?StrainSummaryURL rdf:type BASE:strain_summary.
 ?StrainSummaryURL rdfs:label ?strainSummary .
}
}
}


Query 3: Give the strain summary for all strains created through single or double knockout of a protein kinase gene.

Query 3, Cuebee demo
Download High Quality Video

SPARQL Query:

PREFIX BASE: <http://knoesis.wright.edu/ParasiteExperiment.owl#>
PREFIX PATHWAY: <http://purl.org/obo/owlapi/pathway#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX so: <http://purl.org/obo/owl/sequence#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX NCI: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX provenir: <http://knoesis.wright.edu/provenir/provenir.owl#>
PREFIX ro: <http://obofoundry.org/ro/ro.owl#>

SELECT DISTINCT ?StrainID ?strainSummary
WHERE { GRAPH <http://knoesis.wright.edu/tcruzi4>
{
?gene BASE:has_GO_annotation ?go_term .
?go_term rdfs:label ?GOtermLable .
?gene rdf:type so:gene .
?go_term rdf:type BASE:GO_term .
FILTER regex(?GOtermLable, "kinase") .
FILTER regex(?GOtermLable, "protein") .
?StrainID provenir:has_parameter ?gene .
?StrainID rdf:type BASE:strain_creation_protocol .
?StrainID provenir:has_parameter ?StrainSummaryURL .
?StrainSummaryURL rdf:type BASE:strain_summary.
?StrainSummaryURL rdfs:label ?strainSummary .
}
}