MUDDIS

From Knoesis wiki
Jump to: navigation, search

Research Project

MUDDIS, Multidimensional Semantic Integration Approach for Knowledge Discovery

Project Summary

MUDDIS is a collaboration project with National institute of health. The driving principle in using multi-dimensional approach is to create effective domain specific knowledge discovery, based on gene annotation with the use of scientific and provenance information from different resources. Find the similarity between genes can be useful in different areas of life science and biomedical fields such as model organism research and drug discovery in human. We regroup genes based on their functional annotations, structural annotations, genes responsible for disorders and gene-drug interactions. The novelty of this work is to query through different data sources and make a collection of data for similarity calculation calculated in different levels of granularity. Data from literature, open public databases such as OMIM and gene-centered information at NCBI are used as individual resources for different feature of the gene. Each additional features increases the value of knowledge that can be explained within individual resources.

To illustrate the utility of MUDDIS, we designed an evaluation frame work and discussed the correlation between score of MUDDIS similarity and score from structural similarity such as HomoloGene ot compared with curated similarity data such as GMI. The significant finding is candidates for knowledge discovery and allow domain experts to identify, produce and verify new hypotheses.

Contribution

  • Data integration for finding the similarity between genes for knowledge discovery.
  • Describing different features for gene annotation.
  • Extracting data from different data sets from structural to unstructured data sets.
  • Providing a systematic approach for finding the similarity between genes from term-term similarity to set-set, feature-feature and finally gene-gene similarity based on semantic similarity.
  • Providing a comprehensive evaluation frame work for this platform.

Candidate Features for Annotation

  • Functional annotations
  • Structural annotations
  • Genes responsible for disorders
  • Gene-drug interactions

Data Sources

Data sources in this study can be classified into three types of data: a) data extracted from scientific literature and academic articles, b) data from structured data sets and well known sources, and c) data curated with experts.

  • Entrez Gene is an integration system developed by NCBI to extract data for different features of the gene of interest. It allows to access Gene Ontology terms, articles related to the gene, PMIDs and MeSH terms assigned to these articles to get disorders and drugs related to the gene of interest.
  • HomoloGene is used for evaluation part. It detects homologs among the annotated genes of several completely sequenced eukaryotic genomes. The scores of the HomoloGene are from sequence alignment for Both DNA and Protein sequences.
  • MGI uses Mouse Genome Informatics to access integrated curated data on mouse genes and genome features, from sequences and genomic maps to gene expression and disease models. It is a repository for raw data and detailed protocols from the Mouse Phenome Project, it collects baseline phenotypic data on genetically diverse and commonly used inbred mouse strains.

Project Period

2012 – Current

Use cases

  • Make a Foundation for Knowledge Discovery.
  • A Solution for Gene Fusions Problem.
  • Comparing with Phylogeny Tree to find the Missing Information.

Collaborative Partners

Kno.e.sis Center, Wright State University University of Georgia

External Collaborations

National Institute of Health

Contact

Dr. Mary Panahiazar