Knoesis Alchemy of Healthcare A semantic platform providing the rich functionality of NLP and Machine learning over an integrated knowledge base comprising of diverse domain-specific medical/healthcare-related knowledge sources and Kno.e.sis health ontologies.

Overview

There has been a substantial increase in the need for health-related support systems in medicine and self-diagnosis. Existing support systems are developed using labor-intensive statistical learning processes. Our approach utilizes existing human-curated knowledge bases (KBs) in the medical domain and creates a domain-specific knowledge graph. Some of the commonly used KBs in the medical domain is ICDs (International Classification of Diseases), SNOMED-CT (Systematized Nomenclature of Medicine-Clinical Terms) among many others.

We are developing a comprehensive platform, Kno.e.sis Alchemy of Health (kAofH), that combines extensive health knowledge graph, machine/deep learning and natural language processing (NLP) techniques for extracting features from the biomedical, clinical, and general health-related text. The kAofH aims to perform critical and goal-oriented tasks, namely,

Intent identification
Implicit entity recognition
Entity linking using ezDI biomedical knowledge source
Relationship extraction utilizing compound entities
Segment labeling of biomedical unstructured/structured data
Event extraction, nuance extraction
Domain-specific sentiment mining in healthcare
Relevant Specialization Identification for Web-Based Intervention
Severity identification-classification.
Build a patient and domain-centric evolving knowledge graph (KG)

These components of the kAofH are created based on prior works on NLP and Semantic Technologies at Kno.e.sis. Furthermore, the kAofH platform is envisioned and designed to evolve and support other social good projects at Kno.e.sis.

Use Cases

The prominent use-cases of kAofH include entity and relation extraction from the unstructured text for an evolving knowledge graph.

Entity Recognition Use Case

Health-related and user-generated text from sources such as social media, forums (e.g., Twitter, Reddit, Patient.info) pose a challenge due to nature of the language being used to implicitly express ideas, opinions, and facts. For example, “his tip of the appendix is inflamed”. An entity to be implicitly recognized in this example is ‘Appendicitis’ from the concept ‘inflammation of appendix’. Identification of such implicit mentions requires the presence of domain knowledge (e.g., PubMed Health, UMLS). It gets even more complicated in the biomedical domain where entities seldom occur as single entities. For example, “An excessive endogenous or exogenous stimulation by estrogen induces adenomatous hyperplasia of the endometrium”. In this sentence, adenomatous hyperplasia and endometrium occur as one compound entity, where they could have been captured as single entities. Appropriate background knowledge resources such as UMLS (Unified Medical Language System) can be leveraged for more complex tasks, such as recognition and validation of compound entities.

Relation Extraction Use Case

Extracting relationships from a text is an important task and is key to many health-related applications, including biomedical knowledge discovery. Prior work has modeled the problem of relation extraction using statistical learning methods. Such methods include Bayesian approaches, Markov Random Fields, Co-occurrence methods, Rule-based methods, Mutual Information measures, etc. The requirement of a large amount of labeled data and failure in identifying complex relations have obstructed the influence of such methods. However, with the support of background knowledge of the relevant domain can enhance the efficiency of the model. We present two examples to state our use-case:

Considering the following example:

  “Patients with congenital heart problem can be treated with lasix.”

In this straightforward healthcare text, entities that can be identified using domain knowledge are; congenital health disease (DBpedia or UMLS) and lasix (MedDRA or DBpedia). Latter is the brand name of the drug Furosemide. One can employ the relationBayesian approach ( conditional random fields), or state-of-the-art embedding approaches to identify the relationship between entities, that is treated.

Considering the following example taken from BioInfer, how to identify the relation between biomedical entities:

 “Cloning and characterization of the mouse MColn1 gene reveal an alternatively spliced transcript not seen in humans.”

For this particular example, in figure 1 below, we illustrate how our approach extracts relations by employing knowledge source for the contextual enrichment of the content.

Figure 1: Relation Extraction Use case explained using background knowledge and statistical models.

Personalized Health Knowledge Graph (PHKG) from Unstructured Text

A “personalized” Health Knowledge Graph (HKG) needs to be created because health conditions (e.g., symptoms, side-effects, prescribed drugs, living conditions) of two patients with the same disease will likely be different. Such insight resource that is specific to the patient, will provide a more meaningful support for clinicians to intervene appropriately. For instance, Patient A and Patient B are suffering from Asthma while Patient A is moderately allergic to Pollen while Patient B is severely affected by Pollen. kAofH provides a platform that can help create PHKG with integrated appropriate knowledge sources. Figure 2 outlines the creation of PHKG medical knowledge sources that we utilize in performing semantic annotation of the content and, thus, generating a patient-specific KG.

Figure 2: Personalized Health Knowledge Graph using Reddit on r/Health subreddit. Implicit relations are discovered after 1 hop and Explicit relations are present in 1 hop.

Evolving Knowledge Graph

While PHKG is a critical component in defining the health conditions of the patient, the dynamic nature of the treatment process of a patient requires PHKG to be dynamic as well. A static KG will not be able to capture the progression of a disease, and the intervention will not be conducive. Hence, we require the knowledge graph to be evolving, adding the missing entities, new entities, the missing relations, and new relations discovered over time, contextualizing with the patient health status.

Augmented Personalized Health (APH)

APH seeks to improve healthcare by aggregating, analyzing, and personalizing the use of relevant physical, cyber, and social data obtained from wearables, sensors, Internet of Things (IoT), mobile applications, Electronic Medical Records (EMRs), web-based information (social media, forums, etc.). Diseases like Obesity and Asthma need timely monitoring, evaluation, and inferencing. Collected data requires significant human (clinician) involvement to convert data tables to information for proper decision making. kAofH can be employed, in minimizing the human involvement and automating the processes from data collection to information extraction and storage in machine-readable form. Considering the project on KHealth involving health data collected from wearables and sensors, kAofH can aid in creating rules by utilizing appropriate background knowledge, which will assist clinicians in treating their patients.

Figure 3: kAofH can support Augmented Personalized Health with diverse and rich information

Kno.e.sis Alchemy API Architecture

The API is envisioned to integrate multiple knowledge sources of relevance to projects in Kno.e.sis. Furthermore, it will contain ontologies developed in Kno.e.sis, such as Drug Abuse Ontology and Kno.e.sis KHealth Asthma Ontology. This will be utilized in annotating, disambiguating, and normalization of the content from social media, Blogs, and Forums.

Currently, the API integrates the following medical knowledge bases (KBs): SNOMED-CT, ICD-10, and MedDRA providing easy access to the main medical terminologies. Figure 4, shows vital components that are integrated into Kno.e.sis Alchemy API. In figure 5, we provide an example subgraph created using the API from a Reddit post that is in unstructured form. The API creates the graph in “TTL” format. For the sake of representation, the graph has been re-created using draw.io. The API performs optimized full-text search, access to terms, synonyms, and mapping between multiple KBs to avoid duplication. Further, it manages concepts, and relations between concepts using an efficient data structure.

Figure 4: Integration of Relevant Knowledge Sources to Kno.e.sis Alchemy API

Figure 5: Contextual Subgraph created using Alchemy API and stored in turtle format. Refer to figure 2 for the annotated text.

Team

Faculty

Dr. Amit P. Sheth (Kno.e.sis, Wright State University)
Dr. Krishnaprasad Thirunarayan (Kno.e.sis, Wright State University)

External Collaboration

Dr. Saeedeh Shekarpour (University of Dayton, Former Postdoc @ Kno.e.sis)
Ugur Kursuncu (University of Georgia, Visiting Researcher @ Kno.e.sis)
Shweta Yadav (Indian Institute of Technology, Patna, Visiting Researcher @ Kno.e.sis)

Post-doc

Dr. Amelie Gyrard (Kno.e.sis, Wright State University)

Graduate Students

Manas Gaur (Kno.e.sis, Wright State University)
Swati Padhee (Kno.e.sis, Wright State University)
Joy Prakash Sain (Kno.e.sis, Wright State University)
Amanuel Alambo (Kno.e.sis, Wright State University)

Alumni

Dr. Sarasi Lalithsena (Former PhD at Kno.e.sis, currently at IBM Almaden)
Dr. Sujan Perera (Former PhD at Kno.e.sis, currently at IBM Almaden)

Related Projects

Publications

Manas Gaur, Ugur Kursuncu, Alambo A, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, Jyotishman Pathak. "Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention. In The 27th ACM International Conference on Information and Knowledge Management (CIKM’18). Torino, Italy: Association for Computing Machinery; 2018 [1].
Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth and I. Budak Arpinar. "Predictive Analysis on Twitter: Techniques and Applications". Book Chapter in "Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining", (2018).
Amanuel Alambo, Manas Gaur, Ugur Kursuncu, Krishnaprasad Thirunarayan, Jeremiah Schumm, Jyotishman Pathak, Amit Sheth, “Personalized Prediction of Suicidal Risk for Web-based Intervention”, 24th Mental Health Services Research Conference, National Institute of Mental Health, (2018).
Amelie Gyrard, Manas Gaur, Krishnaprasad Thirunarayan, Amit Sheth and Saeedeh Shekarpour. Personalized Health Knowledge Graph. 1st Workshop on Contextualized Knowledge Graph (CKG) co-located with International Semantic Web Conference, (2018).
Ugur Kursuncu, Manas Gaur, Usha Lokala, Anurag Illendula, Krishnaprasad Thirunarayan, Raminta Daniulaityte, Amit Sheth, & I. Budak Arpinar. “What's ur type?” Contextualized Classification of User Types in Marijuana-related Communications using Compositional Multiview Embedding. arXiv preprint arXiv:1806.06813, (2018).
Soon Jye Kho, Swati Padhee, Goonmeet Bajaj, Krishnaprasad Thirunarayan, Amit Sheth. "Domain-specific Use Cases for Knowledge-enabled Social Media Analysis." Book Chapter in Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining, (2018).

References

Jadhav, Ashutosh. "Knowledge Driven Search Intent Mining." (2016).
Perera, Sujan, “Knowledge-driven Implicit Information Extraction”, (2016).
Cameron, Delroy Huborn. “A context-driven subgraph model for literature-based discovery”, (2014).
Ramakrishnan, Cartic. “Extracting, Representing and Mining Semantic Metadata from Text: Facilitating Knowledge Discovery in Biomedicine”, (2008).
Sahu, Sunil Kumar, et al. "Relation extraction from clinical texts using domain invariant convolutional neural network." arXiv preprint arXiv:1606.09370 (2016).
Ghassemi, Marzyeh, et al. "Opportunities in Machine Learning for Healthcare." arXiv preprint arXiv:1806.00388 (2018)
Pyysalo, Sampo, et al. "BioInfer: a corpus for information extraction in the biomedical domain." BMC bioinformatics 8.1 (2007)

Knoesis Alchemy of Healthcare

Contents