Difference between revisions of "Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus"

From Knoesis wiki
Jump to: navigation, search
(Motivation and Background)
(Motivation and Background)
Line 6: Line 6:
 
* terminology research
 
* terminology research
 
* data mining
 
* data mining
 
[[File:UMLS.png| The various subdomains integrated in the UMLS [1] |800px|center |link=https://www.researchgate.net/profile/Olivier-Bodenreider/publication/8954995/figure/fig1/AS:280280456810498@1443835476088/The-various-subdomains-integrated-in-the-UMLS.png]]
 
  
 
The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.  
 
The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.  
  
With over [https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html 200 source vocabularies], the construction and maintenance process, however, is costly, time-consuming, and error-prone as it primarily relies on
+
Given the [https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/notes.html current size] of the Metathesaurus with 16.1 million atoms from [https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html 220 source vocabularies] grouped into 4.44 million concepts, the construction and maintenance process, however, is costly, time-consuming, and error-prone as it primarily relies on
 
* lexical and semantic processing for suggesting groupings of synonymous terms
 
* lexical and semantic processing for suggesting groupings of synonymous terms
 
* the expertise of UMLS editors for curating these synonymy predictions
 
* the expertise of UMLS editors for curating these synonymy predictions
  
 
==Opportunities==
 
==Opportunities==
The [https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/notes.html current version] of the Metathesaurus contains approximately 4.44 million concepts and 16.1 million unique concept names from 218 source vocabularies. Such availability (size and diversity) of human-curated knowledge, coupled with the advent of statistical and symbolic AI research <sup>[https://scholarcommons.sc.edu/aii_fac_pub/524/ 2]</sup> open up opportunities to improve the UMLS Metathesaurus construction and maintenance process. Specifically by developing novel approaches to aid and complement the efforts of the UMLS human editors in the insertion and updates of new biomedical vocabularies in the existing UMLS Metathesaurus for future releases.
+
The enormous knowledge accumulated over 30 years of manual curation coupled with the advent of statistical and symbolic AI research <sup>[https://scholarcommons.sc.edu/aii_fac_pub/524/ 2]</sup> open up opportunities to improve the UMLS Metathesaurus construction and maintenance process. Specifically by developing novel approaches to aid and complement the efforts of the UMLS human editors in the insertion and updates of new biomedical vocabularies in the existing UMLS Metathesaurus for future releases.
  
==Preliminary Work==
+
==Biomedical Vocabulary Alignment at Scale==
 +
Initial efforts include defining and addressing terminology integration at the full scale and diversity of the UMLS Metathesaurus using a supervised learning-based approach, specifically in
 +
* assessing the feasibility of using deep learning (DL) techniques for terminology integration at scale in the UMLS Metathesaurus and
 +
* developing a scalable supervised learning approach to improve synonymy predictions compared to the current lexical and semantic processing in the Metathesaurus
  
 +
<embedvideo service="youtube">https://www.youtube.com/watch?v=9G14WFM7lCg>
  
 
==Learn More about the UMLS Metathesaurus==
 
==Learn More about the UMLS Metathesaurus==
 
* [https://www.nlm.nih.gov/research/umls/quickstart.html Quick Start Guide]
 
* [https://www.nlm.nih.gov/research/umls/quickstart.html Quick Start Guide]
 +
 +
==People==
 +
*'''Artificial Intelligence Institute, University of South Carolina''':
 +
**[https://sc.edu/study/colleges_schools/engineering_and_computing/faculty-staff/amitsheth.php Advisor: Dr. Amit P. Sheth]
 +
**[https://www.linkedin.com/in/joeyyip/ Hong Yung (Joey) Yip]
 +
**[https://www.linkedin.com/in/thilini-w/ Thilini Wijesiriwardene]
 +
 +
  
 
==References==
 
==References==
 
#Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl_1), D267-D270.
 
#Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl_1), D267-D270.
 
#Sheth, A., & Thirunarayan, K. (2021). The Duality of Data and Knowledge Across the Three Waves of AI. arXiv preprint arXiv:2103.13520.
 
#Sheth, A., & Thirunarayan, K. (2021). The Duality of Data and Knowledge Across the Three Waves of AI. arXiv preprint arXiv:2103.13520.

Revision as of 17:47, 9 September 2021

Motivation and Background

The Unified Medical Language System (UMLS) 1 is a rich repository of biomedical vocabularies developed by the US National Library of Medicine. It is an effort to overcome challenges to the effective retrieval of machine-readable information. One of which is the variety of ways the same concepts are expressed by different terminologies or interchangeably known as source vocabularies. For example, the concept of "Addison’s Disease" is expressed as "Primary hypoadrenalism" in the Medical Dictionary for Regulatory Activities (MedDRA) and as "Primary adrenocortical insufficiency" in the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10). The lack of integration between these synonymous terms often leads to poor interoperability between information systems (i.e. how does one map a concept from one terminology to another) and confusion among health professionals. Hence, the UMLS aims to integrate and provide cross-walk among various terminologies as well as facilitate the creation of more effective and interoperable biomedical information systems and services, including electronic health records. Till date, it is increasingly being used in areas such as

  • linking health information, medical terms, drug names, and billing codes across different computer systems
  • public health statistics reporting
  • search engine retrieval
  • terminology research
  • data mining

The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.

Given the current size of the Metathesaurus with 16.1 million atoms from 220 source vocabularies grouped into 4.44 million concepts, the construction and maintenance process, however, is costly, time-consuming, and error-prone as it primarily relies on

  • lexical and semantic processing for suggesting groupings of synonymous terms
  • the expertise of UMLS editors for curating these synonymy predictions

Opportunities

The enormous knowledge accumulated over 30 years of manual curation coupled with the advent of statistical and symbolic AI research 2 open up opportunities to improve the UMLS Metathesaurus construction and maintenance process. Specifically by developing novel approaches to aid and complement the efforts of the UMLS human editors in the insertion and updates of new biomedical vocabularies in the existing UMLS Metathesaurus for future releases.

Biomedical Vocabulary Alignment at Scale

Initial efforts include defining and addressing terminology integration at the full scale and diversity of the UMLS Metathesaurus using a supervised learning-based approach, specifically in

  • assessing the feasibility of using deep learning (DL) techniques for terminology integration at scale in the UMLS Metathesaurus and
  • developing a scalable supervised learning approach to improve synonymy predictions compared to the current lexical and semantic processing in the Metathesaurus