Difference between revisions of "Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus"

From Knoesis wiki
Jump to: navigation, search
(Created page with "==Motivation and Background== The '''Unified Medical Language System''' (UMLS) <sup>[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308795/ 1]</sup> is a rich repository of biom...")
 
(Motivation and Background)
Line 7: Line 7:
 
* data mining
 
* data mining
  
[[File:UMLS.png| The various subdomains integrated in the UMLS [1] |800px|center |link=https://www.ncbi.nlm.nih.gov/corecgi/tileshop/tileshop.fcgi?p=PMC3&id=643663&s=12&r=1&c=1]]
+
[[File:UMLS.png| The various subdomains integrated in the UMLS [1] |800px|center |link=https://www.researchgate.net/profile/Olivier-Bodenreider/publication/8954995/figure/fig1/AS:280280456810498@1443835476088/The-various-subdomains-integrated-in-the-UMLS.png]]
  
 
The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.  
 
The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.  

Revision as of 17:12, 9 September 2021

Motivation and Background

The Unified Medical Language System (UMLS) 1 is a rich repository of biomedical vocabularies developed by the US National Library of Medicine. It is an effort to overcome challenges to the effective retrieval of machine-readable information. One of which is the variety of ways the same concepts are expressed by different terminologies or interchangeably known as source vocabularies. For example, the concept of "Addison’s Disease" is expressed as "Primary hypoadrenalism" in the Medical Dictionary for Regulatory Activities (MedDRA) and as "Primary adrenocortical insufficiency" in the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10). The lack of integration between these synonymous terms often leads to poor interoperability between information systems (i.e. how does one map a concept from one terminology to another) and confusion among health professionals. Hence, the UMLS aims to integrate and provide cross-walk among various terminologies as well as facilitate the creation of more effective and interoperable biomedical information systems and services, including electronic health records. Till date, it is increasingly being used in areas such as

  • linking health information, medical terms, drug names, and billing codes across different computer systems
  • public health statistics reporting
  • search engine retrieval
  • terminology research
  • data mining

The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.

With over 200 source vocabularies, the construction and maintenance process, however, is costly, time-consuming, and error-prone as it primarily relies on

  • lexical and semantic processing for suggesting groupings of synonymous terms
  • the expertise of UMLS editors for curating these synonymy predictions

Opportunities

The current version of the Metathesaurus contains approximately 4.44 million concepts and 16.1 million unique concept names from 218 source vocabularies. Such availability (size and diversity) of human-curated knowledge, coupled with the advent of statistical and symbolic AI research 2 open up opportunities to improve the UMLS Metathesaurus construction and maintenance process. Specifically by developing novel approaches to aid and complement the efforts of the UMLS human editors in the insertion and updates of new biomedical vocabularies in the existing UMLS Metathesaurus for future releases.

Preliminary Work

Learn More about the UMLS Metathesaurus

References

  1. Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl_1), D267-D270.
  2. Sheth, A., & Thirunarayan, K. (2021). The Duality of Data and Knowledge Across the Three Waves of AI. arXiv preprint arXiv:2103.13520.