Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

From Knoesis wiki
Jump to: navigation, search

Background and Motivation

The Unified Medical Language System (UMLS) 1 is a rich repository of biomedical vocabularies developed by the US National Library of Medicine. It is an effort to overcome challenges to the effective retrieval of machine-readable information. One of which is the variety of ways the same concepts are expressed by different terminologies or interchangeably known as source vocabularies. For example, the concept of "Addison’s Disease" is expressed as "Primary hypoadrenalism" in the Medical Dictionary for Regulatory Activities (MedDRA) and as "Primary adrenocortical insufficiency" in the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10). The lack of integration between these synonymous terms often leads to poor interoperability between information systems (i.e. how does one map a concept from one terminology to another) and confusion among health professionals. Hence, the UMLS aims to integrate and provide cross-walk among various terminologies as well as facilitate the creation of more effective and interoperable biomedical information systems and services, including electronic health records. Till date, it is increasingly being used in areas such as

  • linking health information, medical terms, drug names, and billing codes across different computer systems
  • public health statistics reporting
  • search engine retrieval
  • terminology research
  • data mining

The UMLS Metathesaurus terminology integration system is maintained, revised, and updated twice every year.

Given the current size of the Metathesaurus with 16.1 million atoms from 220 source vocabularies grouped into 4.44 million concepts, the construction and maintenance process, however, is costly, time-consuming, and error-prone as it primarily relies on

  • lexical and semantic processing for suggesting groupings of synonymous terms
  • the expertise of UMLS editors for curating these synonymy predictions

The enormous knowledge accumulated over 30 years of manual curation coupled with the advent of statistical and symbolic AI research 2 open up opportunities to improve the UMLS Metathesaurus construction and maintenance process. Specifically by developing novel approaches to aid and complement the efforts of the UMLS human editors in the insertion and updates of new biomedical vocabularies in the existing UMLS Metathesaurus for future releases.

Learn More about the UMLS Metathesaurus

Biomedical Vocabulary Alignment at Scale

Initial efforts include defining and addressing terminology integration at the full scale and diversity of the UMLS Metathesaurus using a supervised learning-based approach, specifically in

  • assessing the feasibility of using deep learning (DL) techniques for terminology integration at scale in the UMLS Metathesaurus and
  • developing a scalable supervised learning approach to improve synonymy predictions compared to the current lexical and semantic processing in the Metathesaurus



This research was supported in part by the Intramural Research Program of the National Library of Medicine (NLM), National Institutes of Health (NIH). This program is administered by the Oak Ridge Institute for Science and Education through an inter-agency agreement between the U.S. Department of Energy and the National Library of Medicine.


  1. Nguyen, V., & Bodenreider, O. (2021) Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus. In International Medical Informatics Association (MedInfo) 2021.
  2. Nguyen, V., Yip, H. Y., & Bodenreider, O. (2021, April). Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus. In Proceedings of the Web Conference (WWW) 2021 (pp. 2672-2683).
  3. Tran, T. T., Nghiem, S. V., Le, V. T., Quan, T. T., Nguyen, V., Yip, H. Y., & Bodenreider, O. (2020, November). Siamese KG-LSTM: A deep learning model for enriching UMLS Metathesaurus synonymy. In 2020 12th International Conference on Knowledge and Systems Engineering (KSE) (pp. 281-286). IEEE.
  4. Yip, H. Y., Nguyen, V., & Bodenreider, O. (2019). Construction of UMLS Metathesaurus with Knowledge-Infused Deep Learning. In BlockSW/CKG@ ISWC.


  1. Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl_1), D267-D270.
  2. Sheth, A., & Thirunarayan, K. (2021). The Duality of Data and Knowledge Across the Three Waves of AI. arXiv preprint arXiv:2103.13520.