# Difference between revisions of "Knowledge will Propel Machine Understanding of Content"

## Abstract

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this article, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.

## Introduction

Recent success in the area of Machine Learning (ML) for Natural Language Processing (NLP) has been largely credited to the availability of enormous training datasets and computing power to train complex computational models~\cite{halevy2009unreasonable}. Complex NLP tasks such as statistical machine translation and speech recognition have greatly benefited from the Web-scale unlabeled data that is freely available for consumption by learning systems such as deep neural nets. However, many traditional research problems related to NLP, such as part-of-speech tagging and named entity recognition (NER), require labeled or human-annotated data, but the creation of such datasets is expensive in terms of the human effort required. In spite of early assertion of the unreasonable effectiveness of data (i.e., data alone is sufficient), there is an increasing recognition for utilizing knowledge to solve complex AI problems. Even though knowledge base creation and curation is non-trivial, it can significantly improve result quality, reliability, and coverage. A number of AI experts, including Yoav Shoham~\cite{shoham2015knowledge}, Oren Etzioni, and Pedro Domingos~\cite{domingos2012few,domingos2015master}, have talked about this in recent years. In fact, codification and exploitation of declarative knowledge can be both feasible and beneficial in situations where there is not enough data or adequate methodology to learn the nuances associated with the concepts and their relationships.

The value of domain/world knowledge in solving complex problems was recognized much earlier[43]. These efforts were centered around language understanding. Hence, the major focus was towards representing linguistic knowledge. The most popular artifacts of these efforts are FrameNet[29] and WordNet[22], which were developed by realizing the ideas of frame semantics[11] and lexical-semantic relations[6], respectively. Both these resources have been used extensively by the NLP research community to understand the semantics of natural language documents.

The building and utilization of the knowledge bases took a major leap with the advent of the Semantic Web in the early 2000s. For example, it was the key to the first patent on Semantic Web and a commercial semantic search/browsing and personalization engine over 15 years ago[33], where knowledge in multiple domains complemented ML techniques for information extraction (NER, semantic annotation) and building intelligent applications<ref>http://j.mp/15yrsSS</ref>\footnote{\url{http://j.mp/15yrsSS}}. Major efforts in the Semantic Web community have produced large, cross-domain (e.g., DBpedia, Yago, Freebase, Google Knowledge Graph) and domain specific (e.g., Gene Ontology, MusicBrainz, UMLS) knowledge bases in recent years which have served as the foundation for the intelligent applications discussed next.

The value of these knowledge bases has been demonstrated for determining semantic similarity [20,42], question answering [30], ontology alignment [14], and word sense disambiguation (WSD) [21], as well as major practical AI services, including Apple's Siri, Google's Semantic Search, and IBM's Watson. For example, Siri relies on knowledge extracted from reputed online resources to answer queries on restaurant searches, movie suggestions, nearby events, etc. In fact, question answering, which is the core competency of Siri, was built by partnering with Semantic Web and Semantic Search service providers who extensively utilize knowledge bases in their applications\footnote{\url{https://en.wikipedia.org/wiki/Siri}}. The Jeopardy version of IBM Watson uses semi-structured and structured knowledge bases such as DBpedia, Yago, and WordNet to strengthen the evidence and answer sources to fuel its DeepQA architecture[10]. A recent study [19] has shown that Google search results can be negatively affected when it does not have access to Wikipedia. Google Semantic Search is fueled by Google Knowledge Graph\footnote{\url{http://bit.ly/22xUjZ6}}, which is also used to enrich search results similar to what the Taalee/Semagix semantic search engine did 15 years ago\footnote{\url{https://goo.gl/A54hno}}[33,34].

While knowledge bases are used in an auxiliary manner in the above scenarios, we argue that they have a major role to play in understanding real-world data. Real-world data has a greater complexity that has yet to be fully appreciated and supported by automated systems. This complexity emerges from various dimensions. Human communication has added many constructs to language that help people better organize knowledge and communicate effectively and concisely. However, current information extraction solutions fall short in processing several implicit constructs and information that is readily accessible to humans. One source of such complexity is our ability to express ideas, facts, and opinions in an implicit manner. For example, the sentence \textit{The patient showed accumulation of fluid in his extremities, but respirations were unlabored and there were no use of accessory muscles} refers to the clinical conditions of shortness of breath and edema, which would be understood by a clinician. However, the sentence does not contain names of these clinical conditions -- rather it contains descriptions that imply the two conditions. Current literature on entity extraction has not paid much attention to implicit entities [28].

Another complexity in real-world scenarios and use cases is data heterogeneity due to their multimodal nature. There is an increasing availability of physical (including sensor/IoT), cyber, and social data that are related to events and experiences of human interest [31]. For example, in our personalized digital health application for managing asthma in children\footnote{\url{http://bit.ly/kAsthma}}, we use numeric data from sensors for measuring a patient's physiology (e.g., exhaled nitric oxide) and immediate surroundings (e.g., volatile organic compounds, particulate matter, temperature, humidity), collect data from the Web for the local area (e.g., air quality, pollen, weather), and extract textual data from social media (i.e., tweets and web forum data relevant to asthma) [1]. Each of these modalities provides complementary information that is helpful in evaluating a hypothesis provided by a clinician and also helps in disease management. We can also relate anomalies in the sensor readings (such as spirometer) to asthma symptoms and potential treatments (such as taking rescue medication). Thus, understanding a patient's health and well-being requires integrating and interpreting multimodal data and gleaning insights to provide reliable situational awareness and decisions. Knowledge bases play a critical role in establishing relationships between multiple data streams of diverse modalities, disease characteristics and treatments, and in transcending multiple abstraction levels [32]. For instance, we can relate the asthma severity level of a patient, measured exhaled nitric oxide, relevant environmental triggers, and prescribed asthma medications to one another to come up with personalized actionable insights and decisions.

Knowledge bases can come in handy when there is not enough hand-labaled data for supervised learning. For example, emoji sense disambiguation, which is the ability to identify the meaning of an emoji in the context of a message in a computational manner [40,41], is a problem that can be solved using supervised and knowledge-based approaches. However, there is no hand-labeled emoji sense dataset in existence that can be used to solve this problem using supervised learning algorithms. One reason for this could be that emoji have only recently become popular, despite having been first introduced in the late 1990s [40]. We have developed a comprehensive emoji sense knowledge base called EmojiNet [40,41] by automatically extracting emoji senses from open web resources and integrating them with BabelNet. Using EmojiNet as a sense inventory, we have demonstrated that the emoji sense disambiguation problem can be solved with carefully designed knowledge bases, obtaining promising results [41].

In this paper, we argue that careful exploitation of knowledge can greatly enhance the current ability of (big) data processing. At Kno.e.sis, we have dealt with several complex situations where:

1. Large quantities of hand-labeled data required for unsupervised (self-taught) techniques to work well is not available or the annotation effort is significant.
2. The text to be recognized is complex (i.e., beyond simple entity - person/location/organization), requiring novel techniques for dealing with complex/compound entities [27], implicit entities [25,26], and subjectivity (emotions, intention) [13,38].
3. Multimodal data -- numeric, textual and image, qualitative and quantitative, certain and uncertain -- are available naturally[1,2,4,39].

Our recent efforts have centered around exploiting different kinds of knowledge bases and using semantic techniques to complement and enhance ML, statistical techniques, and NLP. Our ideas are inspired by the human brain's ability to learn and generalize knowledge from a small amount of data (i.e., humans do not need to examine tens of thousands of cat faces to recognize the next unseen cat shown to them), analyze situations by simultaneously and synergistically exploiting multimodal data streams, and understand more complex and nuanced aspects of content, especially by knowing (through common-sense knowledge) semantics/identity preserving transformations.

## Challenges in creating and using knowledge bases

Last decade saw an increasing use of background knowledge for solving diverse problems. While applications such as searching, browsing, and question answering can use large, publically available knowledge bases in their current form, others like movie recommendation, biomedical knowledge discovery, and clinical data interpretation are challenged by the limitations discussed below.

Lack of organization of knowledge bases: Proper organization of knowledge bases has not kept pace with their rapid growth, both in terms of variety and size. Users find it increasingly difficult to find relevant knowledge bases or relevant portions of a large knowledge base for use in domain-specific applications (e.g., movie, clinical, biomedical). This highlights the need to identify and select relevant knowledge bases such as the linked open data cloud, and extract the relevant portion of the knowledge from broad coverage sources such as Wikipedia and DBpedia. We are working on automatically indexing the domains of the knowledge bases [17] and exploiting the semantics of the entities and their relationships to select relevant portions of a knowledge base [18].

Gaps in represented knowledge: The existing knowledge bases can be incomplete with respect to a task at hand. For example, applications such as computer assisted coding (CAC) and clinical document improvement (CDI) require comprehensive knowledge about a particular domain (e.g., cardiology, oncology)\footnote{\url{https://goo.gl/nXDY8x}}. We observe that although the existing medical knowledge bases (e.g., Unified Medical Language System (UMLS)) are rich in taxonomical relationships, they lack non-taxonomical relationships among clinical entities. We have developed data-driven algorithms that use real-world clinical data (such as EMRs) to discover missing relationships between clinical entities in existing knowledge base, and then get these validated by a domain-expert-in-the-loop[24]. Yet another challenge is creating personalized knowledge bases for specific tasks. For example, in[35], personal knowledge graphs are created based on the content consumed by a user, taking into account the dynamically changing vocabulary, and this is applied to improve subsequent filtering of relevant content.

Inefficient metadata representation and reasoning techniques: The scope of what is captured in the knowledge bases is rapidly expanding, and involves capturing more subtle aspects such as subjectivity (intention, emotions, sentiments), spatial and temporal information, and provenance. Traditional triple-based representation languages developed by Semantic Web community (e.g., RDF, OWL) are unsuitable for capturing such metadata due to their limited expressivity. For example, representation of spatio-temporal context or uncertainty associated with a triple is {\it ad hoc\/}, inefficient, and lacks semantic integration for formal reasoning. These limitations and requirements are well-recognized by the Semantic Web community, with some recent promising research to address them. For example, the singleton-property based representation [23] adds ability to make statements about a triple (i.e., to express the context of a triple) and probabilistic soft logic [15] adds ability to associate the probability value with a triple and reason over them. It will be really exciting to see applications exploiting such enhanced hybrid knowledge representation models that perform human-like' reasoning on them.

Next, we discuss several applications that utilize knowledge bases and multimodal data to circumvent or overcoming some of the aforementioned challenges due to insufficient manually-created knowledge.

Application 1: Emoji sense disambiguation With the rise of social media, emoji have become extremely popular in online communication. People are using emoji as a new language on social media to add color and whimsiness to their messages. Without rigid semantics attached to them, emoji symbols take on different meanings based on the context of a message. This has resulted in ambiguity in emoji use (see Figure~\ref{emojiensesexamples}). Only recently have there been efforts to mimic NLP techniques used for machine translation, word sense disambiguation and search into the realm of emoji [41]. The ability to automatically process, derive meaning, and interpret text fused with emoji will be essential as society embraces emoji as a standard form of online communication. Having access to knowledge bases that are specifically designed to capture emoji meaning can play a vital role in representing, contextually disambiguating, and converting pictorial forms of emoji into text, thereby leveraging and generalizing NLP techniques for processing richer medium of communication.

As a step towards building machines that can understand emoji, we have developed EmojiNet[40,41], the first machine readable sense inventory for emoji. It links Unicode emoji representations to their English meanings extracted from the Web, enabling systems to link emoji with their context-specific meanings. EmojiNet is constructed by integrating multiple emoji resources with BabelNet, which is the most comprehensive multilingual sense inventory available to-date. For example, for the emoji face with tears of joy' \emoji{1F602}, EmojiNet lists 14 different senses, ranging from happy to sad. An application designed to disambiguate emoji senses can use the senses provided by EmojiNet to automatically learn message contexts where a particular emoji sense could appear. Emoji sense disambiguation could improve the research on sentiment and emotion analysis. For example, consider the emoji \emoji{1F602}, which can take the meanings \textit{happy} and \textit{sad} based on the context in which it has been used. Current sentiment analysis applications do not differentiate among these two meanings when they process \emoji{1F602}. However, finding the meanings of \emoji{1F602} by emoji sense disambiguation techniques [41] can improve sentiment prediction. Emoji similarity calculation is another task that could be benefited by knowledge bases and multi-modal data analysis. Similar to computing similarity between words, we can calculate the similarity between emoji characters. We have demonstrated how EmojiNet can be utilized to solve the problem of emoji similarity [42]. Specifically, we have shown that emoji similarity measures based on the rich emoji meanings available in EmojiNet can outperform conventional emoji similarity measures based on distributional semantic models and also helps to improve applications such as sentiment analysis[42].