Knowledge will Propel Machine Understanding of Content

From Knoesis wiki
Revision as of 21:07, 8 November 2017 by Sanjaya (Talk | contribs)

Jump to: navigation, search


Abstract

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this article, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.


Introduction

Recent success in the area of Machine Learning (ML) for Natural Language Processing (NLP) has been largely credited to the availability of enormous training datasets and computing power to train complex computational models~\cite{halevy2009unreasonable}. Complex NLP tasks such as statistical machine translation and speech recognition have greatly benefited from the Web-scale unlabeled data that is freely available for consumption by learning systems such as deep neural nets. However, many traditional research problems related to NLP, such as part-of-speech tagging and named entity recognition (NER), require labeled or human-annotated data, but the creation of such datasets is expensive in terms of the human effort required. In spite of early assertion of the unreasonable effectiveness of data (i.e., data alone is sufficient), there is an increasing recognition for utilizing knowledge to solve complex AI problems. Even though knowledge base creation and curation is non-trivial, it can significantly improve result quality, reliability, and coverage. A number of AI experts, including Yoav Shoham~\cite{shoham2015knowledge}, Oren Etzioni, and Pedro Domingos~\cite{domingos2012few,domingos2015master}, have talked about this in recent years. In fact, codification and exploitation of declarative knowledge can be both feasible and beneficial in situations where there is not enough data or adequate methodology to learn the nuances associated with the concepts and their relationships.