DAO is the acronym for Drug Abuse Ontology.The PREDOSE research team at Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) has developed preliminary techniques that automatically extract semantic information from Web-based data. Such includes entities, generic sentiment expressions, relationships and triples. To perform entity identification, the research team relies on a combination of lexical and semantics-based techniques, based on a manually curated Drug Abuse Ontology (DAO) - pronounced dow), which is the first ontology for prescription drug abuse.
OntoGraph of DAO
Hierarchical structure of DAO
DAO schema in the PREDOSE Research Plan
Method: Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantics-based techniques to extract entities and relationships from the scraped content and a top-down approach for triple extraction that uses patterns expressed in the DAO.
Automatic Qualitative Coding
This is the most challenging aspect of PREDOSE. The aim is to use various information extraction techniques to extraction semantic information considered semantically equivalent to qualitative codes, from web forums. Drug Abuse Ontology (DAO) manually created to model the prescription drug abuse domain, which is the first ontology on drug abuse in the literature. The DAO is used to facilitate search, and it also serves as the annotation scheme for the entity, relationship and triple extraction.
Information Extraction Layer
The information extraction layer of the PREDOSE platform (Fig. 4) utilizes the DAO to extract entities, relationships, and triples. The Automatic Qualitative Coding Module of PREDOSE, consists of the following five components: (1) the Drug Abuse Ontology; (2) an entity identification component; (3) a relationship identification component (4) a triple extraction component and (5) a sentiment extraction component.
The Drug Abuse Ontology (DAO) is a formal representation of concepts and relationships between them for the prescription drug abuse domain. Fig. 4 (left) shows that the DAO consists of a: (1) schema and (2) an instance base of assertions. The schema contains classes, type definitions and relationships permissible between them, defined manually during the ontology creation process. Such definitions serve as the basis for the annotation scheme in PREDOSE and include both hierarchical and associative relationships. A hierarchical relationship (property) between two classes, expresses membership between them. For example, Drugs occur in different Classes such as Cannabinoids, Buprenorphine, Opioids, Sedatives, and Stimulants. A Cannabinoid is a type of Drug, and the isA hierarchical property expresses the relationship between them. Associative relationships are non-hierarchical relationships between classes. For instance, the statement that ‘‘<Suboxone_Injection CAUSES bad headache>’’ expresses a causality association between a Drug and a Side Effect and does not imply membership of one concept to the other. The current DAO contains 43 classes and 20 properties. Although it is a relatively shallow representation, the DAO is very precise, given its creation by domain experts Center for Interventions, Treatment, and Addictions Research (CITAR) team. The DAO is also enriched with links to concepts in external ontologies, through a very careful manually supervised process. Among the 43 DAO classes, 11 classes have been mapped to URIs in DrugBank,3 Freebase,4 DBpedia5 and the Cyc6 ontologies, using the sameAs property. The DAO also contains 16 object properties, which are properties that associate class instances amongst each another. In addition to object properties, the DAO models two data type properties (has_value and has_slang_term). Data type properties associate a concept with a literal value. For example, the property has_slang_term associates a drug with a slang term. The ability to accurately and formally represent slang term-to-drug associations is important for two reasons. The first is that accurate slang term mappings will positively impact search and retrieval of relevant documents when performing the content analysis. Search methods devoid of such domain knowledge are at a disadvantage and may be unable to retrieve relevant documents containing only slang term mentions but no standard references to a given drug.
Also, various slang utilized to drug mapping sources, including numerous online dictionaries and resources, such as DrugSlang,7 NIDA, NDCP8 and www.erowid.com. Altogether, 307 slang terms were collected, curated and added to the DAO. A total of 193 concepts contained slang term mappings. The DAO is, therefore, the result of a joint manual effort, between domain experts at CITAR and computer scientists at Kno.e.sis. It was created and edited using the Protégé Ontology Editor.