Revision as of 04:29, 6 December 2011

PREDOSE is the name of the NIH-funded PREscription Drug abuse Online-Surveillance and Epidemiology project (July 2011 - July 2013), which is an inter-disciplinary collaborative project between the Ohio Center for Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. The overall aim of PREDOSE is to develop automated techniques for web forum data analysis related to the illicit use of pharmaceutical opioids.

Overview

The non-medical use of pharmaceutical opioids has been identified as one of the fastest growing forms of drug abuse in the U.S. The White House Office of National Drug Control Policy (ONDCP) has recently launched the Epidemic: Responding to America’s Prescription Drug Abuse Crisis initiative to curb prescription drug abuse, through education and drug monitoring programs among other approaches. Such action has been prompted by the fact that, recent research has identified pharmaceutical opioid abuse as one of the fastest growing form on drug abuse in the U.S. This increase has two effects, 1) significant increases in the illicit use of pharmaceutical opioids have expanded the pathways to heroin addiction and 2) resulted in escalating rates of accidental overdose deaths. However, to design effective and responsive prevention and policy measures, public health professionals require timely and reliable information on new and emerging drug trends. Although existing epidemiological data systems provide critically important information about drug abuse trends, they are often time-lagged. There is therefore a need for epidemiological sources that could complement existing drug trend monitoring systems and enhance their capacity for early identification of new and emerging trends. The World Wide Web (Web) has been identified as one of the leading data sources for detecting patterns and changes in the non-medical use of pharmaceutical and other illicit drugs. Many Web 2.0 empowered social platforms, including Web forums, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. PREDOSE aims to leverage web forum data to provide timely and emerging information on the non-medical use of pharmaceutical opioids.

The project has two(2) specific aims:

Goals

To determine user knowledge, attitudes and behavior related to the non-medical use of pharmaceutical opioids (namely buprenorphine) as discussed on Web-based forums
To determine spatio-temporal trends and patterns in pharmaceutical opioid abuse as discussed on Web-based forums

Project Team

Principal Investigators: Raminta Daniulaityte, Amit P. Sheth
Co-Investigators: Robert Carlson, Russel Falck
Graduate Students: Delroy Cameron, Lu Chen
Past Members: Kaustav Saha (Undergraduate Summer Intern 2009), Sujan Udayanga

Project Overview

Problem: Historically, qualitative research in drug abuse interventions programs has been characterized by manual data collection, initiated by interactive interview sessions with individual(s) or groups of individual addicts. The transcribed interviews (audio-to-text) obtained from this process are typically annotated by researchers with themes that emerged from the interview sessions. This process is called qualitative coding. Various tools such as NVivo, have been developed to facilitate this annotation process. Such tools commonly provide researchers with additional functionality, including search, retrieval and various levels of data analysis. However, the intensive manual effort required to make the interactive approach scalable is overwhelming. Furthermore, to effectively process the large volume of heterogeneous Web-based data, the field requires a highly automated way of accessing and processing Web data.

Proposed Solution: Researchers at the Kno.e.sis Center at Wright State University have successfully applied Semantic Web, Machine Learning and Natural Language Processing techniques to automatically extract knowledge from structured biomedical text. Substantial progress has also been made in using these (and other techniques) to understand the content and identify social perceptions of informal text on MySpace, Facebook, and Twitter data, through metadata extraction and spatio-temporal and thematic analysis (i.e., semantic analysis). These cutting-edge information processing techniques, with appropriate adaptations can now be exploited to fit the needs of public health and drug abuse research on conversational and informal text, such as those occurring in web forums.

Research Plan

The overall research plan has three(3) distinct stages:

Data Collection: is an intended alternative to manually conducted interviews. It operates under the assumption that similar information such as those gathered from interview sessions can be found in online forums. Therefore, data crawling software can be used to collect such data from web sources, thereby alleviating the laborious task of relying solely on interviews as the source of qualitative data.
Automatic Qualitative Coding: is the process of automating human generated qualitative codes, mainly through entity identification, relationship identification and complete triple extraction. This process aims at capturing the semantics of information expressed in the web forums with sufficient accuracy to enable subsequent analysis. The complete range of techniques for triple extraction include rule-based, pattern-based, statistical probabilistic and semantics-based analysis, all of which will play a critical role in this phase.
Data Analysis & Interpretation: is the final stage the project. The resulting RDF data (i.e. Drug Abuse Ontology - DAO) collected from phase 2, will be analyzed using existing semantic web tools at Kno.e.sis or new tools to be developed where appropriate. Tasks such as search, automatic summarization, reasoning and discovery are anticipated outcomes from this phase.

Fig1: Research Plan

The overall architecture of PREDOSE contains various sub-components. We discuss each of these in further detail below:

Stage 1: Data Collection

Web Site Selection: Web forums selected for the study are chosen based on the following criteria 1) they allow free discussion of psychoactive drug use; 2) contain information on illicit pharmaceutical drug use, and 3) are publicly accessible. Additionally, since it is important that this study collects relevant and timely information, such forums are also expected to be very active both in terms of number of users and topics of discussion.
Web Crawling: Various popular HTML parsers (e.g. Nutch, Jericho HTML Parser etc) exist for parsing web data. Data crawling periodically is necessary to update our databases with the most recent data published by the selected sources. Standardized web forum software somewhat alleviate the traditional problems involved with mining web data. The use of such software enable exploitation of the structure of web forum site by our custom crawlers.
Data Cleaning: One of the most challenging problems in dealing with web data is decoding special HTML characters to obtain ASCII text and separating special characters from standard text.
Location Resolution: Collection location data is important for spatio-temporal-thematic analysis. It would not be surprising that drug abuse practices across continent with regard to some specific drugs (e.g. heroin) will vary vastly. The most anticipated variations are likely in drug mixtures. For example, it may be popular culture to use heroin+cocaine in one region, while this practice is entirely uncommon in another.
Informal Text Database: It is necessary to collect and store a wide selection on data for this study. Some database tables include, users, posts, source and location (city, state, country, continent, zip).

Stage 2: Automatic Qualitative Coding

This is the most challenging aspect of this project. The aim is to use various information extraction techniques to extraction triples from web forum data. Such extraction is to be undertaken in three steps:

Entity Identification: The most challenging aspect of entity identification from web forum data is the informal nature of the text. Web forum data is characterized by a proliferation of slang terms instead of standard references to known drugs. Fortunately, slang term to known drug mappings are available online through various source, such as (NIDA, NDCP, Erowid, Urban Dictionary etc). We exploit these sources as a starting point for recognizing slang terms that reference known drugs. However, these mappings create the unfortunate side effect of ambiguity. "Oxy" can refer to Oxycontin, Generic Oxycontin, Oxycontin OP or Oxycontin OC. Hence, some techniques for slang term disambiguation become necessary. We have so far taken a probabilistic approach to entity disambiguation, since the surround terms to an ambiguous slang term are also slang and therefore do not help semantics-based approach that leverage the ontology schema.
Relationship Extraction: We anticipate that the success of our entity extraction along with Drug Abuse Ontology schema will directly impact the relationship extraction. However alternative relationship extraction have been covered elsewhere and will be adapted where appropriate.
Triple Extraction: Previous work in the lab have successfully implemented rule-based triple extraction (Ramakrishnan C, Mendes P. N. etc) on structured biomedical literature. In other work, (Thomas C, Mehra P, etc) have implemented a statistical/probabilistic approach to triple extraction also on structured text. Such techniques are not likely apply to informal web forum text. Hence, one approach is to translate our informal text into structured text, once entities and relationships have been identified. Alternatively, standard-alone pattern-based, probabilistic and semantics-based techniques can be used to complete triple extraction based on the effectiveness of the entity and relationship extraction.
Drug Abuse Ontology (DAO): The final output of the triple extraction is population of the Drug Abuse Ontology instance base. This, together with the DAO schema, we intend to maintain as a dynamic ontology created from user-generated content (UGC).

Stage 3: Data Analysis & Interpretation

Semantic Web Tools: Many tools for data analysis exist at Kno.e.sis. Some of these include, 1) Twitris for spatio-temporal-thematic analysis 2) Cuebee for automatic complex query creation over RDF data and 3) Scooner for guided navigation of documents annotated with semantic metadata (entities or triples). Once the DAO has been created, the data can be easily infused into any of these tools to support analysis. Alternatively, new tools can be created on demand.
Spatio-Temporal-Thematic Analysis: Discussion on the integration of web forum data into Twitris has already begun. Owing to the use of the slang term dictionary, qualitative researchers will be able to observe posts contains easily identifiable and non-ambiguous references to known drugs in various locations.

Funding

This project is sponsored by the National Institutes of Health (NIH) Grant No. R21 DA030571-01A1 awarded to the Ohio Center for Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Treatment, Interventions and Addictions Research (CITAR) titled “A Study of Social Web Data on Buprenorphine Abuse using Semantic Web Technology.” Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the National Institutes of Health.

Contact: Delroy Cameron

Revision as of 20:06, 28 November 2011 (view source) W007dhc (Talk \| contribs) ← Older edit		Revision as of 04:29, 6 December 2011 (view source) W007dhc (Talk \| contribs) (→‎Overview) Newer edit →
Line 2:		Line 2:

	=Overview=		=Overview=
−	The non-medical use of pharmaceutical opioids has been identified as one of the fastest growing forms of drug abuse in the U.S. The [http://en.wikipedia.org/wiki/Office_of_National_Drug_Control_Policy White House Office of National Drug Control Policy (ONDCP)] has recently launched the [http://www.healthnews.com/en/news/US-Targeting-Prescription-Drug-Abuse/0DFqFbmBD1ref$CoJ1D5XR/ ''Epidemic: Responding to America’s Prescription Drug Abuse Crisis''] initiative to curb prescription drug abuse, through education and drug monitoring programs among other approaches. Such action has been prompted by the fact that, significant increases in the illicit use of pharmaceutical opioids have expanded the pathways to heroin addiction and resulted in escalating rates of accidental overdose deaths. However, to design effective and responsive prevention and policy measures, public health professionals require timely and reliable information on new and emerging drug trends. Although existing epidemiological data systems provide critically important information about drug abuse trends, they are often time-lagged. There is therefore a need for epidemiological sources that could complement existing drug trend monitoring systems and enhance their capacity for early identification of new and emerging trends. The World Wide Web (Web) has been identified as one of the leading data sources for detecting patterns and changes in the non-medical use of pharmaceutical and other illicit drugs. Many Web 2.0 empowered social platforms, including Web forums, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. PREDOSE aims to leverage web forum data to provide timely and emerging information on the non-medical use of pharmaceutical opioids.	+	The non-medical use of pharmaceutical opioids has been identified as one of the fastest growing forms of drug abuse in the U.S. The [http://en.wikipedia.org/wiki/Office_of_National_Drug_Control_Policy White House Office of National Drug Control Policy (ONDCP)] has recently launched the [http://www.healthnews.com/en/news/US-Targeting-Prescription-Drug-Abuse/0DFqFbmBD1ref$CoJ1D5XR/ ''Epidemic: Responding to America’s Prescription Drug Abuse Crisis''] initiative to curb prescription drug abuse, through education and drug monitoring programs among other approaches. Such action has been prompted by the fact that, recent research has identified pharmaceutical opioid abuse as one of the fastest growing form on drug abuse in the U.S. This increase has two effects, 1) significant increases in the illicit use of pharmaceutical opioids have expanded the pathways to heroin addiction and 2) resulted in escalating rates of accidental overdose deaths. However, to design effective and responsive prevention and policy measures, public health professionals require timely and reliable information on new and emerging drug trends. Although existing epidemiological data systems provide critically important information about drug abuse trends, they are often time-lagged. There is therefore a need for epidemiological sources that could complement existing drug trend monitoring systems and enhance their capacity for early identification of new and emerging trends. The World Wide Web (Web) has been identified as one of the leading data sources for detecting patterns and changes in the non-medical use of pharmaceutical and other illicit drugs. Many Web 2.0 empowered social platforms, including Web forums, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. PREDOSE aims to leverage web forum data to provide timely and emerging information on the non-medical use of pharmaceutical opioids.

Difference between revisions of "PREDOSE"

Revision as of 04:29, 6 December 2011

Contents

Overview

Stage 1: Data Collection

Stage 2: Automatic Qualitative Coding

Stage 3: Data Analysis & Interpretation

Related

Funding

Navigation menu

Views

Personal tools

Navigation

Homepage

Search

Tools