Difference between revisions of "PREDOSE"

From Knoesis wiki
Jump to: navigation, search
(Complete outline created)
Line 36: Line 36:
 
   |content= The overall research plan has three(3) distinct stages: the first stage is the '''Data Collection''' stage, which is an intended alternative to manually conducted interviews. It operates under the assumption that information gathered from interview sessions are expressed in online forums and therefore, data crawling software can be used to collect data from web sources instead of laborious interviews as the means of obtaining qualitative data. The second stage is the process of '''Automatic Qualitative Coding'''. Through entity identification, relationship identification and complete triple extraction, this process aims at capturing the semantics of information expressed in the web forum data, with acceptable levels of precision and recall. The complete range of techniques, including pattern-based, statistical probabilistic and semantics-based analysis will play a critical role in this phase. The final stage is '''Data Analysis & Interpretation''' of the RDF data (i.e. Drug Abuse Ontology - DOA) collected from phase 2 using existing semantic web tools at Kno.e.sis or new tools to be developed where appropriate.
 
   |content= The overall research plan has three(3) distinct stages: the first stage is the '''Data Collection''' stage, which is an intended alternative to manually conducted interviews. It operates under the assumption that information gathered from interview sessions are expressed in online forums and therefore, data crawling software can be used to collect data from web sources instead of laborious interviews as the means of obtaining qualitative data. The second stage is the process of '''Automatic Qualitative Coding'''. Through entity identification, relationship identification and complete triple extraction, this process aims at capturing the semantics of information expressed in the web forum data, with acceptable levels of precision and recall. The complete range of techniques, including pattern-based, statistical probabilistic and semantics-based analysis will play a critical role in this phase. The final stage is '''Data Analysis & Interpretation''' of the RDF data (i.e. Drug Abuse Ontology - DOA) collected from phase 2 using existing semantic web tools at Kno.e.sis or new tools to be developed where appropriate.
 
}}
 
}}
====Stage 1: Data Collection====
 
  
 +
====Stage 1: Data Collection====
 +
* Web Site Selection
 +
* Web Crawler Technology
 +
* Data Cleaning
 +
* Informal Text Database
 
====Stage 2: Automatic Qualitative Coding====
 
====Stage 2: Automatic Qualitative Coding====
 
+
# Information Extraction
 +
#* Entity Identification
 +
#* Relationship Extraction
 +
#* Triple Extraction
 +
# Drug Abuse Ontology
 
[[Image:Citar-research-plan-071811.png | center | 600px | thumb | Fig1: Research Plan]]
 
[[Image:Citar-research-plan-071811.png | center | 600px | thumb | Fig1: Research Plan]]
  
 
====Stage 3: Data Analysis & Interpretation====
 
====Stage 3: Data Analysis & Interpretation====
 +
* Semantic Web Tools
 +
* Spatio-Temporal-Thematic Analysis
 +
 
<!--
 
<!--
 
{{block
 
{{block

Revision as of 03:15, 19 July 2011

Introduction

The non-medical use of pharmaceutical opioids has been identified as one of the fastest growing forms of drug abuse in the U.S. Furthermore, significant increases in the illicit use of pharmaceutical opioids have expanded the pathways to heroin addiction and resulted in escalating rates of accidental overdose deaths. To design effective and responsive prevention and policy measures, public health professionals require timely and reliable information on new and emerging drug trends. Although existing epidemiological data systems provide critically important information about drug abuse trends, they are often time-lagged. There is therefore a need for epidemiological sources that could complement existing drug trend monitoring systems and enhance their capacity for early identification of new and emerging trends. The World Wide Web (Web) has been identified as one of the leading data sources for detecting patterns and changes in the non-medical use of pharmaceutical and other illicit drugs. Many Web 2.0 empowered social platforms, including Web forums, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs.


This project aims to address this critical need for relevant and timely information by pursuing two(2) specific goals:

Goals
  1. To determine user knowledge attitudes and behavior related to the non-medical use of pharmaceutical opioids (namely buprenorphine) as discussed on Web-based forums
  2. To determine spatio-temporal trends and patterns in pharmaceutical opioid abuse as discussed on Web-based forums
Project Team

Principal Investigators: Raminta Daniulaityte, Amit P. Sheth
Co-Investigators: Robert Carlson, Russel Falck
Graduate Students: Delroy Cameron, Sujan Udayanga

Project Overview

Problem: Historically, qualitative research has been characterized by manual data collection, initiated by interactive interview sessions with individual or a group of individual addicts. The audio-to-text transcribed interviews obtained from this process are then typically annotated by researchers/experts with themes or topics that surfaced during interview sessions. This process is called qualitative coding. Various tools, such as ... have been developed to facilitate this annotation process, and provide additional service such as search, retrieval and data analysis. However, the intensive manual effort required to make the interactive approach scalable is enormous. Furthermore, to effectively process the large volume and complexity of the Web-based data, the field certainly needs a highly automated way of accessing and processing Web data.

Proposed Solution: Researchers at the Kno.e.sis Center at Wright State University have successfully applied Semantic Web, Machine Learning and Natural Language Processing techniques to automatically extract knowledge from biomedical text. Substantial progress has also been made in using these and other techniques to understand the content and identify social perceptions through metadata extraction and spatio-temporal and thematic analysis (broadly termed semantic analysis) of informal text on MySpace, Facebook, and Twitter. These cutting-edge information processing techniques, with appropriate adaptations can now be exploited to fit the needs of public health and drug abuse research.

Research Plan

The overall research plan has three(3) distinct stages: the first stage is the Data Collection stage, which is an intended alternative to manually conducted interviews. It operates under the assumption that information gathered from interview sessions are expressed in online forums and therefore, data crawling software can be used to collect data from web sources instead of laborious interviews as the means of obtaining qualitative data. The second stage is the process of Automatic Qualitative Coding. Through entity identification, relationship identification and complete triple extraction, this process aims at capturing the semantics of information expressed in the web forum data, with acceptable levels of precision and recall. The complete range of techniques, including pattern-based, statistical probabilistic and semantics-based analysis will play a critical role in this phase. The final stage is Data Analysis & Interpretation of the RDF data (i.e. Drug Abuse Ontology - DOA) collected from phase 2 using existing semantic web tools at Kno.e.sis or new tools to be developed where appropriate.

Stage 1: Data Collection

  • Web Site Selection
  • Web Crawler Technology
  • Data Cleaning
  • Informal Text Database

Stage 2: Automatic Qualitative Coding

  1. Information Extraction
    • Entity Identification
    • Relationship Extraction
    • Triple Extraction
  2. Drug Abuse Ontology
Fig1: Research Plan

Stage 3: Data Analysis & Interpretation

  • Semantic Web Tools
  • Spatio-Temporal-Thematic Analysis


Funding

This project is sponsored by the National Institutes of Health (NIH) Grant Award No. R21 DA030571-01A1 to the Ohio Center for Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Treatment, Interventions and Addictions Research (CITAR) titled “A Study of Social Web Data on Buprenorphine Abuse using Semantic Web Technology.” Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the National Institutes of Health.

Contact: Delroy Cameron