Difference between revisions of "Knowledge-Aware-Search"
(→SystemT) |
|||
Line 1: | Line 1: | ||
This wiki contains supplementary materials for the research article currently under review entitled: '''Knowledge-Aware Search.''' This work was developed as part of the [http://wiki.knoesis.org/index.php/PREDOSE PREDOSE] project, which is an inter-disciplinary project between the [http://knoesis.org Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)] and the [http://www.med.wright.edu/citar/ Center for Interventions, Treatment and Addictions Research (CITAR)] at [http://wright.edu/ Wright State University]. PREDOSE is the acronym for '''PRE'''scription '''D'''rug abuse '''O'''nline '''S'''urveillance and '''E'''pidemiology. <br /> | This wiki contains supplementary materials for the research article currently under review entitled: '''Knowledge-Aware Search.''' This work was developed as part of the [http://wiki.knoesis.org/index.php/PREDOSE PREDOSE] project, which is an inter-disciplinary project between the [http://knoesis.org Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)] and the [http://www.med.wright.edu/citar/ Center for Interventions, Treatment and Addictions Research (CITAR)] at [http://wright.edu/ Wright State University]. PREDOSE is the acronym for '''PRE'''scription '''D'''rug abuse '''O'''nline '''S'''urveillance and '''E'''pidemiology. <br /> | ||
− | <font color="red">Note that for our Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System</font> | + | <font color="red">Note that for our Evaluation, all queries were run on THE SAME WEB FORUM SITE for Google, Hakia, DuckDuckGo and our System</font> |
=Overview= | =Overview= |
Revision as of 04:45, 23 May 2013
This wiki contains supplementary materials for the research article currently under review entitled: Knowledge-Aware Search. This work was developed as part of the PREDOSE project, which is an inter-disciplinary project between the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. PREDOSE is the acronym for PREscription Drug abuse Online Surveillance and Epidemiology.
Note that for our Evaluation, all queries were run on THE SAME WEB FORUM SITE for Google, Hakia, DuckDuckGo and our System
Contents
- 1 Overview
- 2 People
- 3 Framework
- 4 Web Application
- 5 Evaluation
- 6 Funding
Overview
While semantic search has become a viable alternative to classical keyword-based search, a review of existing semantic search techniques and semantic search engines (Hakia, DuckDuckGo) reveal a considerable misalignment between the mental model of a user's information need and the knowledge model developed to meet such needs. There is an assumption that assertions in ontologies provide sufficient coverage to appropriately interpret the user information need. Hence minimal (often inadequate) support is provided for interpreting additional elements (such as those that convey intensity, frequency, time intervals, etc), not necessarily modeled in ontologies. In reality, many complex search scenarios require a knowledge of such constructs and extend beyond the boundaries of ontologies altogether. In this work, we develop a context-free grammar that defines the query language interpretable by the our knowledge-aware system. In an evaluation against the popular search engine Google, a popular semantic search engine Hakia and a crowd-sourcing based search engine DuckDuckGo, our Knowledge-Aware Search system outperformed against the state of the art in retrieving relevant documents for two complex information needs.
People
Delroy Cameron
Nishita Jaykumar
Gaurish Anand
Krishnaprasad Thirunarayan
Gary A. Smith
Amit P. Sheth
Swapnil Soni
Kera Z. Watkins
Framework
Our Knowledge-Aware Search Framework consists of three components: 1) a module for Query Interpretation; 2) analytics for Document Annotation and 3) a Query Matcher.
Query Interpretation
Our module for query interpretation consists of 1) Knowledge Model and 2) The Context-Free Grammar
Knowledge Model
Context-Free Grammar and Query Language Specification
Note that for our Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System
LEGEND
RULES | |
ONTOLOGY | |
LEXICON | |
LEXICO-ONTOLOGY | |
EXAMPLES |
Ubiquitous Nonterminals
EQUIVALENCE
> | greater than | more than | above | in excess of | slightly above | little more | bit more | slightly more | high | higher | highest | higher than | |
< | less than | lower than | below | in lack of | slightly below | little less | bit less | slightly less | |
= | exactly | precisely | |
>= | greater than or equal to | more than | above | in excess of | slightly above | little more | bit more | slightly more | exactly | precisely | high | higher | highest | higher than | |
<= | less than or equal to | less than | lower than | below | in lack of | slightly below | little less | bit less | slightly less | exactly | precisely |
DEFINITIONS
NUMERIC_AMOUNT | -999.99 | ... | 0 | ... | 3-4 | ... | 24/7 | 999 | |
WORDED_AMOUNT | one | once | two | twice | three | thrice | four | five | six | seven | eight | nine | ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen | twenty | thirty | forty | fifty | sixty | seventy | eighty | ninty | hundred | |
NUMBER | 0 | 1 | ... | 100 | |
AMOUNT | NUMBER | WORDED_AMOUNT | |
RANGE | [0 - NUMBER] WORDS |
NS Nonterminals
Class Name | Class Source | Class Type |
INTERVAL | Alphabet | Compound |
FREQUENCY | Alphabet | Compound |
DOSAGE | Alphabet | Compound |
ENTITY | Ontology | Simple |
ROA (Route-Of-Administration) | Lexicon/Ontology | Simple |
DRUGFORM | Lexicon/Ontology | Simple |
SIDEEFFECT | Lexicon/Ontology | Simple |
EMOTION | Lexicon | Simple |
PRONOUN | Lexicon | Simple |
INTENSITY | Lexicon | Simple |
SENTIMENT | Lexicon | Simple |
Ontology Nonterminals
Buprenorphine | addnok | bup | bupe | bupes | bupey | buprel | buprenex | buprenorphine | buprenorphine analgesic | buprenorphine opioid dependence | buprenorphone | buprigesic | bups | butrans | film | films | morgesic | norspan | probuphine | saboxine | sobos | strip | strips | sub | subbies | subox | suboxene | suboxone | suboxone film | suboxone tablet | subs | subutex | tecs | temgesic | tex | tidigesic | xone | |
Heroin |
Alphabet Nonterminals
INTERVAL
PAST_DETERMINER | ago | prior | previous | since | before | last | past | ||
PRESENT_DETERMINER | now | about | around | several | couple | every | all | this | ||
FUTURE_DETERMINER | next | later | after | ||
PERIOD | PAST_DETERMINER | PRESENT_DETERMINER | FUTURE_DETERMINER | ||
TIME_INDICATOR | |||
HOUR | hour | hours | hr | hrs | ||
MINUTE | minute | minutes | min | mins | ||
SECOND | second | seconds | sec | secs | ||
DURATION_INDICATOR | |||
DECADE | decade | decades | ||
YEAR | year | years | yr | yrs | annum | ||
MONTH | month | months | mth| mths | mo | ||
WEEK | week | weeks| wk | wks | ||
DAY | day | days | night | nights | nite | nites | morning | mornings | mornin | evening | evenin | evenings | afternoon | noon | ||
DURATION_PERIOD | PAST_DETERMINER RANGE PERIOD | years ago, weeks prior | |
PRESENT_DETERMINER RANGE PERIOD | weeks now | ||
FUTURE_DETERMINER RANGE PERIOD | weeks later, days after | ||
PERIOD_DURATION | PAST_DETERMINER RANGE DURATION_INDICATOR | last year, previous day | |
PRESENT_DETERMINER RANGE DURATION_INDICATOR | about a year, around a month | ||
FUTURE_DETERMINER RANGE DURATION_INDICATOR | later years, next day | ||
TIME_PERIOD | TIME_INDICATOR RANGE PAST_DETERMINER | hours ago, minutes before | |
TIME_INDICATOR RANGE PRESENT_DETERMINER | hours now | ||
TIME_INDICATOR RANGE FUTURE_DETERMINER | hours later, minutes after | ||
PERIOD_TIME | PAST_DETERMINER RANGE TIME_INDICATOR | last hour | |
PRESENT_DETERMINER RANGE TIME_INDICATOR | several hours, couple of minutes | ||
FUTURE_DETERMINER RANGE TIME_INDICATOR | next hour | ||
AMOUNT_TIME_PERIOD | AMOUNT RANGE TIME_PAST_PERIOD | 5 minutes ago | |
AMOUNT RANGE TIME_PRESENT_PERIOD | 10 hours now | ||
AMOUNT RANGE TIME_FUTURE_PERIOD | 5 minutes later | ||
AMOUNT_TIME | AMOUNT RANGE TIME_INDICATOR | 15 seconds | |
PERIOD_AMOUNT_TIME | PAST_DETERMINER RANGE AMOUNT_TIME | last 2 hours, past 2 minutes | |
PRESENT_DETERMINER RANGE AMOUNT_TIME | around 2 hours | ||
FUTURE_DETERMINER RANGE AMOUNT_TIME | next 15 seconds, after 2 minutes | ||
AMOUNT_DURATION_PERIOD | AMOUNT RANGE DURATION_PAST_PERIOD | 5 years ago | |
AMOUNT RANGE DURATION_PRESENT_PERIOD | 5 years now | ||
AMOUNT RANGE DURATION_FUTURE_PERIOD | 5 years later, 9 months after | ||
AMOUNT_DURATION | AMOUNT RANGE DURATION_INDICATOR | 15 months | |
PERIOD_AMOUNT_DURATION | PAST_DETERMINER RANGE AMOUNT_DURATION | last 15 weeks, last 2 years | |
PRESENT_DETERMINER RANGE AMOUNT_DURATION | about 3 monts, around five years | ||
FUTURE_DETERMINER RANGE AMOUNT_DURATION | next 15 weeks, after 2 years | ||
INTERVAL | DURATION_PERIOD | PERIOD_DURATION | TIME_PERIOD | PERIOD_TIME | AMOUNT_TIME_PERIOD | AMOUNT_TIME | PERIOD_AMOUNT_TIME | AMOUNT_DURATION_PERIOD | AMOUNT_DURATION | PERIOD_AMOUNT_DURATION |
FREQUENCY
FREQUENCY_ITEM | hourly | daily | weekly | bi-weekly | biweekly | monthly | yearly | annually | |
FREQUENCY_INDICATOR | times | times a | times an | both times | |
PER_INDICATOR | per | / | FREQUENCY_INDICATOR | |
PER_SECOND | PER_INDICATOR RANGE SECOND | /second, times a second |
PER_MINUTE | PER_INDICATOR RANGE MINUTE | per minute, /minute |
PER_HOUR | PER_INDICATOR RANGE HOUR | hourly | per hour, times an hour |
PER_DAY | PER_INDICATOR RANGE DAY | daily | nightly | / day, per day |
PER_WEEK | PER_INDICATOR RANGE WEEK | weekly, bi-weekly | biweekly | per week, /week |
PER_MONTH | PER_INDICATOR RANGE MONTH | monthly | bi-monthly | bimonthly | /month, times a month |
PER_YEAR | PER_INDICATOR RANGE YEAR | yearly | annually | per year, /year |
PER_DECADE | PER_INDICATOR RANGE DECADE | times a year, per year |
PER_TIMEINDICATOR | PER_SECOND | PER_MINUTE | PER_HOUR | PER_DAY | PER_WEEK | PER_MONTH | PER_YEAR | PER_DECADE |
per min, per hour, /min, /hour |
PER_DURATION_INDICATOR | PER_INDICATOR RANGE DURATION_INDICATOR | per day, per week, /week, /month |
AMOUNT_PER_TIME_INDICATOR | AMOUNT RANGE PER_TIME_INDICATOR | 5 per min, per hour, 24 mg /min |
AMOUNT_FREQUENCY | AMOUNT RANGE FREQUENCY_INDICATOR | 5 times |
AMOUNT_FREQUENCY_DURATION | AMOUNT_FREQUENCY RANGE DURATION_INDICATOR | 5 times a day |
FREQUENCY_DURATION | FREQUENCY_INDICATOR RANGE DURATION_INDICATOR | times a day |
FREQUENCY_TIME | FREQUENCY_INDICATOR RANGE TIME_INDICATOR | times a hour |
PERIOD_FREQUENCY_DURATION | PERIOD_DETERMINER RANGE FREQUENCY_INDICATOR | several times a day |
PERIOD_FREQUENCY_TIME | PERIOD_DETERMINER RANGE FREQUENCY_TIME | several times a day |
AMOUNT_FREQUENCY_TIME | AMOUNT RANGE FREQUENCY_TIME | several times a day |
AMOUNT_PER_TIME | AMOUNT RANGE PER_TIME_INDICATOR | 5 per min, per hour, 24 mg /min |
AMOUNT_PER_DURATION | AMOUNT RANGE FREQUENCY_TIME | 5 per day, 10 per week |
FREQUENCY | PER_TIME_INDICATOR | PER_DURATION_INDICATOR | AMOUNT_FREQUENCY_DURATION | PERIOD_FREQUENCY_DURATION | PERIOD_FREQUENCY_TIME | AMOUNT_FREQUENCY_TIME | AMOUNT_PER_TIME | AMOUNT_PER_DURATION | FREQUENCY_ITEM |
DOSAGE
CUBIC_CENTIMETER | cc | ccs | cubic c | cubic cs | cubic centimeter | cubic centimeters | cubic centimetres | cubic centimetre | cubic centi-meter | cubic centi-meters | cubic centi-metre | cubic centi-metres | |
BAG | bag | bags | |
POUND | lb | lbs | pound | pounds | |
GRAM | g | gs | gm | gms | gram | grams | |
MILLIGRAM | mg | mgs | milligram | milligrams | milli-gram | milli-grams | |
MICROGRAM | ug | ugs | mcg | mcgs | microgram | micrograms | micro-gram | micro-grams | |
KILOGRAM | kg | kgs | kilogram | kilograms | kilo-gram | kilo-grams | |
LITER | litre | litres | liter | liters | |
MILLILITER | ml | mls | millilitre | millilitres | milliliter | milliliters | milli-litre | milli-litres | milli-liter | milli-liters | |
MICROLITER | mcl | mcls | microlitre | microlitres | microliter | microliters | micro-litre | micro-litres | micro-liter | micro-liters | |
OUNCE | oz | ozs | ounce | ounces | |
TABLET | tablet | tablets | tab | tabs | |
UNIT | CUBIC_CENTIMETER | BAG | POUND | GRAM | MILLIGRAM | MICROGRAM | KILOGRAM | LITER | MILLILITER | MICROLITER | OUNCE | TABLET | |
NUMERIC_AMOUNT_UNIT | NUMBER_AMOUNT | UNIT |
1-5 grams, 2 mcg |
WORDED_NUMERIC_AMOUNT_UNIT | WORD_AMOUNT | UNIT |
hundred milligrams, one ecstasy tablet |
DOSAGE | NUMERIC_AMOUNT_UNIT | WORDED_NUMERIC_AMOUNT_UNIT |
Lexicon Nonterminals
PRONOUN
DEMONSTRATIVE_PRONOUN | this | that | these | those | |
PERSONAL_PRONOUN | i | me | you | she | her | he | him | it | we | us | they | them | |
POSSESSIVE_PRONOUN | my | our | ours | your | yours | his | her | hers | its | their | theirs | mine | |
REFLEXIVE_PRONOUN | myself | ourselves | yourself | yourselves | himself | herself | itself | themselves | |
RELATIVE_PRONOUN | that | which | who | whom | whose | whichever | whoever | whomever | |
INDEFINITE_PRONOUN | anybody | anyone | anything | each | either | everybody | everyone | everything | neither | nobody | no one | nothing | one | somebody | someone | something | both | few | many | several | all | any | most | none | some | |
INTERROGATIVE_PRONOUN | what | who | which | whom | whose |
INTENSITY
LOW | low | very low | lower | lower than | lowest | small | very small | smaller | smaller than | smallest | less | less than | less is more | least | |
AVERAGE | average | ideal | |
HIGH | high | very high | higher | highest | large | very large | larger | largest | more | most | excess | excessive |
ROA (Route-of-Administration)
ENTERAL | ate | chewing | drink | eat | insufflate | plug | plugged | smoke | smoked | sniff | snort |
EPIDURAL | inject | injected | injection |
INTRAARTERIAL | IV | IVed | IV’ed | IVing | IV’ing | inject | injected | injection |
INTRACARDIAC | inject | injected | injection |
INTRACEREBRAL | inject | injected | injection |
INTRADERMAL | IV | IV’ed | IVing | inject | injected | injection | sniff | snort | snorting | bumping | railing | doozing |
INTRAMUSCULAR | inject | injected | injection | skin poppin |
INTRAVENOUS | IV | IVed | IV’ed | IVing | IV’ing | inject | injected | injection |
INHALATIONAL | smoke | smokes | smoked | smoking | sniff | sniffed | sniffing |snort | snorted | snorting | bumping | railing | doozing |
INTRAPERITONEAL | inject | injected | injection |
INTRATHECAL | inject | injected | injection |
INTRAOSSEOUS INFUSION | inject | injected | injection |
NASAL | sniff | snort | snorting | bumping | railing | doozing |
PARENTERAL | inject | injected | injection |
TRANSDERMAL | patch | patches |
TRANSMUCOSAL | snort | snorted | snorting | sniff | sniffed | sniffing | bumping | railing | doozing |
TOPICAL | patch | patches |
SUBCUTANEOUS | inject | injected | injection |
Lexico-ontology Nonterminals
Knowledge-Aware-Search-lexico-ontology
Template Pattern Productions
Note that for our Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-Productions
Document Annotation
AQL
Note that for our Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-AQL
Query Matcher
Web Application
Knowledge Aware Search Live Demo
Evaluation
Note that for our Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-Evaluation
Funding
This project is sponsored by the National Institutes of Health (NIH) Grant No. R21 DA030571-01A1 awarded to the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Treatment, Interventions and Addictions Research (CITAR) titled “A Study of Social Web Data on Buprenorphine Abuse using Semantic Web Technology.” Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the National Institutes of Health.
Contact: Delroy Cameron