Knowledge-Aware-Search
This wiki contains supplementary materials for the research article currently under review entitled: Knowledge-Aware Search. This work was developed as part of the PREDOSE project, which is an inter-disciplinary project between the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. PREDOSE is the acronym for PREscription Drug abuse Online Surveillance and Epidemiology.
Note that for our Knowledge-Aware-Search-Evaluation, all queries were run on THE SAME WEB FORUM for Google, Hakia, DuckDuckGo and our System
Contents
- 1 Overview
- 2 People
- 3 Framework
- 4 Demo & Live Web Application
- 5 Evaluation
- 6 Funding
Overview
While semantic search has become a viable alternative to classical keyword-based search, a review of existing semantic search techniques and semantic search engines (Hakia, DuckDuckGo) reveal a considerable misalignment between the mental model of a user's information need and the knowledge model developed to meet such needs. There is an assumption that assertions in ontologies provide sufficient coverage to appropriately interpret the user information need. Hence minimal (often inadequate) support is provided for interpreting additional elements (such as those that convey intensity, frequency, time intervals, etc), not necessarily modeled in ontologies. In reality, many complex search scenarios require a knowledge of such constructs and extend beyond the boundaries of ontologies altogether. In this work, we develop a context-free grammar that defines the query language interpretable by the our knowledge-aware system. In an evaluation against the popular search engine Google, a popular semantic search engine Hakia and a crowd-sourcing based search engine DuckDuckGo, our Knowledge-Aware Search system outperformed the state of the art in retrieving relevant documents for two complex information needs.
People
Delroy Cameron
Nishita Jaykumar
Gaurish Anand
Krishnaprasad Thirunarayan
Gary A. Smith
Amit P. Sheth
Swapnil Soni
Kera Z. Watkins
Framework
Our Knowledge-Aware Search Framework consists of three components: 1) a module for Query Interpretation; 2) text analytics for Document Annotation and 3) a Query Matcher.
Query Interpretation
Our module for query interpretation consists of: 1) a Knowledge Model and 2) The Context-Free Grammar
Knowledge Model
The Knowledge Model is the Drug Abuse Ontology (DAO - pronounced dow), which is the first ontology for prescription drug abuse ever created. An online version of the DAO is available for browsing. In our Knowledge-Aware Search system, we interpret keywords according to the lexical/syntactic match between keywords and concept labels in the ontology. We then expand the query with labels and slang terms for the matching concepts (whether a Class or an Individual) and all labels and slangs terms from Subclasses of the given concepts in the ontology hierarchy.
NB: There is no novelty in this approach, and we make no claims to that effect in this aspect of our work. Our novelty comes from demonstrating that insufficiency of ontology-based query interpretation for complex information needs that require background knowledge outside of the ontology. Hence, we develop a context-free grammar to define the query language interpretable by the system, which includes information from: 1) the ontology; 2) lexicons; 3) lexicons combined with the ontology (lexico-ontology) and 4) the alphabet of the grammar. In the next section we present details on the grammar.
Context-Free Grammar and Query Language Specification
Note that for our Knowledge-Aware-Search-Evaluation, all queries were run on THE SAME SITE for Google, Hakia, DuckDuckGo and our System
LEGEND
ALPHABET | |
ONTOLOGY | |
LEXICON | |
LEXICO-ONTOLOGY | |
EXAMPLES |
Ubiquitous Nonterminals
EQUIVALENCE
> | greater than | more than | above | in excess of | slightly above | little more | bit more | slightly more | high | higher | highest | higher than | |
< | less than | lower than | below | in lack of | slightly below | little less | bit less | slightly less | |
= | exactly | precisely | |
>= | greater than or equal to | more than | above | in excess of | slightly above | little more | bit more | slightly more | exactly | precisely | high | higher | highest | higher than | |
<= | less than or equal to | less than | lower than | below | in lack of | slightly below | little less | bit less | slightly less | exactly | precisely |
DEFINITIONS
NUMERIC_AMOUNT | -999.99 | ... | 0 | ... | 3-4 | ... | 24/7 | 999 | |
WORDED_AMOUNT | one | once | two | twice | three | thrice | four | five | six | seven | eight | nine | ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen | twenty | thirty | forty | fifty | sixty | seventy | eighty | ninty | hundred | |
NUMBER | 0 | 1 | ... | 100 | |
AMOUNT | NUMBER | WORDED_AMOUNT | |
RANGE | [0 - NUMBER] WORDS |
NS Nonterminals
Class Name | Class Source | Class Type |
INTERVAL | Alphabet | Compound |
FREQUENCY | Alphabet | Compound |
DOSAGE | Alphabet | Compound |
ENTITY | Ontology | Simple |
ROA (Route-Of-Administration) | Lexicon/Ontology | Simple |
DRUGFORM | Lexicon/Ontology | Simple |
SIDEEFFECT | Lexicon/Ontology | Simple |
EMOTION | Lexicon | Simple |
PRONOUN | Lexicon | Simple |
INTENSITY | Lexicon | Simple |
SENTIMENT | Lexicon | Simple |
Ontology Nonterminals
Buprenorphine | addnok | bup | bupe | bupes | bupey | buprel | buprenex | buprenorphine | buprenorphine analgesic | buprenorphine opioid dependence | buprenorphone | buprigesic | bups | butrans | film | films | morgesic | norspan | probuphine | saboxine | sobos | strip | strips | sub | subbies | subox | suboxene | suboxone | suboxone film | suboxone tablet | subs | subutex | tecs | temgesic | tex | tidigesic | xone | |
Heroin |
Alphabet Nonterminals
INTERVAL
PAST_DETERMINER | ago | prior | previous | since | before | last | past | ||
PRESENT_DETERMINER | now | about | around | several | couple | every | all | this | ||
FUTURE_DETERMINER | next | later | after | ||
PERIOD | PAST_DETERMINER | PRESENT_DETERMINER | FUTURE_DETERMINER | ||
TIME_INDICATOR | |||
HOUR | hour | hours | hr | hrs | ||
MINUTE | minute | minutes | min | mins | ||
SECOND | second | seconds | sec | secs | ||
DURATION_INDICATOR | |||
DECADE | decade | decades | ||
YEAR | year | years | yr | yrs | annum | ||
MONTH | month | months | mth| mths | mo | ||
WEEK | week | weeks| wk | wks | ||
DAY | day | days | night | nights | nite | nites | morning | mornings | mornin | evening | evenin | evenings | afternoon | noon | ||
DURATION_PERIOD | PAST_DETERMINER RANGE PERIOD | years ago, weeks prior | |
PRESENT_DETERMINER RANGE PERIOD | weeks now | ||
FUTURE_DETERMINER RANGE PERIOD | weeks later, days after | ||
PERIOD_DURATION | PAST_DETERMINER RANGE DURATION_INDICATOR | last year, previous day | |
PRESENT_DETERMINER RANGE DURATION_INDICATOR | about a year, around a month | ||
FUTURE_DETERMINER RANGE DURATION_INDICATOR | later years, next day | ||
TIME_PERIOD | TIME_INDICATOR RANGE PAST_DETERMINER | hours ago, minutes before | |
TIME_INDICATOR RANGE PRESENT_DETERMINER | hours now | ||
TIME_INDICATOR RANGE FUTURE_DETERMINER | hours later, minutes after | ||
PERIOD_TIME | PAST_DETERMINER RANGE TIME_INDICATOR | last hour | |
PRESENT_DETERMINER RANGE TIME_INDICATOR | several hours, couple of minutes | ||
FUTURE_DETERMINER RANGE TIME_INDICATOR | next hour | ||
AMOUNT_TIME_PERIOD | AMOUNT RANGE TIME_PAST_PERIOD | 5 minutes ago | |
AMOUNT RANGE TIME_PRESENT_PERIOD | 10 hours now | ||
AMOUNT RANGE TIME_FUTURE_PERIOD | 5 minutes later | ||
AMOUNT_TIME | AMOUNT RANGE TIME_INDICATOR | 15 seconds | |
PERIOD_AMOUNT_TIME | PAST_DETERMINER RANGE AMOUNT_TIME | last 2 hours, past 2 minutes | |
PRESENT_DETERMINER RANGE AMOUNT_TIME | around 2 hours | ||
FUTURE_DETERMINER RANGE AMOUNT_TIME | next 15 seconds, after 2 minutes | ||
AMOUNT_DURATION_PERIOD | AMOUNT RANGE DURATION_PAST_PERIOD | 5 years ago | |
AMOUNT RANGE DURATION_PRESENT_PERIOD | 5 years now | ||
AMOUNT RANGE DURATION_FUTURE_PERIOD | 5 years later, 9 months after | ||
AMOUNT_DURATION | AMOUNT RANGE DURATION_INDICATOR | 15 months | |
PERIOD_AMOUNT_DURATION | PAST_DETERMINER RANGE AMOUNT_DURATION | last 15 weeks, last 2 years | |
PRESENT_DETERMINER RANGE AMOUNT_DURATION | about 3 monts, around five years | ||
FUTURE_DETERMINER RANGE AMOUNT_DURATION | next 15 weeks, after 2 years | ||
INTERVAL | DURATION_PERIOD | PERIOD_DURATION | TIME_PERIOD | PERIOD_TIME | AMOUNT_TIME_PERIOD | AMOUNT_TIME | PERIOD_AMOUNT_TIME | AMOUNT_DURATION_PERIOD | AMOUNT_DURATION | PERIOD_AMOUNT_DURATION |
FREQUENCY
FREQUENCY_ITEM | hourly | daily | weekly | bi-weekly | biweekly | monthly | yearly | annually | |
FREQUENCY_INDICATOR | times | times a | times an | both times | |
PER_INDICATOR | per | / | FREQUENCY_INDICATOR | |
PER_SECOND | PER_INDICATOR RANGE SECOND | /second, times a second |
PER_MINUTE | PER_INDICATOR RANGE MINUTE | per minute, /minute |
PER_HOUR | PER_INDICATOR RANGE HOUR | hourly | per hour, times an hour |
PER_DAY | PER_INDICATOR RANGE DAY | daily | nightly | / day, per day |
PER_WEEK | PER_INDICATOR RANGE WEEK | weekly, bi-weekly | biweekly | per week, /week |
PER_MONTH | PER_INDICATOR RANGE MONTH | monthly | bi-monthly | bimonthly | /month, times a month |
PER_YEAR | PER_INDICATOR RANGE YEAR | yearly | annually | per year, /year |
PER_DECADE | PER_INDICATOR RANGE DECADE | times a year, per year |
PER_TIMEINDICATOR | PER_SECOND | PER_MINUTE | PER_HOUR | PER_DAY | PER_WEEK | PER_MONTH | PER_YEAR | PER_DECADE |
per min, per hour, /min, /hour |
PER_DURATION_INDICATOR | PER_INDICATOR RANGE DURATION_INDICATOR | per day, per week, /week, /month |
AMOUNT_PER_TIME_INDICATOR | AMOUNT RANGE PER_TIME_INDICATOR | 5 per min, per hour, 24 mg /min |
AMOUNT_FREQUENCY | AMOUNT RANGE FREQUENCY_INDICATOR | 5 times |
AMOUNT_FREQUENCY_DURATION | AMOUNT_FREQUENCY RANGE DURATION_INDICATOR | 5 times a day |
FREQUENCY_DURATION | FREQUENCY_INDICATOR RANGE DURATION_INDICATOR | times a day |
FREQUENCY_TIME | FREQUENCY_INDICATOR RANGE TIME_INDICATOR | times a hour |
PERIOD_FREQUENCY_DURATION | PERIOD_DETERMINER RANGE FREQUENCY_INDICATOR | several times a day |
PERIOD_FREQUENCY_TIME | PERIOD_DETERMINER RANGE FREQUENCY_TIME | several times a day |
AMOUNT_FREQUENCY_TIME | AMOUNT RANGE FREQUENCY_TIME | several times a day |
AMOUNT_PER_TIME | AMOUNT RANGE PER_TIME_INDICATOR | 5 per min, per hour, 24 mg /min |
AMOUNT_PER_DURATION | AMOUNT RANGE FREQUENCY_TIME | 5 per day, 10 per week |
FREQUENCY | PER_TIME_INDICATOR | PER_DURATION_INDICATOR | AMOUNT_FREQUENCY_DURATION | PERIOD_FREQUENCY_DURATION | PERIOD_FREQUENCY_TIME | AMOUNT_FREQUENCY_TIME | AMOUNT_PER_TIME | AMOUNT_PER_DURATION | FREQUENCY_ITEM |
DOSAGE
CUBIC_CENTIMETER | cc | ccs | cubic c | cubic cs | cubic centimeter | cubic centimeters | cubic centimetres | cubic centimetre | cubic centi-meter | cubic centi-meters | cubic centi-metre | cubic centi-metres | |
BAG | bag | bags | |
POUND | lb | lbs | pound | pounds | |
GRAM | g | gs | gm | gms | gram | grams | |
MILLIGRAM | mg | mgs | milligram | milligrams | milli-gram | milli-grams | |
MICROGRAM | ug | ugs | mcg | mcgs | microgram | micrograms | micro-gram | micro-grams | |
KILOGRAM | kg | kgs | kilogram | kilograms | kilo-gram | kilo-grams | |
LITER | litre | litres | liter | liters | |
MILLILITER | ml | mls | millilitre | millilitres | milliliter | milliliters | milli-litre | milli-litres | milli-liter | milli-liters | |
MICROLITER | mcl | mcls | microlitre | microlitres | microliter | microliters | micro-litre | micro-litres | micro-liter | micro-liters | |
OUNCE | oz | ozs | ounce | ounces | |
TABLET | tablet | tablets | tab | tabs | |
UNIT | CUBIC_CENTIMETER | BAG | POUND | GRAM | MILLIGRAM | MICROGRAM | KILOGRAM | LITER | MILLILITER | MICROLITER | OUNCE | TABLET | |
NUMERIC_AMOUNT_UNIT | NUMBER_AMOUNT | UNIT |
1-5 grams, 2 mcg |
WORDED_NUMERIC_AMOUNT_UNIT | WORD_AMOUNT | UNIT |
hundred milligrams, one ecstasy tablet |
DOSAGE | NUMERIC_AMOUNT_UNIT | WORDED_NUMERIC_AMOUNT_UNIT |
Lexicon Nonterminals
PRONOUN
DEMONSTRATIVE_PRONOUN | this | that | these | those | |
PERSONAL_PRONOUN | i | me | you | she | her | he | him | it | we | us | they | them | |
POSSESSIVE_PRONOUN | my | our | ours | your | yours | his | her | hers | its | their | theirs | mine | |
REFLEXIVE_PRONOUN | myself | ourselves | yourself | yourselves | himself | herself | itself | themselves | |
RELATIVE_PRONOUN | that | which | who | whom | whose | whichever | whoever | whomever | |
INDEFINITE_PRONOUN | anybody | anyone | anything | each | either | everybody | everyone | everything | neither | nobody | no one | nothing | one | somebody | someone | something | both | few | many | several | all | any | most | none | some | |
INTERROGATIVE_PRONOUN | what | who | which | whom | whose |
INTENSITY
LOW | low | very low | lower | lower than | lowest | small | very small | smaller | smaller than | smallest | less | less than | less is more | least | |
AVERAGE | average | ideal | |
HIGH | high | very high | higher | highest | large | very large | larger | largest | more | most | excess | excessive |
ROA (Route-of-Administration)
ENTERAL | ate | chewing | drink | eat | insufflate | plug | plugged | smoke | smoked | sniff | snort |
EPIDURAL | inject | injected | injection |
INTRAARTERIAL | IV | IVed | IV’ed | IVing | IV’ing | inject | injected | injection |
INTRACARDIAC | inject | injected | injection |
INTRACEREBRAL | inject | injected | injection |
INTRADERMAL | IV | IV’ed | IVing | inject | injected | injection | sniff | snort | snorting | bumping | railing | doozing |
INTRAMUSCULAR | inject | injected | injection | skin poppin |
INTRAVENOUS | IV | IVed | IV’ed | IVing | IV’ing | inject | injected | injection |
INHALATIONAL | smoke | smokes | smoked | smoking | sniff | sniffed | sniffing |snort | snorted | snorting | bumping | railing | doozing |
INTRAPERITONEAL | inject | injected | injection |
INTRATHECAL | inject | injected | injection |
INTRAOSSEOUS INFUSION | inject | injected | injection |
NASAL | sniff | snort | snorting | bumping | railing | doozing |
PARENTERAL | inject | injected | injection |
TRANSDERMAL | patch | patches |
TRANSMUCOSAL | snort | snorted | snorting | sniff | sniffed | sniffing | bumping | railing | doozing |
TOPICAL | patch | patches |
SUBCUTANEOUS | inject | injected | injection |
Lexico-ontology Nonterminals
Knowledge-Aware-Search-lexico-ontology
Template Pattern Productions
Note that for our Knowledge-Aware-Search-Evaluation, all queries were run on THE SAME WEB FORUM for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-Productions
Document Annotation
AQL
Note that for our Knowledge-Aware-Search-Evaluation, all queries were run on THE SAME WEB FORUM for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-AQL
Query Matcher
Demo & Live Web Application
PLEASE SELECT A RESOLUTION >720p TO WATCH THIS VIDEO
Knowledge Aware Search Live Demo
Evaluation
Note that for our Knowledge-Aware-Search-Evaluation, all queries were run on THE SAME WEB FORUM for Google, Hakia, DuckDuckGo and our System
Knowledge-Aware-Search-Evaluation
Funding
This project is sponsored by the National Institutes of Health (NIH) Grant No. R21 DA030571-01A1 awarded to the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Treatment, Interventions and Addictions Research (CITAR) titled “A Study of Social Web Data on Buprenorphine Abuse using Semantic Web Technology.” Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s) and do not necessarily reflect the views of the National Institutes of Health.
Contact: Delroy Cameron