Revision as of 16:44, 15 May 2015

Domain Specific Document Retrieval Framework for Near Real-time Social Health Data

Introduction

With the advent of the web search and microblogging, the percentage of Online Health Information Seekers (OHIS) using these online services to share and seek health real-time information has increased exponentially. When OHIS turn to search engine or microblogging search services to seek out real-time information, the results are not promising. Most of the web search engines and microblogging services are limited to keyword based techniques to retrieve useful information for a given query. Often, the top results are dominated with breaking news. Similarly, in the microblogging and web search engine realm, the results do not contain real-time information. It is extremely difficult for users to retrieve relevant results based on query alone; they may get overwhelmed by the information overload. In our approach, we have considered Twitter to search documents based on some unique features: triple-pattern based mining, near real-time retrieval, and tweet contained URL based search. First, triple based pattern (subject, predicate, and object) mining technique extracts triple patterns from microblog messages--related with chronic health conditions. The triple pattern is defined in the initial question. Second, in order to make the system near real-time, the search results are divided into intervals of six hours. Third, in addition to tweets, we use URLs’ (mentioned in the tweet) content as the data source. Finally, the results are ranked according to relevance and popularity such that at a particular time the most relevant information for the questions are displayed instead of only temporal relevance.

Architecture

Our Social Health Signal platform is based on a) large scale real-time Twitter data processing b) semantic web techniques and domain knowledge c) triple-pattern based text mining. The system is divided into three major components.

Processing Pipeline: To collect and extract meta-data of the tweets. Second
Pattern Extractor: To extract relevant documents related to a given query
Rank Calculator: This module calculate the rank of the results

Architecture

@@ Line 5: / Line 5: @@
 </div>
 =Architecture=
-Our Social Health Signal platform is based on a) large scale real-time Twitter data processing b) semantic web techniques and domain knowledge c) triple-pattern based text mining. The system is divided into three major components. First, to collect and extract meta-data of the tweets. Second, to extract relevant documents related to a given query. Finally, calculate the rank of the results.
+Our Social Health Signal platform is based on a) large scale real-time Twitter data processing b) semantic web techniques and domain knowledge c) triple-pattern based text mining. The system is divided into three major components.
+# <b>Processing Pipeline</b>: To collect and extract meta-data of the tweets. Second
+# <b>Pattern Extractor</b>: To extract relevant documents related to a given query
+# <b>Rank Calculator</b>: This module calculate the rank of the results
+[[File:Architecture_shs.png|center|thumb|650px|Architecture]]

Difference between revisions of "Socialhealthsignal"

Revision as of 16:44, 15 May 2015

Introduction

Architecture

Navigation menu

Views

Personal tools

Navigation

Homepage

Search

Tools