Relevance and Quality of Health Information on the Web Tim Tang DCS Seminar October, 2005
2 Outlines Motivation - Aims Experiments & Results –Domain specific vs. general search –A Quality Focused Crawler Conclusion & Future work
3 Why health information on the Web? Internet is a free medium High user demand for health information Health information of various quality Incorrect health advice is dangerous
4 Problems Normal definition of relevance: Topical relevance Normal way to search: Word-matching Q: Are these applicable to health information? A: Not complete, we also need quality = usefulness of the information
5 Problem: Quality of health info The various quality of health information in search results
6 Wrong advice
7 Dangerous information
8
9 Dangerous Information
10 Problem: Commercial sites Health information for commercial purposes
11 Commercial promotion
12 Problem: Types of search engine The difference between domain-specific search and general-purpose search.
13 Querying BPS
14 Querying Google: Irrelevant information
15 Problem of domain-specific portals Domain-specific portals may be good, but … It often requires intensive effort to build and maintain (will be discussed more in experiment 2)
16 Aims To analyse the relative performance of domain specific and general purpose search engines To discover how to provide effective domain specific search, particularly in the health domain To automate the quality assessment of medical web sites
17 Two experiments First: Compare search results for health info between general and domain specific engines Second: Build and evaluate a Quality focused crawler for a health topic
18 The First Experiment A Comparison of the relative performance of general purpose search engines and domain- specific search engines In Journal of Information Retrieval ‘05 – Special Issue with Nick Craswell, Dave Hawking, Kathy Griffiths and Helen Christensen
19 Domain specific vs. General engines General search engines: Google, Yahoo, MSN search, … Domain specific: Search service for scientific papers, search service for health, or a topic in the health domain. A depression portal: BluePages ( )
20 BluePages Search (BPS)
21 BPS result list
22 Engines –Google –GoogleD (Google with “depression”) –BPS –4sites (4 high quality depression sites) –HealthFinder (HF): A health portal search named Health Finder –HealthFinderD (HFD): HF with depression
23 Queries 101 queries about depression: –50 treatment queries suggested by domain experts –51 non-treatment queries collected from 2 query logs: domain-specific query log and general query log. Examples: –Treatment queries: acupuncture, antidepressant, chocolate –Non-treatment queries: depression symptoms, clinical depression
24 Experiment details Run the 101 queries on the 6 engines. For each query, top 10 results from each engine are collected. All results were judged by research assistants: degrees of relevance, recommendation of advice Relevance and quality for all engines were then compared
25 Results Engine RelevanceQuality GoogleD BPS sites Google HFS
26 Findings Google is not good in either relevance or quality GoogleD can retrieve more relevant pages, but less high quality pages. 4sites and BPS provide good quality but have poor coverage. It’s important to have a domain-specific portal which provides both high quality and high coverage. How to improve coverage?
27 Experiment 2 Building a high quality domain-specific portal using focused crawling techniques In CIKM ’05 With Dave Hawking, Nick Craswell, Kathy Griffiths
28 A Quality Focused Crawler Why? –The first experiment shows: Quality can be achieved using domain specific portals –The current method for building such a portal is expensive. – Focused crawling may be a good way to build a health portal with high coverage, while reducing human effort.
29 The problems of BPS Manual judgments of health sites by domain experts for two weeks to decide what to include. 207 Web sites are included, i.e., a lot of useful web pages are left out. Tedious maintenance process: Web pages change, cease to exist, new pages, etc. Also, the first experiment shows: High quality but quite low coverage.
30 Focused Crawling (FC) Designed to selectively fetch content relevant to a specified topic of interest using the Web’s hyperlink structure. Examples of topics: sport, health, cancer, or scientific papers, etc.
31 FC Process URL Frontier Link extractorDownload Classifier {URLs, link info} dequeue {URLs, scores} enqueue Link info = anchor text, URL, source page’s content, so on.
32 FC: simple example Crawling pages about psychotherapy
33 Relevance prediction anchor text: text appearing in a hyperlink text around the link: 50 bytes before and after the link URL words: parse the URL address
34 Relevance Indicators URL: herapy.html => URL words: depression, com, psychotherapy Anchor text: psychotherapy Text around the link: –50 bytes before: section, learn –50 bytes after: talk, therapy, standard, treatment
35 Methods Machine learning approach: Train and test relevant and irrelevant URLs using the discussed features. Evaluated different learning algorithms: k-nearest neighbor, Naïve Bayes, C4.5, Perceptron. Result: The C4.5 decision tree was the best to predict relevance. The same method applied to predict quality but not successful!!!
36 Quality prediction Using evidence-based medicine, and Using Relevance Feedback (RF) technique
37 Evidence-based Medicine Interventions that are supported by a systematic review of the evidence as effective. Examples of effective treatments for depression: –Antidepressants –ECT (electroconvulsive therapy) –Exercise –Cognitive behavioral therapy These treatments were divided into single and 2-word terms.
38 Relevance Feedback Well-known IR approach of query by examples. Basic idea: Do an initial query, get feedback from users about what documents are relevant, then add words from relevant document to the query. Goal: Add terms to the query in order to get more relevant results.
39 RF Algorithm Identify the N top-ranked documents Identify all terms from these documents Select the terms with highest weights Merge these terms with the original query Identify the new top-ranked documents for the new query (Usually, 20 terms are added in total)
40 Our Modified RF approach Not for relevance, but Quality No only single terms, but also phrases Generate a list of single terms and 2-word phrases and their associated weights Select the top weighted terms and phrases Cut-off points at the lowest-ranked term that appears in the evidence-based treatment list 20 phrases and 29 single words form a ‘quality query’
41 Terms represent topic “depression” TermWeight Depression13.3 Health6.9 Treatment5.7 Mental5.4 patient3.3 Medication3 ECT2.4 antidepressants1.9 Mental health1.2 Cognitive therapy0.84
42 Predicting Quality For downloaded pages, quality score (QScore) is computed using a modification of the BM25 formula, taking into account term weights. Quality of a page is then predicted based on the quality of all downloaded pages linking to it. (Assumption: Good pages are usually inter-connected) Predicted quality score of a page with n downloaded source pages: PScore = Σ QScore/n
43 Combining relevance and quality Need to have a way of balancing relevance and quality Quality and relevance score combination is new Our method uses a product of the two scores Other ways to combine these scores will be explored in future work A quality focused crawler rely on this combined score to order the crawl queue
44 The Three Crawlers A Web crawler (spider): –A program which browses the WWW in a methodical, automated manner –Usually used by a search engine to index web pages to provide fast searches. We built three crawlers: –The Breadth-first crawler: Traverses the link graph in a FIFO fashion (serves as baseline for comparison) –The Relevance crawler: For relevance purpose, ordering the crawl queue using the C4.5 decision tree –The Quality crawler: For both relevance and quality, ordering the crawl queue using the combination of the C4.5 decision tree and RF techniques.
45 Results
46 Relevance
47 Relevance Results The relevance and quality crawls each stabilised after 3000 pages, at 80% and 88% relevance respectively. The BF crawl continued to degrade over time, and down to 40% at 10,000 pages. The quality crawler outperformed the relevance crawler due to the incorporation of the RF quality scores.
48 Quality
49 High quality pages AAQ = Above Average Quality: top 25%
50 Low quality pages BAQ = Below Average Quality: bottom 25%
51 Quality Results The quality crawler performed significantly better than the relevance crawler. (50% better towards the end of the crawl) All the crawls did well in crawling high quality pages. The quality crawler performed very well, with more than 50% of its pages being high quality. The quality crawl only has about 5% pages from low quality sites while the BF crawl has about 3 times higher.
52 Findings Topical-relevance could be well predicted using link anchor context. Link anchor context could not be used to predict quality. Relevance feedback technique proved its usefulness in quality prediction.
53 Overall Conclusions Domain-specific search engines could offer better quality of results than general search engines. The current way to build a domain-specific portal is expensive. We have successfully used focused crawling techniques, relevance decision tree and relevance feedback technique to build high-quality portals cheaply.
54 Future works So far we only experimented in one health topic. Our plan is to repeat the same experiments with another topic, and generalise the technique to another domain. Other ways of combining relevance and quality should be explored. Experiments to compare our quality crawl with other health portals is necessary. How to remove spam from the crawl is another important step.