Internet Searching and Browsing in a Multilingual World An Experiment on the Chinese Business Intelligence Portal Acknowledgment: NSF/NIJ Grant.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Natalie Fong English Centre, The University of Hong Kong Good Practices in a Second Language Classroom: An Alternating Use of ICT in Independent Learning.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Chapter 14: Usability testing and field studies. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept.
ADVISE: Advanced Digital Video Information Segmentation Engine
Evaluation Adam Bodnar CPSC 533C Monday, April 5, 2004.
Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
Web 2.0 Testing and Marketing E-engagement capacity enhancement for NGOs HKU ExCEL3.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
VeldwERK: What happens when you step into the CEFR Seminar on Curriculum Convergences Council of Europe, Strasbourg 29th November, 2011 Daniela Fasoglio,
Mining and Summarizing Customer Reviews
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Evaluation of digital collections' user interfaces Radovan Vrana Faculty of Humanities and Social Sciences Zagreb, Croatia
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Yahoo! Acquires Inktomi March 19 th, Yahoo!
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Master Thesis Defense Jan Fiedler 04/17/98
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Module 4: Systems Development Chapter 12: (IS) Project Management.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Stands for “Search Engine Optimization” Process of improving “visibility” of a web site to search engines in order to help search ranking Attracts more.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Understanding Users Cognition & Cognitive Frameworks
Chapter Nineteen Understanding Information and e-Business.
ES component and structure Dr. Ahmed Elfaig The production system or rule-based system has three main component and subcomponents shown in Figure 1. 1.Knowledge.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Working Memory and Learning Underlying Website Structure
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Supporting document use through interactive visualization of metadata Visual Interfaces to Digital Libraries JCDL 28/06/2001 Mischa Weiss-Lijn.
Thomas Kern | The system documentation as binding agent for and in between internal and external customers April 24th, 2009 | Page 1 The system documentation.
BACKGROUND The Web is a global information resource Web users that seek information vary, culturally and ethnically Users of different cultural backgrounds.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Free SEO for Blogs & YouTube Channels.
What is a CAT? What is a CAT?.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CTE EA SKILLS CHARTS FOR CLASSROOM ENGAGEMENT
Proposal for Term Project
MKT 435 Competitive Success-- snaptutorial.com
MKT 435 Education for Service-- snaptutorial.com
MKT 435 Teaching Effectively-- snaptutorial.com
Q4 Measuring Effectiveness
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Presentation transcript:

Internet Searching and Browsing in a Multilingual World An Experiment on the Chinese Business Intelligence Portal Acknowledgment: NSF/NIJ Grant

2 Outline Motivation The Chinese Business Intelligence Portal –System Description –Results of Usability Study Conclusions

Introduction

4 Motivation As the Internet grows in popularity worldwide, more users want to access Web content in their native languages –The majority of the total global online population (63.5%) lives in non-English-speaking areas (Global-Reach, 2002) –Such population is estimated to grow rapidly, much faster than English-speaking population However, existing search engines may not serve their needs, because most technologies have been developed for English-speaking users

5 This Presentation The following slides present our efforts in creating and evaluating intelligent Web portals that address the above needs –The Chinese business information serves as our research testbed Through the studies, we aim to achieve better understanding of human interaction and analysis with automated systems developed for Internet searching and browsing in a multilingual world

The Chinese Business Intelligence Portal (CBizPort)

7 CBizPort The Chinese Business Intelligence Portal (CBizPort) –Two versions of user interface: Simplified Chinese and Traditional Chinese –URLs Introduction: Portal: –Each version has the same user interface and provides the same functions Encoding conversion Meta searching major Chinese information sources Summarization, Categorization Providing links to major Chinese business Web resources –The following slides show the system architecture and screen shots of CBizPort

8

9 Keywords: Meta searches 8 major information sources of Mainland China, Hong Kong, and Taiwan Provides links to major Chinese business Web sites and resources Provides both Simplified and Traditional Chinese versions of user interface Allows input of multiple key terms

10 Search PageResult Page Categorizer A two-sentence summary on left, original page on right Summarizer Web pages grouped by key phrases extracted by mutual information algorithm (non- exclusive categorization)

11 Evaluation of CBizPort Objectives 1.To evaluate the performance of summarizer as a preview function and categorizer as an overview function 2.To compare CBizPort with regional Chinese search engines to study its effectiveness and usability 3.To evaluate, in comparison with existing regional Chinese search engines, the information quality obtained from CBizPort and its capability of searching for cross- regional business information

12 Experimental Design Searching and browsing were studied Scenario-based, culturally oriented tasks, e.g., –A search task (4 min): “Find two cities in mainland China that Motorola has set up its manufacturing operations” –A browse task (5 min): “Describe, in a number of distinct themes, the economic impacts of removing trade barriers between mainland China and Taiwan towards Hong Kong ” Theme identification method (Chen et al., 2001) –Pilot test: 3 subjects used up all the time in most tasks  only focused on effectiveness but not efficiency

13 10 Tasks in the Experiment (1 hour) Subject’s Origin ToolSettingHong KongTaiwanChina CBizPortBasic searching (with neither summarizer nor analyzer) SO1SO2SO3 BO1BO2BO3 Basic searching + with summarizer only SM1 BM1 Basic searching + with categorizer only SA1 BA1 Regional Chinese SE General searching and browsing SG1 BG1 Cross-regional searching and browsing SC1SC2SC3 BC1BC2BC3 S = search task; B = browse task; O = Basic searching (with neither summarizer nor analyzer); M = Basic searching + with summarizer only; A = Basic searching + with categorizer only; G = General searching and browsing; C = Cross-regional searching and browsing; same number signals the same question across different regions (Random assignment of tasks is used for different settings)

14 Comparisons Search Browse Search Browse Openfind YahooHK Sina.com or CBizPort With or without summarizer With or without categorizer Compare

15 Subjects 30 subjects, 10 from each region, were recruited –Rationale: equal influence of regional impacts Each subject used CBizPort and another search tool according to his/her origin Subject’s originSearch toolCBizPort version Hong KongYahooHKTraditional Chinese TaiwanOpenFindTraditional Chinese Mainland ChinaSina.comSimplified Chinese

16 Experts Three experts, one from each region, were recruited to provide answers to all browse tasks –First, the experts identify the set of relevant answers (organized into themes) to a browse task –Then, they modified the answers by adding some of subjects’ responses that they judged as relevant –The above two steps are repeated for all the other browse tasks Bla bla bla

17 Hypotheses Three sets of hypotheses were tested –CBizPort’s Enhanced Analysis Capabilities Searching and browsing With or without summarizer/categorizer –SE Performance Comparison Searching and browsing capabilities Individual settings and combination* –Users’ Subjective Evaluation Information quality cross-regional searching capability overall satisfaction –Auxiliary hypotheses: Performance of the three regions are not significantly different We tried to mimic a situation that each subject was allowed to use both CBizPort and benchmark search engine together to solve the same problem

18 CBizPort Experts’ answers Benchmark SE

19

20 Performance Measures Accuracy = Percentage of correct answers Precision = number of correct themes identified by users / total number of themes identified by users Recall = number of correct themes identified by users / total number of themes identified by an expert F value = 2*Recall*Precision / (Precision + Recall) Information quality: accessibility, appropriateness of amount, believability, completeness, …, etc. (Wang & Strong, 2002) Subjective evaluation: cross-regional searching capability, overall satisfaction, protocol analysis, post- hoc test (to study whether the three SEs yield significantly different results)

21 Accuracy of search tasks

22 Precision of browse tasks

23 Recall of browse tasks

24 F value of browse tasks

25 Information Quality

26 Users’ Subjective Evaluation

27 Subjects’ Verbal Comments Subjects liked summarizer and categorizer –Subj.#15: “… good performance in summarization and categorization, more focused results can be found”; #26: “… very handy”; #6: “…useful tools to enhance the searching ability” (11 subjects) CBizPort provides a wide coverage and variety of searching options –Subj.#2: “… Yahoo Search Engine is more limited when search certain term in a specific region … While CBizport can fulfill what Yahoo couldn’t do.”; #4: “… more search engines to choose from” (4 subjects)

28 Subjects’ Verbal Comments (2) Subjects are familiar with benchmark SEs –Subj#27: “I am familiar with the format of Openfind. So that's the reason that I am more satisfied with it than CBizPort.”; (4 subjects) Benchmark SEs are not good at cross- regional information searching –Subj#15: “Sina gives many results but they are not focused, and is poor at searching HK and Taiwan results”; #5: “ provide more accurate regional searching ” CBizPort is user friendly but slow –#3: “Yahoo not as precise as CBizPort”; #28: “… easier to search” (7 subjects); “slow” (3 subjects)

29 Conclusions CBizPort’s summarizer and categorizer provide helpful analysis capabilities for users’ search and browse tasks –CBizPort’s searching and browsing performance is comparable to that of regional Chinese search engines CBizPort can significantly augment the searching and browsing ability of regional Chinese search engines, thus improving human integration of regional information and analysis –Information quality, cross-regional searching capability and overall satisfaction of CBizPort are comparable to those of regional Chinese search engines CBizPort is better than regional Chinese search engines in terms of analysis functions, cross-regional searching capabilities and user-friendliness