Download presentation
Presentation is loading. Please wait.
1
College of Information
Connecting with Users beyond Language Boundaries through Multilingual Information Access for Digital Collections Jiangping Chen May
2
Presentation Outline We need your help!
The Metadata Records Translation (MRT) Project The concept of Multilingual Information Access Research goals Research plan HeMT: An integrated multilingual participatory platform HeMT Users Functions Collaboration and crowdsourcing We need your help! Jiangping Chen - TCDL 2012
3
Jiangping Chen - TCDL 2012
4
What is Multilingual Information Access (MLIA)?
An extension of Cross-Language Information Retrieval (CLIR) To facilitate universal information access by overcoming language barriers Includes but not limited to: CLIR, CLQA, Cross-Language Summarization, Bilingual or multilingual searching, browsing, and presentation. Jiangping Chen - TCDL 2012
5
Chinese-English Information Retrieval
Queries (C) Query Translation Queries (E) Results (C) Results (E) English Documents MT System DL or IR system Sample query: 白内障有哪些新药? What are the newest medicines/treatments for cataract? Jiangping Chen - TCDL 2012
6
Chinese-English Information Retrieval
MT System English Documents Chinese Documents Queries (C) DL or IR system Results (C) Sample query:白内障有哪些新药? What are the newest medicines/treatments for cataract? Jiangping Chen - TCDL 2012
7
MLIA is Important Information creators and users are multilingual
The need to access information in many languages For economic development For knowledge sharing/cultural exchange For learning For national security Jiangping Chen - TCDL 2012
8
MLIA is Difficult MLIA involves translation
Translation is difficult even for professional translators. Machine translation is considered the most difficult natural language processing problem Information retrieval (IR) is challenging Understanding users’ information needs is difficult Even the most sophisticated IR algorithm cannot satisfy every user Jiangping Chen - TCDL 2012
9
MLIA Research Fortunately, significant progress has been made in MLIA
MLIA Evaluation Forums: TREC, NTCIR, CLEF Statistical MT workshops Google had launched a cross-language search (2007 – 2011), which built upon many years of research in Machine translation (MT) and Cross-Language Information Retrieval (CLIR) Jiangping Chen - TCDL 2012
10
Jiangping Chen - TCDL 2012
11
Jiangping Chen - TCDL 2012
12
Jiangping Chen - TCDL 2012
13
MLIA for Digital Libraries
Digital library users are multilingual Many digital collections are multilingual MLIA research has been conducted for years, but real applications of the research results to digital libraries are rare MT systems are producing promising results Jiangping Chen - TCDL 2012
14
Bilingual or Multilingual Digital Libraries in the United States
Library Name URL Languages Meeting of Frontiers English/Russian France in America English/French Parallel Histories English/Spanish International Children's Digital Library Digital Objects in 11 languages. Users can do the keyword search in 51 languages. The Perseus Digital Library Greek, English, Latin Chen & Ruiz – SIGIR 2009
15
The Five Digital Libraries Share the Following Characteristics
They have been funded by various funding agencies, especially from the federal government; They are the products of collaboration. People from different countries work together to produce the bilingual or multilingual collections; They serve a broader or global user community in which users speak different languages; They Do Not employ cross-language information retrieval techniques or machine translation. Chen & Ruiz – SIGIR 2009
16
MLIA for DL: Questions/Concerns
How useful are current MLIA and MT technologies for digital libraries? What are the costs and benefits for DLs to provide multilingual information access? Are there other solutions to language barriers for MLIA in DLs? Crowdsourcing Computer-assisted mechanisms Jiangping Chen - TCDL 2012
17
Jiangping Chen - TCDL 2012
18
The Metadata Records Translation (MRT) Project
A collaborative project among four units in three countries – UNT college of Information, UNT Libraries, Wuhan University in China, and UAEM in Mexico Two-year research project funded by IMLS National Leadership Grant and UNT The goals include: (1) to understand to what extent current MT technologies generate adequate translation for metadata records and (2) to explore effective metadata records translation strategies for digital collections. Jiangping Chen - TCDL 2012
19
Jiangping Chen - TCDL 2012
20
HeMT: an Integrated Multilingual Participatory Platform
A database-driven Web application serving three different types of users With interfaces (webpages) in three languages: English, Simplified Chinese, and Spanish. Easy to use for translators and evaluators the system will be used by evaluators from the US, China, and Mexico. So it is important to keep the system easy to understand, fast uploaded, images should be minimized Extendable to other languages – desire a language independent database structure Jiangping Chen - TCDL 2012
21
HeMT Homepage Jiangping Chen - TCDL 2012
22
HeMT Users Translators: conduct manual translation of metadata records; Recruited from Texas Reviewers: Monitor system progress, update multilingual term list The research team Evaluators: conduct human evaluation of the system Recruited from China, Mexico, and U.S. Jiangping Chen - TCDL 2012
23
HeMT Site Map HeMT Homepage User Registration Training Lesson
New Evaluator Reviewer New User Evaluator Translator User Registration Training Lesson Evaluation Start Page Manual Translation Monitoring & Visualization User Profile Revision Individual MT System Evaluation Comparative Evaluation Jiangping Chen - TCDL 2012
24
Jiangping Chen - TCDL 2012
25
Training of the Evaluators
Train lesson is in three languages The lesson describes The MRT project and its purposes Procedures for evaluators Evaluation measures (such as adequacy, fluency, best system, worst system, etc) and the criteria for assign values to these measures Comments: when and how should I provide comments? Quiz at the end of the lesson to make sure evaluators understand how to do evaluation Jiangping Chen - TCDL 2012
26
Training Lesson in Chinese
Training home page in Chinese Jiangping Chen - TCDL 2012
27
Training Lesson in Spanish
Jiangping Chen - TCDL 2012
28
Jiangping Chen - TCDL 2012
29
Jiangping Chen - TCDL 2012
30
Jiangping Chen - TCDL 2012
31
The Bottom Part of the Individual Evaluation Page
2018/9/19 Jiangping Chen
32
Jiangping Chen - TCDL 2012
33
The Bottom Part of the Comparative Evaluation Page
Jiangping Chen - TCDL 2012
34
Jiangping Chen - TCDL 2012
35
Jiangping Chen - TCDL 2012
36
Comments on Crowdsourcing and Collaboration for MRT Project
Collaboration is crucial to achieve the goal of this project Partners, consultants Crowdsourcing may save money, but may require more time and more effective management Online training lesson for participants are important Jiangping Chen - TCDL 2012
37
Next Step …… We need more people for evaluating Spanish machine translation results Please pick a handout if you or anyone you know may be interested in participation We want to collaborate with institutions that have metadata records in Simplified Chinese or Spanish for training our MT system We want to collaborate with U.S. DLs to evaluate MT for MLIA Jiangping Chen - TCDL 2012
38
Thank You! College of Information
Any comments, suggestions are welcome! Please contact:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.