C SC 620 Advanced Topics in Natural Language Processing 3/11 Lecture 15.

Slides:



Advertisements
Similar presentations
CT is a self-directed process by which we take deliberate steps to think at the highest level of quality. CT is skillful, responsible thinking that is.
Advertisements

Introduction to Computational Linguistics
INTRODUCTION TO MODELING
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
1 Procedural Analysis or structured approach. 2 Sometimes known as Analytic Induction Used more commonly in evaluation and policy studies. Uses a set.
Prepared By Jacques E. ZOO Bohm’s Philosophy of Nature David Bohm, Causality and Chance in Modern Physics (New York, 1957). From Feyerabend, P. K.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
LING 388: Language and Computers Sandiway Fong Lecture 26: 11/29.
CS 357 – Intro to Artificial Intelligence  Learn about AI, search techniques, planning, optimization of choice, logic, Bayesian probability theory, learning,
Teaching and Testing Pertemuan 13
C SC 620 Advanced Topics in Natural Language Processing Lecture 19 4/6.
Let remember from the previous lesson what is Knowledge representation
1 Validation and Verification of Simulation Models.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
C SC 620 Advanced Topics in Natural Language Processing Lecture 10 2/19.
C SC 620 Advanced Topics in Natural Language Processing Lecture 17 3/25.
C SC 620 Advanced Topics in Natural Language Processing Lecture 13 3/4.
Lecture 1 Introduction: Linguistic Theory and Theories
Science Inquiry Minds-on Hands-on.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Scientific Knowledge and Technological Advance
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ITP © Ron Poet Lecture 1 1 IT Programming Introduction.
LOGIC AND CRITICAL THINKING Jonathan Dolhenty, Ph.D. Logic and Critical Thinking. Available at
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Use Basic Sentences – Translation software typically tries to translate whole sentences rather than translating word-by-word. SOLUTION: USE.
Causes of Failure in College from the College of Alabama Center for Teaching and Learning
What research is Noun: The systematic investigation into and study of materials and sources in order to establish facts and reach new conclusions. Verb:
1. 2 IMPORTANCE OF MANAGEMENT Some organizations have begun to ask their contractors to provide only project managers who have been certified as professionals.
Artificial Intelligence Introductory Lecture Jennifer J. Burg Department of Mathematics and Computer Science.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Cognitive Psychology: Thinking, Intelligence, and Language
The Art of Politics: Critical Analysis and Knowledge The Underpinnings of Knowledge.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
RSBM: Introduction to Research Business School Introduction to Research Dr Gill Green.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
CHAPTER 1 Understanding RESEARCH
Research Methods and Techniques Lecture 8 Technical Writing 1 © 2004, J S Sventek, University of Glasgow.
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
Research Topics CSC Parallel Computing & Compilers CSC 3990.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
Introduction to Ethics Lecture 15 Writing Philosophy Papers By David Kelsey.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 CS 385 Fall 2006 Chapter 1 AI: Early History and Applications.
Uncertainty Management in Rule-based Expert Systems
The Art of Politics: Critical Analysis and Knowledge Preface to the Case of GOP Strategy in the American Political Environment.
Introduction Chapter 1 Foundations of statistical natural language processing.
Critical Thinking: Bringing Reasoning to a New Level.
Introduction to Comparative Education
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Writing Exercise Try to write a short humor piece. It can be fictional or non-fictional. Essay by David Sedaris.
Moazzam Ali Malik Research Methodology. Why do we Research? The possible motives for doing research may be either one or more of the following: Desire.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
September 1,  Analyzing  Choosing and Arranging  Drafting and Revising  Editing.
Vocabulary Acquisition in a Second Language: Do Learners Really Acquire Most Vocabulary by Reading? Some Empirical Evidence Batia Laufer.
D10A Metode Penelitian MP-04 Metodologi Penelitian
Advanced Algorithms Analysis and Design
What is a CAT? What is a CAT?.
Introduction Artificial Intelligent.
Objective of This Course
Introduction to Machine Translation
LITERATURE REVIEW by Moazzam Ali.
Presentation transcript:

C SC 620 Advanced Topics in Natural Language Processing 3/11 Lecture 15

Machine Translation Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press Part 1: Historical Perspective Reading list: –Introduction. Nirenburg, S. –1. Translation. Weaver, W. –3. The Mechanical Determination of Meaning. Reifer, E. –5. A Framework for Syntactic Translation. Yngve, V. –6. The Present Status of Automatic Translation of Languages. Bar- Hillel, Y.

Machine Translation: Next Readings Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press Part 1: Historical Perspective Reading list: –12. Correlational Analysis and Mechanical Translation. Ceccato, S. –13. Automatic Translation: Some Theoretical Aspects and the Design of a Translation System. Kulagina, O. and I. Mel’cuk –16. Automatic Translation and the Concept of Sublanguage. Lehrberger, J. –17. The Proper Place of Men and Machines in Language Translation. Kay, M.

Papers available On shelf (improperly) marked for LING 696G in Linguistics (Douglass) opposite front office

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.2 Unreasonableness of Aiming at Fully Automatic High Quality Translation (FAHQT) –Misplaced optimism from first years Large number of problems were readily solved Output of machine-simulated translations were often of a form which an intelligent and expert reader could make good sense and use of –Not sufficiently realized Gap between such output and high quality translation was still enormous Problems solved were the simplest ones, whereas the “few” remaining problems were the harder ones

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.2 Unreasonableness of Aiming at Fully Automatic High Quality Translation (FAHQT) –Most groups seem to have realized that FAHQT will not be attained in the near future –Consequence 1: keep trying hope that the pursuit of this aim will yield interesting theoretical insights which will justify this endeavor, whether or not these insights will ever be exploited for some practical purpose –Consequence 2: try for something easier with a better chance of attainability in the near future

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.2 Unreasonableness of Aiming at Fully Automatic High Quality Translation (FAHQT) –Those who are interested in MT as a primarily practical device must realize that full automation is incompatible with high quality –Sacrifice quality, or –Reduce self-sufficiency of the machine output –Post-editing => computer-aided translation (CAT)

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.3 Commercial Partly Mechanized, High Quality Translation Attainable in the Near Future –Cost-benefit tradeoffs –Problem 1: Input 0.25 to 0.5c/word typing 1 to 3c/word cost of human Russian-to-English translation

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.3 Commercial Partly Mechanized, High Quality Translation Attainable in the Near Future –Problem 2: A concerted effort will have to be made by a pretty large group in order to prepare the necessary dictionaries Not straightforward –Modern Note: (Free or readily available) high-quality lexical resources are still hard to come by even today

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.3 Commercial Partly Mechanized, High Quality Translation Attainable in the Near Future –Problem 3: Determining the optimal division of labor between human and machine Easy for human, hard for machine – Example: period as end-marker or other purpose –Problem 4: Source language forms stored in dictionary as fully-inflected forms or canonical forms

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.4 Compromising in the Wrong Direction –Since we cannot have 100% automatic HQ translation, let us be satisfied with a machine output which is complete and unique, i.e. a smooth text of the kind you will get from a human translator but which has less than 100% chance of being correct - “95%”

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.4 Compromising in the Wrong Direction –Implementation 1: Print the most frequently target- language counterpart of a source-language word whose ambiguity has not been resolved Requires large scale statistical studies –Implementation 2: Work with syntactical and semantical rules of analysis with a degree of validity of no more than 95%, so long as this degree is sufficient to insure uniquess and smoothness of the translation Esthetically appealing but … Wrong and dangerous - can reader detect mistranslations?

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.4 Compromising in the Wrong Direction –No need to compromise in the direction of reducing the reliability of the machine output –Fail-safe output Provide post-editor with all possible help (alternatives to select from) Modern Note: –No MT system gives a rating of how confident it is in the translation

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.5 A Critique of the Overestimation of Statistics and the “Empirical Approach” –Warning against overestimating the impact of statistical information on the problem of MT and related questions Modern Note: –Statistical MT and other applications have been very popular in the past decade or so … Large corpora available on-line Cheap CPU power Perceived failure of symbolic approaches

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.5 A Critique of the Overestimation of Statistics and the “Empirical Approach” –“I believe this overestimation is a remnant of the time, seven or eight years ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication” –Much valuable time spent on gathering statistics –Not every statistic on linguistic matters is automatically of importance for MT

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 1.5 A Critique of the Overestimation of Statistics and the “Empirical Approach” –Adherents of the “Empirical Approach” Distrustful of existing grammar books and dictionaries –Most existing grammar books are normative –Translation dictionaries out-of-date Regard it as necessary to establish from scratch grammatical rules –Not justified it’s any faster than modifying existing sources Through human analysis of a large enough corpus of source- language material, constantly improving upon the formulation of these rules by constantly enlarging this corpus

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 2. Critical Survey of the Achievements of the Particular MT Research Groups –2.1.1 The Seattle Group (U of Washington, E. Reifler) Low quality of output Word-by-word translation plus –Word-order –Reducing syntactical and lexical ambiguities Unbelievably optimistic claims –Compounding: “found moreover that only three matching procedures and four matching steps are necessary to deal effectively with any of these ten types of compounds of any language in which they occur” –“it will not be very long before the remaining linguistic problems in machine translation will be solved for a number of important languages”

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel Other tidbits –Interlingua: artificial mediating language n languages, 2n programs - reduction from n(n-1) –Interlingua: a real language n languages, 2(n-1) programs –Artificial interlingua Logical, unambiguous Assumption that translation from a natural language into a logical one is somehow simpler than translation from one natural language into another is unwarranted

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel Other tidbits –Idea of a completely symmetrical n-ary dictionary, each entry consisting of exactly n words, one for each of the n languages concerned, is wholly unrealistic Interlingual thesaurus –Assume L 1 -> L 2, L 2 -> L 3, how much better would L 1 - > L 3 be compared to L 1 -> L 2 -> L 3 and would it be cost-effective?

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 3. Conclusion –FAHQT not a reasonable goal, not even for scientific texts –Human translator is often obliged to make intelligent use of extra-linguistic knowledge which sometimes has to be of considerable breadth and depth. –Without this knowledge he would often be in no position to resolve semantic ambiguities –At present no way of constructing machines with such a knowledge is known, nor of writing programs which will ensure intelligent use of this knowledge –Modern Note: still true today …

Paper 6: The Present Status of Automatic Translation of Languages. Y. Bar-Hillel 3. Conclusion –For the preparation of practical MT programs, great linguistic sophistication seems to be neither requisite nor even especially helpful at the present state of the art –Basic linguistic research is of great important as such, and its support should preferably not be based on the pretense that it will lead to an improvement of MT techniques –It is likely that far-reaching illumination of the human factor in translation will not be achieved without an enormous amount of such basic research, but this is a very long-range affair that should preferably be kept separate from immediate goals