Computational Linguistics: New Vistas

Slides:



Advertisements
Similar presentations
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Advertisements

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
ELN – Natural Language Processing Giuseppe Attardi
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
NERIL: Named Entity Recognition for Indian FIRE 2013.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Introducing MorphoLogic to LIRICS Gábor Prószéky MorphoLogic Pázmány Péter Catholic University Faculty.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
LIN 6932 LIN6932 Topics in Computational Linguistics Lecture 11 Hana Filip.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
Languages at Inxight Ian Hersey Co-Founder and SVP, Corporate Development and Strategy.
LDMT MURI Data Collection and Linguistic Annotation November 2, 2012 Jason Baldridge, UT Austin Lori Levin, CMU.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Open Health Natural Language Processing Consortium
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Language Identification and Part-of-Speech Tagging
Centre for Translation Studies FACULTY OF ARTS
Linguistic Graph Similarity for News Sentence Searching
Approaches to Machine Translation
Introduction to Machine Translation
An overview of the Natural Language Toolkit
PRESENTED BY: PEAR A BHUIYAN
[A Contrastive Study of Syntacto-Semantic Dependencies]
Web News Sentence Searching Using Linguistic Graph Similarity
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Computational and Statistical Methods for Corpus Analysis: Overview
Corpus Linguistics I ENG 617
Natural Language Processing (NLP)
Improving a Pipeline Architecture for Shallow Discourse Parsing
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Are End-to-end Systems the Ultimate Solutions for NLP?
--Mengxue Zhang, Qingyang Li
Text Analytics Giuseppe Attardi Università di Pisa
Machine Learning in Natural Language Processing
Writing Analytics Clayton Clemens Vive Kumar.
Technology Development
Topics in Linguistics ENG 331
Approaches to Machine Translation
Natural Language Processing
Introduction to Machine Translation
Language and Learning Introduction to Artificial Intelligence COS302
Text Mining & Natural Language Processing
Extracting Recipes from Chemical Academic Papers
Natural Language Processing (NLP)
CS224N Section 3: Corpora, etc.
CS224N Section 3: Project,Corpora
Artificial Intelligence 2004 Speech & Natural Language Processing
Extracting Why Text Segment from Web Based on Grammar-gram
Owen Rambow 6 Minutes.
Natural Language Processing (NLP)
Presentation transcript:

Computational Linguistics: New Vistas Dr. Narayan Choudhary, ezDI, LLC. November 22, 2013 Tezpur University, Assam

What we have been doing so far Global perspective Machine Translation Information Retrieval/Extraction Spelling and Grammar Checking TTS & STT Language Learning and Teaching Sentiment Analysis Indian Perspective Barely scratched all of the above at the academic/research level Real world, industry oriented applications yet to come for most of the languages February 17, 2019 Narayan Choudhary, Tezpur University, Assam

New Vistas Application Areas being explored at the global level Adding new domains for information extraction Legal Services, market intelligence, clinical linguistics, bio-informatics Speech to speech translation (real time) Deep Linguistic Analysis Involves interfacing at all the levels of linguistics analysis February 17, 2019 Narayan Choudhary, Tezpur University, Assam

COLING: Indian Achievements Machine Translation Systems Quite a few: AnglaBharati, AnuBharati, Shakti, Anusaraka, Shiva, Mantra (domain restricted) Publically available: None of the above mentioned (or you are not going to use it anyway!) Corporate Products Google Translate, Bing Translate, Systran Need to reach the level where it can be trusted to some extent Evaluation Scores Not Available (They won’t publish it!) February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Other Tools and Resources for Indian Languages Spelling and Grammar Checker Quite a few reported (CDAC Pune, Microsoft Proofing Tools) None works good (but you can always tweak it with your own resources, if you have it) Grammar checker not reported yet Dictionaries Many reported (quite a few online bilingual and monolingual dictionaries available for many languages) Problems of standardization remains Issues of copyrights February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Speech Processing in Indian Languages Text to speech projects Quite a few Lack coherence and need improvements Many languages yet to get one Speech to Text Academic Scratches and limited domain (shrutlekhan-rajbhasha) Many languages need to be added February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Corpus Resources Global Scenario Indian Scenario Quite a few good sized corpora annotated at various levels of linguistic analysis Brown Corpus, LOB Corpus, Penn Treebank (for English), quite a few other corpora available in various other languages (Japanese, Chinese, Arabic and all the European Union languages) Indian Scenario ILCI is the first major initiative to cater to the needs of annotated corpora in all the major Indian languages Raw text (un-annotated corpora) available for research purposes (Gyan Nidhi) February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Web as a corpus Increasingly, the text available on the web is being used a corpus Market Intelligence, Sentiment Analysis, Trend Prediction etc. are done on corpus collected from the web Increasing presence of Indian languages on the web Use of web-generated corpus in industry and the academia Bottlenecks in using the web as a corpus Language Identification Text-encoding standards Heavy pre-processing required February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Linguistic Analysis: Available Tools Global Perspective (English): Quite a few tools available for automated PoS annotation, syntactic and dependency parsing, semantic role labeling Quite good accuracy on the domains trained Stanford NLP tools, GATE, OpenNLP, ClearTK Indian Perspective: Quite a few reported PoS Tagger and chunker No syntactic, dependency parsing or semantic role labelers done yet Research Works reported from IIIT Hyderabad, IIT Bombay, IIT Kharagpur No concentrated platform for linguistic analysis February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Semantic Resources WordNets Domain Specific Ontologies Major project running at IITB (Indo-WordNet) Domain Specific Ontologies Medical, Legal, Business (one can do it in Indian English, there is a need, if not enough encouragement is there for Indian languages) Argument Structure Mapping (PropBank etc.) for major Indian languages Resources for Sentiment Analysis Discourse Analysis Anaphora Resolution February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Boost-ups from linguists needed Understand the needs Create a good corpora Create tag-sets PoS, chunk tags, tree-banking guidelines for all the major languages Annotation at different levels Parts-of-speech Chunk Labels Full Syntactic Parsing Resources for semantic role labeling (PropBank, verb bank, FrameNets) Help develop morphological analyzers, pre-processors, tokenizers February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Need for Concentrated Effort Make Concentrated Effort: Sporadic effort will not yield results Connect Researchers (done best when each language’s researchers’ collaborate) Pitch in for a standard Follow the global standards first and then go for the language specific needs) Stay connected and updated, you are living in a fast changing world February 17, 2019 Narayan Choudhary, Tezpur University, Assam

Thank You for your attention! February 17, 2019 Narayan Choudhary, Tezpur University, Assam