Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche, Rebecca Passonneau, Owen Rambow We are part of “The NLP Group” but not of the CS department What we do: o Researchers o Work with Kathy and Julia o Our own projects o Sometimes teach o Supervise students (PhD, Masters, independent studies) Some of us are in CEPSR, some in the Interchurch Building Some NLP Group meetings will take place in Interchurch Center
CLiMB 2: Computational Linguistics for Metadata Building, phase 2 Becky Passonneau (with University of Maryland) Interactive workbench for image cataloguers/indexers: Use NLP to extract descriptive terms from scholarly text Mellon Foundation
Automated Readers Advisor, Heiskell Talking Books and Braille Library (NYPL) Becky Passonneau Replace some of librarians’ tasks in current over-the-phone borrowing system with automated dialogue system Use Wizard-of-Oz paradigm for data collection Joint project with CCNY (Esther Levin) ozVariant.ppt ozVariant.ppt
Tracking Emergent Narrative Skills (TENS) Becky Passonneau Current data set: ten-year olds retelling silent movies Develop quantitative methods to compare semantic and pragmatic content (e.g., adapt Pyramid Method for evaluating summary content) Pyramid Method Joint project with University of Connecticut (Elena Levy)
Arabic NLP CADIM Group: Mona Diab, Nizar Habash, Owen Rambow Focus on Standard Arabic AND the dialects NLP tools for Arabic: o Morphological analysis (exists) o Morphological tagging (exists, best-performing) Tokenization POS tagging (best-performing) Diacritization (best-performing) o Word-sense disambiguation (in progress) o Sentence-boundary detection for ASR (in progress) o Parsing (initial research) o Names-entity recognition (joint with Fair Isaacs, in progress) o …
Machine Translation Nizar Habash Focus: Arabic-English MT Different hybrid MT approaches explored o Linguistic preprocessing for Statistical MT Morphological and Syntactic preprocessing o Adding statistical resources to rule-based MT systems Automatically extracted phrase tables combined with Generation-Heavy MT Columbia first time participation in NIST MTEval (2006)
Word Sense Modeling and Disambiguation Mona Diab Using corpora (including multilingual parallel and similar) for unsupervised learning Arabic WordNet Arabic PropBank
Summarization: Social Networks Aaron Harnly (PhD student) and Owen Rambow, with Kathy McKeown Study interaction between: o -intrinsic factors Language in (lexison, syntax, …) genre o Structure of dialog Threads Speech acts o Relation among people Roles in organization Social networks Use to predict on factor from others Use in high-level summaries of large amounts of communication
Multilingual Metagrammars Owen Rambow (with University of Pennsylvania) Goal: high-level abstract representation of syntax of (many/all) natural languages, from which we can automatically generate grammars that can be used for NLP Have: Universal Grammar component and language-specific modules for Korean, German, Yiddish Next: Icelandic, Mainland Scandinavian, English, Kashmiri, …