School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING An open discussion and exchange of ideas Introduced by Eric Atwell, Language.


Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A research-led coursework assignment for the exceptional student and the.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
COMP3410 DB32: Technologies for Knowledge Management 08 : Introduction to Knowledge Discovery By Eric Atwell, School of Computing, University of Leeds.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word-counts, visualizations and N-grams Eric Atwell, Language Research.
Comp3776: Data Mining and Text Analytics Intro to Data Mining By Eric Atwell, School of Computing, University of Leeds (including re-use of teaching resources.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Google Research: Theorizing from Data COMP3310 AI32 Natural Language Processing.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Sentiment Analysis and The Fourth Paradigm MSE 2400 EaLiCaRA Spring 2014 Dr. Tom Way.
Machine Learning Homework
Florida International University COP 4770 Introduction of Weka.
Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.
A Text Processing Tool for the Romanian Language Oana Frunza and Diana InkpenDavid Nadeau School of Information Technology and Institute for Information.
Data Mining and Text Analytics By Saima Rahna & Anees Mohammad Quranic Arabic Corpus.
Quranic Arabic Corpus Data Mining & Text Analytics By Ismail Teladia & Abdullah Alazwari.
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.
Farag Saad i-KNOW 2014 Graz- Austria,
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Machine learning continued Image source:
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:
Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Issues with Data Mining
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Computational Linguistics INTroduction
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
COMP3410 DB32: Technologies for Knowledge Management 10 : Introduction to Knowledge Discovery By Eric Atwell, School of Computing, University of Leeds.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Lessons Learned from Applications of Machine Learning Robert C. Holte University of Alberta.
WHO-Collaborating Centre in Calgary Thursday May 14 th, 2015 in Calgary, Alberta, Canada Dr. Bedirhan Ustun from the WHO attended and presented Dr. Hude.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Introduction to Corpus Linguistics
Sentiment analysis algorithms and applications: A survey
Are End-to-end Systems the Ultimate Solutions for NLP?
Corpus Linguistics I ENG 617
Machine Learning in Natural Language Processing
Machine Learning with Weka
Text Analytics and Machine Learning Workshop
CS224N Section 3: Corpora, etc.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Presentation transcript:

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING An open discussion and exchange of ideas Introduced by Eric Atwell, Language Research Group Natural Language Processing (NLP) + Visualization and Virtual Reality (VVR)

… Eric will present aspects of NLP research projects which involve "visualisation" of text, to seek advice on further visualisation techniques NLP researchers should consider; and other NLPers can ask about visualisation techniques they could use. The VVR "angle" may be that current visualisation methods work mainly for numerical datasets, so the VVR people might benefit from ideas on text analytics techniques which might "turn text into numbers: what sorts of number- vectors can represent meanings of texts, and how to extract them. Saman Hina (NLP seminar coordinator):

Typical NLP research NLP research often involves developing an algorithm to automatically process some text and output analysis, eg -For each word, its Part of Speech (or semantic class, or…) -For each sentence, its grammatical structure (parse-tree) -For each text, its classification: Genre, sentiment, CoD, interesting wrt specific task/users Often this is done by Machine Learning: given a training dataset of example words/sentences/texts, each marked (beforehand) with its Class … learn a Classifier which can predict the Class of any new, unseen word/sentence/text. The algorithm is automatic, so where does Visualisation fit?

Visualisation of feature space? Machine Learning is automatic (eg using WEKA toolkit), the classification is not done by humans … BUT ML relies on mapping each word/sentence/text into a set of FEATURES which characterise the data Visualisation may guide the researcher in exploring the dataset, to choose useful features? OR: ML with different parameter-settings can produce different classification models; Visualisation may help the researcher to compare the models?

Lexical semantic space

by Justin Washtell

Typical NLP dataset: a CORPUS (plural: Corpora or Corpuses) Quran – English translation; interesting subset of versesEnglish translationsubset of verses Leeds Arabic NLP Arabic morphological analysis toolsmorphological analysis tools Quranic Arabic Corpus Verbal AutopsyVerbal Autopsy interviews: narrative text + yes/no, numbers SNOMED-CTSNOMED-CT Systematized Nomenclature of Medicine Clinical Terms adopted by UK NHS and US health authoritiesNHS

Verbal Autopsy Dataset Verbal Autopsy: interview of mother after death of her baby. Data collected as part of a main trial over 7 year period 10,000 interview reports; Data collected includes: Signs and symptoms that led to the death History of any ailments Socio economic characteristic Care seeking and treatment Fertility and obstetric history Classification of Cause of Death by doctors at LSHTM - London School of Hygene and Tropical Medicine, Uni London Based on signs, symptoms and expert knowledge

Problems with VA data Both quantitative and qualitative Missing values (-) 215 variables (plus narrative text) Entries can have opaque codes sex = 1, 2, 8 or 9 Weight= 1.45, 9.99 or 8.88 Continuous revision of questionnaire created blank values for some variables Visualization of decision tree is problematic (size =1043, leaves=601); also other classifier outputs, eg Naïve Bayesdecision tree Naïve Bayes

Visualising Corpus Linguistics Paul Rayson presented overview of techniques at CL2009 International Conference on Corpus Linguistics: Paul Rayson and John Mariani, Visualising Corpus Linguistics I like the Key Word Clouds from CL2001 … CL2009 !!! … Wordle etc make pretty pictures, for PR etc; BUT do word clouds actually help guide NLP research???

Open to discussion Over to you: NLPers can ask about visualisation techniques they could use VVRers can ask about ideas on text analytics techniques which might turn text into numbers And/or any other ideas? … THANK YOU for your participation