Digital Text and Data Processing

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Understanding Goals Character Traits Literary Themes Collaboration On-going Assessments Informal & Formal Peer Feedback Teacher Feedback Self Reflection.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
What must students cover
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Poetry.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Digital Text and Data Processing Week 1. □ Future of reading? □ Understanding “Machine reading”: □ Text analysis tools □ Visualisation tools Course background.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Hosted by: Mrs. Hogan. Classroom Rules Raise your hand if you have a question If I ask you a question please respond using the face icons Smile face-
Advanced Higher Unit and Course assessment Unit assessment: Analysis & Evaluation of Literary Texts OutcomesAssessment Standards 1 Critically analyse.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Understanding Rigor in Reading: Text Complexity and Supported Struggle.
Student report cards Presentation for Primary School Staff 2007.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Digital Text and Data Processing Week 4. □ Making computers understand languages spoken by human beings □ Applications: □ Part of Speech Tagging □ Sentiment.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Levels of Linguistic Analysis
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
GCSE MEDIA STUDIES. What is Media Studies? Television Film Radio Internet Newspapers Magazines Advertising Music Industry.
Modules(Units) Course contents: This book as you checked has 3 modules We finished Module 1 -which (has 7 units) As in the following Box.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Opening Class Descriptive Grammar – 2º S, 2016 Mrs. Belén Berríos Droguett.
Language Identification and Part-of-Speech Tagging
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Vocabulary Module 2 Activity 5.
Introduction NLP Applications
Approaches to Machine Translation
Linguistic knowledge for Speech recognition
Sentiment analysis algorithms and applications: A survey
Digital Text and Data Processing
Unit and Course assessment
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Corpus Linguistics I ENG 617
Unlocking Informational Text Structure
Text analysis Letter from Birmingham Jail
Prose Analysis Essay for the AP Language and Composition Exam
ENG 125 OUTLET Education for Service- -eng125outlet.com.
Research Paper Terms & Due Dates
Machine Learning in Natural Language Processing
CSCE 590 Web Scraping - NLTK
Writing Analytics Clayton Clemens Vive Kumar.
L23B: Sociolinguistics Please Turn off all cellular phones & pagers L23B Website: 11/22/2018.
Statistical NLP: Lecture 9
Topics in Linguistics ENG 331
Poetry Workshop, pp What is a poem?.
Welcome Back! Happy 2018!.
Advanced Higher Textual analysis.
Approaches to Machine Translation
Levels of Linguistic Analysis
Text Mining & Natural Language Processing
Text Mining & Natural Language Processing
Natural Language Processing
A User study on Conversational Software
CSCE 590 Web Scraping - NLTK
Critical Essay Writing
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
By Hossein Hematialam and Wlodek Zadrozny Presented by
Statistical NLP : Lecture 9 Word Sense Disambiguation
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Presentation transcript:

Digital Text and Data Processing Week 6

Difficulty vs. enjoyment

Course evaluation Final essay (ca. 4,000 words) Report of your individual research project (50%) Critical reflection on digital humanities research (50%) Five “Coding Challenges” which need to be marked as sufficient

Homework Write a brief text (max. 500 words) about your individual research project. Answer the following questions: Which texts have you selected for your corpus? Which research question do you intend to answer? Which types of analyses will be most useful for your research question? The course syllable mentions five possible topics. Also provide a brief description of your theoretical question if you want to focus on another topic q

Applications TDM technologies have been used to study Literary genres (e.g. Sarah Allison et al., Quantitative Formalism: An Experiment). Literary characters (e.g. Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism) Date of creation (Richard Forsyth, “Stylochronometry with Substrings, or: A Poet Young and Old”) Authorship

Applications Themes (Martha Nell Smith et al., ““Undiscovered Public Knowledge”: Mining for Patterns of Erotic Language in Emily Dickinson’s Correspondence with Susan Huntington (Gilbert) Dickinson”) Lexical repetitions (T. E. Clement, ““A Thing Not Beginning and Not Ending””) Rhyme and meter Allusions (N. Coffee et al., “The Tesserae Project: Intertextual Analysis of Latin Poetry”)

Feature extraction Most frequent words Genres Type-token ration Grammatical categories Repetitions of words Sentiments Genres Texts with specific themes Literary characters Authorship

Final three weeks W6: Natural Language Processing, Semantic Tagging W7: Complexity metrics (sentence length, syllables); Topic Modelling W8: Course wrap up; Mapping geographic information Which other analyses may be useful?

Natural Language Processing Making computers understand languages spoken by human beings Applications: Part of Speech Tagging Sentiment analysis Information extraction Machine translation Summarising Paraphrasing

Can be done in the SMILE Text Analyzer, among many other tools Part of speech tagging providing the syntactical category or words within in a sentence: “The Signora had no business to do it," said Miss Bartlett, "no business at all. The/DT Signora/NNP had/VBD no/DT business/NN to/TO do/VB it/PRP said/VBD Miss/NNP Bartlett/NNP no/DT business/NN at/IN all/DT Can be done in the SMILE Text Analyzer, among many other tools

Brill’s POS tagger Combination of a lexicon-based and a rule-based approach A lexicon entry looks as follows: Talk VB NN Initial Results are improved with transformation rules: e.g. VB NN PREVIOUSTAG JJ she could re-enter the world of rapid/JJ talk/VB, which was alone familiar to her So she did want to talk/VB about her broken engagement

PERL NLP modules Lingua::EN::Tagger (a “trained” POS Tagger) Lingua::EN::Fathom (Readability measures) Lingua::EN::Sentence Also: Lingua::DE Lingua::FR Lingua::ES Lingua::Klingon For Dutch: Frog

Lemmatisation POS lemma I PRP made VBD make my PRP$ song NN a DT coat covered cover with IN embroideries NNS embroidery out of old JJ mythologies mythology