Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Text and Data Processing

Similar presentations


Presentation on theme: "Digital Text and Data Processing"— Presentation transcript:

1 Digital Text and Data Processing
Week 6

2

3 Difficulty vs. enjoyment

4 Course evaluation Final essay (ca. 4,000 words)
Report of your individual research project (50%) Critical reflection on digital humanities research (50%) Five “Coding Challenges” which need to be marked as sufficient

5 Homework Write a brief text (max. 500 words) about your individual research project. Answer the following questions: Which texts have you selected for your corpus? Which research question do you intend to answer? Which types of analyses will be most useful for your research question? The course syllable mentions five possible topics. Also provide a brief description of your theoretical question if you want to focus on another topic q

6 Applications TDM technologies have been used to study
Literary genres (e.g. Sarah Allison et al., Quantitative Formalism: An Experiment). Literary characters (e.g. Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism) Date of creation (Richard Forsyth, “Stylochronometry with Substrings, or: A Poet Young and Old”) Authorship

7 Applications Themes (Martha Nell Smith et al., ““Undiscovered Public Knowledge”: Mining for Patterns of Erotic Language in Emily Dickinson’s Correspondence with Susan Huntington (Gilbert) Dickinson”) Lexical repetitions (T. E. Clement, ““A Thing Not Beginning and Not Ending””) Rhyme and meter Allusions (N. Coffee et al., “The Tesserae Project: Intertextual Analysis of Latin Poetry”)

8 Feature extraction Most frequent words Genres Type-token ration
Grammatical categories Repetitions of words Sentiments Genres Texts with specific themes Literary characters Authorship

9 Final three weeks W6: Natural Language Processing, Semantic Tagging
W7: Complexity metrics (sentence length, syllables); Topic Modelling W8: Course wrap up; Mapping geographic information Which other analyses may be useful?

10

11 Natural Language Processing
Making computers understand languages spoken by human beings Applications: Part of Speech Tagging Sentiment analysis Information extraction Machine translation Summarising Paraphrasing

12 Can be done in the SMILE Text Analyzer, among many other tools
Part of speech tagging providing the syntactical category or words within in a sentence: “The Signora had no business to do it," said Miss Bartlett, "no business at all. The/DT Signora/NNP had/VBD no/DT business/NN to/TO do/VB it/PRP said/VBD Miss/NNP Bartlett/NNP no/DT business/NN at/IN all/DT Can be done in the SMILE Text Analyzer, among many other tools

13 Brill’s POS tagger Combination of a lexicon-based and a rule-based approach A lexicon entry looks as follows: Talk VB NN Initial Results are improved with transformation rules: e.g. VB NN PREVIOUSTAG JJ she could re-enter the world of rapid/JJ talk/VB, which was alone familiar to her So she did want to talk/VB about her broken engagement

14 PERL NLP modules Lingua::EN::Tagger (a “trained” POS Tagger)
Lingua::EN::Fathom (Readability measures) Lingua::EN::Sentence Also: Lingua::DE Lingua::FR Lingua::ES Lingua::Klingon For Dutch: Frog

15 Lemmatisation POS lemma I PRP made VBD make my PRP$ song NN a DT coat
covered cover with IN embroideries NNS embroidery out of old JJ mythologies mythology


Download ppt "Digital Text and Data Processing"

Similar presentations


Ads by Google