Dafydd Gibbon Universität Bielefeld Germany

Slides:



Advertisements
Similar presentations
Word processing A word processor can be used to write, edit, format and print text. Before word processors, printed documents were typed directly on to.
Advertisements

Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
Stylometry System CSIS Stylometry System – Use Cases and Feasibility Study Gregory Shalhoub, Robin Simon, Jayendra Tailor, Ramesh Iyer, Dr. Sandra Westcott.
Implementation workshop
C SC 520 Principles of Programming Languages 1 C SC 520: Principles of Programming Languages Peter J. Downey Department of Computer Science Spring 2006.
Stylometry System CSIS Stylometry Projects, mostly Fall 2009 Project Seidenberg School of Computer Science and Information Systems.
1 Mobile Platforms, Linked Content, and Copyright: Issues and Answers COPE North American Seminar 2014 Philadelphia, PA August 13, 2014 Michael W. Carroll.
TEACHING APPROACH: LITERATURE
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Language Descriptors - Examples of Production Procedures CZECH REPUBLIC.
Teaching and Learning with Technology  Allyn and Bacon 2002 Administrative Software Chapter 5 Teaching and Learning with Technology.
CSI315CSI315 Web Development Technologies Continued.
9/8/20151 Natural Language Processing Lecture Notes 1.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
Computational Investigation of Palestinian Arabic Dialects
1 Computational Linguistics Ling 200 Spring 2006.
Attributes in ArcGIS. ArcGIS Attributes FID – ESRI’s internal identifier Shape – Actual spatial data.
Teaching and Learning with Technology to edit Master title style  Allyn and Bacon 2002 Teaching and Learning with Technology lick to edit Master title.
ESPON / Social Preparatory Study on Social Aspects of EU Territorial Development Status: Interim Report Erich Dallhammer (ÖIR)
Mining various kinds of Association Rules
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Word Processing. © All Rights Reserved Introduction Word processing is the use of a word processor to create documents using computers.
The Role of Communications. The Need for Communication  Communication between devices is necessary to: Share information Use shared devices (peripherals)
English Language Learners Modifications Needs Assessment Presenter Yanira Alfonso, EDS ESOL Teacher Hopkins Elementary.
GIS Data Types. GIS technology utilizes two basic types of data 1. Spatial Data Describes the absolute and relative location of geographic features.
General Purpose Packages DATA TYPES. Data Types Computer store information in the form of data. Information has meaning. Eg 23 May 2005 Data has no meaning.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Processing Hardware, Software. Hardware Hardware Processing is performed by a computer ’ s central processing unit and is measured by the clock speed.
Levels of Linguistic Analysis
Characteristics of file formats and ease of conversion Term 2 – Week 10 VCE IT – UNIT 2.
NR 322: Editing Attributes Jim Graham Fall 2008 Chapter 6.
CODILANG: MIXED MODEL OF DIVERGENCE-CONVERGENCE OF LANGUAGES OF THE WORLD Vladimir Polyakov (Russia)
INTRODUCTION TO APPLIED LINGUISTICS
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Data Analysis and Visualisation. Problem Solving Methodology.
Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.
Language Technologies What endangered languages can teach us
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
WP4 Models and Contents Quality Assessment
SOFTWARE.
contrastive linguistics
Ashima Wadhwa Assistant Professor(giBS)
Spreadsheets.
TYPES OF TRANSLATION.
Branches of Stylistics
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Text-To-Speech System for English
contrastive linguistics
Consonant variegations in first words: Infants’ actual productions of
Software Word Processors.
CS 1104 INTRODUCTION TO COMPUTER SCIENCE
Administrative Software
Unit# 6: ICT Applications
PLACE STUDENT NAME HERE AND CENTRE DETAILS
Levels of Linguistic Analysis
Algorithm design (computational geometry)
Course Introduction CSC 576: Data Mining.
Creating and Managing Database Tables
Text Categorization Berlin Chen 2003 Reference:
contrastive linguistics
contrastive linguistics
Understanding Core Database Concepts
Presentation transcript:

Virtual Distances between Languages Methods from Dialectometry and Stylometry in Digital Humanities Dafydd Gibbon Universität Bielefeld Germany Bielefeld-Abidjan Cooperation 2015-11-16

Dafydd Gibbon: Distances between Languages Background My work in teaching and in the field in West Africa has been mainly mainly in Côte d’Ivoire and Nigeria. Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

From fieldwork to Digital Humanities Research goals: Analysis of oral literature in a Digital Humanities framework. Phonetic analysis; sound-gesture synchronisation in speech and music. Visualisation of differences between languages as ‘virtual distances’. Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

A personal view of the ancestry of Digital Humanities Timeline Humanities: Theology, Philology, Literary Studies, Linguistics, Law, ... 1950 Computation Computational Linguistics Speech Technologies Mathematical Computational Linguistics NLP, Computational Corpus Linguistics Humanities Computing Documentary Linguistics Digital Humanities ? Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Background in Linguistics Lexicostatistics Morris Swadesh (Swadesh wordlist) create wordlists for each language (interviews, corpus) determine cognate words (related in form and meaning) determine pairwise similarities → triangular distance table create family tree using the table Problems: accidental similarity, borrowing, taboo, ... Dialectometry Two major centres in Europe: Rijksuniversiteit Groningen Universität Salzburg Similar methodology in Digital Humanities: Stylometry: authorship attribution, style typology MFW: most frequent lexical words, most frequent grammatical words Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Stylometric visualisation – Maciej Eder, Kraków Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Stylometric visualisation – Maciej Eder, Kraków Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Stylometric visualisation – Maciej Eder, Kraków Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Dialectometric visualisation – Dialectometry project, Salzburg Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Language similarity visualisation – Gerhard Jäger, Tübingen Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Language typology – visualisation of ‘virtual distances’ Project: comparison of Kru languages Hypothesis: The geographical distances between the Kru languages are reflected in the differences of their consonant systems. Method: Data mining with legacy language atlases Results: Coming up Here: Visualisation of relations induced from the legacy data for 19 Kru languages (of the 39 in the Ethnologue database of language metadata) Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Visualisation workflow: Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants A practical systematisation procedure for a machine-readable database: Step 1: A word processor or spreadsheet or DBMS table. Step 2: Export as CSV (character/comma/tab separated value) table. Step 3: Process manually or automatically: analyse and format as desired. Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants A practical systematisation procedure for a machine-readable database: Step 1: A word processor or spreadsheet or DBMS table. Step 2: Export as CSV (character/comma/tab separated value) table. Step 3: Process manually or automatically: analyse and format as desired. Then: 171 language comparisons to do: ( n2 – n ) / 2 = (192 – 19) / 2 = 171 for 44 features each time: 7524. That’s a helluva lot. So in comes the software … How is the comparison done? Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants A practical systematisation procedure for a machine-readable database: Step 1: A word processor or spreadsheet or DBMS table. Step 2: Export as CSV (character/comma/tab separated value) table. Step 3: Process manually or automatically: analyse and format as desired. Probably the most well-known algorithm for pairwise comparison of sequences: Levenshtein Edit Distance Definition: The minimum number of deletions, insertions and substitutions needed to change one sequence into another. Other distance measures can be used. In stylometry, numerical distances are used (Comparision of MFW, most frequent words, in texts). Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants But: the phoneme data matrix is deceptively 2- dimensional: 19 languages x 44 consonants: The 19 objects are actually located in a 44 dimensional quality space. Here are 2 of these dimensions, applied to the 4 languages Godie, Dida, Wobe and Neyo: Even distinctive features involve around 12 dimensions. How to visualise all 44 dimensions in 2 dimensions? Godie Dida + [gw] - Wobe Neyo + [ŋw] - Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Manding languages – phonological rules Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Spread of differences between 19 Kru consonant inventories for 44 features, which we want to visualise. Useful strategy: interpret and map differences as distances in quality space. Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants Strategy #1: Squash to 2 dimensions! Differences are interpreted as distances Distances are represented spatially as a distance map The dimensions are squashed – like a system of springs – into 2 dimensions Further dimensions may be represented by colours, etc. Strategy #2: Select elite features! Check the features for their importance in distinguishing objects Randomly start with an important feature and build a hierarchy of features distinguishing between sets of objects until all are distinguished Different choices lead to different results, different insights Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Dealing with high orders of dimensionality Strategy #1: Squash to 2 dimensions! Differences are interpreted as distances Distances are represented spatially as a distance map The dimensions are squashed – like a system of springs – into 2 dimensions Further dimensions may be represented by colours, etc. Strategy #2: Select elite features! Check the features for their importance in distinguishing objects Randomly start with an important feature and build a hierarchy of features distinguishing between sets of objects until all are distinguished Different choices lead to different results, different insights And visualise the results! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 1: Squash those dimensions! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 1: Squash those dimensions! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 1: Squash those dimensions! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 1: Squash those dimensions! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Strategy 2: Pick out the best features! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Côte d’Ivoire: Kru languages – consonants The moral of this story is that Legacy data in linguistic atlases can be given a new lease of life and a solid quantitative foundation in addition to any further research on dialect relations and history which may be pursued. Standard arrangements of quantitative information (e.g. tables) may be useful. Graphical visualisations are helpful in either suggesting or underlining lines of investigation. Demo: DistGraph Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages

Dafydd Gibbon: Distances between Languages Many thanks – and looking forward to further discussions and future cooperation! Bielefeld 2015-11-16 Dafydd Gibbon: Distances between Languages