Guillaume Segerer CNRS - LLACAN - France Niger-Congo Languages as a playground for lexical comparison LYON, May 12-14, 2008 New Directions.

Slides:



Advertisements
Similar presentations
EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
Advertisements

Hello, Everyone! Review questions  Give examples to show the following features that make human language different from animal communication system:
Language ecology and genetic diversity on the African continent Gerrit J. Dimmendaal University of Cologne.
Sampling Mathsfest Why Sample? Jan8, 2003 Air Midwest Flight 5481 from Douglas International Airport in North Carolina stalled after take off, crashed.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Phonetics The study of productive sounds within a language 2 Basic types of sounds in English: Consonants (C): restriction on airflow Vowels (V): no restriction.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
The standard error of the sample mean and confidence intervals
Mapping with Probability – The Fortunate Isles Anthony Smith, Andrew Hopkins, Dick Hunstead.
6-5 The Central Limit Theorem
Sampling and Sampling Distributions Aims of Sampling Probability Distributions Sampling Distributions The Central Limit Theorem Types of Samples.
Chapter 7 ~ Sample Variability
5.3 Language Families of the World
Ch. 5 Key Issue 3 Where are other language families distributed?
Articulation and Description of English Vowels
University of Maryland Automatically Adapting Sampling Rates to Minimize Overhead Geoff Stoker.
Search strategy quiz Nancy Graham Learning Advisor (Medicine) July 2006.
Where are other language families distributed?
© 2011 Pearson Education, Inc. Where Are Other Language Families Distributed? Classification of languages –Indo-European = the largest language family.
ESTIMATION Estimation: process of using sample values to estimate population values Point Estimates: parameter is estimated as single point –Examples:
Where are Other Language families Distributed?. 1.Indo-European (46% speak one) 2.Sino-Tibetan (21% speak one) 3.Afro-Asiatic 4.Austronesian 5.Niger-Congo.
Introduction in Linguistics Tongji University November Marjoleine Sloos.
Austronesian diffusion The remarkable diffusion of the Polynesian people – From the eastern part of the Austronesian culture region – Occupy hundreds.
Dan Wright Developing Algorithms for Computational Comparative Diachronic Historical Linguistics.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
Sampling Distribution of a sample Means
Section 5.2 The Sampling Distribution of the Sample Mean.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Categorical vs. Quantitative…
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
BUS216 Spring  Simple Random Sample  Systematic Random Sampling  Stratified Random Sampling  Cluster Sampling.
CHAPTER-6 Sampling error and confidence intervals.
1 Linguistics week 6 Phonetics 4. 2 Parameters for describing consonants So far (this is not complete yet) we have – Airstream (usually the same for all.
The African « lax » question prosody Annie Rialland Laboratoire de phonétique et phonologie, UMR 7018, CNRS/Sorbonne-Nouvelle, Paris Second TIE Conference:
Complemental Probabilities.
LIN 3201 Sounds of Human Language Sayers -- Week 1 – August 29 & 31.
Quiz 6.3C Here are the counts (in thousands) of earned degrees in the United States in a recent year, classified by level and by the sex of the degree.
Key Issue 3: Where Are Other Language Families Distributed?
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
K means ++ and K means Parallel Jun Wang. Review of K means Simple and fast Choose k centers randomly Class points to its nearest center Update centers.
Phonetic / phonological typology
Data analysis and probability.  Mean  Median  Mode  Measure of central Tendency Average Range Outlier Chance Impossible event Certain event Independent.
Homework Questions. Simulations Unit 6 Experimental Estimates As the number of trials in an experiment increases, the relative frequency of an outcome.
MA/CSSE 473 Day 30 Optimal BSTs. MA/CSSE 473 Day 30 Student Questions Optimal Linked Lists Expected Lookup time in a Binary Tree Optimal Binary Tree (intro)
MATH Section 4.4.
MA/CSSE 473 Days Optimal linked lists Optimal BSTs.
Today we are going to learn about: Speech sounds Anomotical production.
INFORMÁTICA EDUCATIVA Profª ANA GUERREIRO 1.
Measuring the stability of typological parameters Dik Bakker Amsterdam.
G. Segerer – Atlantic Classification – ACAL 47 A new, innovation-based classification of Atlantic languages Guillaume Segerer (& Konstantin Pozdniakov)
Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞 (南京工业大学,外国语言文学学院,江苏 南京211816) Introduction Vocabulary acquisition is of great.
Linguistics: Phonetics
Linguistics – Phonetics & Phonology
Jeopardy Final Jeopardy Teeth Teeth Jobs $100 $100 $100 $100 $200 $200
What is sociolinguistics 2
Chapter 6, Introduction to Inferential Statistics
Sounds of Language: fənɛ́tɪks
Data Mining K-means Algorithm
Unit 6 Probability.
Random Testing.
DOT PLOTS What is a dot plot?.
Slow rate of lexical replacement and deeper genetic relationships
Articulation and Description of English Vowels
Copyright © Cengage Learning. All rights reserved.
Developing Materials for STANAG Listening Exams. Selected aspects.
Sampling Distribution of a Sample Mean
Sampling Distribution of a Sample Mean
Paper 2 revision session 3 How to answer application questions
Can you put the symbols in?
Sampling Distribution of a Sample Mean
Presentation transcript:

Guillaume Segerer CNRS - LLACAN - France Niger-Congo Languages as a playground for lexical comparison LYON, May 12-14, 2008 New Directions in Historical Linguistics Paper presented May 14 Document revised May 16

Languages of Africa

Niger-Congo Languages

Niger-Congo clusters

The experiment The present experiment consists in: - testing the validity of the Niger-Congo phylum by measuring its homogeneity - doing real mass comparison : 506 languages examined (~ 1/3 of all NC languages) - with only a few lexical roots, chosen intuitively from empirical experience It can be further refined by: - considering more languages (more data is actually available) - chosing different lexical roots (but how ?) - taking into account adjacent phyla (Nilo-Saharan, Afro-Asiatic, Khoisan)

Language Sample (506 lgs)

10 supposedly common NC lexical roots 1TU(P) : to spit 2MED ~ MOD : to swallow 3NYU : to drink 4DUM : to bite 5TE : tree 6NYI(N) : tooth 7TU : ear 8DEM : tongue 9DI : to eat 10TAT : three

Distribution of root 1

Distribution of root 2

Distribution of root 3

Distribution of root 4

Distribution of root 5

Distribution of root 6

Distribution of root 7

Distribution of root 8

Distribution of root 9

Distribution of root 10

Weighted sample

Probabilities 1 consonants labial : symbol P dental/coronal : symbol T palatal : symbol C velar/uvular : symbol K vowels front : symbol I central : symbol A back : symbol U example DEM ‘tongue’ coded as TIP probability 1/4 x 1/3 x 1/4 = 1/48 tolerance I~A > new prob. 1/24 Out of the 506 sample languages, 1/24 = 21 languages may by chance have a word for ‘tongue’ of the shape TIP ~ TAP

Probabilities 2 Probabilities for each of the 10 roots TAT - three : TAT ~ TIT > 1/24 DUM - bite : TUP ~ TUT > 1/24 DEM - tongue : TIP ~ TAP > 1/24 MED ~ MOD - swallow : PIT ~ PUT > 1/24 TU(P) - spit : TU ~ CU > 1/6 TU - ear : TU ~ CU > 1/6 NYI(N) - tooth : CI ~ TI > 1/6 TE - tree : TI ~ TA ~ CI > 1/4 DI - eat : TI ~ CA ~ CI > 1/4 NYU - drink : CU ~ TU ~ KU> 1/3 probability to have all 10 items : 1/ languages in the sample have all 10 items : Akpafu (Kwa), Sukuma (Bantu F21, Runyankore (Bantu E13), Andoni (Benue-Congo)

Probabilities 3 probability to have at least 1 item : 18% of 1565 lgs > 286 lgs probability to have at least 2 items : 19% ~ 27% of 286 lgs > 55 ~ 79 lgs probability to have at least 3 items : 29% ~ 37% of 55 ~ 79 lgs > 16 ~ 29 lgs

Some questions... Can this method be used to classify a language ? What is the minimal number of items needed to identify a language cluster ? Is there a method (other than intuitive) to identify these items ? Can this technique be applied to any language family / cluster ? What are the implications of these phenomena ?...

A restricted distribution **GOP : possible Atlantic lexical innovation