Download presentation
Presentation is loading. Please wait.
Published byBrenda Rosamund Jefferson Modified over 8 years ago
1
A dvances in Automated Language Classification ASJP Consortium (Dik Bakker)
2
ASJP: Automatic Reconstruction2 Overview Project (MAY 2007 - ): ASJP (Automated Similarity Judgment Program)
3
ASJP: Automatic Reconstruction3 Overview Project: ASJP (Automated Similarity Judgment Program) LANGUAGE NUMBERS
4
ASJP: Automatic Reconstruction4 Overview Project: ASJP (Automated Similarity Judgment Program) Data sources TOOLS
5
ASJP: Automatic Reconstruction5 Overview Project: ASJP (Automated Similarity Judgment Program) Data bases Results Data sources TOOLS
6
ASJP: Automatic Reconstruction6 Overview Project: ASJP (Automated Similarity Judgment Program)
7
ASJP: Automatic Reconstruction7 Overview Project: ASJP are: Sören Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) André Müller (BRD) Robert Mailhammer (BRD) Hagen Jung (BRD) Eric Holman (US) Anthony Grant (UK) Dmitry Egorov (Russia) Pamela Brown (US) Cecil Brown (US) Dik Bakker (UK; Netherlands)
8
ASJP: Automatic Reconstruction8 Overview Project: ASJP (Automated Similarity Judgment Program)
9
ASJP: Automatic Reconstruction9 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships
10
ASJP: Automatic Reconstruction10 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features
11
ASJP: Automatic Reconstruction11 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features Method: Lexicostatistics: mass comparison of basic lexical items,
12
ASJP: Automatic Reconstruction12 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features Method: Lexicostatistics: mass comparison of basic lexical items, extended by all relevant data available
13
ASJP: Automatic Reconstruction13 Swadesh (2440)
14
ASJP: Automatic Reconstruction14 Swadesh (2440) ASJP software
15
ASJP: Automatic Reconstruction15 Swadesh (2440) ASJP software distance matrices
16
ASJP: Automatic Reconstruction16 Swadesh (2440) distance matrices ASJP1ASJP2
17
ASJP: Automatic Reconstruction17 Swadesh (2440) distance matrices ASJP1ASJP2 TREE SFTW
18
ASJP: Automatic Reconstruction18 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 calibration TREE SFTW STAT SFTW
19
ASJP: Automatic Reconstruction19 Swadesh (2440) distance matrices ETHN WALS EXPRT calibration TREE SFTW STAT SFTW ASJP1ASJP2
20
ASJP: Automatic Reconstruction20 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW ASJP1ASJP2
21
ASJP: Automatic Reconstruction21 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS ASJP1ASJP2
22
ASJP: Automatic Reconstruction22 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT
23
ASJP: Automatic Reconstruction23 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT Jeff Mielke 500+
24
ASJP: Automatic Reconstruction24 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT LOANS
25
ASJP: Automatic Reconstruction25 Overview OVERALL GOAL: Reconstruction of Language Relationships
26
ASJP: Automatic Reconstruction26 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals:
27
ASJP: Automatic Reconstruction27 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications
28
ASJP: Automatic Reconstruction28 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages
29
ASJP: Automatic Reconstruction29 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies
30
ASJP: Automatic Reconstruction30 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
31
ASJP: Automatic Reconstruction31 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find an optimal dating method
32
ASJP: Automatic Reconstruction32 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find an optimal dating method - Automatically detect borrowings
33
ASJP: Automatic Reconstruction33 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Automatically detect borrowings
34
ASJP: Automatic Reconstruction34 Overview 1. The list of basic lexical items
35
ASJP: Automatic Reconstruction35 Overview 1. The list of basic lexical items 2. Comparing words & languages
36
ASJP: Automatic Reconstruction36 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity
37
ASJP: Automatic Reconstruction37 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity 4. On Inheritance vs Borrowing
38
ASJP: Automatic Reconstruction38 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity 4. On Inheritance vs Borrowing 5. Immanent extensions
39
ASJP: Automatic Reconstruction39 1. The list of basic lexical items
40
ASJP: Automatic Reconstruction40 Lexical items Word list: Swadesh 100 basic meanings
41
ASJP: Automatic Reconstruction41 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages
42
ASJP: Automatic Reconstruction42 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar
43
ASJP: Automatic Reconstruction43 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed
44
ASJP: Automatic Reconstruction44 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent
45
ASJP: Automatic Reconstruction45 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time
46
ASJP: Automatic Reconstruction46 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms
47
ASJP: Automatic Reconstruction47 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ?
48
ASJP: Automatic Reconstruction48 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ? LWT
49
ASJP: Automatic Reconstruction49 1. I21. dog41. nose61. die81. smoke 2. you22. louse42. mouth62. kill82. fire 3. we23. tree43. tooth63. swim83. ash 4. this24. seed44. tongue64. fly84. burn 5. that25. leaf45. claw65. walk85. path 6. who26. root46. foot66. come86. mountain 7. what27. bark47. knee67. lie87. red 8. not28. skin48. hand68. sit88. green 9. all29. flesh49. belly69. stand89. yellow 10. many30. blood50. neck70. give90. white 11. one31. bone51. breasts71. say91. black 12. two32. grease52. heart72. sun92. night 13. big33. egg53. liver73. moon93. hot 14. long34. horn54. drink74. star94. cold 15. small35. tail55. eat75. water95. full 16. woman36. feather56. bite76. rain96. new 17. man37. hair57. see77. stone97. good 18. person38. head58. hear78. sand98. round 19. fish39. ear59. know79. earth99. dry 20. bird40. eye60. sleep80. cloud100. name
50
ASJP: Automatic Reconstruction50 1. I21. dog41. nose61. die81. smoke 2. you22. louse42. mouth62. kill82. fire 3. we23. tree43. tooth63. swim83. ash 4. this24. seed44. tongue64. fly84. burn 5. that25. leaf45. claw65. walk85. path 6. who26. root46. foot66. come86. mountain 7. what27. bark47. knee67. lie87. red 8. not28. skin48. hand68. sit88. green 9. all29. flesh49. belly69. stand89. yellow 10. many30. blood50. neck70. give90. white 11. one31. bone51. breasts71. say91. black 12. two32. grease52. heart72. sun92. night 13. big33. egg53. liver73. moon93. hot 14. long 34. horn 54. drink74. star94. cold 15. small35. tail55. eat75. water95. full 16. woman36. feather56. bite76. rain96. new 17. man37. hair57. see77. stone97. good 18. person38. head58. hear78. sand98. round 19. fish39. ear59. know79. earth99. dry 20. bird40. eye60. sleep80. cloud100. name Otomi from Spanish
51
ASJP: Automatic Reconstruction51 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results
52
ASJP: Automatic Reconstruction52 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results Less work
53
ASJP: Automatic Reconstruction53 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results Less work Less missing data
54
ASJP: Automatic Reconstruction54 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results Less work Less missing data Faster processing; combinatorial explosion: 40 : 100 ~ 10 9 < 10 10 COMPARISONS
55
ASJP: Automatic Reconstruction55 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results Less work Less missing data Faster processing; combinatorial explosion: 40 : 100 ~ 10 9 < 10 10 COMPARISONS
56
ASJP: Automatic Reconstruction56 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) * * see references
57
ASJP: Automatic Reconstruction57 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus
58
ASJP: Automatic Reconstruction58 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus U = mean proportion ‘same form ’ for different SM x / genus
59
ASJP: Automatic Reconstruction59 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus U = mean proportion ‘same form ’ for different SM x / genus N.B. S sm high correlation between families
60
ASJP: Automatic Reconstruction60 Ethnologue (Goodmann-Kruskal) WALS (Pearson) ++ --
61
ASJP: Automatic Reconstruction61 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breasts say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name
62
ASJP: Automatic Reconstruction62 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breast say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name 40 Most Stable
63
ASJP: Automatic Reconstruction63 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breast say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name 40 Most Stable
64
ASJP: Automatic Reconstruction64 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words:
65
ASJP: Automatic Reconstruction65 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard
66
ASJP: Automatic Reconstruction66 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal)
67
ASJP: Automatic Reconstruction67 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal) Recoding to simplified ASJPcode (only Ascii)
68
ASJP: Automatic Reconstruction68 Lexical items: transcription ASJPcode:
69
ASJP: Automatic Reconstruction69 Lexical items: transcription ASJPcode: 7 Vowels
70
ASJP: Automatic Reconstruction70 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants
71
ASJP: Automatic Reconstruction71 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants ‘Closest sound’
72
ASJP: Automatic Reconstruction72 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants Operators for:Nasalization Labialization Palatalization Aspiration Glottalization
73
ASJP: Automatic Reconstruction73 Abaza (Caucasian): Meaning PERSON LEAF SKIN HORN NOSE TOOTH
74
ASJP: Automatic Reconstruction74 Abaza (Caucasian): MeaningIPA PERSONʕʷɨʧʼʲʷʕʷɨs LEAFbɣʲɨ SKINʧʷazʲ HORNʧʼʷɨʕʷa NOSEpɨnʦʼa TOOTHpɨʦ
75
ASJP: Automatic Reconstruction75 Abaza (Caucasian): MeaningIPAASJPcode PERSONʕʷɨʧʼʲʷʕʷɨsXw~3Cw"yXw~3s LEAFbɣʲɨbxy~3 SKINʧʷazʲCw~azy~ HORNʧʼʷɨʕʷaCw"~3Xw~a NOSEpɨnʦʼap3nc"a TOOTHpɨʦp3c
76
ASJP: Automatic Reconstruction76 Lexical items Collected to date: - Close to 2500 languages (incl. dialects and proto)
77
ASJP: Automatic Reconstruction77 Lexical items Collected to date: - Close to 2500 languages (incl. dialects and proto) - Mean number of items/language: 35.8 (/40)
78
ASJP: Automatic Reconstruction78 Lexical items Areal distribution (not a sample!): Americas:27% Eurasia:23% Australia/PNG:18% Austronesia:15% Africa:14% Creoles: 2% Artificial: 1%
79
ASJP: Automatic Reconstruction79 Languages currently sampled
80
ASJP: Automatic Reconstruction80 2. Comparing words and languages
81
ASJP: Automatic Reconstruction81 Comparing words Two strategies:
82
ASJP: Automatic Reconstruction82 Comparing words Two strategies: 1. ASJP rules
83
ASJP: Automatic Reconstruction83 Comparing words 1. ASJP context rules
84
ASJP: Automatic Reconstruction84 Comparing words ASJP context rules a. between 2 words
85
ASJP: Automatic Reconstruction85 Comparing words ASJP context rules SM i : WORD lg1 == WORD lg2
86
ASJP: Automatic Reconstruction86 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#
87
ASJP: Automatic Reconstruction87 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv# pattern W lg1 UNIFIESpattern W lg2
88
ASJP: Automatic Reconstruction88 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#
89
ASJP: Automatic Reconstruction89 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#
90
ASJP: Automatic Reconstruction90 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv# #yapi#opi
91
ASJP: Automatic Reconstruction91 Comparing words ASJP context rules a. between 2 words value 0 or 1
92
ASJP: Automatic Reconstruction92 Comparing words ASJP context rules a. between 2 words value 0 or 1 b. between 2 languages: RELATEDNESS (n of matching words / total pairs) * 100
93
ASJP: Automatic Reconstruction93 Comparing words ASJP context rules a. between 2 words value 0 or 1 b. between 2 languages: DISTANCE LSP=100 – ((matching words / total pairs) * 100 )
94
ASJP: Automatic Reconstruction94 Comparing words 2. Levenshtein Distance
95
ASJP: Automatic Reconstruction95 Comparing words Levenshtein Distance a. between 2 words: number of transformations to get from the shorter form to the longer one (changes, additions) min = 0 / max = length longest word
96
ASJP: Automatic Reconstruction96 Comparing words Levenshtein Distance a. between 2 words: number of transformations to get from the shorter form to the longer one (changes, additions) b. between 2 languages: mean LD for total number of pairs
97
ASJP: Automatic Reconstruction97 Comparing words Two problems with simple LD:
98
ASJP: Automatic Reconstruction98 Comparing words Two problems: 1.Value depends on length of longest word
99
ASJP: Automatic Reconstruction99 Comparing words Two problems: 1.Value depends on length of longest word Normalize: LDN = ( LD / L max )
100
ASJP: Automatic Reconstruction100 Comparing words Two problems: 1.Value depends on length of longest word Normalize: LDN = ( LD / L max ) 2. Differences between lgs in phonological overlap
101
ASJP: Automatic Reconstruction101 Comparing words Two problems: 1.Value depends on length of longest word Normalize: LDN = ( LD / L max ) 2. Differences between lgs in phonological overlap Eliminate ‘ background noise’: LDND = ( LDN / LDN different pairs )
102
ASJP: Automatic Reconstruction102 Comparing words Levenshtein Distance a. between 2 words: LDND = 0 - 100 (+)
103
ASJP: Automatic Reconstruction103 Comparing words Levenshtein Distance a. between 2 words: LDND = 0 - 100 (+) b. between 2 languages: Mean of all LDND’s of words in common
104
ASJP: Automatic Reconstruction104 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4
105
ASJP: Automatic Reconstruction105 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 T O T A LLSP = 58.14
106
ASJP: Automatic Reconstruction106 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 T O T A LLSP = 58.14LDND = 51.68 (n=35)
107
ASJP: Automatic Reconstruction107 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 HIGH CORRELATION:LSP = 58.14LDND = 51.68 (n=35)
108
ASJP: Automatic Reconstruction108 Comparing languages HIGH CORRELATION LSP ~ LDND
109
ASJP: Automatic Reconstruction109 Comparing languages HIGH CORRELATION LSP ~ LDND MAYA (n=34)0.93** INDO-EUROPEAN (n=129)0.97** AMERINDIAN (n=511)0.59**
110
ASJP: Automatic Reconstruction110 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364
111
ASJP: Automatic Reconstruction111 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057
112
ASJP: Automatic Reconstruction112 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057
113
ASJP: Automatic Reconstruction113 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057 - Shortness - Sound Symbolism?
114
ASJP: Automatic Reconstruction114 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075
115
ASJP: Automatic Reconstruction115 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075 Across families 36. NIGHT 0.028 37. HEAR 0.027 38. HORN 0.027 39. STAR 0.024 40. KNEE 0.023
116
ASJP: Automatic Reconstruction116 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075 Across families 36. NIGHT 0.028 37. HEAR 0.027 38. HORN 0.027 39. STAR 0.024 40. KNEE 0.023
117
ASJP: Automatic Reconstruction117 LANG1LANG2FAM1FAM2 LSP LDND AGUACATECCHICOMUCELTECMAYAN 96.5594.75 AGUACATECCHOL_TILAMAYAN 86.1180.10 AGUACATECCHONTAL_TABASCOMAYAN 90.0083.97 AGUACATECIXIL_CHAJULMAYAN 47.5049.25 AGUACATECKAQCHIKEL_NORTHERNMAYAN 74.3664.40 AGUACATECMAYA_YUCATANMAYAN 78.9576.15 AGUACATECMOCHOMAYAN 54.2951.68 AGUACATECQANJOBAL_EASTERNMAYAN 45.0050.59 AGUACATECRABINAL_ACHIMAYAN 70.0059.03 AGUACATECSAKAPULTEKOMAYAN 70.0061.83 AGUACATECSIPAKAPENSEMAYAN 66.6754.97 AGUACATECTEKTITEKOMAYAN 52.5057.24 AGUACATECTZELTAL_OXCHUCMAYAN 86.8472.93 AGUACATECTZOTZIL_SAN_ANDRESMAYAN 92.5079.64 for 2440 lgs: ~ 3,000,000 ( * 36 2 ~ ± 3.10 9 )
118
ASJP: Automatic Reconstruction118 3. Genetic proximity
119
ASJP: Automatic Reconstruction119 Swadesh (2440) distance matrices AJP2 Splits Tree
120
ASJP: Automatic Reconstruction120 Swadesh (2440) distance matrices AJP2 Splits Tree MEGA4
121
ASJP: Automatic Reconstruction121 Swadesh (2440) distance matrices AJP2 Splits Tree MEGA4 Neighbour Joining
122
ASJP: Automatic Reconstruction122 LSP ASJP
123
ASJP: Automatic Reconstruction123 LSP Correlation: ETHN.325**
124
ASJP: Automatic Reconstruction124 LSP (n = 34) Correlation: ETHN.325** (n = 69)
125
ASJP: Automatic Reconstruction125 LSP Correlation: ETHN.325** More structure than ETHN
126
ASJP: Automatic Reconstruction126 LSP Correlation: ETHN.325** Separation
127
ASJP: Automatic Reconstruction127 LDND Levenshtein
128
ASJP: Automatic Reconstruction128 LDND Correlation: ETHN.195** Levenshtein
129
ASJP: Automatic Reconstruction129 LDND Correlation: ETHN.195** (LSP =.325) Levenshtein
130
ASJP: Automatic Reconstruction130 ASJP LDND
131
ASJP: Automatic Reconstruction131 ASJP LDND cholan
132
ASJP: Automatic Reconstruction132 ASJP LDND cholan tzeltalan
133
ASJP: Automatic Reconstruction133 ASJP LDND cholan tzeltalan
134
ASJP: Automatic Reconstruction134 ASJP LDND yucatecan
135
ASJP: Automatic Reconstruction135 ASJP LDND
136
ASJP: Automatic Reconstruction136 ASJP LDND
137
ASJP: Automatic Reconstruction137 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01
138
ASJP: Automatic Reconstruction138 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01
139
ASJP: Automatic Reconstruction139 Improving the fit Enrich lexical with typological data:
140
ASJP: Automatic Reconstruction140 Swadesh (2440) distance matrices ASJP TREE SFTW WALS (2580) ~
141
ASJP: Automatic Reconstruction141 distance matrices ASJP TREE SFTW SWALSH (2440)
142
ASJP: Automatic Reconstruction142 Improving the fit Enrich lexical with typological data:
143
ASJP: Automatic Reconstruction143 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages
144
ASJP: Automatic Reconstruction144 distance matrices ASJP TREE SFTW SWALSH (550)
145
ASJP: Automatic Reconstruction145 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages - WALS variables very unevenly spread
146
ASJP: Automatic Reconstruction146 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages - WALS variables very unevenly spread - Maximum subset: 85 most stable
147
ASJP: Automatic Reconstruction147 Most stable WALS variables WALS Variable DescriptionStability Within Genus 31Sex-based and Non-sex-based Gender Systems 0.81 118Predicative Adjectives0.74 30Number of Genders0.73 119Nominal and Locational Predication0.71 29Syncretism in Verbal Person/Number Marking 0.71
148
ASJP: Automatic Reconstruction148 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable
149
ASJP: Automatic Reconstruction149 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) ?
150
ASJP: Automatic Reconstruction150 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) - Mantel Test: 10.000 simulations:
151
ASJP: Automatic Reconstruction151 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) - Mantel Test: 10.000 simulations: best +0.050 - 0.043 (mean 0.009)
152
ASJP: Automatic Reconstruction152 Improving the fit Enrich lexical with typological data: - Database 40 most stable Swadesh + 85 most stable WALS features
153
ASJP: Automatic Reconstruction153 Improving the fit Enrich lexical with typological data: - Database 40 most stable Swadesh + 85 most stable WALS features - Optimal weight of both?
154
ASJP: Automatic Reconstruction154 Improving the fit
155
ASJP: Automatic Reconstruction155 Improving the fit
156
ASJP: Automatic Reconstruction156 Improving the fit
157
ASJP: Automatic Reconstruction157 4. On Inheritance vs Borrowing
158
ASJP: Automatic Reconstruction158 Inherited or borrowed? AVAR (AVA) / AGUL (AGL)
159
ASJP: Automatic Reconstruction159 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0
160
ASJP: Automatic Reconstruction160 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0 6 items < 70.0
161
ASJP: Automatic Reconstruction161 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0 6 items < 70.0 Genetically related !!
162
ASJP: Automatic Reconstruction162 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA)
163
ASJP: Automatic Reconstruction163 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2
164
ASJP: Automatic Reconstruction164 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2 6 items < 70.0
165
ASJP: Automatic Reconstruction165 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2 NOT Related: Chance?
166
ASJP: Automatic Reconstruction166 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2 NOT Related: Chance? Or Borrowing?
167
ASJP: Automatic Reconstruction167 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6
168
ASJP: Automatic Reconstruction168 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA: f/g= 0.17/0.82 (= % < 0.70)
169
ASJP: Automatic Reconstruction169 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA <> CHA:f/g= 0.17/0.82 0.00/0.00
170
ASJP: Automatic Reconstruction170 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 >0.00/0.00
171
ASJP: Automatic Reconstruction171 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF=
172
ASJP: Automatic Reconstruction172 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA:wwF= 83 (= mean LDND estreya in IE)
173
ASJP: Automatic Reconstruction173 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA:wwF= 83-99 (= mean estreya in AU)
174
ASJP: Automatic Reconstruction174 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102 (= mn estrecas / AU)
175
ASJP: Automatic Reconstruction175 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102-85 (= estrecas / IE)
176
ASJP: Automatic Reconstruction176 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102-85
177
ASJP: Automatic Reconstruction177 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85
178
ASJP: Automatic Reconstruction178 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA <> CHA:phwF=
179
ASJP: Automatic Reconstruction179 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA:phwF=100.00 (phon estreya in IE / AU)
180
ASJP: Automatic Reconstruction180 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA<> CHA:phwF=100.00 <> 0.52 (phon estrecas in AU/ IE )
181
ASJP: Automatic Reconstruction181 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA > CHA:phwF=100.00 > 0.52
182
ASJP: Automatic Reconstruction182 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA > CHA:phwF=100.00 > 0.52 SYN: CHA= puti7on (f: 1.00)
183
ASJP: Automatic Reconstruction183 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO ONE : uno=unu * LDND=36.9 SPA > CHA f/g= 0.24/0.82 > 0.03/0.00 wwF= 97-106 > 110-97 phwF= 12.00 > 0.44
184
ASJP: Automatic Reconstruction184 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO TWO : dos=dos * LDND= 0.0 SPA > CHA f/g= 0.62/1.00 > 0.12/0.00 wwF= 78-99 > 102-78 phwF=100.00 > 0.22
185
ASJP: Automatic Reconstruction185 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO NIGHT : noCe=noces * LDND=44.2 SPA > CHA f/g= 0.23/0.55 > 0.04/0.00 wwF= 89-100 > 105-92 phwF=100.00 > 0.10
186
ASJP: Automatic Reconstruction186 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO NEW : nuevo=nueba * LDND=44.2 SPA > CHA f/g=0.50/0.64 > 0.04/0.00 wwF= 68-104 > 105-80 phwF=4.27 > 0.03
187
ASJP: Automatic Reconstruction187 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO PERSON : persona=petsona * LDND=15.8 SPA > CHA f/g= 0.20/0.64 > 0.01/0.00 wwF= 89-98 > 98-90 phwF=32.40 > 0.13 SYN: CHA= taotao (f: 1.00)
188
ASJP: Automatic Reconstruction188 Inherited or borrowed? Further output filters:
189
ASJP: Automatic Reconstruction189 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings
190
ASJP: Automatic Reconstruction190 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction
191
ASJP: Automatic Reconstruction191 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information
192
ASJP: Automatic Reconstruction192 Inherited or borrowed? SPANISH (spa) INDO-EUROPEAN (128) > ROMANCE (12) EURASIA SPAIN VS. CHAMORRO (cha) AUSTRONESIAN (678) > CHAMORRO OCEANIA GUAM [GEODIST=13244; GENDIST=3.00]
193
ASJP: Automatic Reconstruction193 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW ASJP1ASJP2 HIST FACTS ‘Spaniards in Pacific since 16 th century’
194
ASJP: Automatic Reconstruction194 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information 4. Role of form and meaning ( ? )
195
ASJP: Automatic Reconstruction195 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information 4. Role of form and meaning ( ? ) LWT
196
ASJP: Automatic Reconstruction196 Borrowed! BOR = spa TO cha 6 (=15.0%) LDND = 76.63 (shared=40; crit=70.00 - U) DATABASE: unu(*spa) dos(*spa) petsona(*spa) estrecas(*spa) noces(*spa) nueba(*spa)
197
ASJP: Automatic Reconstruction197 5. Immanent extensions
198
ASJP: Automatic Reconstruction198
199
ASJP: Automatic Reconstruction199 GARBAGE IN GARBAGE OUT
200
ASJP: Automatic Reconstruction200 Lexical items: transcription Second year of project (2008-9): Replace ASJP code by full IPA representations
201
ASJP: Automatic Reconstruction201 Lexical items: transcription Second year of project (2008-9): Replace ASJP code by full IPA representations Juliette Jeff
202
ASJP: Automatic Reconstruction202 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved:
203
ASJP: Automatic Reconstruction203 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations
204
ASJP: Automatic Reconstruction204 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations 2. automatic conversion IPA to integer (Python)
205
ASJP: Automatic Reconstruction205 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations 2. automatic conversion IPA to integer (Python) 3. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar
206
ASJP: Automatic Reconstruction206 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON
207
ASJP: Automatic Reconstruction207 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs
208
ASJP: Automatic Reconstruction208 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115
209
ASJP: Automatic Reconstruction209 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJPcode: 88,119,126,51,67,34,121,119,126,88,119,126,51 115 ( = Xw~3Cw"y~Xw~3s)
210
ASJP: Automatic Reconstruction210 Lexical items: transcription Second year of project (2008-9): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar Why not run on full IPA??
211
ASJP: Automatic Reconstruction211 Lexical items: transcription Second year of project (2008): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar Caucasian: correlations IPA ~ ASJP > 0.9
212
ASJP: Automatic Reconstruction212 Lexical items: transcription Second year of project (2008): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar - correlations IPA ~ ASJP > 0.9 - but: ASJP better fit with classifications IPA too specific
213
ASJP: Automatic Reconstruction213 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) ‘a’ <- 661, 895, 416, … formal grammar
214
ASJP: Automatic Reconstruction214 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) ‘a’ <- 661, 895, 416, … … C [-V] <- C [+V] / - # C [+V] <- C [-V, +PL] / - C [+V] formal grammar
215
ASJP: Automatic Reconstruction215 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) optimal level of abstraction for historical phonological reconstruction? ‘a’ <- 661, 895, 416, … … C [-V] <- C [+V] / - # C [+V] <- C [-V, +PL] / - C [+V]
216
ASJP: Automatic Reconstruction216 distance matrices ETHN WALS EXP ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS Swadesh Phon Invent Borrowing!
217
ASJP: Automatic Reconstruction217 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01 Lexical items: transcription
218
ASJP: Automatic Reconstruction218 NLGSPHON Altaic30.723+ Maya34.325+ Afro-Asiatic128.172+ Trans New-Guinea148.325+ Niger-Congo379.125+ **all significant > 0.01 Lexical items: transcription
219
ASJP: Automatic Reconstruction219 - Holman, Eric et al. (2008). Advances in automated language classification.Advances in automated language classification In Arppe, Antti, Kaius Sinnemäki and Urpu Nikanne (eds.), Quantitative Investigations in Theoretical Linguistics, 40-43. Helsinki: University of Helsinki. - Holman et al. (forthc. 2008) Explorations in automated language classification.Explorations in automated language classification Folia Linguistica - Brown et al. (forthc. 2008) Automated Classification of the World’s languages: A description of the method and prelimary results Sprachtypologie und Universalienforschung - Bakker et al. (2009?) Using WALS for the ASJP project
220
ASJP: Automatic Reconstruction220 email.eva.mpg.de./~wichmann/ASJPHomePage
221
ASJP: Automatic Reconstruction221 ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.