Presentation is loading. Please wait.

Presentation is loading. Please wait.

A dvances in Automated Language Classification ASJP Consortium (Dik Bakker)

Similar presentations


Presentation on theme: "A dvances in Automated Language Classification ASJP Consortium (Dik Bakker)"— Presentation transcript:

1 A dvances in Automated Language Classification ASJP Consortium (Dik Bakker)

2 ASJP: Automatic Reconstruction2 Overview Project (MAY 2007 - ): ASJP (Automated Similarity Judgment Program)

3 ASJP: Automatic Reconstruction3 Overview Project: ASJP (Automated Similarity Judgment Program) LANGUAGE NUMBERS

4 ASJP: Automatic Reconstruction4 Overview Project: ASJP (Automated Similarity Judgment Program) Data sources TOOLS

5 ASJP: Automatic Reconstruction5 Overview Project: ASJP (Automated Similarity Judgment Program) Data bases Results Data sources TOOLS

6 ASJP: Automatic Reconstruction6 Overview Project: ASJP (Automated Similarity Judgment Program)

7 ASJP: Automatic Reconstruction7 Overview Project: ASJP are: Sören Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) André Müller (BRD) Robert Mailhammer (BRD) Hagen Jung (BRD) Eric Holman (US) Anthony Grant (UK) Dmitry Egorov (Russia) Pamela Brown (US) Cecil Brown (US) Dik Bakker (UK; Netherlands)

8 ASJP: Automatic Reconstruction8 Overview Project: ASJP (Automated Similarity Judgment Program)

9 ASJP: Automatic Reconstruction9 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships

10 ASJP: Automatic Reconstruction10 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features

11 ASJP: Automatic Reconstruction11 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features Method: Lexicostatistics: mass comparison of basic lexical items,

12 ASJP: Automatic Reconstruction12 Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrices between individual languages on the basis of linguistic features Method: Lexicostatistics: mass comparison of basic lexical items, extended by all relevant data available

13 ASJP: Automatic Reconstruction13 Swadesh (2440)

14 ASJP: Automatic Reconstruction14 Swadesh (2440) ASJP software

15 ASJP: Automatic Reconstruction15 Swadesh (2440) ASJP software distance matrices

16 ASJP: Automatic Reconstruction16 Swadesh (2440) distance matrices ASJP1ASJP2

17 ASJP: Automatic Reconstruction17 Swadesh (2440) distance matrices ASJP1ASJP2 TREE SFTW

18 ASJP: Automatic Reconstruction18 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 calibration TREE SFTW STAT SFTW

19 ASJP: Automatic Reconstruction19 Swadesh (2440) distance matrices ETHN WALS EXPRT calibration TREE SFTW STAT SFTW ASJP1ASJP2

20 ASJP: Automatic Reconstruction20 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW ASJP1ASJP2

21 ASJP: Automatic Reconstruction21 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS ASJP1ASJP2

22 ASJP: Automatic Reconstruction22 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT

23 ASJP: Automatic Reconstruction23 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT Jeff Mielke 500+

24 ASJP: Automatic Reconstruction24 Swadesh (2440) distance matrices ETHN WALS EXPRT ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS PHON INVENT LOANS

25 ASJP: Automatic Reconstruction25 Overview OVERALL GOAL: Reconstruction of Language Relationships

26 ASJP: Automatic Reconstruction26 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals:

27 ASJP: Automatic Reconstruction27 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications

28 ASJP: Automatic Reconstruction28 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages

29 ASJP: Automatic Reconstruction29 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies

30 ASJP: Automatic Reconstruction30 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

31 ASJP: Automatic Reconstruction31 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find an optimal dating method

32 ASJP: Automatic Reconstruction32 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find an optimal dating method - Automatically detect borrowings

33 ASJP: Automatic Reconstruction33 Overview OVERALL GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Search for (ir)regularities in phylogenies - Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Automatically detect borrowings

34 ASJP: Automatic Reconstruction34 Overview 1. The list of basic lexical items

35 ASJP: Automatic Reconstruction35 Overview 1. The list of basic lexical items 2. Comparing words & languages

36 ASJP: Automatic Reconstruction36 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity

37 ASJP: Automatic Reconstruction37 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity 4. On Inheritance vs Borrowing

38 ASJP: Automatic Reconstruction38 Overview 1. The list of basic lexical items 2. Comparing words & languages 3. Some results: genetic proximity 4. On Inheritance vs Borrowing 5. Immanent extensions

39 ASJP: Automatic Reconstruction39 1. The list of basic lexical items

40 ASJP: Automatic Reconstruction40 Lexical items Word list: Swadesh 100 basic meanings

41 ASJP: Automatic Reconstruction41 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages

42 ASJP: Automatic Reconstruction42 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar

43 ASJP: Automatic Reconstruction43 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed

44 ASJP: Automatic Reconstruction44 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent

45 ASJP: Automatic Reconstruction45 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time

46 ASJP: Automatic Reconstruction46 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms

47 ASJP: Automatic Reconstruction47 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ?

48 ASJP: Automatic Reconstruction48 Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ? LWT

49 ASJP: Automatic Reconstruction49 1. I21. dog41. nose61. die81. smoke 2. you22. louse42. mouth62. kill82. fire 3. we23. tree43. tooth63. swim83. ash 4. this24. seed44. tongue64. fly84. burn 5. that25. leaf45. claw65. walk85. path 6. who26. root46. foot66. come86. mountain 7. what27. bark47. knee67. lie87. red 8. not28. skin48. hand68. sit88. green 9. all29. flesh49. belly69. stand89. yellow 10. many30. blood50. neck70. give90. white 11. one31. bone51. breasts71. say91. black 12. two32. grease52. heart72. sun92. night 13. big33. egg53. liver73. moon93. hot 14. long34. horn54. drink74. star94. cold 15. small35. tail55. eat75. water95. full 16. woman36. feather56. bite76. rain96. new 17. man37. hair57. see77. stone97. good 18. person38. head58. hear78. sand98. round 19. fish39. ear59. know79. earth99. dry 20. bird40. eye60. sleep80. cloud100. name

50 ASJP: Automatic Reconstruction50 1. I21. dog41. nose61. die81. smoke 2. you22. louse42. mouth62. kill82. fire 3. we23. tree43. tooth63. swim83. ash 4. this24. seed44. tongue64. fly84. burn 5. that25. leaf45. claw65. walk85. path 6. who26. root46. foot66. come86. mountain 7. what27. bark47. knee67. lie87. red 8. not28. skin48. hand68. sit88. green 9. all29. flesh49. belly69. stand89. yellow 10. many30. blood50. neck70. give90. white 11. one31. bone51. breasts71. say91. black 12. two32. grease52. heart72. sun92. night 13. big33. egg53. liver73. moon93. hot 14. long 34. horn 54. drink74. star94. cold 15. small35. tail55. eat75. water95. full 16. woman36. feather56. bite76. rain96. new 17. man37. hair57. see77. stone97. good 18. person38. head58. hear78. sand98. round 19. fish39. ear59. know79. earth99. dry 20. bird40. eye60. sleep80. cloud100. name Otomi from Spanish

51 ASJP: Automatic Reconstruction51 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results

52 ASJP: Automatic Reconstruction52 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results  Less work

53 ASJP: Automatic Reconstruction53 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results  Less work  Less missing data

54 ASJP: Automatic Reconstruction54 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results  Less work  Less missing data  Faster processing; combinatorial explosion: 40 : 100 ~ 10 9 < 10 10 COMPARISONS

55 ASJP: Automatic Reconstruction55 Lexical items: further reduction Early analyses have shown: - Most stable 40/100 item subset gives same results  Less work  Less missing data  Faster processing; combinatorial explosion: 40 : 100 ~ 10 9 < 10 10 COMPARISONS

56 ASJP: Automatic Reconstruction56 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) * * see references

57 ASJP: Automatic Reconstruction57 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus

58 ASJP: Automatic Reconstruction58 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus U = mean proportion ‘same form ’ for different SM x / genus

59 ASJP: Automatic Reconstruction59 Lexical items: further reduction Most stable: S SM = (R – U) / (1 – U) R = mean proportion ‘same form ’ for SM i / genus U = mean proportion ‘same form ’ for different SM x / genus N.B. S sm high correlation between families

60 ASJP: Automatic Reconstruction60 Ethnologue (Goodmann-Kruskal) WALS (Pearson) ++ --

61 ASJP: Automatic Reconstruction61 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breasts say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name

62 ASJP: Automatic Reconstruction62 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breast say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name 40 Most Stable

63 ASJP: Automatic Reconstruction63 I dog nose die smoke you louse mouth kill fire we tree tooth swim ash this seed tongue fly burn that leaf claw walk path who root foot come mountain what bark knee lie red not skin hand sit green all flesh belly stand yellow many blood neck give white one bone breast say black two grease heart sun night big egg liver moon hot long horn drink star cold small tail eat water full woman feather bite rain new man hair see stone good person head hear sand round fish ear know earth dry bird eye sleep cloud name 40 Most Stable

64 ASJP: Automatic Reconstruction64 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words:

65 ASJP: Automatic Reconstruction65 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard

66 ASJP: Automatic Reconstruction66 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal)

67 ASJP: Automatic Reconstruction67 Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal)  Recoding to simplified ASJPcode (only Ascii)

68 ASJP: Automatic Reconstruction68 Lexical items: transcription ASJPcode:

69 ASJP: Automatic Reconstruction69 Lexical items: transcription ASJPcode: 7 Vowels

70 ASJP: Automatic Reconstruction70 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants

71 ASJP: Automatic Reconstruction71 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants ‘Closest sound’

72 ASJP: Automatic Reconstruction72 Lexical items: transcription ASJPcode: 7 Vowels 34 Consonants Operators for:Nasalization Labialization Palatalization Aspiration Glottalization

73 ASJP: Automatic Reconstruction73 Abaza (Caucasian): Meaning PERSON LEAF SKIN HORN NOSE TOOTH

74 ASJP: Automatic Reconstruction74 Abaza (Caucasian): MeaningIPA PERSONʕʷɨʧʼʲʷʕʷɨs LEAFbɣʲɨ SKINʧʷazʲ HORNʧʼʷɨʕʷa NOSEpɨnʦʼa TOOTHpɨʦ

75 ASJP: Automatic Reconstruction75 Abaza (Caucasian): MeaningIPAASJPcode PERSONʕʷɨʧʼʲʷʕʷɨsXw~3Cw"yXw~3s LEAFbɣʲɨbxy~3 SKINʧʷazʲCw~azy~ HORNʧʼʷɨʕʷaCw"~3Xw~a NOSEpɨnʦʼap3nc"a TOOTHpɨʦp3c

76 ASJP: Automatic Reconstruction76 Lexical items Collected to date: - Close to 2500 languages (incl. dialects and proto)

77 ASJP: Automatic Reconstruction77 Lexical items Collected to date: - Close to 2500 languages (incl. dialects and proto) - Mean number of items/language: 35.8 (/40)

78 ASJP: Automatic Reconstruction78 Lexical items Areal distribution (not a sample!): Americas:27% Eurasia:23% Australia/PNG:18% Austronesia:15% Africa:14% Creoles: 2% Artificial: 1%

79 ASJP: Automatic Reconstruction79 Languages currently sampled

80 ASJP: Automatic Reconstruction80 2. Comparing words and languages

81 ASJP: Automatic Reconstruction81 Comparing words Two strategies:

82 ASJP: Automatic Reconstruction82 Comparing words Two strategies: 1. ASJP rules

83 ASJP: Automatic Reconstruction83 Comparing words 1. ASJP context rules

84 ASJP: Automatic Reconstruction84 Comparing words ASJP context rules a. between 2 words

85 ASJP: Automatic Reconstruction85 Comparing words ASJP context rules SM i : WORD lg1 == WORD lg2

86 ASJP: Automatic Reconstruction86 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#

87 ASJP: Automatic Reconstruction87 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv# pattern W lg1 UNIFIESpattern W lg2

88 ASJP: Automatic Reconstruction88 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#

89 ASJP: Automatic Reconstruction89 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) SM i : WORD lg1 == WORD lg2 R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv#

90 ASJP: Automatic Reconstruction90 Comparing words ASJP context rules (C/V=general; c/v=specific; X=*) R1 #(V)cVcX# #XcVcX# R2 #Xc(V)c(V)cX# #Xc(V)c(V)cX# … R12 #AVcvX# #VcvX# A=hwy R13 #(V)ccVX# #(V)ccVX# … R22 #cv# #(CV)cv# #yapi#opi

91 ASJP: Automatic Reconstruction91 Comparing words ASJP context rules a. between 2 words value 0 or 1

92 ASJP: Automatic Reconstruction92 Comparing words ASJP context rules a. between 2 words value 0 or 1 b. between 2 languages: RELATEDNESS (n of matching words / total pairs) * 100

93 ASJP: Automatic Reconstruction93 Comparing words ASJP context rules a. between 2 words value 0 or 1 b. between 2 languages: DISTANCE LSP=100 – ((matching words / total pairs) * 100 )

94 ASJP: Automatic Reconstruction94 Comparing words 2. Levenshtein Distance

95 ASJP: Automatic Reconstruction95 Comparing words Levenshtein Distance a. between 2 words: number of transformations to get from the shorter form to the longer one (changes, additions) min = 0 / max = length longest word

96 ASJP: Automatic Reconstruction96 Comparing words Levenshtein Distance a. between 2 words: number of transformations to get from the shorter form to the longer one (changes, additions) b. between 2 languages: mean LD for total number of pairs

97 ASJP: Automatic Reconstruction97 Comparing words Two problems with simple LD:

98 ASJP: Automatic Reconstruction98 Comparing words Two problems: 1.Value depends on length of longest word

99 ASJP: Automatic Reconstruction99 Comparing words Two problems: 1.Value depends on length of longest word  Normalize: LDN = ( LD / L max )

100 ASJP: Automatic Reconstruction100 Comparing words Two problems: 1.Value depends on length of longest word  Normalize: LDN = ( LD / L max ) 2. Differences between lgs in phonological overlap

101 ASJP: Automatic Reconstruction101 Comparing words Two problems: 1.Value depends on length of longest word  Normalize: LDN = ( LD / L max ) 2. Differences between lgs in phonological overlap  Eliminate ‘ background noise’: LDND = ( LDN / LDN different pairs )

102 ASJP: Automatic Reconstruction102 Comparing words Levenshtein Distance a. between 2 words: LDND = 0 - 100 (+)

103 ASJP: Automatic Reconstruction103 Comparing words Levenshtein Distance a. between 2 words: LDND = 0 - 100 (+) b. between 2 languages: Mean of all LDND’s of words in common

104 ASJP: Automatic Reconstruction104 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4

105 ASJP: Automatic Reconstruction105 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 T O T A LLSP = 58.14

106 ASJP: Automatic Reconstruction106 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 T O T A LLSP = 58.14LDND = 51.68 (n=35)

107 ASJP: Automatic Reconstruction107 Comparing languages AGUACATEC (agu) <> MOCHO (mhc) MAYAN (45) > MAYAN [GeoD=97; GenD=1.86] ONExun=hun- LDND= 37.4 TWOkob=kabe7R1 LDND= 67.3 BONEbaq=baqR3 LDND= 0.0 EARSCin=Cikin- LDND= 67.3 WATERa7=ha7R10 LDND= 37.4 HIGH CORRELATION:LSP = 58.14LDND = 51.68 (n=35)

108 ASJP: Automatic Reconstruction108 Comparing languages HIGH CORRELATION LSP ~ LDND

109 ASJP: Automatic Reconstruction109 Comparing languages HIGH CORRELATION LSP ~ LDND MAYA (n=34)0.93** INDO-EUROPEAN (n=129)0.97** AMERINDIAN (n=511)0.59**

110 ASJP: Automatic Reconstruction110 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364

111 ASJP: Automatic Reconstruction111 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057

112 ASJP: Automatic Reconstruction112 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057

113 ASJP: Automatic Reconstruction113 Comparing languages BEST PERFORMERSWithin families 1. EYE0.496 2. LOUSE0.480 3. DIE0.469 4. BREAST0.415 5. STONE0.364 Across families 1. I0.072 2. DIE0.065 3. WE0.061 4. YOU0.057 5. BREAST0.057 - Shortness - Sound Symbolism?

114 ASJP: Automatic Reconstruction114 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075

115 ASJP: Automatic Reconstruction115 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075 Across families 36. NIGHT 0.028 37. HEAR 0.027 38. HORN 0.027 39. STAR 0.024 40. KNEE 0.023

116 ASJP: Automatic Reconstruction116 Comparing languages WORST PERFORMERSWithin families 36. HORN0.107 37. SEE 0.099 38. KNEE 0.095 39. NIGHT 0.079 40. MOUNTAIN 0.075 Across families 36. NIGHT 0.028 37. HEAR 0.027 38. HORN 0.027 39. STAR 0.024 40. KNEE 0.023

117 ASJP: Automatic Reconstruction117 LANG1LANG2FAM1FAM2 LSP LDND AGUACATECCHICOMUCELTECMAYAN 96.5594.75 AGUACATECCHOL_TILAMAYAN 86.1180.10 AGUACATECCHONTAL_TABASCOMAYAN 90.0083.97 AGUACATECIXIL_CHAJULMAYAN 47.5049.25 AGUACATECKAQCHIKEL_NORTHERNMAYAN 74.3664.40 AGUACATECMAYA_YUCATANMAYAN 78.9576.15 AGUACATECMOCHOMAYAN 54.2951.68 AGUACATECQANJOBAL_EASTERNMAYAN 45.0050.59 AGUACATECRABINAL_ACHIMAYAN 70.0059.03 AGUACATECSAKAPULTEKOMAYAN 70.0061.83 AGUACATECSIPAKAPENSEMAYAN 66.6754.97 AGUACATECTEKTITEKOMAYAN 52.5057.24 AGUACATECTZELTAL_OXCHUCMAYAN 86.8472.93 AGUACATECTZOTZIL_SAN_ANDRESMAYAN 92.5079.64 for 2440 lgs: ~ 3,000,000 ( * 36 2 ~ ± 3.10 9 )

118 ASJP: Automatic Reconstruction118 3. Genetic proximity

119 ASJP: Automatic Reconstruction119 Swadesh (2440) distance matrices AJP2 Splits Tree

120 ASJP: Automatic Reconstruction120 Swadesh (2440) distance matrices AJP2 Splits Tree MEGA4

121 ASJP: Automatic Reconstruction121 Swadesh (2440) distance matrices AJP2 Splits Tree MEGA4 Neighbour Joining

122 ASJP: Automatic Reconstruction122 LSP ASJP

123 ASJP: Automatic Reconstruction123 LSP Correlation: ETHN.325**

124 ASJP: Automatic Reconstruction124 LSP (n = 34) Correlation: ETHN.325** (n = 69)

125 ASJP: Automatic Reconstruction125 LSP Correlation: ETHN.325** More structure than ETHN

126 ASJP: Automatic Reconstruction126 LSP Correlation: ETHN.325** Separation

127 ASJP: Automatic Reconstruction127 LDND Levenshtein

128 ASJP: Automatic Reconstruction128 LDND Correlation: ETHN.195** Levenshtein

129 ASJP: Automatic Reconstruction129 LDND Correlation: ETHN.195** (LSP =.325) Levenshtein

130 ASJP: Automatic Reconstruction130 ASJP LDND

131 ASJP: Automatic Reconstruction131 ASJP LDND cholan

132 ASJP: Automatic Reconstruction132 ASJP LDND cholan tzeltalan

133 ASJP: Automatic Reconstruction133 ASJP LDND cholan tzeltalan

134 ASJP: Automatic Reconstruction134 ASJP LDND yucatecan

135 ASJP: Automatic Reconstruction135 ASJP LDND

136 ASJP: Automatic Reconstruction136 ASJP LDND

137 ASJP: Automatic Reconstruction137 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01

138 ASJP: Automatic Reconstruction138 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01

139 ASJP: Automatic Reconstruction139 Improving the fit Enrich lexical with typological data:

140 ASJP: Automatic Reconstruction140 Swadesh (2440) distance matrices ASJP TREE SFTW WALS (2580) ~

141 ASJP: Automatic Reconstruction141 distance matrices ASJP TREE SFTW SWALSH (2440)

142 ASJP: Automatic Reconstruction142 Improving the fit Enrich lexical with typological data:

143 ASJP: Automatic Reconstruction143 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages

144 ASJP: Automatic Reconstruction144 distance matrices ASJP TREE SFTW SWALSH (550)

145 ASJP: Automatic Reconstruction145 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages - WALS variables very unevenly spread

146 ASJP: Automatic Reconstruction146 Improving the fit Enrich lexical with typological data: - NOT 1:1 with ASJP languages - WALS variables very unevenly spread - Maximum subset: 85 most stable

147 ASJP: Automatic Reconstruction147 Most stable WALS variables WALS Variable DescriptionStability Within Genus 31Sex-based and Non-sex-based Gender Systems 0.81 118Predicative Adjectives0.74 30Number of Genders0.73 119Nominal and Locational Predication0.71 29Syncretism in Verbal Person/Number Marking 0.71

148 ASJP: Automatic Reconstruction148 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable

149 ASJP: Automatic Reconstruction149 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) ?

150 ASJP: Automatic Reconstruction150 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) - Mantel Test: 10.000 simulations:

151 ASJP: Automatic Reconstruction151 Improving the fit Enrich lexical with typological data: - Maximum subset: 85 most stable - Correlation with Swadesh: 0.063 (> 0.001) - Mantel Test: 10.000 simulations: best +0.050 - 0.043 (mean 0.009)

152 ASJP: Automatic Reconstruction152 Improving the fit Enrich lexical with typological data: - Database 40 most stable Swadesh + 85 most stable WALS features

153 ASJP: Automatic Reconstruction153 Improving the fit Enrich lexical with typological data: - Database 40 most stable Swadesh + 85 most stable WALS features - Optimal weight of both?

154 ASJP: Automatic Reconstruction154 Improving the fit

155 ASJP: Automatic Reconstruction155 Improving the fit

156 ASJP: Automatic Reconstruction156 Improving the fit

157 ASJP: Automatic Reconstruction157 4. On Inheritance vs Borrowing

158 ASJP: Automatic Reconstruction158 Inherited or borrowed? AVAR (AVA) / AGUL (AGL)

159 ASJP: Automatic Reconstruction159 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0

160 ASJP: Automatic Reconstruction160 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0  6 items < 70.0

161 ASJP: Automatic Reconstruction161 Inherited or borrowed? AVAR (AVA) / AGUL (AGL) I : dun=zun * LDND=36.6 YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0 FIRE : c"a=c"a * LDND= 0.0 FULL : c"ura=ac"uf * LDND=66.0 NEW : c"iya=c"EyEr * LDND=55.0  6 items < 70.0  Genetically related !!

162 ASJP: Automatic Reconstruction162 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA)

163 ASJP: Automatic Reconstruction163 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2

164 ASJP: Automatic Reconstruction164 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2  6 items < 70.0

165 ASJP: Automatic Reconstruction165 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2 NOT Related: Chance?

166 ASJP: Automatic Reconstruction166 Inherited or borrowed? SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=15.8 STAR : estreya=estrecas * LDND=27.6 NIGHT : noCe=noces * LDND=44.2 NEW : nuevo=nueba * LDND=44.2 NOT Related: Chance? Or Borrowing?

167 ASJP: Automatic Reconstruction167 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6

168 ASJP: Automatic Reconstruction168 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA: f/g= 0.17/0.82 (= % < 0.70)

169 ASJP: Automatic Reconstruction169 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA <> CHA:f/g= 0.17/0.82 0.00/0.00

170 ASJP: Automatic Reconstruction170 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 >0.00/0.00

171 ASJP: Automatic Reconstruction171 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF=

172 ASJP: Automatic Reconstruction172 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA:wwF= 83 (= mean LDND estreya in IE)

173 ASJP: Automatic Reconstruction173 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA:wwF= 83-99 (= mean estreya in AU)

174 ASJP: Automatic Reconstruction174 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102 (= mn estrecas / AU)

175 ASJP: Automatic Reconstruction175 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102-85 (= estrecas / IE)

176 ASJP: Automatic Reconstruction176 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA <> CHA:wwF= 83-99 <> 102-85

177 ASJP: Automatic Reconstruction177 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85

178 ASJP: Automatic Reconstruction178 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA <> CHA:phwF=

179 ASJP: Automatic Reconstruction179 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA:phwF=100.00 (phon estreya in IE / AU)

180 ASJP: Automatic Reconstruction180 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA<> CHA:phwF=100.00 <> 0.52 (phon estrecas in AU/ IE )

181 ASJP: Automatic Reconstruction181 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA > CHA:phwF=100.00 > 0.52

182 ASJP: Automatic Reconstruction182 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12) / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO STAR : estreya=estrecas * LDND=27.6 SPA > CHA:f/g= 0.17/0.82 > 0.00/0.00 SPA > CHA:wwF= 83-99 > 102-85 SPA > CHA:phwF=100.00 > 0.52 SYN: CHA= puti7on (f: 1.00)

183 ASJP: Automatic Reconstruction183 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO ONE : uno=unu * LDND=36.9 SPA > CHA f/g= 0.24/0.82 > 0.03/0.00 wwF= 97-106 > 110-97 phwF= 12.00 > 0.44

184 ASJP: Automatic Reconstruction184 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO TWO : dos=dos * LDND= 0.0 SPA > CHA f/g= 0.62/1.00 > 0.12/0.00 wwF= 78-99 > 102-78 phwF=100.00 > 0.22

185 ASJP: Automatic Reconstruction185 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO NIGHT : noCe=noces * LDND=44.2 SPA > CHA f/g= 0.23/0.55 > 0.04/0.00 wwF= 89-100 > 105-92 phwF=100.00 > 0.10

186 ASJP: Automatic Reconstruction186 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO NEW : nuevo=nueba * LDND=44.2 SPA > CHA f/g=0.50/0.64 > 0.04/0.00 wwF= 68-104 > 105-80 phwF=4.27 > 0.03

187 ASJP: Automatic Reconstruction187 Inherited or borrowed? SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE / CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO PERSON : persona=petsona * LDND=15.8 SPA > CHA f/g= 0.20/0.64 > 0.01/0.00 wwF= 89-98 > 98-90 phwF=32.40 > 0.13 SYN: CHA= taotao (f: 1.00)

188 ASJP: Automatic Reconstruction188 Inherited or borrowed? Further output filters:

189 ASJP: Automatic Reconstruction189 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings

190 ASJP: Automatic Reconstruction190 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction

191 ASJP: Automatic Reconstruction191 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information

192 ASJP: Automatic Reconstruction192 Inherited or borrowed? SPANISH (spa) INDO-EUROPEAN (128) > ROMANCE (12) EURASIA SPAIN VS. CHAMORRO (cha) AUSTRONESIAN (678) > CHAMORRO OCEANIA GUAM [GEODIST=13244; GENDIST=3.00]

193 ASJP: Automatic Reconstruction193 Swadesh (2440) distance matrices ETHN WALS EXPRT TREE SFTW STAT SFTW GEO GRAPH MAP SFTW ASJP1ASJP2 HIST FACTS ‘Spaniards in Pacific since 16 th century’

194 ASJP: Automatic Reconstruction194 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information 4. Role of form and meaning ( ? )

195 ASJP: Automatic Reconstruction195 Inherited or borrowed? Further output filters: 1. Minimum N potential borrowings 2. All in the same direction 3. Geographic information 4. Role of form and meaning ( ? ) LWT

196 ASJP: Automatic Reconstruction196 Borrowed! BOR = spa TO cha 6 (=15.0%) LDND = 76.63 (shared=40; crit=70.00 - U) DATABASE: unu(*spa) dos(*spa) petsona(*spa) estrecas(*spa) noces(*spa) nueba(*spa)

197 ASJP: Automatic Reconstruction197 5. Immanent extensions

198 ASJP: Automatic Reconstruction198

199 ASJP: Automatic Reconstruction199 GARBAGE IN  GARBAGE OUT

200 ASJP: Automatic Reconstruction200 Lexical items: transcription Second year of project (2008-9): Replace ASJP code by full IPA representations

201 ASJP: Automatic Reconstruction201 Lexical items: transcription Second year of project (2008-9): Replace ASJP code by full IPA representations Juliette Jeff

202 ASJP: Automatic Reconstruction202 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved:

203 ASJP: Automatic Reconstruction203 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations

204 ASJP: Automatic Reconstruction204 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations 2. automatic conversion IPA to integer (Python)

205 ASJP: Automatic Reconstruction205 Lexical items: transcription Second year of project (2008-9): Problems with full IPA representation solved: 1. scan/download/… full IPA representations 2. automatic conversion IPA to integer (Python) 3. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

206 ASJP: Automatic Reconstruction206 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON

207 ASJP: Automatic Reconstruction207 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs

208 ASJP: Automatic Reconstruction208 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115

209 ASJP: Automatic Reconstruction209 Lexical items: transcription Abaza (Caucasian): Meaning:PERSON IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJPcode: 88,119,126,51,67,34,121,119,126,88,119,126,51 115 ( = Xw~3Cw"y~Xw~3s)

210 ASJP: Automatic Reconstruction210 Lexical items: transcription Second year of project (2008-9): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar Why not run on full IPA??

211 ASJP: Automatic Reconstruction211 Lexical items: transcription Second year of project (2008): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar Caucasian: correlations IPA ~ ASJP > 0.9

212 ASJP: Automatic Reconstruction212 Lexical items: transcription Second year of project (2008): 1. automatic conversion IPA to integer (Python) 2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar - correlations IPA ~ ASJP > 0.9 - but: ASJP better fit with classifications  IPA too specific

213 ASJP: Automatic Reconstruction213 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) ‘a’ <- 661, 895, 416, … formal grammar

214 ASJP: Automatic Reconstruction214 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) ‘a’ <- 661, 895, 416, … … C [-V] <- C [+V] / - # C [+V] <- C [-V, +PL] / - C [+V] formal grammar

215 ASJP: Automatic Reconstruction215 Lexical items: transcription IPA:ʕʷɨʧʼʲʷʕʷɨs Decimal: 661,695,616,679,700,690,695,661,695,616,115 ASJP ++ code:( = any unicode subset ) optimal level of abstraction for historical phonological reconstruction? ‘a’ <- 661, 895, 416, … … C [-V] <- C [+V] / - # C [+V] <- C [-V, +PL] / - C [+V]

216 ASJP: Automatic Reconstruction216 distance matrices ETHN WALS EXP ASJP1ASJP2 TREE SFTW STAT SFTW GEO GRAPH MAP SFTW HIST FACTS Swadesh Phon Invent Borrowing!

217 ASJP: Automatic Reconstruction217 NLGSLSPLDND Altaic30.723.688 Maya34.325.195 Afro-Asiatic128.147.172 Trans New-Guinea148.294.325 Niger-Congo379.089.125 **all significant > 0.01 Lexical items: transcription

218 ASJP: Automatic Reconstruction218 NLGSPHON Altaic30.723+ Maya34.325+ Afro-Asiatic128.172+ Trans New-Guinea148.325+ Niger-Congo379.125+ **all significant > 0.01 Lexical items: transcription

219 ASJP: Automatic Reconstruction219 - Holman, Eric et al. (2008). Advances in automated language classification.Advances in automated language classification In Arppe, Antti, Kaius Sinnemäki and Urpu Nikanne (eds.), Quantitative Investigations in Theoretical Linguistics, 40-43. Helsinki: University of Helsinki. - Holman et al. (forthc. 2008) Explorations in automated language classification.Explorations in automated language classification Folia Linguistica - Brown et al. (forthc. 2008) Automated Classification of the World’s languages: A description of the method and prelimary results Sprachtypologie und Universalienforschung - Bakker et al. (2009?) Using WALS for the ASJP project

220 ASJP: Automatic Reconstruction220 email.eva.mpg.de./~wichmann/ASJPHomePage

221 ASJP: Automatic Reconstruction221 ?


Download ppt "A dvances in Automated Language Classification ASJP Consortium (Dik Bakker)"

Similar presentations


Ads by Google