Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyses of first names in The Netherlands: full population studies

Similar presentations


Presentation on theme: "Analyses of first names in The Netherlands: full population studies"— Presentation transcript:

1 Analyses of first names in The Netherlands: full population studies
Gerrit Bloothooft Institute of Linguistics OTS Utrecht University Bloothooft, Gerrit (The Netherlands) Naming and subcultures in the Netherlands (2 a) Tuesday 14.00 Ladies and gentlemen, in The Netherlands the naming mechanisms have considerably changed during the last century. Until 1950 the naming of relatives was dominant and far over 90 % of the children got the first name of their grandparents by tradition. This has radically changed since then, leaving freedom for parents to follow their own preferences. I assume that these preferences are related to socio-cultural factors that originate in subcultures in society. In this presentation I will demonstrate that this is indeed the case. This implies the identification of relevant subcultures and their naming preferences.

2 Dutch studies on first names
Limited scientific work so far Dictionary ( entries) Few socio-linguistic studies Limited scope, small samples Topic is extremely popular in the media CTL colloqium June 2006

3 Research dimensions in onomastics
Name Form and spelling Origin Motives Time Place require a lot of data CTL colloqium June 2006

4 Full population Gemeentelijke Basis Administratie (GBA), Civil Administration Electronically from 1994 Legal right to use data for scientific research 16+ million people CTL colloqium June 2006

5 Connected! UiL-OTS and Meertens Institute are connected to the GBA on June 1, 2006 The right to make a rich data extraction for the full population (all persons with Dutch nationality): planned July 1, 2006 CTL colloqium June 2006

6 Research proposal NWO The first name revolution in the 20th century in The Netherlands – the first name as a measure of social and linguistic change CTL colloqium June 2006

7 Mile stones Traditional naming (after relatives) decreased enormously during the 20th century, especially second half Full freedom for parents through name law of 1970 -> Naming of children became a very personal linguistic and social expression during the last 50 years CTL colloqium June 2006

8 Major topics Changes in naming after relatives
Relations between names and social classes (sets and spelling) Regional spread of names, dialectal influences Life cycles of names CTL colloqium June 2006

9 What do we get (per person)
All first names Date -, place -, postal code -, land of birth, gender, date of decease (after 1994) Parents: first names, date & place of birth Children: first names, date & place of birth Administrative number of all persons with own record this is unprecedented (also internationally)! CTL colloqium June 2006

10 Looking for mechanisms
All research topics can be described as the search for large scale mechanisms and relations Away from the individual name, towards much higher aggregation levels CTL colloqium June 2006

11 Towards name sets From 16+ million names with
over different first names to a much lower number of name sets that have homogeneous properties CTL colloqium June 2006

12 A previous study ( ) First names from the National Social Security Bank (SVB) All children born since 1983 first name (official, no nick name, but..) year of birth family code (separate table) postal code (four digits) I was in the lucky circumstance that we recently acquired a database with the first name of all children born since 1983. This database included year of birth, a family code – through which we knew what children belong to the same family - , and a postal code – allowing geographical studies. The data came from the National Social Security Bank who is responsible for all social payments, among which those for children. Almost all parents are entitled to receive this payment for their children. CTL colloqium June 2006

13 A very rich source 4.2 million children (1983-2002)
per year 1.9 million families different first names unique names 3.120 names with frequency > 100 represent 85% of the children The database is very large. It contains the first name from over 3.5 million children from the period we studied: These children came from 1.9 million families. These children had over 150 thousand different names, of which two-third occurred only once. A little more than 3000 names had a frequency over 100 and represent 85 % of all children. Many studies can be envisioned using such a rich and complete source. I present only one investigation, specially targetted to the rare information on names of children per family. CTL colloqium June 2006

14 Datareduction needed Far too many names to describe one by one
Names with common properties Not from etymological point of view Not from linguistic point of view Based on choices of parents name use! CTL colloqium June 2006

15 Naming and social classes Hypothesis:
There are social classes with own naming preferences These classes/subcultures may relate to culture/language (Frisian, Arabic, Turkish, Surinam, Antillean,..) religion (Catholic, Protestant, Islam,..) sociological status (education, income,..) geography (urban, rural, regional,..) CTL colloqium June 2006

16 Research aims: Identification of social classes (and their naming preferences) on the basis of the first names of children per family Study of the relation between these subcultures (first names) and socio-cultural and geographic factors I think so, and therefore the research aims are to try to identify subcultures and naming preferences on the basis of first names of children per family. If we succeed in this effort, we want to analyze these subcultures with respect to geographic and socio-cultural factors. CTL colloqium June 2006

17 Method (a chain of names)
Parents choose first names from a set that is popular in their subculture (relatives, friends, neighbours,..) (with higher probability) [Social Group size is about 150] This is informative only if there is more than one child (more than one name) in a family Pairs of first names (from a family) as unit for analysis If parents choose names for their children from a certain set, this is only informative for us if there is more than one child. With two or more first names from a family we have the information that if one of the names belongs to a set, there is a likelihood that the other names belong to the set as well. We use pairs of first names (from the same family) as a unit for further analysis. CTL colloqium June 2006

18 Method (a chain of names)
Children in on family: Mark, Peter, Linda If Mark is popular in a subculture, then Peter and Linda may be popular as well Name pairs: Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda - Peter Here is an example. A family has the children Mark, Peter and Linda. If we have found that Mark is popular in a subculture, then this family suggests that Peter and Linda may be popular as well in that subculture. The name pairs that underlie further analysis are in this case Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda – Peter. On the basis of such name pairs we can get information of the names of all brothers and sisters of Mark for instance, and their frequency. CTL colloqium June 2006

19 Method (a chain of names)
Select all families with two or more children (1.17 million families, 2.81 million children) Derive all pairs of first names (from a single family) (in all, 2.12 million different pairs) Compute the frequency of each pair The higher the frequency of a pair, the more likely the first names in the pair belong to the same set In our database we had 1.17 million families with more than one child. These families had 2.81 million children. In all 2.12 million DIFFERENT pairs of names were found (many of these with low frequency). CTL colloqium June 2006

20 Most frequent name pairs
Frequency Pair of first names 1091 Johannes Maria 790 Johanna 754 Jeroen Martijn 727 …. 572 Mohamed Fatima 459 Lars Niels Here are the most frequent pairs of first names. We see Johannes and Maria on top followed by Johannes and Johanna. Interestingly the combination Johanna and Maria is already on the fourth place, indicating strong links between the three names Johannes, Maria and Johanna. Other examples of closely related names are the Dutch names Jeroen and Martijn, the Arabic names Mohamed and Fatima, and the Scandinavian names Lars and Niels. CTL colloqium June 2006

21 Clustering of first names
Define measure that reflects relationship between two names Combine names, which mutually have a strong relationship, into a set Johannes, Maria, Johanna, … In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. CTL colloqium June 2006

22 Name relationship measure
Esther 7.967 girls brothers and sisters 276 times sister Judith (= 2.1 %) Judith 4.828 girls 8.033 brothers and sisters 276 times sister Esther (= 3.4 %) Geometric average (2.7 %) A symmetric measure of relationship between the two names In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. CTL colloqium June 2006

23 Clustering of first names
Name pairs from a (subculture-related) set have the highest relation measure Esther: Judith 2.7 Mirjam 2.4 Ruben 1.2 David 1.1 Judith: Esther 2.7 Mirjam 1.6 Ruben 1.0 Miriam 0.8 If we look at the most related names of brothers and sisters of both Esther and Judith, in the top we immediately see a preference for names from the Old Testament. These names are likely member of the same set. CTL colloqium June 2006

24 Clustering Start with strongly related name-pairs
Add new name-pair to existing cluster or start a new cluster Iterative procedure CTL colloqium June 2006

25 Clustering results 4.013 first names result: 340 name sets
Frequency of a pair > 4 result: 340 name sets Limited number of large sets High number of small sets top-25 of sets is most illustrative 2.887 first names 2.64 million children (75%) An iterative clustering procedure was used to establish the membership of a first name to a certain set. We limited the available material to first names from pairs that had a frequency higher than four. That were about four thousand first names. The procedure resulted in 340 different name sets. Some included a large number of names, others only a few. The top-25 of these sets already proved to be very illustrative, they contained almost 2900 first names and had a coverage of 75 % of all 3.5 million children. CTL colloqium June 2006

26 Features of name sets Period of maximum popularity refine! Language
Traditional, Pre-modern ( ), Modern Language Dutch, Frisian, English, American, French, Spanish, Italian, [Arabic, Turkish] Common Western Topic area Nature, History & Culture, Old Testament Length Short (one syllable), long In general terms the name sets could be described using the following features: The period of maximum popularity of the first names: These are traditional names, predominantly used until 1950, names that gained popularity in the fifties and sixties which we called pre-modern names, and names that became popular since then, the modern names. Furthermore, the language is an important factor. We have Dutch, Frisian, English, American, French, Spanish, Italian, Arabic and Turkish names. In this presentation I exclude the Arabic and Turkish first names because they form separate and closed classes. Then there are sets that include names that are quite commonly used in Western Societies, like Mark. There are also sets related to topic areas, such as nature, history & culture and the Old Testament. Finally the length of the name proved to be an important factor. A distinction showed between short names (of one syllable) and longer names. CTL colloqium June 2006

27 A map of name sets Presentation of a map of name sets
Based on mutual relations between name sets The closer two name sets on the map, the more related the sets Since it is impossible to discuss the resulting name sets one by one, I attempted a presentation of name sets in the form of a map. In such a map, the closer the name sets, the more related they are. CTL colloqium June 2006

28 Short American & English
Spanish & Italian Long American & English Short American & English Pre-modern English & French Long names from the Old Testament Names from nature Long names from history and culture Short modern Common Western Pre-modern Common Western Long French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Latin | Dutch Short traditional Dutch Frisian Here is the result. We see … Bold type name sets are the relatively larger ones, while color indicates declining name sets. Colors indicated declining name sets. This is most dramatically for the Traditional Dutch set, which dropped from over 90 to about 10 % of all names in a few generations, the pre-modern sets, but also the American & English sets already had their maximum popularity a few years ago. CTL colloqium June 2006

29 Dimensions Foreign Long Short Modern Pre-modern Common Western
Dutch, Frisian Long Short Modern Pre-modern Traditional The previous map can be schematized with a couple of major dimensions: The language factor: Dutch & Frisian -> Common Western -> Foreign And independent from this one: The time factor: Traditional -> Pre-modern -> Modern The Length factor: short -> long names CTL colloqium June 2006

30 Spanish & Italian RICARDO
Long American & English MICHAEL Short American & English Pre-modern English & French DENNIS KIM Names from the Old Testament DANIËL Names from nature IRIS Names from history and culture LAURENS Short modern TIM Common Western Pre-modern MARK Common Western French Scandinavian NIELS CHARLOTTE Pre-modern Dutch JEROEN Short modern Dutch BART Traditional Dutch JOHANNES | JAN Short traditional Dutch TEUN Frisian JELLE For those of you who want to get a flavor of the first names themselves, here are the name sets again but each with the most frequent first name. CTL colloqium June 2006

31 Geographical distribution
four-digit postal code area level [3584] Big differences between pc areas city quarters villages (religion) Enough children for characterisation On average 1200 births per pc in 20 years Some further name grouping needed We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. CTL colloqium June 2006

32 Further grouping Traditional names (Latin form)
Traditional names (Dutch) Frisian names Pre-modern names (Dutch, Western) Foreign names (English) Short modern names (Dutch, Western, Skand) Names from OT, history, culture, nature Arabic & Turkish names [unrelated group] Other [low frequent] % 8 5 3 12 24 13 7 23 CTL colloqium June 2006

33 Traditional Latin Dutch
Spanish & Italian Long American & English Short American & English Pre-modern English & French Names from the Old Testament Names from nature Names from history and culture Short modern Western Pre-modern Western French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Short traditional Dutch Foreign History & Culture Pre-Modern Short In terms of our previous map this grouping shows as follows Traditional Latin Dutch Frisian CTL colloqium June 2006

34 Traditional (Dutch) Aaltje Barend Dirkje Evert Geertje Harm Jantje Klaas Margje Teunis CTL colloqium June 2006

35 Traditional (Latin form)
Adriana Bernardus Christina Eduard Elisabeth Franciscus Geertruida Hubertus Johanna Krijn Maria CTL colloqium June 2006

36 Frisian names Aafke Bauke Douwe Froukje Joppe Jitske Jelle Menno Sietske Onno Wietske Wiebe CTL colloqium June 2006

37 Pre-modern names (Dutch, Western)
Anniek Anita Carla Frank Jochem Jeroen Linda Mark Marloes Paul Suzanne CTL colloqium June 2006

38 Foreign names (English)
Amanda Dennis Danny Chantal Henry Isabella Kim Kevin Melissa Ricardo Samantha Stephen CTL colloqium June 2006

39 Short names (modern, Dutch, Western, Skand)
Anne Bart Eva Gijs Lisa Kaj Niels Sanne Sofie Tim CTL colloqium June 2006

40 Short names - Religion None Protestant Catholic Religion
CTL colloqium June 2006

41 Old testament history, culture, nature
Daniël Esther Judith Naomi Willemijn Diederik Frederieke Maurits Iris Fleur Jasmijn CTL colloqium June 2006

42 Income Religion Lowest Highest CTL colloqium June 2006

43 Arabic and Turkish names
Fatima Mohamed Noura Hamza Sara Yassin Fatma Mustafa Hatice Mehmet CTL colloqium June 2006

44 Further geographical analysis
Per pc area: percentage of children per name group (8 values) These percentages reflect social composition of the pc area Factor analysis on data from 3584 pc areas 10 typical profiles We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. CTL colloqium June 2006

45 10 profiles Traditional – Latin form Traditional – Dutch
Transitional, Traditional Dutch to pre-modern Transitional, Traditional Latin form to foreign Pre-modern Foreign Short Elite Arabic-Turkish Frisian CTL colloqium June 2006

46 Example profile Traditional – Latin form
Traditional – Latin form Traditional – Dutch Frisian names Pre-modern names Foreign names Short names Names from OT, history, culture, nature Arabic and Turkish names other % CTL colloqium June 2006

47 Naming map of the Netherlands
Frisian pre-modern Arab Turkish traditional Dutch elite >foreign traditional Latin short foreign CTL colloqium June 2006

48 EU constitution votes Education level CTL colloqium June 2006

49 Educational level Highest Lowest Education level
CTL colloqium June 2006

50 Conclusions Successful data reduction Name groups & subcultures
language, income, education, religion Geographic representation four-digit postal code area just right The factor time should be included CTL colloqium June 2006

51 The Wegener connection
Direct marketing company Organises twice a year a national consumer questionnaire families per year Wide range of information Income, education level Includes first names and year of birth of all family members CTL colloqium June 2006

52 Correlation at family level (instead of postal code level)
Name set & Income of parents Educational level (of both parents) (newspapers, underwear, cars, insurance, holidays,…..) preferences of parents CTL colloqium June 2006

53 Mathematical studies Life cycle of a name Zipf’s behavior
A few names with high frequency, a lot of names that are unique information function of a name in communication CTL colloqium June 2006

54 CTL colloqium June 2006

55 Research dimensions in onomastics
Name Form and spelling Origin Motives Time Place YES, we can do great research on this with the full population data! CTL colloqium June 2006

56 Contact Book: Over voornamen, Het spectrum (2004)
Homepage: Mail: Trans 10, 3512 JK Utrecht, The Netherlands CTL colloqium June 2006


Download ppt "Analyses of first names in The Netherlands: full population studies"

Similar presentations


Ads by Google