Gerrit Bloothooft Institute of Linguistics OTS Utrecht University

Slides:

Advertisements

Similar presentations

Introduction to Geopolitics

Advertisements

Measuring Ethno-Cultural Characteristics in Population Censuses United Nations Economic Commission for Europe Statistical Division Regional Training Workshop.

Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of Linguistics OTS Utrecht University.

ISCI second International conference University of Western Sydney November, 2009.

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.

First names in the Netherlands from preferences of parents to socio-geographic representations Gerrit Bloothooft Institute of Linguistics OTS Utrecht University.

Proper names in The Netherlands from the Civil Registration: full population data Gerrit Bloothooft Utrecht University / Meertens Institute KNAW

Lecture 2 Income Inequality, Mobility, and the Limits of Opportunity.

The third International Population Geography Conference Liverpool, June 2006 Proximity of adult children to their elderly parents in the Netherlands.

Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.

Estimation and Confidence Intervals

Becoming an American and citizenship

Society  Any relatively self-contained and self- sufficient group united by social relationships.  Two central components of society: Social Structure.

From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.

Five Themes of Geography (Mr. Help)

Additional analysis of poverty in Scotland 2013/14 Communities Analytical Services July 2015.

Methodological questions of migration and ethnocultural diversity in Population Censuses StatCapCA Training Workshop No 3 Dushanbe, March 2007 Werner Haug.

1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.

American Relationships… Family, Marriage & Divorce, Homosexuality

UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.

Chapter 4 Macrosociology: Studying Larger Groups and Societies Key Terms.

1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.

Descriptive Statistics Research Writing Aiden Yeh, PhD.

Languages in the Contemporary World Although languages have common properties, from the point of view of their users, it is the differences that count,

Comes from the Greek word ethnos meaning “people” or “nation”

Lecture 7 Gender & Age.

Chapter 1: The What and the Why of Statistics

Research Design

Language UNIT 3 REVIEW. Language  What is it? Language is a systematic means of communicating ideas and feelings through the use of signs, gestures,

2.3 Texans and Geography.

Family and household structure Part 2

3. The X and Y samples are independent of one another.

Statistical Data Analysis

Distribution of the Sample Means

CHAPTER 11 Inference for Distributions of Categorical Data

SAMPLING (Zikmund, Chapter 12.

Assistant Headteacher

28 November - 1 December 2016, Amman, Jordan

THE CHANGING AMERICAN SOCIETY: SUBCULTURES

Chapter 6-Section 4 Voter Behavior

Log Linear Modeling of Independence

Introduction to Summary Statistics

Chapter 6: Voters and Voter Behavior Section 4

The European Statistical Training Programme (ESTP)

Week 6 – Review & Intro. Topic: Declining birth rate world wide

28 November - 1 December 2016, Amman, Jordan

Introduction to Summary Statistics

Analyses of first names in The Netherlands: full population studies

Chapter 6: Voters and Voter Behavior

Liberal Attitude Change in the Post-Industrial West

CHAPTER 11 Inference for Distributions of Categorical Data

BR: T1D13 Over time, cultures experience change. Can you think of some examples of how culture has changed here in the U.S.? What do you think causes.

One-Way Analysis of Variance

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 6: Voters and Voter Behavior Section 4

SAMPLING (Zikmund, Chapter 12).

Statistical Data Analysis

Chapter 6: Voters and Voter Behavior Section 4

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

SOCIOLOGY 110 Break-out Session

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 5: The analysis of nonresponse

Introduction about sociology

Chapter 1 Section 1 Being an America

Presentation transcript:

Gerrit Bloothooft Institute of Linguistics OTS Utrecht University First names in the Netherlands from preferences of parents to socio-geographic representations Gerrit Bloothooft Institute of Linguistics OTS Utrecht University Bloothooft, Gerrit (The Netherlands) Naming and subcultures in the Netherlands (2 a) Tuesday 14.00 Ladies and gentlemen, in The Netherlands the naming mechanisms have considerably changed during the last century. Until 1950 the naming of relatives was dominant and far over 90 % of the children got the first name of their grandparents by tradition. This has radically changed since then, leaving freedom for parents to follow their own preferences. I assume that these preferences are related to socio-cultural factors that originate in subcultures in society. In this presentation I will demonstrate that this is indeed the case. This implies the identification of relevant subcultures and their naming preferences.

Dutch studies on first names Limited scientific work Dictionary (20.000 entries) Few socio-linguistic studies Limited scope, small samples Topic is extremely popular in the media Linguistics Groningen - 2005

Linguistics Groningen - 2005 First names data Hard to get from civil registration privacy issues New horizons because of digitization of the population administration (and archives) but distributed storage Linguistics Groningen - 2005

A full population study First names from the National Social Security Bank (SVB) All children born since 1983 first name (official, no call name, but..) year of birth family code (separate table) unique! postal code I was in the lucky circumstance that we recently acquired a database with the first name of all children born since 1983. This database included year of birth, a family code – through which we knew what children belong to the same family - , and a postal code – allowing geographical studies. The data came from the National Social Security Bank who is responsible for all social payments, among which those for children. Almost all parents are entitled to receive this payment for their children. Linguistics Groningen - 2005

Linguistics Groningen - 2005 A very rich source 4.2 million children (1983-2002) 200.000 per year 1.9 million families 176.800 different first names 108.500 unique names 3.120 names with frequency > 100 represent 85% of the children The database is very large. It contains the first name from over 3.5 million children from the period we studied: 1983-1999. These children came from 1.9 million families. These children had over 150 thousand different names, of which two-third occurred only once. A little more than 3000 names had a frequency over 100 and represent 85 % of all children. Many studies can be envisioned using such a rich and complete source. I present only one investigation, specially targetted to the rare information on names of children per family. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Datareduction needed Far too many names to describe one by one Names with common properties Not from etymological point of view Not from linguistic point of view Based on choices of parents name use! Linguistics Groningen - 2005

Naming and subcultures Hypothesis: There are subcultures with own naming preferences These subcultures may relate to culture/language (Frisian, Arabic, Turkish, Surinam, Antillean,..) religion (Catholic, Protestant, Islam,..) sociological status (education, income,..) geography (urban, rural, regional,..) Linguistics Groningen - 2005

Naming and subcultures Research aims: Identification of subcultures (and their naming preferences) on the basis of the first names of children per family Study of the relation between these subcultures (first names) and socio-cultural and geographic factors I think so, and therefore the research aims are to try to identify subcultures and naming preferences on the basis of first names of children per family. If we succeed in this effort, we want to analyze these subcultures with respect to geographic and socio-cultural factors. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Once again Analysis (grouping) of first names on the basis of the choices of the parents, i.e. name use NOT on any other scientific assumption It is very important to keep in mind that such a study is based on CHOICES OF THE PARENTS. I did not make any additional assumption on grouping of first names beforehand. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Contents Method Sets of first names A map of name sets Geographic distribution of name sets Regional name profiles Socio-cultural factors of name sets Conclusions In the following, I will shortly explain the method used to arrive at sets of first names which are supposedly related to subcultures. I will clarify relations between these sets by presenting them in a map. Then I will demonstrate how these name sets (or subcultures) have prominence in specific regions of the Netherlands, with some pointers to socio-cultural and religious factors. Finally, I will come to conclusions. Linguistics Groningen - 2005

Method (a chain of names) Parents choose first names from a set that is popular in their subculture (relatives, friends, neighbors,..) (with higher probability) This is informative only if there is more than one child (more than one name) in a family Pairs of first names (from a family) as unit for analysis If parents choose names for their children from a certain set, this is only informative for us if there is more than one child. With two or more first names from a family we have the information that if one of the names belongs to a set, there is a likelihood that the other names belong to the set as well. We use pairs of first names (from the same family) as a unit for further analysis. Linguistics Groningen - 2005

Method (a chain of names) Family: Mark, Peter, Linda If Mark is popular in a subculture, then Peter and Linda may be popular as well Name pairs: Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda - Peter Here is an example. A family has the children Mark, Peter and Linda. If we have found that Mark is popular in a subculture, then this family suggests that Peter and Linda may be popular as well in that subculture. The name pairs that underlie further analysis are in this case Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda – Peter. On the basis of such name pairs we can get information of the names of all brothers and sisters of Mark for instance, and their frequency. Linguistics Groningen - 2005

Method (a chain of names) Select all families with two or more children (1.17 million families, 2.81 million children) Derive all pairs of first names (from a single family) (in all, 2.12 million different pairs) Compute the frequency of each pair The higher the frequency of a pair, the more likely the first names in the pair belong to the same set In our database we had 1.17 million families with more than one child. These families had 2.81 million children. In all 2.12 million DIFFERENT pairs of names were found (many of these with low frequency). Linguistics Groningen - 2005

Most frequent name pairs Frequency Pair of first names 1091 Johannes Maria 790 Johanna 754 Jeroen Martijn 727 …. 572 Mohamed Fatima 459 Lars Niels Here are the most frequent pairs of first names. We see Johannes and Maria on top followed by Johannes and Johanna. Interestingly the combination Johanna and Maria is already on the fourth place, indicating strong links between the three names Johannes, Maria and Johanna. Other examples of closely related names are the Dutch names Jeroen and Martijn, the Arabic names Mohamed and Fatima, and the Scandinavian names Lars and Niels. Linguistics Groningen - 2005

Clustering of first names Define measure that reflects relationship between two names Combine names that mutually have a strong relationship into a set Johannes, Maria, Johanna, … In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about 13.000 brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. Linguistics Groningen - 2005

Name relationship measure Esther 7.967 girls 12.973 brothers and sisters 276 times sister Judith (= 2.1 %) Judith 4.828 girls 8.033 brothers and sisters 276 times sister Esther (= 3.4 %) Geometric average (2.7 %) A symmetric measure of relationship between the two names In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about 13.000 brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Alternative measure In terms of probablities Prob(name_pair) / indepentProb(name_pair) indepentProb = if there is no specific preference Series of problems High-frequent name pairs should get a stronger weight (estimation inaccuracies for low-frequent pairs) In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about 13.000 brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. Linguistics Groningen - 2005

Clustering of first names Name pairs from a (subculture-related) set have the highest relation measure Esther: Judith 2.7 Mirjam 2.4 Ruben 1.2 David 1.1 Judith: Esther 2.7 Mirjam 1.6 Ruben 1.0 Miriam 0.8 If we look at the most related names of brothers and sisters of both Esther and Judith, in the top we immediately see a preference for names from the Old Testament. These names are likely member of the same set. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Clustering Start with strongly related name-pairs Add new name-pair to existing cluster or start a new cluster Iterative procedure Linguistics Groningen - 2005

Linguistics Groningen - 2005 Clustering results 4.013 first names Frequency of a pair > 4 result: 340 name sets Limited number of large sets High number of small sets top-25 of sets is most illustrative 2.887 first names 2.64 million children (75%) An iterative clustering procedure was used to establish the membership of a first name to a certain set. We limited the available material to first names from pairs that had a frequency higher than four. That were about four thousand first names. The procedure resulted in 340 different name sets. Some included a large number of names, others only a few. The top-25 of these sets already proved to be very illustrative, they contained almost 2900 first names and had a coverage of 75 % of all 3.5 million children. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Features of name sets Period of maximum popularity Traditional, Pre-modern (1950-1980), Modern Language Dutch, Frisian, English, American, French, Spanish, Italian, [Arabic, Turkish] Common Western Topic area Nature, History & Culture, Old Testament Length Short (one syllable), long In general terms the name sets could be described using the following features: The period of maximum popularity of the first names: These are traditional names, predominantly used until 1950, names that gained popularity in the fifties and sixties which we called pre-modern names, and names that became popular since then, the modern names. Furthermore, the language is an important factor. We have Dutch, Frisian, English, American, French, Spanish, Italian, Arabic and Turkish names. In this presentation I exclude the Arabic and Turkish first names because they form separate and closed classes. Then there are sets that include names that are quite commonly used in Western Societies, like Mark. There are also sets related to topic areas, such as nature, history & culture and the Old Testament. Finally the length of the name proved to be an important factor. A distinction showed between short names (of one syllable) and longer names. Linguistics Groningen - 2005

Linguistics Groningen - 2005 A map of name sets Presentation of a map of name sets Based on mutual relations between name sets The closer two name sets on the map, the more related the sets Since it is impossible to discuss the resulting name sets one by one, I attempted a presentation of name sets in the form of a map. In such a map, the closer the name sets, the more related they are. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Spanish & Italian Long American & English Short American & English Pre-modern English & French Long names from the Old Testament Names from nature Long names from history and culture Short modern Common Western Pre-modern Common Western Long French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Latin | Dutch Short traditional Dutch Frisian Here is the result. We see … Bold type name sets are the relatively larger ones, while color indicates declining name sets. Colors indicated declining name sets. This is most dramatically for the Traditional Dutch set, which dropped from over 90 to about 10 % of all names in a few generations, the pre-modern sets, but also the American & English sets already had their maximum popularity a few years ago. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Dimensions Foreign Common Western Dutch, Frisian Long Short Modern Pre-modern Traditional The previous map can be schematized with a couple of major dimensions: The language factor: Dutch & Frisian -> Common Western -> Foreign And independent from this one: The time factor: Traditional -> Pre-modern -> Modern The Length factor: short -> long names Linguistics Groningen - 2005

Linguistics Groningen - 2005 Spanish & Italian RICARDO Long American & English MICHAEL Short American & English Pre-modern English & French DENNIS KIM Names from the Old Testament DANIËL Names from nature IRIS Names from history and culture LAURENS Short modern TIM Common Western Pre-modern MARK Common Western French Scandinavian NIELS CHARLOTTE Pre-modern Dutch JEROEN Short modern Dutch BART Traditional Dutch JOHANNES | JAN Short traditional Dutch TEUN Frisian JELLE For those of you who want to get a flavor of the first names themselves, here are the name sets again but each with the most frequent first name. Linguistics Groningen - 2005

Geographical distribution Postal code area level [3584] Big differences between pc areas city neighborhoods villages (religion) Enough children for characterisation ~1200 births per pc in 20 years Some further name grouping needed We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. Linguistics Groningen - 2005

Linguistics Groningen - 2005 Further grouping Traditional names (Latin form) Traditional names (Dutch) Frisian names Pre-modern names (Dutch, Western) Foreign names (English) Short modern names (Dutch, Western, Skand) Names from OT, history, culture, nature Arabic & Turkish names [unrelated group] Other [low frequent] % 8 5 3 12 24 13 7 23 Linguistics Groningen - 2005

Linguistics Groningen - 2005 Spanish & Italian Long American & English Short American & English Pre-modern English & French Names from the Old Testament Names from nature Names from history and culture Short modern Western Pre-modern Western French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Short traditional Dutch Foreign History & Culture Pre-Modern Short In terms of our previous map this grouping shows as follows Traditional Latin Dutch Frisian Linguistics Groningen - 2005

Linguistics Groningen - 2005 Traditional (Dutch) Aaltje Barend Dirkje Evert Geertje Harm Jantje Klaas Margje Teunis Linguistics Groningen - 2005

Traditional (Latin form) Adriana Bernardus Christina Eduard Elisabeth Franciscus Geertruida Hubertus Johanna Krijn Maria Linguistics Groningen - 2005

Linguistics Groningen - 2005 Frisian names Aafke Bauke Douwe Froukje Joppe Jitske Jelle Menno Sietske Onno Wietske Wiebe Linguistics Groningen - 2005

Pre-modern names (Dutch, Western) Anniek Anita Carla Frank Jochem Jeroen Linda Mark Marloes Paul Suzanne Linguistics Groningen - 2005

Foreign names (English) Amanda Dennis Danny Chantal Henry Isabella Kim Kevin Melissa Ricardo Samantha Stephen Linguistics Groningen - 2005

Short names (modern, Dutch, Western, Skand) Anne Bart Eva Gijs Lisa Kaj Niels Sanne Sofie Tim Linguistics Groningen - 2005

Linguistics Groningen - 2005 Short names - Religion Religion None Protestant Catholic Linguistics Groningen - 2005

Old testament history, culture, nature Daniël Esther Judith Naomi Willemijn Diederik Frederieke Maurits Iris Fleur Jasmijn Linguistics Groningen - 2005

Linguistics Groningen - 2005 Income Religion Lowest Highest Linguistics Groningen - 2005

Arabic and Turkish names Fatima Mohamed Noura Hamza Sara Yassin Fatma Mustafa Hatice Mehmet Linguistics Groningen - 2005

Further geographical analysis Per pc area: percentage of children per name group (8 values) These percentages reflect social composition of the pc area Factor analysis on data from 3584 pc areas 10 typical profiles We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. Linguistics Groningen - 2005

Linguistics Groningen - 2005 10 profiles Traditional – Latin form Traditional – Dutch Transitional, Traditional Dutch to pre-modern Transitional, Traditional Latin form to foreign Pre-modern Foreign Short Elite Arabic-Turkish Frisian Linguistics Groningen - 2005

Example profile Traditional – Latin form Traditional – Latin form Traditional – Dutch Frisian names Pre-modern names Foreign names Short names Names from OT, history, culture, nature Arabic and Turkish names other % 37 18 1 8 12 6 6 0 12 Linguistics Groningen - 2005

Naming map of the Netherlands Frisian pre-modern Arab Turkish trad. Dutch elite foreign trad.Latin short foreign Linguistics Groningen - 2005

Linguistics Groningen - 2005 EU constitution votes Education level Linguistics Groningen - 2005

Linguistics Groningen - 2005 Educational level Education level Highest Lowest Linguistics Groningen - 2005

Naming map of Groningen province trad. Dutch > pre-modern foreign pre-modern elite Linguistics Groningen - 2005

Naming map of Groningen city (typical city pattern) % households with income in highest 20% class Linguistics Groningen - 2005

Linguistics Groningen - 2005

Typical Groningen names Oldambt: Boelo, Doeko, Adzo, Elzo, Popko, Rienko, Wubbo | Grieto, Trienko Frisian: Alke, Bouktje, Rikste, Eisse, Wiert Peat-colonies: Hinderika, Harmannes, Geessien, Hillechien regional names are becoming rare Linguistics Groningen - 2005

Linguistics Groningen - 2005 Conclusions Successful representation of Linguistics Groningen - 2005

Linguistics Groningen - 2005 Further studies Changes in naming Missing data 1940-1982; towards full population data Current study 1983-2002; towards 5 year period analysis Who starts name renewal, how does it spread Names Call names & official names (using consumer questionnaires) Spelling choices Social factors in naming Role of naming after relatives (in first, second, third name) Gender dependencies Income, education, religion Mathematics of naming (chaos theory) Name pronunciation (for speech synthesis) Linguistics Groningen - 2005

Linguistics Groningen - 2005 Contact E-mail: Gerrit.Bloothooft@let.uu.nl Homepage: www.let.uu.nl/~Gerrit.Bloothooft/personal Mail: Trans 10, 3512 JK Utrecht, The Netherlands Linguistics Groningen - 2005