Quantifying culture: a discussion and reflection on current methodology Michael Regier BAR, BSc, MSc Department of Statistics, UBC.

Slides:



Advertisements
Similar presentations
Chapter 7 System Models.
Advertisements

Does Geography Matter? Variations in Perceived Discomfort and Discrimination in Canadas Gateway Cities Brian Ray, University of Ottawa Valerie Preston,
Child disabilities Multiple Indicator Cluster Surveys- MICS3 Analysis and Report Writing Workshop Panama City, July 12-20, 2006.
Disability. Goals and Indicators Methodological issues Needs proper adaptation to the conditions and language of country The indicator provides a screening.
United Nations Statistics Division National CensusInfo Training Department of Census and Statistics, Colombo, Sri Lanka, 19 – 23 September 2011.
Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.
Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Benefit Transfer of Non-Market Values – Understanding the concepts John Rolfe Central Queensland University.
Non-linear Components for Multiple Regression Why we might need non-linear components Type of non-linear components Squared-term components & their disadvantages.
In defence of ‘race’ and ethnicity as analytical variables in epidemiological research George TH Ellison St George’s – University of London.
The micro-geography of UK demographic change Paul Norman School of Geography, University of Leeds Understanding Population Trends and Processes.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Kaj Jørgensen, Aalborg University, Department of Production and Jørn Skauge, School of Architecture Aarhus, Department of Architectural Design Building.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Squeezing more out of existing data sources: Small Area Estimation of Welfare Indicators Berk Özler The World Bank Development Research Group, Poverty.
Concepts and definitions to identify the stock of international migrants: the Canadian case study Presentation prepared for Joint ECE/Eurostat Seminar.
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
IDENTIFICATION AND CONSULTATION WITH DATA USERS – TANZANIA EXPERIENCE Presentated by Irenius Ruyobya National Bureau of Statistics Tanzania September,
The Gender Gap in Educational Attainment: Variation by Age, Race, Ethnicity, and Nativity in the United States Sarah R. Crissey, U.S. Census Bureau Nicole.
The National Geography Standards
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
The Institute of Advanced Study Sir Kenneth Calman Vice-Chancellor and Warden 11 October 2006 Shaped by the past, creating the future.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Site Location.
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section A 1.
1 Immigrant Economic and Social Integration in Canada: Research, Measurement, Data Development By Garnett Picot Director General Analysis Branch Statistics.
Research Methods in Computer Science Lecture: Quantitative and Qualitative Data Analysis | Department of Science | Interactive Graphics System.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Six Elements, Eighteen Standards of Geography (from Geography for Life)
Copyright 2010, The World Bank Group. All Rights Reserved. COVERAGE, FRAMES & GIS, Part 2 Quality assurance for census 1.
1 Using administrative data to analyze the health experience of African Nova Scotians— An exploratory study Prepared for 2008 CPHA Conference, June 1-4,
2006 Census Recensement de Census Geography  DLI – Wolfville, Nova Scotia April 24, 2008 Marc Melanson Eastern Region Halifax, Nova Scotia Statistics.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Batch kernel SOM and related Laplacian methods for social network analysis Presenter : Lin, Shu-Han Authors.
Introduction to Geography By Arthur Getis Judith Getis Jerome D. Fellmann.
GROUP 2 Practical C. Question 1 Cut off will depend on the country situation : 1 pig may be significant Frequency distribution – take the lower 10 – 20%
Media Arts and Technology Graduate Program UC Santa Barbara MAT 259 Visualizing Information Winter 2006George Legrady1 MAT 259 Visualizing Information.
Gender Aspects and Minority Data: An Illustrative Case of Roma Women in Southeast Europe United Nations Development Programme Nadja Dolata and Susanne.
Elsevier items and derived items © 2007 by Saunders, an imprint of Elsevier Inc. Chapter 9 Statistics.
Transfer Viva Empathic Visualisation Algorithm (EVA)
A Quick Introduction to GIS
U.S. Department of Commerce Economics and Statistics Administration U.S. Census Bureau Overview of Race and Hispanic Origin: 2010 March 2011.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Methods of Statistical Analysis and Dissemination of Census Results in Guyana MORGAN CLITUS DIAS SENIOR CARTOGRAPHER BUREAU OF STATISTICS GEORGEOWN,GUYANA.
WELCOME TO OUR PRESENTATION UNIFIED MODELING LANGUAGE (UML)
1 A investigation of ethnic variations in mortality using the ONS Longitudinal Study Chris White Health Variations Team Office for National Statistics.
EUROCONTROL EXPERIMENTAL CENTRE1 / 29/06/2016  Raphaël CHRISTIEN  Network Capacity & Demand Management  5 th USA/Europe ATM 2003 R&D seminar  23 rd.
Estimation of Distribution Algorithm and Genetic Programming Structure Complexity Lab,Seoul National University KIM KANGIL.
4.1 THE INTERNATIONAL SYSTEM. The term international system is a metrical concept that facilitates a clear formulation of the main variables with which.
Software Engineering Lecture 4 System Modeling The Analysis Stage.
Patent Mapping and Visualization
Meningitis surveillance
Michael A. McAdams Geography Dept. Fatih University Istanbul, Turkey
WORKSHOP GROUP ON QUALITY IN STATISTICS
Measuring Equity, Diversity and Inclusion in the Research Ecosystem
System models October 5, 2005.
Probabilistic Models with Latent Variables
Who is the average Canadian?
The implementation of a more efficient way of collecting data
RACE VS ETHNICITY BY Jay Barrett.
Disseminating Statistics to the Research Community
What is Regression Analysis?
Software Engineering System Modeling Chapter 5 (Part 1) Dr.Doaa Sami
Data Transformations targeted at minimizing experimental variance
Tool for Assessing Statistical Capacity (TASC)
The role of metadata in census data dissemination
Presentation transcript:

Quantifying culture: a discussion and reflection on current methodology Michael Regier BAR, BSc, MSc Department of Statistics, UBC

2 Outline NET definition of culture Operational problem Three current solutions Conclusions Open Discussion

3 NET definition of culture The cultural NET has defined culture as –“a complex interplay of meanings that represent and shape the individual and collective lives of people”. This definition is Postmodern in spirit. –Individuals and communities are shaped by meanings (e.g. language, country of birth, religion). –identity is socially (contextually) constructed.

4 Operational problem 1: measurement Although the cultural NET has a definition of culture, the definition provides little guidance on how to make it operational. –No available meta-narrative. e.g. All persons of who have ancestral links, within four generations, to the Indian subcontinent will be identified as South Asian. We need a function that will map (measure) the Postmodern definition to a value (e.g. binary indicator, population count)

5 Operational problem 2: cultural data In the absence of individual level cultural data, census data is used. Neighbourhoods are considered equivalent to the dissemination area (DA) –DA data is the smallest geographic area at which census data is freely released –In general, the DA represents the basic building block for other census based geographic areas –A DA is a compact (e.g. square) area with visible boundaries (e.g. road, river) with an average of 550 people. Census data is collected on predetermined measures (e.g. mother tongue, country of birth, ethnic origins) –Mismatch definitions used to abstract patients from registry database –Proxy to individual measure of culture. –Proxy to “true” definition of culture.

6 Three solutions Cut-off measure Compression and clustering (CC) measure Bayesian measure

7 Cut-off measure Methodology inherited from the Nova Scotia research team Uses a single cultural attribute (e.g. South Asian Ethnicity) to define a DA. A cut-off is selected for a specific census variable by exploring the impact of different cut-offs (e.g. 10%, 25%, 33%, 50%) on the numbers of people reporting the attribute of interest

8 Advantages and disadvantages of the cut-off measure Advantages –Uses freely available information –Easy to implement –Easy to interpret –Quick identification of DA’s of interest Disadvantages –Interesting sub-populations are predetermined by the researcher –Arbitrary data reduction Multinomial distributions are reduced to binomial distributions based on a selective search and non-algorithmic decisions –Unclear classification of a DA when multiple indicators are used 40% South Asian, 35% Canadian, and 25% East Asian –Demands the use of complex interactions Potential for over-fitting and spurious relationships Difficult to interpret these models

9 Compression and clustering measure (CC) Uses Principal Component Analysis with a clustering algorithm to capture the complex interplay of meanings as defined by a single census variable. –Multinomial distributions are compressed (data reduction) over the p- dimensional simplex space into an r dimensional space where r<p. –The principal components in the r-dimensional space are clustered based on the family of Mahalanobis distances. –Clusters are chosen by minimizing the ratio of the intra- and inter- cluster variation. The clusters represent a collection of DAs that have data driven similarities. Clusters are mutually exclusive –No cross-classification as with cut-off method Clusters represent cultural context

10 Advantages and disadvantages of the compression and clustering measure Advantages –Data driven researcher interprets the clusters and does not pre-determine the clusters prevents predetermined results driven analysis –Clear analytic methodology based on widely accepted statistical techniques for clustering data –The CC method represents context –Single cultural constructs are not isolated from their context Important attributes arise naturally from the data Inherently contains the complex interplay of meanings Can find naturally occurring ethnic enclaves –No need for interactions in the model –Need a statistician –Requires inter-disciplinary (or trans-disciplinary) collaboration for the interpretation of the indicator Disadvantages –Interpretation can be difficult –Not easy to implement Technical issues still remain for the development of the indicator

11 Bayesian measure Any operational definition of culture will fail to fully quantify the NET definition of culture. All operational definitions of culture will have some uncertainty with respect to how well they captured the NET definition of culture A Bayesian approach allows us to –model uncertainty of our definition –analytically incorporate researcher knowledge –naturally incorporate covariates and geographic hierarchies

12 Implementation of the Bayesian measure A Bayesian approach has only recently been considered. The approach is being considered in the context of survival analysis. Bayesian methods will require a statistician.

13 Which measure to use? The “correct” measure depends on context and research question –Cut-off approach Quick look at the data –Descriptive statistics, preliminary models, preliminary investigation into spatial-temporal trends –Compression and clustering Descriptive statistics, model construction, trend analysis (spatial- temporal), identify predictors, inference –Bayesian Incidence and survival, model construction, trend analysis (spatial-temporal), identify predictors, inference

14 Conclusions Making the conceptual definition of culture operational will result in a variety of functional definitions. Functional definitions should be used with caution as they may be reasonable for only a certain type of investigation. Functional definitions are not “off-the-shelf” definitions.

Thoughts, comments, and discussion