Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense de Madrid
Samer Hassan Contents Data-driven ABM DM-assisted Methodology Case Study: Mentat Application Conclusions
Samer Hassan Research Aim
Samer Hassan Research Aim Theoretical KISS Structural Validation Abstract General
Samer Hassan Research Aim Data-driven Non-KISS Empirical Validation Specific (case study) Expressive Theoretical KISS Structural Validation Abstract General
Samer Hassan Classical Logic of Simulation
Samer Hassan Data-Driven Logic
Samer Hassan Data-driven Approach Complexity Large amounts of Data Auxiliary AI: Fuzzy Logic Ontologies Evolutionary Computation Data Mining
Samer Hassan Data Mining Extracting patterns and relevant information from large amounts of data Pre-processing of empirical data Cluster finding Discovery of hidden patterns Locates redundancies Post-processing of simulation output Clustering: Discovery of hidden patterns Validation of clusters Locates inconsistencies Classification Cluster matching
Samer Hassan Contents Data-driven ABM DM-assisted Methodology Case Study: Mentat Application Conclusions
Samer Hassan Methodology for DM-assisted ABM
Samer Hassan Methodology for DM-assisted ABM Data Collection Initial point Validation points Necessarily ≠ initial Type Explicit Externalised Empirical distributions Secondary sources Methods Quantitative E.g. surveys Qualitative E.g. interviews
Samer Hassan Methodology for DM-assisted ABM Analysis Preprocessing of empirical data Roles Domain expert Guide DM exploration Interpretation DM expert Confirm or refine theories
Samer Hassan Methodology for DM-assisted ABM Selection of Relevant Data Filtering Adaptation of data Normalisation Discretisation Domain Expert Theory DM Redundancies Overlooked independent variables
Samer Hassan Methodology for DM-assisted ABM Data Analysis Large data collections Guided by theory Types Cluster analysis Principal Component Analysis Time series methods Association rules
Samer Hassan Methodology for DM-assisted ABM Interpretation of results Theory expert Relate results to theory New findings are added to the findings base
Samer Hassan Methodology for DM-assisted ABM ABM Building Based on Findings Modeller Steps Formalisation Data-driven Design Implementation Initialisation
Samer Hassan Methodology for DM-assisted ABM Simulation Fine tuning the ABM Sensitivity analysis Intensive testing Output Record agent trace
Samer Hassan Methodology for DM-assisted ABM Validation Analysis of the results Empirical validation Theoretical consistency Roles DM expert Analyse the data Domain expert Extract conclusions Iterative cycle
Samer Hassan Contents Data-driven ABM DM-assisted Methodology Case Study: Mentat Application Conclusions
Samer Hassan The Problem Aim: simulate the process of change in social values in a period in a society Plenty of factors involved Inertia of generational change: To which extent the demographic dynamics explain the mental change? Inter-generational: Agent characteristics remain constant Macro aggregation evolves
Samer Hassan Mentat: architecture Agent : Mental State attributes Life cycle patterns Demographic micro-evolution: Couples Reproduction Inheritance
Samer Hassan Mentat: architecture World: 3000 agents Grid 100x100 Demographic model 8 indep. parameters Social Network: Communication with Moore Neighbourhood Friends network Family network
Samer Hassan Contents Data-driven ABM DM-assisted Methodology Case Study: Mentat Application Conclusions
Samer Hassan Data Collection in Mentat Initial data: EVS-1980 Representative sample of Spain Qualitative info Empirically-grounded demographic equations Validation data: EVS-1990 EVS-1999
Samer Hassan Analysis in Mentat Selection of relevant data EVS-1980,1990,1999 Options: 1.Algorithm for the best subset of variables 2.Rely on domain expert Tested domain knowledge (2) chosen Variables adaptation Normalisation NameTypeRange gendercategorical agenumeric≥18 studiesnumeric≥5 civil statecategorical economynumericreal ideologyordinal1-10 conf. churchordinal1-4 church att.Ordinal1-7 relig. personcategorical
Samer Hassan Analysis in Mentat Data Analysis Algorithm selection Wrapped k-means Explore different k (# of clusters) Discarded variables Gender & Age provokes appearance of irrelevant clusters E.g. widowed women Economy is redundant High correlation with Education
Samer Hassan Analysis in Mentat Interpretation Sociological research Religious typology (RLGTYPE) Based on 3 variables Ecclesiastical, low-intensity, alternatives & non-religious Clusters found (1980, 1999) Based on the 9-3=6 variables 5 clusters with sociological meaning Consistent with RLGTYPE Theoretical observations of the pattern evolution: Religiosity strength falls Ideological spectrum twists to the left education & economy Newest type of religiosity, “alternatives” rise youngsters
Samer Hassan Analysis in Mentat
Samer Hassan Validation in Mentat Mentat re-building & simulation explored Mentat output clusterised Same 5 clusters found Similar evolution trends 3 theoretical observations shown Inconsistencies detected Liberal cluster % do not match although aggregated they do Graphics show less youngsters Liberal clusters deeply affected Guide to re-design
Samer Hassan Contents Data-driven ABM DM-assisted Methodology Case Study: Mentat Application Conclusions
Samer Hassan Conclusions DM-assisted ABM methodology Suitable for DDABM Complexity Large amounts of data Limitations KISS Qualitative sources Uses Build new ABM Re-thinking existing DDABM Revealing hidden facts Detect inconsistencies
Samer Hassan Thanks for your attention! Samer Hassan Universidad Complutense de Madrid
Samer Hassan Contents License This presentation is licensed under a Creative Commons Attribution You are free to copy, modify and distribute it as long as the original work and author are cited