Unsupervised Machine Learning in Agent-Based Modeling

Slides:



Advertisements
Similar presentations
Agents and Pervasive Computing Group Università di Modena e Reggio Emilia System Requirements NetLogo is designed: to run almost any type of computer.
Advertisements

CMPUT 615 Applications of Machine Learning in Image Analysis
PARTITIONAL CLUSTERING
Biomedical Modeling: Introduction to the Agent-based epidemic modeling
New Mexico Computer Science for All Agent-based modeling By Irene Lee December 27, 2012.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Fuzzy Approach to Detection of Emergent Herd Formations in Multi-Agent Simulation 1 Ognen Paunovski, George Eleftherakis and Tony Cowling FUZZY APPROACH.
EPIDEMIOLOGY: Introduction to the Agent-based epidemic modeling Dr. Qi Mi Department of Sports Medicine and Nutrition, SHRS, Univ. of Pitt.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
Computational Modelling of Road Traffic SS Computational Project by David Clarke Supervisor Mauro Ferreira - Merging Two Roads into One As economies grow.
Continuous Random Variables and Probability Distributions
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Cellular Automata and Game Design By Pete Strader.
UNDERSTANDING DYNAMIC BEHAVIOR OF EMBRYONIC STEM CELL MITOSIS Shubham Debnath 1, Bir Bhanu 2 Embryonic stem cells are derived from the inner cell mass.
NetLogo Dr. Feng Gu. NetLogo NetLogo is a programmable modeling environment for simulating natural and social phenomena, authored by Uri Wilensky in 1999.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Modeling Complex Dynamic Systems with StarLogo in the Supercomputing Challenge
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
DECOI: Social Simulation - August Social Simulation Branimir Cace, Carlos Grilo, Arne Handt, Pablo Rabanal & Scott Stensland.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
Populations. Remember a population is… A group of the same species in the same area at the same time. A group of the same species in the same area at.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Active Walker Model for Bacterial Colonies: Pattern Formation and Growth Competition Shane Stafford Yan Li.
Modeling Changes in Exploitative vs. Protective Behavior Joseph Blass Motivation and Questions Humans exploit others for selfish reasons Humans also protect.
Basic Practice of Statistics - 3rd Edition Introducing Probability
Big data classification using neural network
Computer Simulation Henry C. Co Technology and Operations Management,
Self-Organizing Network Model (SOM) Session 11
Semi-Supervised Clustering
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Statistical Analysis Urmia University
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Review Measure testosterone level in rats; test whether it predicts aggressive behavior. What would make this an experiment? Randomly choose which rats.
Al-Imam Mohammad Ibn Saud University Large-Sample Estimation Theory
How Populations Grow Ecology.
Ecosystems Unit Activity 2.3 Biomass Patterns in Ecosystems
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Maximum Likelihood Estimation
Brian Lafferty Virus on a Network.
Basic machine learning background with Python scikit-learn
Cross-cutting concepts in science
Lecture 13 Sections 5.4 – 5.6 Objectives:
Chapter 10 Verification and Validation of Simulation Models
Populations.
Opening Activity: Jan. 11, 2018 Turn in your decomposer packet – Due today! Have ecology initial ideas ready to stamp. Each group of 2 students.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Control Structure Senior Lecturer
Jianping Fan Dept of CS UNC-Charlotte
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
of the Artificial Neural Networks.
Image Information Extraction
Population Ecology Chapter 14.
Dynamic Complex Ecosystems Simulation / Modeling Bill Yu
Lecture 6: Introduction to Machine Learning
MIS2502: Data Analytics Clustering and Segmentation
Boltzmann Machine (BM) (§6.4)
Group 9 – Data Mining: Data
Analyzing Reliability and Validity in Outcomes Assessment
creating a ecosystems model in net logo
Essential Statistics Introducing Probability
Chapter 14 Interactions in Ecosystems
Further Topics on Random Variables: 1
Physics-guided machine learning for milling stability:
Simulating the Tragedy of the Commons Using Agent-Based Modeling
Continuous Random Variables: Basics
Presentation transcript:

Unsupervised Machine Learning in Agent-Based Modeling Luke Robinson and Forrest Stonedahl* Mathematics and Computer Science Department, Augustana College *Faculty Advisor -Notes , - II. Agent-Based Model BehaviorSpace Experiment I. Overview BehaviorSpace is a tool in Netlogo that allows the user to perform an experiment that carries out many runs of a model with different combinations of the model’s parameters. The data for this project came from a BehaviorSpace experiment with the following parameters: Agent-based modeling is “a form of computational modeling whereby a phenomenon is modeled in terms of agents and their interactions” [1]. In the Wolf Sheep Predation model, wolves (black) and sheep (white) move randomly on a field of grass and dirt patches. When a wolf and a sheep occupy the same space, the wolf eats the sheep. If a wolf or sheep consumes a sufficient amount of food, it reproduces. If a wolf or sheep goes long enough without food, it dies out. A final detail: If the ‘grass?’ toggle is ON, grass can be depleted by the sheep (resulting in a brown patch, which will eventually grow back). If ‘grass?’ is OFF, then grass is an infinite resource for the sheep. Agent-based models (ABMs) are used by researchers in a variety of fields to model natural phenomena. In an ABM, a wide range of behaviors and outcomes can be observed based on the parameters of the model. In many cases, these behaviors can be categorized into discrete outcomes identifiable by human observers. Our goal was to use clustering algorithms to identify those outcomes from model output data. If this task can be completed reliably by a computer, it will make the task of investigating an ABM easier for human users. For this project, we used the Wolf Sheep Predation model that can be found in the models library of Netlogo 5.3.1 [2]. For data analysis and visualization, we used Python 3.5.2 and the scikit-learn and matplotlib libraries. ["sheep-reproduce" 3 5 7] ["initial-number-sheep" [80 10 120]] ["wolf-gain-from-food" [12 8 28]] ["wolf-reproduce" 3 5 7] ["show-energy?" false] ["sheep-gain-from-food" 3 5 7] ["initial-number-wolves" [30 10 70]] ["grass-regrowth-time" [20 10 40]] ["grass?" true false] Output measures: count sheep, count wolves Stop condition: not any? turtles   or count sheep > 1500 Time limit: 1500 steps Total runs: 24,300 , - , - IV. Parameter Analysis III. Clustering Algorithms V. Overarching Goals and Future Work We investigated the two clusters that made up the K-means clustering by examining the distribution of the parameters from the ABM that led to a run being in its respective cluster. The histograms for most of the variables showed an even distribution for both clusters, with the notable exceptions of the ‘wolfgainfromfood’ and ‘grass’ variables. K-Means(n_clusters=2) silhouette score = .86 K-Means K-Means requires the user to specify the number of clusters, and it clusters data by trying to separate samples into groups of equal variance [4]. Our data forms two clusters: simulations where the sheep population exploded, and simulations where it didn’t. K-Means generally found these two clusters, but misclassified some points on the boundary (the blue points between 800 and 1100 sheep). ABM Clustering Algorithm on Output Data BehaviorSpace Experiment Cluster 1 Cluster 2 Cluster 3 Cluster Evaluations Cluster Visualizations Analysis of Parameters Parameter Visualizations DBSCAN DBSCAN is short for ‘Density-Based Spatial Clustering of Applications with Noise’ [3]. This algorithm finds core samples of high density and expands clusters from these. An epsilon value of 50 sets the maximum distance between two points to be in the same neighborhood. A value of 30 for min_samples specifies how many data points must be in a neighborhood for it to be considered the core of a cluster. The black dots on the diagonal fringe were classified as noise (not part of any cluster). The flowchart above shows the process of evaluating agent-based models using unsupervised learning techniques. There are many opportunities for future work on this project. Ideally, this is an iterative process that can be refined to help a modeler build the most informative model and gain a full understanding of it. Research could be done into the best ways to adjust the parameters of the ABM based on the parameter distributions that appear in clusterings of the model's output data. Also, attempts could be made to automate some or all of this process to further reduce the amount of work for the researcher. DBSCAN(epsilon=50, min_samples=30) silhouette score = .66 When ‘wolfgainfromfood’ has a higher value, the run is more likely to be in cluster 1 (red), where the sheep population did not explode. The reverse is also true. This makes sense because presumably the wolves’ increased gain from food enabled them to grow in population and limit the sheep population growth. Affinity Propagation The Affinity Propagation clustering algorithm chooses ‘exemplars’ (shown as large dots at left), which are points that best represent others around them, and then forms clusters by growing them outwards from the exemplars [4]. Due to the time complexity of this algorithm and the large dataset that we used, we had to run the algorithm on a random subset with only 2,000 data points. The sparser data set contributed to why this algorithm found more clusters than the other algorithms. VI. References AffinityPropagation(damping=.75) silhouette score = .60 [1] Wilensky, U., & Rand, W. (2015). An Introduction to agent-based modeling: modeling natural, social, and engineered complex systems with NetLogo. Cambridge, Mass.: MIT Press. [2] Wilensky, U. 1999. NetLogo. http://ccl.northwestern.edu/netlogo/ Center for Connected Learning and Computer-Based Modeling, Northwestern University. Evanston, IL. [3] Sklearn.cluster.DBSCAN. Retrieved February 13, 2017, from http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html [4] 2.3. Clustering. Retrieved February 13, 2017, from http://scikit-learn.org/stable/modules/clustering.html 'grass?'=ON (grass depletion) 'grass?'=OFF (no grass depletion) Silhouette Scores The silhouette score is a clustering metric ranging from -1 (worst) to 1 (best), by comparing distances between points within a cluster and between different clusters [4]. In cluster 1, ¾ of the trials had grass-depletion enabled. In cluster 2, basically all trials had grass depletion disabled. This points to a key outcome from this model, which is that an ecosystem with three interacting species is more stable (less likely to have a population explosion) than an 2-species ecosystem.