Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.

Slides:



Advertisements
Similar presentations
Rerun of machine learning Clustering and pattern recognition.
Advertisements

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Data Mining Classification: Alternative Techniques
Weka. Preprocessing Opening a file Editing a file Visualize a variable.
An Overview of Machine Learning
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Machine Learning in R and its use in the statistical offices
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Introduction to Data Mining Engineering Group in ACL.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Anomaly detection with Bayesian networks Website: John Sandiford.
ANALYTICS BUSINESS INTELLIGENCE SOFTWARE STATISTICS Kreara Solutions | 9 years | 60 members | ISO 9001:2008.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Loan Default Model Saed Sayad 1www.ismartsoft.com.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning in CSC 196K
Developing outcome prediction models for acute intracerebral hemorrhage patients: evaluation of a Support Vector Machine based method A. Jakab 1, L. Lánczi.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Introduction to Azure Machine Learning and Data Mining algorithms Oleksandr Krakovetskyi CEO, DevRain Solutions PhD, Microsoft Regional
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Usman Roshan Dept. of Computer Science NJIT
Brief Intro to Machine Learning CS539
Experience Report: System Log Analysis for Anomaly Detection
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Machine Learning with Spark MLlib
Basic machine learning background with Python scikit-learn
Machine Learning Training Bootcamp
Overview of Supervised Learning
Vincent Granville, Ph.D. Co-Founder, DSC
What is Pattern Recognition?
Machine Learning Week 1.
Basic Intro Tutorial on Machine Learning and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Overview of Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Clinical Data
Machine Learning for Space Systems: Are We Ready?
CAMCOS Report Day December 9th, 2015 San Jose State University
Machine learning CS 229 / stats 229
FOUNDATIONS OF BUSINESS ANALYTICS Introduction to Machine Learning
What is Artificial Intelligence?
Presentation transcript:

Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving opportunities for modernisation Kenneth Chu and Claude Poirier Geneva, Switzerland, April 2015

What is Machine Learning (ML) Application of artificial intelligence in which algorithms use available information to process (or assist the processing of) statistical data 20 applications were reported. 18/11/2015 Statistics Canada Statistique Canada 2 CodingEditingLinkageCollection

Why should we consider ML ?  Relatively new discipline of computer science No needs for probabilistic models Less stringent for the BIG Data era  NSOs should all explore the use of ML 18/11/2015 Statistics Canada Statistique Canada 3

Classes of ML  Ex.1: Logistic regression [statistics] Training data: Binary response (0:1) and predictors Maximum likelihood leads to model parameters Resulting model is used to predict responses  Ex.2: Support Vector Machines [non-statistics] Training data: Binary response (0:1) and predictors Hyperplanes in the space of predictors separate responses SVM optimisation problem comes from geometry  Decision trees, neural networks, Bayesian networks 18/11/2015 Statistics Canada Statistique Canada 4 SUPERVISED ML

Classes of ML 18/11/2015 Statistics Canada Statistique Canada 5 UNSUPERVISED ML  Ex.1: Principal Component Analysis [statistics] PCA summarizes a set of data by finding orthogonal sub-spaces that represent most of the variation There is no longer a response variable in the setting  Ex.2: Cluster Analysis [non-statistics] CA seeks to determine grouping in given data Again, there are no response variables in the setting

Applications  Automated Coding Bayesian classifier (Germany): Occupation coding CASCOT (United Kingdom): Occupation coding Indexing utility (Ireland): Individual consumption SVM (New Zealand): Occupation and Qualification 18/11/2015 Statistics Canada Statistique Canada 6

Applications  Data Editing Bayesian Networks (Eurostat): Voting intentions Classification Trees (Portugal): Foreign trade data Cluster Analysis (USA): Census of agriculture CART (New Zealand): Census of population Random Forests (New Zealand): Donor imputation Association Analysis (New Zealand): Edit rules 18/11/2015 Statistics Canada Statistique Canada 7

Applications  Record Linkage Neither like coding, nor editing Quality of linkages depends on pre-processing more than matching No applications of Machine Learning in official statistics were listed 18/11/2015 Statistics Canada Statistique Canada 8

Applications  Other areas – Data collection Classification Tree (USA): Non-response prediction Classification Tree (USA): Reporting errors Naïve Bayes text mining (Italy): Web scraping K-nearest neighbours (Hungary): Tax audit Image Processing (Canada): Remote sensing 18/11/2015 Statistics Canada Statistique Canada 9

Concluding remarks  Several machine learning applications  Gap in the area of record linkage  Attention required outside statistical paradigms  Next: Applying Machine Learning on BIG Data Will this be possible only on a case-by-case basis? 18/11/2015 Statistics Canada Statistique Canada 10

Thank you Merci  For more information,Pour plus d’information, please contact:veuillez contacter : 18/11/2015 Statistics Canada Statistique Canada 11