Download presentation
Presentation is loading. Please wait.
Published byDominic Singleton Modified over 9 years ago
1
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving opportunities for modernisation Kenneth Chu and Claude Poirier Geneva, Switzerland, 15-17 April 2015
2
What is Machine Learning (ML) Application of artificial intelligence in which algorithms use available information to process (or assist the processing of) statistical data 20 applications were reported. 18/11/2015 Statistics Canada Statistique Canada 2 CodingEditingLinkageCollection
3
Why should we consider ML ? Relatively new discipline of computer science No needs for probabilistic models Less stringent for the BIG Data era NSOs should all explore the use of ML 18/11/2015 Statistics Canada Statistique Canada 3
4
Classes of ML Ex.1: Logistic regression [statistics] Training data: Binary response (0:1) and predictors Maximum likelihood leads to model parameters Resulting model is used to predict responses Ex.2: Support Vector Machines [non-statistics] Training data: Binary response (0:1) and predictors Hyperplanes in the space of predictors separate responses SVM optimisation problem comes from geometry Decision trees, neural networks, Bayesian networks 18/11/2015 Statistics Canada Statistique Canada 4 SUPERVISED ML
5
Classes of ML 18/11/2015 Statistics Canada Statistique Canada 5 UNSUPERVISED ML Ex.1: Principal Component Analysis [statistics] PCA summarizes a set of data by finding orthogonal sub-spaces that represent most of the variation There is no longer a response variable in the setting Ex.2: Cluster Analysis [non-statistics] CA seeks to determine grouping in given data Again, there are no response variables in the setting
6
Applications Automated Coding Bayesian classifier (Germany): Occupation coding CASCOT (United Kingdom): Occupation coding Indexing utility (Ireland): Individual consumption SVM (New Zealand): Occupation and Qualification 18/11/2015 Statistics Canada Statistique Canada 6
7
Applications Data Editing Bayesian Networks (Eurostat): Voting intentions Classification Trees (Portugal): Foreign trade data Cluster Analysis (USA): Census of agriculture CART (New Zealand): Census of population Random Forests (New Zealand): Donor imputation Association Analysis (New Zealand): Edit rules 18/11/2015 Statistics Canada Statistique Canada 7
8
Applications Record Linkage Neither like coding, nor editing Quality of linkages depends on pre-processing more than matching No applications of Machine Learning in official statistics were listed 18/11/2015 Statistics Canada Statistique Canada 8
9
Applications Other areas – Data collection Classification Tree (USA): Non-response prediction Classification Tree (USA): Reporting errors Naïve Bayes text mining (Italy): Web scraping K-nearest neighbours (Hungary): Tax audit Image Processing (Canada): Remote sensing 18/11/2015 Statistics Canada Statistique Canada 9
10
Concluding remarks Several machine learning applications Gap in the area of record linkage Attention required outside statistical paradigms Next: Applying Machine Learning on BIG Data Will this be possible only on a case-by-case basis? 18/11/2015 Statistics Canada Statistique Canada 10
11
Thank you Merci For more information,Pour plus d’information, please contact:veuillez contacter : Claude.Poirier@statcan.gc.ca 18/11/2015 Statistics Canada Statistique Canada 11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.