Download presentation
Presentation is loading. Please wait.
1
Model Discovery through Metalearning
Brigham Young University DARPA D3M Kick-off Meeting, March 2017
2
Model Discovery Given some data mining task, consisting of:
Objectives Possible constraints Find a sequence of data mining primitives, and their associated parameters (i.e., a workflow), most likely to produce a model that meets the stated objectives and satisfies the constraints
3
Rice’s Framework Workflow Space
4
Why Is ML/DM Model Selection a Challenge?
Most current efforts focus on designing new primitives or creating extensions to address known limitations in existing ones Little insight as to what works well where Intelligent but uninformed users at a loss where to start Trial and error or hired expertise Costly and sub-optimal What is needed are robust systems that offer guidelines and support to practitioners
5
Intelligent Discovery Assistant (IDA)
Requirements An IDA has access to all available primitives, and has information about their input, output, preconditions and effects (IOPE) An IDA extracts all useful information from its input An IDA can link information among input characteristics, properties of primitives, and workflows An IDA learns from previous tasks and use that knowledge to improve its performance An IDA supports, and adapts to, the addition of new datasets and new primitives An IDA combines both advice and execution
6
Metalearning IDA Base-level learning Metalearning
Accumulate experience on a specific task E.g., credit rating, medical diagnosis, mine-rock discrimination, fraud detection Metalearning Accumulate experience on the performance of models over multiple tasks Use resulting metaknowledge to inform the discovery of improved models
7
Contributions and Extensions
Design of the Data Mining Advisor Design of landmarkers as novel meta-features for metalearning Design of model-based meta-features Design of cluster-based classification algorithm selection Analysis of machine learning algorithms based on instance-level behavior Reconsideration and analysis of the value of parameter optimization
8
System Overview Metalearner: Leverages experience
Experimenter: Exploits and explores space of experiments Recommender: Operationalizes results Give use case
9
Metalearner Module Objective: Develop a core incremental metalearning module capable of accommodating the complete DM process. This will include prerequisite research in task characterization and algorithm behavior. Incremental updates New datasets, new metafeatures, and new primitives Integration of ontology, case-based reasoning and induction Meta-analyses of ML literature Autonomous experiments Behavior-based algorithm clustering
10
Experimenter Module Objective: Research and develop process of incremental metalearning, by constructing an agent-based system to perform experiments via exploration and exploitation of primitives and parameters. Exploitation of existing metadata OpenML Exploration of new areas Systematic experiments (e.g., parameter optimization) Active metalearning (e.g., design new datasets) Continuously collect data, create and execute workflows, collect results and update metalearner Users may submit entire dataset or only metafeatures
11
Recommender Module Objective: Develop means of handling multi-criteria objectives in metalearning, and provide a mechanism for obtaining data from the user and returning a concrete DM process to run. System’s window to the world Multi-criteria advice (e.g., accuracy, comprehensibility, speed, etc.) Ranking of workflows Parameter optimization Optional download of scripts, executables, etc.
12
Proof of Concept Current features:
Django front-end easily extensible (admin portal only) Algorithm and parameter selection only Nice wrapper functions to the database (classes that map directly to tables in our database) Parallelized to be able to run experiments in the background Standard Celery Flow Diagram courtesy of:
13
Potential Benefits System Benefits: Research Benefits:
Efficiently identify appropriate workflows for diverse ML/DM tasks Lower cost of entry of businesses into the ML/DM sphere Empower decision makers by placing the technology under their more direct control Move human expertise up the value chain by focusing it on uniquely human activities Research Benefits: Characterize datasets based on meta-level attributes Develop objective criteria for differentiating model behavior on distinct ML/DM problem types Explore applications of metadata to workflow creation (e.g., jump starting Bayesian selection or informing ensemble creation) Etc.
14
D3M Dissemination AutoML Workshop proposal has been submitted to ICML 2017 (Sydney) D3M Workshop proposal has been submitted to ICDM 2017 (New Orleans) NIPS anyone? (Long Beach)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.