南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所

南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所
University of Birmingham 演化学习基于演化优化处理机器学习问题俞扬南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所 University of Birmingham

Evolutionary algorithms
Genetic Algorithms [J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975.] Evolutionary Strategies [I. Rechenberg. Evolutionstrategie: Optimierung Technisher Systeme nach Prinzipien des Biologischen Evolution. Fromman-Hozlboog Verlag, Stuttgart, 1973.] Evolutionary Programming [L. J. Fogel, A. J. Owens, M. J. Walsh. Artificial Intelligence through Simulated Evolution, John Wiley, 1966.] and many other nature-inspired algorithms ... new solutions archive initialization random evaluation & selection problem-independent reproduction for binary vector: for real vector: mutation: [1,0,0,1,0] → [1,1,0,1,0] crossover: [1,0,0,1,0] + [0,1,1,1,0] → [0,1,0,1,0] + [1,0,1,1,0] mutation:

Genetic Algorithms [J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975.] Evolutionary Strategies [I. Rechenberg. Evolutionstrategie: Optimierung Technisher Systeme nach Prinzipien des Biologischen Evolution. Fromman-Hozlboog Verlag, Stuttgart, 1973.] Evolutionary Programming [L. J. Fogel, A. J. Owens, M. J. Walsh. Artificial Intelligence through Simulated Evolution, John Wiley, 1966.] and many other nature-inspired algorithms ... new solutions archive initialization random evaluation & selection problem-independent reproduction

Genetic Algorithms [J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975.] Evolutionary Strategies [I. Rechenberg. Evolutionstrategie: Optimierung Technisher Systeme nach Prinzipien des Biologischen Evolution. Fromman-Hozlboog Verlag, Stuttgart, 1973.] Evolutionary Programming [L. J. Fogel, A. J. Owens, M. J. Walsh. Artificial Intelligence through Simulated Evolution, John Wiley, 1966.] and many other nature-inspired algorithms ... new solutions archive initialization random evaluation & selection problem-independent reproduction only need to evaluate solutions ⇒ calculate f(x) !

Applications Series N700 Series 700 this nose ... has been newly developed ... using the latest analytical technique (i.e. genetic algorithms) N700 cars save 19% energy % increase in the output... This is a result of adopting the ... nose shape

QHAs(human designed) 38% efficiency
Applications evolved antennas 93% efficiency QHAs(human designed) 38% efficiency human designed

Evolutionary optimization v.s. machine learning
[Computing machinery and intelligence. Mind 49: , 1950.] “We cannot expect to find a good child machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications “Structure of the child machine = Hereditary material “Changes of the child machine = Mutations “Judgment of the experimenter = Natural selection” Alan Turing [Genetic algorithms and machine learning. Machine Learning, 3:95-99, 1988.] D. E. Goldberg J. H. Holland Leslie Valiant ACM/Turing Award [Evolvability. Journal of the ACM, 56(1), 2009.]

Evolutionary optimization v.s. machine learning
Selective ensemble: [Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002] training data test data The GASEN approach minimize the estimated generalization error directly by a genetic algorithm regression classification figures from [Zhou et al., AIJ’02]

Fundamental Problems Unsolved
When get the result? How is the result? Which operators affect the result? ...

A summary of our work When get the result:
we developed a general tool for the first hitting time analysis [AIJ’08] and disclosed a general performance bound How is the result: we derived a general approximation performance bound [AIJ’12] and disclosed that EAs can be the best-so-far algorithm with practical advantages Which operators affect the result: we proved the usefulness of crossover for multi-objective optimization. [AIJ’13] on synthetic as well as NP-hard problems And more: optimization with noisy [Qian et al., PPSN’14], class-wise analysis [Qian, Yu and Zhou, PPSN’12], statistical view [Yu and Qian, CEC’14] ...

Outline Approximation ability Pareto v.s. Penalty
Evolutionary learning Pareto Ensemble Pruning Pareto Sparse Regression Conclusion

In applications... “... roughly a fourfold improvement...” “...save 19% energy % increase in the output...” “...38% efficiency ... resulted in 93% efficiency...” A simple EA takes exponential time to find an optimal solution, but time to find a approximate solution Maximum Matching: find the largest possible number of non-adjacent edges [Giel and Wegener, STACS’03]

Exact v.s. approximate approximate optimization: obtain good enough solutions x* f(x) x with a close-to-opt. objective value measure of the goodness: (for minimization) is called the approximation ratio of x x is an r-approximate solution

Previous studies a simple EA: arbitrarily bad!
On Minimum Vertex Cover (MVC) (NP-Hard problem) a simple EA: arbitrarily bad! Pareto optimization: good ⇒ price performance optimal Pareto front A C B better performance: A better price: C better price: A [Friedrich et al., ECJ’10] solution multi-objective evolutionary algorithm new solutions archive initialization random evaluation & selection problem-independent reproduction a special case or a general principle?

Our work [Y. Yu, X. Yao, and Z.-H. Zhou. On the approximation ability of evolutionary optimization with application to minimum set cover. Artificial Intelligence, 2012.] We propose the SEIP framework for analysis approximation ratios isolation function: isolates the competition among solutions Only one isolation ⇒ the bad EA Properly configured isolation ⇒ the multi-objective reformulation Partial ratio: measures infeasible solutions infeasible feasible

Our work Theorem SEIP can find -approximate solutions in time
[Y. Yu, X. Yao, and Z.-H. Zhou. On the approximation ability of evolutionary optimization with application to minimum set cover. Artificial Intelligence, 2012.] SEIP can find approximate solutions in time Theorem number of isolations size of an isolation best conditional partial ratio in isolations bad EAs ⇒ only one isolation ⇒ c is very large multi-objective EAs ⇒ balance q and c

Our work On minimum set cover problem
[Y. Yu, X. Yao, and Z.-H. Zhou. On the approximation ability of evolutionary optimization with application to minimum set cover. Artificial Intelligence, 2012.] On minimum set cover problem a typical NP-hard problem for approximation studies k is the size of the largest set n elements in E m weighted sets in C SEIP finds approximate solutions in time SEIP finds approximate solutions in time For minimum k-set cover problem: For unbounded minimum set cover problem: EAs can be the best-so-far approximation algorithm

Our work [Y. Yu, X. Yao, and Z.-H. Zhou. On the approximation ability of evolutionary optimization with application to minimum set cover. Artificial Intelligence, 2012.] Greedy algorithm: bad! no better than SEIP: for , approximate solutions in time (anytime algorithm) approximate ratio time EAs can be the best-so-far approximation algorithm, with practical advantages

For constrained optimizations
⇒ Constrained optimization: Penalty method: Pareto method (multi-objective reformulation): when Pareto method is better?

Problem Class 1 Minimum Matroid Problem matroid
[C. Qian, Y. Yu and Z.-H. Zhou. On Constrained Boolean Pareto Optimization. IJCAI’15] Minimum Matroid Problem matroid rank: minimum matroid optimization given a matroid (U,S), let x be the subset indicator vector of U e.g. minimum spanning tree, maximum bipartite matching minimum matroid optimization

Problem Class 1 Minimum Matroid Problem
[C. Qian, Y. Yu and Z.-H. Zhou. On Constrained Boolean Pareto Optimization. IJCAI’15] Minimum Matroid Problem the worst problem-case average-runtime complexity solve optimal solutions For the Penalty Function Method For the Pareto Optimization Method

Problem Class 2 Minimum Cost Coverage Monotonic submodular function
[C. Qian, Y. Yu and Z.-H. Zhou. On Constrained Boolean Pareto Optimization. IJCAI’15] Minimum Cost Coverage Monotonic submodular function minimum cost coverage problem given U, let x be the subset indicator vector of U, given a monotone and submodular function f , and some value e.g. minimum submodular cover, minimum set cover

Problem Class 2 Minimum Matroid Problem
[C. Qian, Y. Yu and Z.-H. Zhou. On Constrained Boolean Pareto Optimization. IJCAI’15] Minimum Matroid Problem the worst problem-case average-runtime complexity solve Hq-approximate solutions For the Penalty Function Method For the Pareto Optimization Method

Previous approaches Previous approaches Ordering-based methods OEP
error minimization [Margineantu and Dietterich, ICML’97] diversity-like criterion maximization [Banfield et al., Info Fusion’05] [Martínez-Munõz, Hernańdez-Lobato, and Suaŕez TPAMI’09] combined criterion [Li, Yu, and Zhou, ECML’12] OEP Optimization-based methods semi-definite programming [Zhang, Burer and Street, JMLR’06] quadratic programming [Li and Zhou, MCS’09] genetic algorithms [Zhou, Wu and Tang, AIJ’02] artificial immune algorithms [Castro et al., ICARIS’05] SEP

Back to selective ensemble
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] multi-objective reformulation selective ensemble can be divided into two goals reduce error reduce size Pareto Ensemble Pruning (PEP): 1. random generate a pruned ensemble, put it into the archive 2. loop | pick an ensemble randomly from the archive | randomly change it to make a new one | if the new one is not dominated | | put it into the archive | | put its good neighbors into the archive 3. when terminates, select an ensemble from the archive

Back to selective ensemble
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Pareto Ensemble Pruning (PEP): 1. random generate a pruned ensemble, put it into the archive 2. loop | pick an ensemble randomly from the archive | randomly change it to make a new one | if the new one is not dominated | | put it into the archive | | put its good neighbors into the archive 3. when terminates, select an ensemble from the archive new solutions archive initialization random evaluation & selection reproduction:

Theoretical advantages
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Can we have theoretical comparisons now?

[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Can we have theoretical comparisons now? PEP is at least as good as ordering-based methods

[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Can we have theoretical comparisons now? PEP is at least as good as ordering-based methods PEP can be better than ordering-based methods

[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Can we have theoretical comparisons now? PEP is at least as good as ordering-based methods PEP can be better than ordering-based methods PEP/ordering-based methods can be better than the direct use of heuristic search

[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Can we have theoretical comparisons now? PEP is at least as good as ordering-based methods PEP can be better than ordering-based methods PEP/ordering-based methods can be better than the direct use of heuristic search For the first time

Empirical comparison Pruning bagging base learners with size 100
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Pruning bagging base learners with size 100

Empirical comparison Pruning bagging base learners with size 100
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] Pruning bagging base learners with size 100 on error on size

Empirical comparison On pruning different bagging size
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] On pruning different bagging size

Application mobile human activity recognition
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Ensemble Pruning. AAAI’15] mobile human activity recognition 3 times more than the runner-up saves more than 20% storage and testing time than the runner-up Compared with previous overall accuracy 89.3% [Anguita et al., IWAAL’12], we achieves 90.2%

Sparse regression Regression: Sparse regression (sparsity k):
denotes the number of non-zero elements in w Forward (FR) Current best approximation ratio: on R2 [Das and Kempe, ICML’11] Forward-Backward (FoBa), Orthogonal Matching Pursuit (OMP) ... Greedy methods [Gilbert et al.,2003; Tropp, 2004] Convex relaxation methods [Tibshirani, 1996; Zou & Hastie, 2005] Previous methods

Our approach multi-objective reformulation:
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] multi-objective reformulation: sparse regression can be divided into two goals reduce MSE reduce size

evaluation & selection
Our approach [C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] new solutions archive initialization random evaluation & selection reproduction

[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] Is POSS as good as the previously best method (FR) ? Yes, POSS can achieve the same approximation ratio Can POSS be better ? Yes, POSS can solve exact solutions on problem subclasses, while FR cannot

Empirical comparison [C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] select 8 features, report R2 (the larger the better), average over 100 runs

Empirical comparison [C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] Comparison optimization performance with different sparsities

Empirical comparison Comparison test error with different sparsities
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] Comparison test error with different sparsities with L2 regularization

Empirical comparison POSS running time v.s. performance
[C. Qian, Y. Yu and Z.-H. Zhou. Pareto Optimization for Subset Selection. NIPS’15] POSS running time v.s. performance best greedy performance theoretical running time

Conclusion Motivated by the hard optimizations in machine learning
We did research on the theoretical foundation of evolutionary algorithms With the leveraged optimization power, we can solve learning problems better convex formulations are limited non-convex formulations are rising

Thank you! yuy@nju.edu.cn http://cs.nju.edu.cn/yuy Collaborators 钱超
Prof. Xin Yao 周志华教授

南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所

Similar presentations

Presentation on theme: "南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

南京大学 软件新技术国家重点实验室 机器学习与数据挖掘研究所

Similar presentations

Presentation on theme: "南京大学 软件新技术国家重点实验室 机器学习与数据挖掘研究所"— Presentation transcript:

Similar presentations

About project

Feedback

南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所

Presentation on theme: "南京大学软件新技术国家重点实验室机器学习与数据挖掘研究所"— Presentation transcript: