Resource bounded dimension and learning Elvira Mayordomo, U. Zaragoza CIRM, 2009 Joint work with Ricard Gavaldà, María López-Valdés, and Vinodchandran.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

Complexity Theory Lecture 6
Informational Complexity Notion of Reduction for Concept Classes Shai Ben-David Cornell University, and Technion Joint work with Ami Litman Technion.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lecture 23 Space Complexity of DTM. Space Space M (x) = # of cell that M visits on the work (storage) tapes during the computation on input x. If M is.
Order Statistics Sorted
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
Massive Online Teaching to Bounded Learners Brendan Juba (Harvard) Ryan Williams (Stanford)
The class NP Section 7.3 Giorgi Japaridze Theory of Computability.
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Random Sources.
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Arithmetic Hardness vs. Randomness Valentine Kabanets SFU.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
Chapter 11: Limitations of Algorithmic Power
1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.
CS 4700: Foundations of Artificial Intelligence
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
1 Slides by Asaf Shapira & Michael Lewin & Boaz Klartag & Oded Schwartz. Adapted from things beyond us.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
The Polynomial Hierarchy By Moti Meir And Yitzhak Sapir Based on notes from lectures by Oded Goldreich taken by Ronen Mizrahi, and lectures by Ely Porat.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Theoretical Approaches to Machine Learning Early work (eg. Gold) ignored efficiency Only considers computabilityOnly considers computability “Learning.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Learning DFA from corrections Leonor Becerra-Bonache, Cristina Bibire, Adrian Horia Dediu Research Group on Mathematical Linguistics, Rovira i Virgili.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 2 Probabilistic Computations  Extend the notion of “efficient computation” beyond polynomial-time- Turing machines.  We will still consider only.
Computation Model and Complexity Class. 2 An algorithmic process that uses the result of a random draw to make an approximated decision has the ability.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
1 Approximate Schemas and Data Exchange Michel de Rougemont University Paris II & LRI Joint work with Adrien Vielleribière, University Paris-South.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
Learning Universally Quantified Invariants of Linear Data Structures Pranav Garg 1, Christof Loding, 2 P. Madhusudan 1 and Daniel Neider 2 1 University.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
NP-Completeness  For convenience, the theory of NP - Completeness is designed for decision problems (i.e. whose solution is either yes or no).  Abstractly,
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Eric Allender Rutgers University Curiouser and Curiouser: The Link between Incompressibility and Complexity CiE Special Session, June 19, 2012.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Complexity Theory and Explicit Constructions of Ramsey Graphs Rahul Santhanam University of Edinburgh.
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
HW 2.
Lecture 2-2 NP Class.
Computational Learning Theory
Analysis and design of algorithm
Pseudo-derandomizing learning and approximation
Alternating tree Automata and Parity games
CSCI B609: “Foundations of Data Science”
Section 14.3 Complexity Classes
CSE 6408 Advanced Algorithms.
Machine Learning: UNIT-3 CHAPTER-2
Clustering.
Complexity Theory: Foundations
Presentation transcript:

Resource bounded dimension and learning Elvira Mayordomo, U. Zaragoza CIRM, 2009 Joint work with Ricard Gavaldà, María López-Valdés, and Vinodchandran N. Variyam

Contents 1.Resource-bounded dimension 2.Learning models 3.A few results on the size of learnable classes 4.Consequences Work in progress

Effective dimension Effective dimension is based in a characterization of Hausdorff dimension on   given by Lutz (2000) The characterization is a very clever way to deal with a single covering using gambling

Hausdorff dimension in (Lutz characterization) Let s  (0,1). An s-gale is such that It is the capital corresponding to a fixed strategy and a the house taking a fraction of d(w) is an s-gale iff |  | (1-s)|w| d(w) is a martingale

Hausdorff dimension (Lutz characterization) An s-gale d succeeds on x    if limsup i  d (x[0..i-1])=  d succeeds on A    if d succeeds on each x  A dim H (A) = inf {s | there is an s-gale that succeeds on A} The smaller the s the harder to succeed

Effectivizing Hausdorff dimension We restrict to constructive or effective gales and get the corresponding “dimensions” that are meaningful in subsets of   we are interested in

Constructive dimension If we restrict to constructive gales we get constructive dimension (dim) The characterization you are used to: For each x   dim(x) = liminf n  For each A   dim (A)= sup x  A dim (x) K (x[1..n]) n log|  |

Resource-bounded dimensions Restricting to effectively computable gales we have: –computable in polynomial time dim p –computable in quasi-polynomial time dim p 2 –computable in polynomial space dim pspace Each of this effective dimensions is “the right one” for a set of sequences (complexity class)

In Computational Complexity A complexity class is a set of languages (a set of infinite sequences) P, NP, PSPACE E= DTIME (2 n ) EXP = DTIME (2 p(n) ) dim p (E)= 1 dim p 2 (EXP)= 1

What for? We use dim p to estimate size of subclasses of E (and call it dimension in E) Important: Every set has a dimension Notice that dim p (X)<1 implies X  E Same for dim p 2 inside of EXP (dimension in EXP), etc I will also mention a dimension to be used inside PSPACE

My goal today I will use resource-bounded dimension to estimate the size of interesting subclasses of E, EXP and PSPACE If I show that X a subclass of E has dimension 0 (or dimension <1) in E this means: –X is quite smaller than E (most elements of E are outside of X) –It is easy to construct an element out of X (I can even combine this with other dim 0 properties) Today I will be looking at learnable subclasses

My goal today We want to use dimension to compare the power of different learning models We also want to estimate the amount of languages that can be learned

Contents 1.Resource-bounded dimension 2.Learning models 3.A few results on the size of learnable classes 4.Consequences

Learning algorithms The teacher has a finite set T with T  {0,1} n in mind, the concept The learner goal is to identify exactly T, by asking queries to the teacher or making guesses about T The teacher is faithful but adversarial The learner goal is to identify exactly T Learner=algorithm, limited resources

Learning … Learning algorithms are extensively used in practical applications It is quite interesting as an alternative formalism for information content

Two learning models Online mistake-bound model (Littlestone) PAC- learning (Valiant)

Littlestone model (Online mistake-bound model) Let the concept be T  {0,1} n The learner receives a series of cases x 1, x 2,... from {0,1} n For each of them the learner guesses whether it belongs to T After guessing on case x i the learner receives the correct answer

Littlestone model “Online mistake-bound model” The following are restricted –The maximum number of mistakes –The time to guess case x i in terms of n and i

PAC-learning A PAC-learner is a polynomial-time probabilistic algorithm A that given n, , and  produces a list of random membership queries q1, …, qt to the concept T  {0,1} n and from the answers it computes a hypothesis A(n, ,  ) that is “  - close to the concept with probability 1-  ” Membership query q: is q in the concept?

PAC-learning An algorithm A PAC-learns a class C if –A is a probabilistic algorithm running in polynomial time –for every L in C and for every n, (T= L =n ) –for every  >0 and every  >0 –A outputs a concept A L (n,r, ,  ) with Pr( ||A L (n, r, ,  )  L =n || 1-  * r is the size of the representation of L =n

What can be PAC-learned AC 0 Everything can be PAC NP -learned Note: We are specially interested in learning parts of P/poly= languages that have a polynomial representation

Related work Lindner, Schuler, and Watanabe (2000) study the size of PAC-learnable classes using resource-bounded measure Hitchcock (2000) looked at the online mistake-bound model for a particular case (sublinear number of mistakes)

Contents 1.Resource-bounded dimension 2.Learning models 3.A few results on the size of learnable classes 4.Consequences

Our result Theorem If EXP≠MA then every PAC-learnable subclass of P/poly has dimension 0 in EXP In other words: If weak pseudorandom generators exist then every PAC-learnable class (with polynomial representations) has dimension 0 in EXP

Immediate consequences From [Regan et al] If strong pseudorandom generators exist then P/poly has dimension 1 in EXP So under this hypothesis most of P/poly cannot be PAC-learned

Further results Every class that can be PAC-learned with polylog space has dimension 0 in PSPACE

Littlestone Theorem For each a  1/2 every class that is Littlestone learnable with at most a2 n mistakes has dimension  H(a) H(a)= -a log a –(1-a) log(1-a) E =DTIME(2 O(n) )

Can we Littlestone-learn P/poly? We mentioned From [Regan et al] If strong pseudorandom generators exist then P/poly has dimension 1 in EXP

Can we Littlestone-learn P/poly? If strong pseudorandom generators exist then (for every  ) P/poly is not learnable with less than(1-  )2 n-1 mistakes in the Littlestone model

Both results For every  <1/2, a class that can be Littlestone-learned with at most  2 n mistakes has dimension <1 in E If weak pseudorandom generators exist then every PAC-learnable class (with polynomial representations) has dimension 0 in EXP

Comparison It is not clear how to go from PAC to Littlestone (or vice versa) We can go –from Equivalence queries to PAC –from Equivalence queries to Littlestone

Directions Look at other models for exact learning (membership, equivalence). Find quantitative results that separate them.