CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36, 37: Hardness of training.

Slides:



Advertisements
Similar presentations
Complexity Classes: P and NP
Advertisements

 2004 SDU Lecture17-P,NP, NPC.  2004 SDU 2 1.Decision problem and language decision problem decision problem and language 2.P and NP Definitions of.
The Theory of NP-Completeness
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Advanced Topics in Algorithms and Data Structures
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
1 Polynomial Time Reductions Polynomial Computable function : For any computes in polynomial time.
The Theory of NP-Completeness
NP-Complete Problems Problems in Computer Science are classified into
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture NP-Completeness Jan Maluszynski, IDA, 2007
Chapter 11: Limitations of Algorithmic Power
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 6: 22/09/2004, 24/09/2004,
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture–39: Recap.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Pushpak Bhattacharyya Computer Science and Engineering Department
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 45– Backpropagation issues; applications 11 th Nov, 2010.
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
COMPLEXITY. Satisfiability(SAT) problem Conjunctive normal form(CNF): Let S be a Boolean expression in CNF. That is, S is the product(and) of several.
NP-complete Languages
CSCI 2670 Introduction to Theory of Computing December 2, 2004.
CSCI 2670 Introduction to Theory of Computing December 7, 2005.
CS623: Introduction to Computing with Neural Nets (lecture-7) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.
CS 154 Formal Languages and Computability May 10 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
P & NP.
Advanced Algorithms Analysis and Design
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS621: Artificial Intelligence
ICS 353: Design and Analysis of Algorithms
Chapter 11 Limitations of Algorithm Power
NP-Complete Problems.
CS623: Introduction to Computing with Neural Nets (lecture-9)
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Presentation transcript:

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36, 37: Hardness of training feed forward neural nets 11 th and 12 th April, 2011

Backpropagation for hidden layers Hidden layers Input layer (n i/p neurons) Output layer (m o/p neurons) j i …. k for outermost layer

Local Minima Due to the Greedy nature of BP, it can get stuck in local minimum m and will never be able to reach the global minimum g as the error can only decrease by weight change.

Momentum factor 1. Introduce momentum factor.  Accelerates the movement out of the trough.  Dampens oscillation inside the trough.  Choosing β : If β is large, we may jump over the minimum.

E w Positive Δ Oscillating Δ

Momentum factor If the momentum factor is very large (GP series of β) β is learning rate (lies between 0 and 1) η is momentum factor (lies between 0 and 1) Generally,

Symmetry breaking If mapping demands different weights, but we start with the same weights everywhere, then BP will never converge. w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2 x1x1 x2x XOR n/w: if we started with identical weight everywhere, BP will not converge

Symmetry breaking: simplest case If all the weights are same initially they will remain same over iterations

Symmetry Breaking: general case inin i n-1 i n-2 i n-3 i0i0 hkhk h k-1 h k-2 h0h0 OmOm O m-1 O m-2 O m-3 O0O0 Output Layer Hidden Layers Input Layer

hkhk h k-1 h k-2 h0h0 OmOm O0O0 Output Layer Hidden Layers Input Layer imim i0i0 Output layer and Input Layer form an axis. Around the axis symmetry needs to be broken

Training of FF NN takes time! BP + FFNN combination applied for many problems from diverse disciplines Consistent observation: the training takes time as the problem size increases Is there a hardness hidden soemwhere?

Hardness of Training Feedforward NN NP-completeness result: Avrim Blum, Ronald L. Rivest: Training a 3- node neural network is NP-complete. Neural Networks 5(1): (1992)Showed that the loading problem is hard As the number of training example increases, so does the training time EXPONENTIALLY

A primer on NP-completeness theory

Turing Machine Infinite Tape Finite state head w1w1 w2w2 w3w3 …wnwn

Formal Definition (vide: Hopcroft and Ullmann, 1978) A Turing machine is a 7-tuple, where Q is a finite set of states Γ is a finite set of the tape alphabet/symbols b is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation) Σ, a subset of Γ not including b is the set of input symbols δ : Q X Γ  Q X Γ X {L, R} is a partial function called the transition function, where L is left shift, R is right shift. q 0 Є Q is the initial state F is the set of final or accepting states

Non-deterministic and Deterministic Turing Machines If δ is to a number of possibilities δ : Q X Γ  {Q X Γ X {L, R}} Then the TM is an NDTM; else it is a DTM

Decision problems Problems whose answer is yes/no For example, Hamilton Circuit: Does an undirected graph have a path that visits every node and comes back to the starting node? Subset sum: Given a finite set of integers, is there a subset of them that sums to 0?

The sets NP and P Suppose for a decision problem, an NDTM is found that takes time polynomial in the length of the input, then we say that the said problem is in NP If, however, a DTM is found that takes time polynomial in the length of the input, then we say that the said problem is in P

Relation between P and NP Clearly, P is a subset of NP Is P a proper subset of NP? That is the P = NP question

The concept of NP-completeness (informal definition) A problem is said to be NP-complete, if It is in NP, and A known NP-complete problem is reducible TO it. The ‘first’ NP-complete problem is satisfiability: Given a Boolean Formula in Conjunctive Normal Form (CNF), does is have a satisfying assignment, i.e., a set of 0-1 values for the constituting literals that makes the formula evaluate to 1? (even the restricted version of this problem- 3- sat- is NP-complete)

Example of 3-sat (x 1 + x 2 + x’ 3 )(x’ 1 + x 3 ) is satisfiable: x 2 = 1 and x 3 = 1 x 1 (x 2 + x 3 )x’ 1 is not satisfiable. {x’ I means complement of x i }

Numerous problems have been proven to be NP-complete The procedure is always the same: Take an instance of a known NP-complete problem; let this be p. Show a polynomial time Reduction of p TO an instance q of the problem whose status is being investigated. Show that the answer to q is yes, if and only if the answer to p is yes.

Clarifying the notion of Reduction Convex Hull problem: Given a set of points on the two dimensional plane, find the convex hull of the points xx x x x x x x p1p1 p4p4 p5p5 p6p6 p8p8 p2p2 p3p3 p7p7 P 1, p 4, p 5, p 6 and p 8 are on the convex hull

Complexity of convex hull finding problem We will show that this is O(nlogn). Method used is Reduction. The most important first step: choose the right problem. We take sorting whose complexity is known to be O(nlogn)

Reduce Sorting to Convex hull (caution: NOT THE OTHER WAY) Take n numbers a 1, a 2, a 3, …, a n which are to be sorted. This is an instance of a sorting problem. From this obtain an instance of a convex hull problem. Find the convex hull of the set of points,,,, …, This transformation takes linear time in the length of the input

Pictorially… x (a 5,0)(a 9,0)(a 3,0) xxx (a 8,0) (a n,0) xx (0,1) Convex hull: Effectively sorts the numbers

Convex hull finding is O(nlogn) If the complexity is lower, sorting too has lower complexity Because by the linear time procedure shown, ANY instance of the sorting problem can be converted to an instance of the CH problem and solved. This is not possible. Hence CH is O(nlogn)

Important remarks on reduction

Problem Input Situation Algorithm i1i2i3...imi1i2i3...im A1A2...AnA1A2...An Case Scenario : Algorithm A p on input i q Worst Case of best algorithm

Case Scenario : Algorithm A p on input i p Worst Case of best algorithm Time complexity O(|i ↓ |) |i ↓ | length of i ↓

Example Sorting (ascending) InputAlgorithm etc Bubble Sort Heap Sort Merge Sort Best Algorithm : Quicksort Worst Case: Already sorted sequence

P1P1 P2P2 Algorithm for Solving P 1 Less complexity than worst case complexity of P 1 Instance of A P2P2 P1P1 Algorithm for Solving P 2 Less complexity than worst case complexity of P 2 Instance of B Transformations

P1P1 1. Sorting 2. Set Splitting P2P2 1. Convex Hull 2. Linear Confinement We know the worst case complexity of P 1 Worst case complexity of P 1 P 2 → P 1 : new algorithm for solving P 2

For any problem Situation A when an algorithm is discovered, and its worst case complexity calculated, the effort will continuously be to find a better algorithm. That is to improve upon the worst case complexity. Situation B Find a problem P 1 whose worst case complexity is known and transform it to the unknown problem with less complexity. That puts a seal on how much improvement can be done on the worst case complexity.

Worst case complexity of best complexity O(nlogn) Worst complexity of bubble sort O(n 2 ) Example : Sorting : the trinity of complexity theory

Training of 1 hidden layer 2 neuron feed forward NN is NP- complete

Numerous problems have been proven to be NP-complete The procedure is always the same: Take an instance of a known NP-complete problem; let this be p. Show a polynomial time Reduction of p TO an instance q of the problem whose status is being investigated. Show that the answer to q is yes, if and only if the answer to p is yes.

Training of NN Training of Neural Network is NP-hard This can be proved by the NP- completeness theory Question Can a set of examples be loaded onto a Feed Forward Neural Network efficiently?

Architecture We study a special architecture. Train the neural network called 3-node neural network of feed forward type. ALL the neurons are 0-1 threshold neurons

Architecture h 1 and h 2 are hidden neurons They set up hyperplanes in the (n+1) dimensions space.

Confinement Problem Can two hyperplanes be set which confine ALL and only the positive points? Positive Linear Confinement problem is NP- Complete. Training of positive and negative points needs solving the CONFINEMENT PROBLEM.

Solving with Set Splitting Problem Set Splitting Problem Statement: Given a set S of n elements e 1, e 2,...., e n and a set of subsets of S called as concepts denoted by c 1, c 2,..., c m, does there exist a splitting of S i.e. are there two sets S 1 (subset of S) and S 2 (subset of S) and none of c 1, c 2,..., c m is subset of S 1 or S 2

Set Splitting Problem: example Example S = {s 1, s 2, s 3 } c 1 = {s 1, s 2 }, c 2 = {s 2, s 3 } Splitting exists S 1 = {s 1, s 3 }, S 2 = {s 2 }

Transformation For n elements in S, set up an n- dimensional space. Corresponding to each element mark a negative point at unit distance in the axes. Mark the origin as positive For each concept mark a point as positive.

Transformation S = {s 1, s 2, s 3 } c 1 = {s 1, s 2 }, c 2 = {s 2, s 3 } x1x1 x2x2 x3x3 (0,0,0) +ve (0,0,1) -ve (0,1,0) -ve (1,0,0) -ve (1,1,0) +ve (0,1,1) +ve

Proving the transformation Statement – Set-splitting problem has a solution if and only if positive linear confinement problem has a solution. Proof in two parts: if part and only if part If part – Given Set-splitting problem has a solution. – To show that the constructed Positive Linear Confinement (PLC) problem has a solution – i.e. to show that since S 1 and S 2 exist, P 1 and P 2 exist which confine the positive points

Proof – If part P 1 and P 2 are as follows: – P 1 : a 1 x 1 + a 2 x a n x n = -1/2-- Eqn A – P 2 : b 1 x 1 + b 2 x b n x n = -1/2-- Eqn B a i = -1,if s i ε S 1 = n, otherwise b i = -1, if s i ε S 2 = n, otherwise

Representative Diagram

Proof (If part) – Positive points For origin (a +ve point), plugging in x 1 = 0 = x 2 =... = x n into P 1 we get, 0 > -1/2 For other points – +ve points correspond to c i ’s – Suppose c i contains elements {s 1 i, s 2 i,..., s n i }, then at least one of the s j i cannot be in S 1 ∴ co-efficient of x j i = n, ∴ LHS > -1/2 Thus +ve points for each c i belong to the same side of P 1 as the origin. Similarly for P 2.

Proof (If part) – Negative points -ve points are the unit distance points on the axes They have only one bit as 1. Elements in S 1 give rise to m 1 -ve points. Elements in S 2 give rise to m 2 -ve points. -ve points corresponding to S 1 – If q i ε S 1 then x i in P 1 must have co-efficient -1 ∴ LHS = -1 < -1/2

What has been proved Origin (+ve point) is on one side of P 1 +ve points corresponding to c i ’s are on the same side as the origin. -ve points corresponding to S 1 are on the opposite side of P 1

Illustrative Example Example – S = {s 1, s 2, s 3 } – c 1 = {s 1, s 2 }, c 2 = {s 2, s 3 } – Splitting : S 1 = {s 1, s 3 }, S 2 = {s 2 } +ve points: – (,+), (,+), (,+) -ve points: – (,-),(,-), (,-)

Example (contd.) The constructed planes are: P 1 :  a 1 x 1 + a 2 x 2 + a 3 x 3 = -1/2  -x 1 + 3x 2 – x 3 = -1/2 P 2 :  b 1 x 1 + b 2 x 2 + b 3 x 3 = -1/2  3x 1 – x 2 + 3x 3 = -1/2

Example (contd.) P 1 : -x 1 + 3x 2 – x 3 = -1/2 : LHS = 0 > -1/2, – ∴ is +ve pt (similarly, and are classified as +ve) : LHS = -1 < -1/2, – ∴ is -ve pt : LHS = -1 < -1/2, – ∴ is -ve pt But is classified as +ve, i.e., cannot classify the point of S 2.

Example (contd.) P 2 : 3x 1 – x 2 + 3x 3 = -1/2 : LHS = 0 > -1/2 – ∴ is +ve pt : LHS = 2 > -1/2 – ∴ is +ve pt : LHS = 2 > -1/2 – ∴ is +ve pt : -1 < -1/2 – ∴ is -ve pt

Graphic for Example P1P1 P2 P2 + - S1S1 S2S2

Proof – Only if part Given +ve and -ve points constructed from the set-splitting problem, two hyperplanes P 1 and P 2 have been found which do positive linear confinement To show that S can be split into S 1 and S 2

Proof - Only if part (contd.) Let the two planes be: – P 1 : a 1 x 1 + a 2 x a n x n = θ 1 – P 2 : b 1 x 1 + b 2 x b n x n = θ 2 Then, – S 1 = {elements corresponding to -ve points separated by P 1 } – S 2 = {elements corresponding to -ve points separated by P 2 }

Proof - Only if part (contd.) Since P 1 and P 2 take care of all -ve points, their union is equal to S... (proof obvious) To show: No c i is a subset of S 1 and S 2 i.e., there is in c i at least one element ∉ S 1 -- Statement (A)

Proof - Only if part (contd.) Suppose c i ⊂ S 1, then every element in c i is contained in S 1 Let e 1 i, e 2 i,..., e mi i be the elements of c i corresponding to each element Evaluating for each co-efficient, we get, – a 1 < θ 1,a 2 < θ 1,...,a mi < θ 1 -- (1) – But a 1 + a a m > θ 1 -- (2) – and 0 > θ 1 -- (3) CONTRADICTION

What has been shown Positive Linear Confinement is NP-complete. Confinement on any set of points of one kind is NP- complete (easy to show) The architecture is special- only one hidden layer with two nodes The neurons are special, 0-1 threshold neurons, NOT sigmoid Hence, can we generalize and say that FF NN training is NP-complete? Not rigorously, perhaps; but strongly indicated

Summing up Some milestones covered A* Search Predicate Calculus, Resolution, Prolog HMM, Inferencing, Training Perceptron, Back propagation, NP-completeness of NN Training Lab: to reinforce understanding of lectures Important topics left out: Planning, IR (advanced course next sem) Seminars: breadth and exposure Lectures: Foundation and depth

Language Processing & Understanding Information Extraction: Part of Speech tagging Named Entity Recognition Shallow Parsing Summarization Machine Learning: Semantic Role labeling Sentiment Analysis Text Entailment (web 2.0 applications) Using graphical models, support vector machines, neural networks IR: Cross Lingual Search Crawling Indexing Multilingual Relevance Feedback Machine Translation: Statistical Interlingua Based English  Indian languages Indian languages  Indian languages Indowordnet Resources: Publications: Linguistics is the eye and computation the body