Approximation and Generalization in Neural Networks

Approximation and Generalization in Neural Networks
CMS 165 Lecture 8 Approximation and Generalization in Neural Networks

Recall from previous lecture:
Hypothesis class: ℎ∈𝐻;ℎ:𝑋→𝑌 A loss function: 𝑙 𝑌,ℎ 𝑋 ∈𝑅, 𝑒.𝑔., 𝕀(𝑌≠ℎ(𝑋)) Expected risk: 𝐿 ℎ := 𝐸 𝑃 𝑙 𝑌,ℎ 𝑋 Expected risk minimizer: ℎ ∗ ∈ 𝑎𝑟𝑔 𝑚𝑖𝑛 ℎ∈𝐻 𝐿 ℎ Given a set of samples: 𝑥 𝑖 , 𝑦 𝑖 𝑖 𝑛 Empirical risk: 𝐿 ℎ ≔ 1 𝑛 𝑖 𝑛 𝑙 𝑦 𝑖 ,ℎ 𝑥 𝑖 Empirical risk minimizer: ℎ ∈ 𝑎𝑟𝑔 𝑚𝑖𝑛 ℎ∈𝐻 𝐿 ℎ 𝑃 X,𝑌∼𝑃

Measures of Complexity
𝐸 sup ℎ∈𝐻 𝐿 ℎ − 𝐿 ℎ =2 𝑅 𝑛 𝐻,𝑙 𝑅 𝑛 𝐻,𝑙 =𝐸 sup ℎ∈𝐻 1 𝑛 1 𝑛 𝜎 𝑖 𝑙 𝑌 𝑖 ,ℎ( 𝑋 𝑖 ) , 𝑤ℎ𝑒𝑟𝑒 𝜎 𝑖 𝑖𝑠 𝑅𝑎𝑑𝑒𝑚𝑐ℎ𝑒𝑟 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 −1,1 𝐿 ℎ −𝐿 ℎ ∗ ≤2𝑅 log 2 𝛿 𝑛 with prob at least 1−𝛿 VC-Dimension: 𝑅≤ 2𝑉𝐶 𝐻 log 𝑛 +1 𝑛 Linear class 𝑅≤𝑂 𝑑 log 2𝑑 𝑛 Bounded linear class 𝑅≤𝑂 𝛽 log 2𝑑 𝑛

Rademacher complexity of NN
From notes of Percy Liang

Decomposition of Errors
Derivation for linear regression

Universality of NN

Approximation in Shallow NN
Universality proof is loose: exponential number of units. Better bound? Better basis? How does it improve bound for various classes of functions?

Deep vs. Shallow Networks
What is the advantage of deep networks? Compositionality: requires exponential number of units in a shallow network

Classical NN theory

Modern Neural Networks
From Belkin etal, “Reconciling modern machine learning and the bias-variance trade-off”

Approximation and Generalization in Neural Networks

Similar presentations

Presentation on theme: "Approximation and Generalization in Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approximation and Generalization in Neural Networks

Similar presentations

Presentation on theme: "Approximation and Generalization in Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback