Stat 6601 Project: Neural Networks (V&R 6.3) 12/4/2018 Stat 6601 Project: Neural Networks (V&R 6.3) Group Members: Xu Yang Haiou Wang Jing Wu 12/4/2018 Chapter 8.11 Neural Networks
Definition Neural Network There are various classes of NN models. A broad class of models that mimic functioning inside the human brain There are various classes of NN models. They are different from each other depending on: (1) Problem types, prediction, Classification , Clustering (2) Structure of the model (3) Model building algorithm We will focus on feed-forward neural network. 12/4/2018
A bit of biology . . . Most important functional unit in human brain – a class of cells called – NEURON Dendrites – Receive information Cell Body – Process information Axon – Carries processed information to other neurons Synapse – Junction between Axon end and Dendrites of other Neurons Dendrites Cell Body Axon Neural Nework Synapse Neurons 12/4/2018
An Artificial Neuron . I V = f(I) f X1 w1 X2 w2 Receives Inputs X1 X2 … Xp from other neurons or environment Inputs fed-in through connections with ‘weights’ Total Input = Weighted sum of inputs from all sources Transfer function (Activation function) converts the input to output Output goes to other neurons or environment f X1 X2 Xp I I = w1X1 + w2X2 + w3X3 +… + wpXp V = f(I) w1 w2 . wp Dendrites Cell Body Axon Direction of flow of Information 12/4/2018
Simplest but most common form (One hidden layer) 12/4/2018
Choice for Activation function Tanh (hyperbolic tangent) f(x) = (ex – e-x) / (ex + e-x) -1 1 0.5 Logistic f(x) = ex / (1 + ex) Threshold 0 if x< 0 f(x) = 1 if x >= 1 12/4/2018
A collection of neurons form a layer Input Layer - Each neuron gets ONLY one input, directly from outside Hidden Layer - Connects Input and Output layers Output Layer - Output of each neuron directly goes to outside x1 wij Input layer Hidden Layer(s) Outputs x2 x3 x4 12/4/2018
More general format Skip-layer connections Outputs Hidden Input layer wij Input layer Hidden Layer(s) Outputs 12/4/2018
Fitting criteria Least squares Maximum likelihood Log likelihood One way to ensure f is smooth: E+λC(f ) 12/4/2018
Usage of nnet in R nnet.formula(formula, data=NULL, weights, ..., subset, na.action=na.fail, contrasts=NULL) size: number of units in the hidden layer. Can be zero if there are skip-layer units. Wts: initial parameter vector. If missing chosen at random. linout: switch for linear output units. Default logistic output units. entropy: switch for entropy (= maximum conditional likelihood) fitting. Default by least-squares. softmax: switch for softmax (log-linear model) and maximum conditional. skip: Logical for links from inputs to outputs. formula: A formula of the form 'class ~ x1 + x2 + ...' weights: (case) weights for each example - if missing defaults to 1. rang: if Wts is missing, use random weights from runif(n, -rang, rang). decay: Parameter λ. maxit: maximum of iterations for the optimizer. Hess: Should the Hessian matrix at the solution be returned? trace: logical for output form the optimizer. 12/4/2018
An Example Code: library(MASS) library(nnet) attach(rock) area1<-area/10000; peri1<-peri/10000 rock1<-data.frame(perm, area=area1, peri=peri1, shape) rock.nn<-nnet(log(perm)~area + peri +shape, rock1, size=3, decay=1e-3, linout=T, skip=T, maxit=1000, hess=T) summary(rock.nn) 12/4/2018
Output > summary(rock.nn) a 3-3-1 network with 19 weights options were - skip-layer connections linear output units decay=0.001 b->h1 i1->h1 i2->h1 i3->h1 9.48 -7.39 -14.60 6.94 b->h2 i1->h2 i2->h2 i3->h2 1.92 -11.87 -2.88 7.36 b->h3 i1->h3 i2->h3 i3->h3 -0.03 -11.12 15.61 4.62 b->o h1->o h2->o h3->o i1->o i2->o i3->o 2.64 3.89 11.90 -17.76 -0.06 4.73 -0.38 >sum((log(perm)-predict(rock.nn))^2) [1] 11.39573 # weights: 19 initial value 1712.850737 iter 10 value 34.726352 iter 20 value 32.725356 iter 30 value 30.677100 iter 40 value 29.430856 …………………………………. iter 140 value 13.658571 iter 150 value 13.248229 iter 160 value 12.941181 iter 170 value 12.913059 iter 180 value 12.904267 iter 190 value 12.901672 iter 200 value 12.900292 iter 210 value 12.899496 final value 12.899400 converged 12/4/2018
Use the same method from previous section to view the fitted surface Code: Xp <- expand.grid(area = seq(0.1, 1.2, 0.05), peri = seq(0, 0.5, 0.02), shape = 0.2) trellis.device() rock.grid <- cbind(Xp, fit = predict(rock.nn,Xp)) ## S: Trellis 3D Plot wireframe(fit ~ area + peri, rock.grid, screen = list(z = 160, x = -60), aspect = c(1, 0.5), drape = T) 12/4/2018
Output 12/4/2018
Experiment to show key factor which affects the degree of fit attach(cpus) cpus3 <- data.frame(syct = syct-2, mmin = mmin-3, mmax = mmax-4, cach = cach/256, chmin = chmin/100, chmax = chmax/100, perf = perf) detach() test.cpus <- function(fit) sqrt(sum((log10(cpus3$perf) - predict(fit, cpus3))^2)/109) cpus.nn1 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 0) test.cpus(cpus.nn1) [1] 0.271962 cpus.nn2 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 4, decay = 0.01, maxit = 1000) test.cpus(cpus.nn2) [1] 0.2130121 cpus.nn3 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 10, decay = 0.01, maxit = 1000) test.cpus(cpus.nn3) [1] 0.1960365 cpus.nn4 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 25, decay = 0.01, maxit = 1000) test.cpus(cpus.nn4) [1] 0.1675305 12/4/2018