Chapter 11 – Neural Nets Data Mining for Business Analytics

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Chapter 11 – Neural Nets Data Mining for Business Analytics Shmueli, Patel & Bruce

Neural Net in Human Brain 1011 = Number of neurons Each neuron connected to 10,000 other neurons 1015 = Number of synaptic connections

Basic Idea Combine input information in a complex & flexible neural net “model” Model “coefficients” are continually tweaked in an iterative process The network’s interim performance in classification and prediction informs successive tweaks

Network Structure Multiple layers Nodes Input layer (raw observations) Hidden layers Output layer Nodes Weights (like coefficients, subject to iterative adjustment) Bias values (also like coefficients, but not subject to iterative adjustment)

Schematic Diagram

Example – Using fat & salt content to predict consumer acceptance of cheese

Example – Using fat & salt content to predict consumer acceptance of cheese

Example - Data Hidden layer Node 3 Node 4 Node 5 Input Layer qj -0.3   Node 3 Node 4 Node 5 Input Layer qj -0.3 0.02 0.05 Node 1 (Fat) W1j -0.01 Node 2 (Salt) W2j 0.01 0.03 Output Layer   Node 6 Node 7 Hidden Layer qk -0.04 -0.015 Node 3  W3k -0.02 0.01 Node 4   W4k -0.03 0.05 Node 5   W5k 0.015

Example - Data

Moving Through the Network for obs #1 Fat Salt Acceptance 1 0.2 0.9

The Input Layer For input layer, input = output E.g., for record #1: Fat input = output = 0.2 Salt input = output = 0.9 Output of input layer = input into hidden layer

The Hidden Layer In this example, hidden layer has 3 nodes Each node receives as input the output of all input nodes Output of each hidden node is a function of the weighted sum of inputs

The Weights The weights q (theta) and w are typically initialized to random values in the range -0.05 to +0.05 Equivalent to a model with random prediction (in other words, no predictive value) These initial weights are used in the first round of training

Output of Node 3, if g is a Logistic Function

Output Layer The output of the last hidden layer becomes input for the output layer Uses same function as above, i.e. a function g of the weighted average

Relation to Linear Regression A net with a single output node and no hidden layers, where g is the identity function, takes the same form as a linear regression model

Back Propagation (“back-prop”) Output from output node k: Error associated with that node: Note: this is like ordinary error, multiplied by a correction factor

Error is Used to Update Weights l = constant between 0 and 1, reflects the “learning rate” or “weight decay parameter”

Case Updating Weights are updated after each record is run through the network Completion of all records through the network is one epoch (also called sweep or iteration) After one epoch is completed, return to first record and repeat the process

Batch Updating All records in the training set are fed to the network before updating takes place In this case, the error used for updating is the sum of all errors from all records

Why It Works Big errors lead to big changes in weights Small errors leave weights relatively unchanged Over thousands of updates, a given weight keeps changing until the error associated with that weight is negligible, at which point weights change little

Initial Pass Through Network Goal: Find weights that yield best predictions The process we described above is repeated for all records At each record, compare prediction to actual Difference is the error for the output node Error is propagated back and distributed to all the hidden nodes and used to update their weights

Common Criteria to Stop the Updating When weights change very little from one iteration to the next When the misclassification rate reaches a required threshold When a limit on runs is reached

Training the Model

Preprocessing Steps Scale variables to 0-1 Categorical variables If equidistant categories, map to equidistant interval points in 0-1 range Otherwise, create dummy variables Transform (e.g., log) skewed variables

Preprocessing Steps Scale variables to 0-1 Xnorm = 𝑋 −𝑎 𝑏 −𝑎 where [a, b] is the range of x-values If a and b are unknown estimate them from the data

Preprocessing Steps Categorical variables If binary create dummy variables If ordinal with equidistant categories, map to equidistant interval points in 0-1 range, such as [0, 0.25, 0.5, 0.75, 1] If nominal create dummy variables

Preprocessing Steps Highly skewed predictor Transform skewed variables before normalizing Log transformation and then normalize to [0, 1] scale TANH transformation (Symmetric in XL Miner) and normalize to [-1, 1] scale

XLMiner Output: Final Weights Note: XLMiner uses two output nodes (P[1] and P[0]); diagrams show just one output node (P[1])

XLMiner: Final Classifications

Avoiding Overfitting With sufficient iterations, neural net can easily overfit the data To avoid overfitting: Track error in validation data Limit iterations Limit complexity of network

User Inputs

Specify Network Architecture Number of hidden layers Most popular – one hidden layer Number of nodes in hidden layer(s) More nodes capture complexity, but increase chances of overfit Number of output nodes For classification, one node per class (in binary case can also use one) For numerical prediction use one

Network Architecture, cont. “Learning Rate” (l) Low values “downweight” the new information from errors at each iteration This slows learning, but reduces tendency to overfit to local structure “Momentum” High values keep weights changing in same direction as previous iteration Likewise, this helps avoid overfitting to local structure, but also slows learning

Automation Some software automates the optimal selection of input parameters XLMiner automates # of hidden layers and number of nodes

Advantages Good predictive ability Can capture complex relationships No need to specify a model

Disadvantages Considered a “black box” prediction machine, with no insight into relationships between predictors and outcome No variable-selection mechanism, so you have to exercise care in selecting variables Heavy computational requirements if there are many variables (additional variables dramatically increase the number of weights to calculate)

Summary Neural networks can be used for classification and prediction Can capture a very flexible/complicated relationship between the outcome and a set of predictors The network “learns” and updates its model iteratively as more data are fed into it Major danger: overfitting Requires large amounts of data Good predictive performance, yet “black box” in nature