Notes 20110125 CIS 6050 Neural Networks
From SnOwy - Ed's Wiki Notebook
Contents |
Why Back Propagation
- memorization vs. generalization
- storing information is simple
- generalization is difficult -- extrapolation
- a memory does not generalize
- networks can memorize the training data
- this is not desirable (does not require a neural network)
- error correction algorithm -- desired output vs observed output; touches weights
- error propagates backwards through the network
- forward pass -- testing pass
- backward pass -- training
Types of Generalization
- linear interpolation
- non-linear interpolation -- spline-based interpolation
Data Partition
- training data
- testing data measures ability to generalize
- testing set ≤ training set
- testing set error > training set error
The Momentum Term
- a
Characteristics of BP
- learns offline in discrete time
- is not capable of simultaneously learning and training at the same time
- hetero-associative -- learns pattern pairs (pairs of vectors -- inputs and outputs)
- stores arbitrary analog spatial patterns (can also learn binary)
- called a multi-layer perceptron
Convergence -- finding a solution
- BP not guaranteed to find the global error minimum
- will find a local minimum
- can be slow to converge -- uses only local information (one datapoint)
- limited view of the problem surface
- must make many small steps using all training data
- if we knew the shape of the curve, we could solve it in one big step -- but we don't
- small steps take a long time
- large steps tend to oscillate -- too large to move down the error surface
- factors affecting convergence
- number of nodes in hidden layer
- size of learning rate
- amount of data
- complexity of the data
- noise in the data
Other things about a backprop network
A Theoretical Proof
- there's a proof that shows a three-layer BP ANN can approximate all differentiable continuous functions
Encoding
- use encodings (input and output patterns) that don't conceal the meaning of the data
- don't combine features into a single input
- two inputs with the ranges (1) [1,5]; (2) [0.1,0.9] -- don't encode these as a single value
- don't combine -- two inputs as (1) [1,5]; (2) [0.1-0.9]
- don't binary encode unless we're actually working with binary patterns
Credit Assignment Problem (blame assignment)
- find weight values for a non-linear network
- simplest non-linear problem -- XOR