Notes 20110120 CIS 6050 Neural Networks
From SnOwy - Ed's Wiki Notebook
Contents |
Administrivia
- Assignment 1 -- due soon
- Lit review -- due soon
Architectures
- k-NN -- the representations are actually transparent and human interpretable
- looks like an associative array
Competitive learning
- Self organizing maps
- Adaptive resonance theory
- output nodes compete to become active
- usually only one node can remain active simultaneously
- each output is responsible for a specific group of patterns
- normal feed forward training with the addition of lateral inhibitory nodes
- interlayer connections are excitatory
- intralayer connections are inhibitory
- the winning node has the highest activation
- losing nodes have their activations set to zero
- learning occurs on weight attached only to the winning node k.
- Δ { η(xj - wk,j if k is winner; 0 if k is loser }
- over time, the weight vector wk moves toward the input pattern x
- we can then look at the weights that literally describe the desired input patterns
- as opposed to the back propagation networks
- this may not be perfect if the input vectors are dramatically different
Boltzmann Learning
- recurrent structure
- binary nodes -- nodes are on and off
- calculates an energy value ...
- E = 0.5 ΣjΣkwkjxkxj
- where xk, xj are node states (activation)
- wkj -- weights connecting them
- nodes are flipped at random
- changes are kept if they reduce E
- the difference between desired and actual behaviour is used to adjust weights
Back Propagation
- at least three layers of nodes are needed
- i.e. two weight layers are needed
- input layer FA (F is field)
- hidden layer FB
- output layer FC
- normally completely connected
- input-output pairs are presented to the network in FA
- activations created in FB
- outputs created in FC
- weights adjusted to associate input patterns to outputĀ patterns
- requires many iterations
Encoding Patterns into Weights
- initialization
- assign random values in [-1, +1] to all weights between FA and FB; FB and FC -- Vhi and Wij
- each node in FB and FC also have threshold values Θi and Γj (also randomized)
- notation: Weight layers have been given V and W; thresholds have been given Θ and Γ
- input pattern given to FA
- call it Ak
- calculate the FB, FC
- bi = f( Σh=1naivhi+Θi )
- bi -- FB activation for each node
- ai -- activation values for each input node
- vhi -- weights between FA and FB
- Θi -- threshold for FB
- f() sigmoid function ...
- sigmoid function -- bounds the values
- f(x) → (1 + e-x)-1
- logistic sigmoid function
- calculate output activation
- pass FB through wij to FC
- cj = f (Σi=1mbiwij+Γj)
- ... infer
- learning starts
- calculate the error at the FC layer
- the difference between the seen cj and desired outputs cjk
- dj = cj(1-cj)(cjk-cj)
- calculate the errors in the hidden layer
- ei = bi(1-bi)Σj=1qwijdj
- adjusting the weights between FB and FC
- wij between FB and FC
- ΔWij = αbidj
- where bi connects to wij connects to dj
- α is the learning rate (learning constant) -- is a small number
- adjusting the weights between FA and FB
- ΔVij = βahei
- β is a learning constant (often equal to α)
- ΔVij = βahei
- adjust the thresholds
- Θ = βei
- Γ = αdj
- repeat steps 2 to 8 until error is arbitrarily low (or failure, whereupon you restart)
- for recall -- do steps 2 and 3 -- generates activation in FC with no change to the weights