Notes 20110113 CIS 6050 Neural Networks
From SnOwy - Ed's Wiki Notebook
Introduction to Neural Networks
- overview -- what are they used for?
- classification
- clustering
- modelling real neurons (not in this class -- connectionism)
- non-linear problems
Clustering
- input data x → ANN → a map
- a neural network builds a map so that input data is clustered -- a topologically consistent formation
- things that are similar are closer together; we can figure when these things are different
Classification
- associates a label or a name with an input
- input data x → ANN → output label
- a very simple example: inputs are temperature in °C → ANN → state of water
- classifying sequences
- an interesting example: < 45, 46, 50, 53 > → ANN → increasing
Components of ANN
- architecture: weights and processing units
- rule to ...
- set the weight values
- store and retrieve information
Weights (connections, synapses)
- connect two processing units
- change values during training
- often a two-dimensional array
Processing Elements (nodes, units, PE's, neurons)
- receive inputs from the weights
- often a one-dimensional array
A single node
- and associated weights
- Inputs < I1, I2, I3 > &rarr < Wk1, Wk2, Wk3 > → node (with bias w) → vk → f(vk) → ok (output)
- the inputs Ii and weights
- the node with bias w
- the net activation, transfer function and output ok
The Inputs and Weights
- I1wk1, I2wk2 ... = IW
- where I is a 1D array, and W is a 2D array
- in some cases, the calculation is a branching operation: choose the maximum of the input and weight
The Processing Node
- result of IW is summed, creates Vk
- vk = ΣiIiwik
- bias value is also added
- the bias is normally a threshold value
The Activation Function (threshold function)
- scales the net activation to something useable by limiting the range
- without this, the output magnitude could be large
- normally bounded by [0, 1] or [-1, 1]
- generally non-linear
- Ok = f(ΣiIiwki)
Typical Activation Functions
Threshold Function (step, bipolar)
- f(x < 0) = 0j
- f(x ≥ 0) = 1j
- where j in real
Piecewise Linear (non-linear ramp)
- f(x < -1i) = 0j
- f(x ≥ -1i or x ≤ li) = x
- f(x > 1i) = 1j
- where i, j in real
- a simplification of the sigmoid function
Sigmoid Function (e.g. logistic, e.g. tanh)
- S-shaped, bounded, monotonic, non-increasing
- monotonic stipulates that the function may not repeat the same range value
- non-increasing stipulates that the same value of the range may repeat
- logistic: (0, 1) -- S(x) = (1+e-x)-1
- tanh: (-1, 1) -- S(x) = tanh(x)
- not all networks use a non-linear activation function -- some use sums only
Node Organization
- organized in layers
- input layer -- receives signals from the environment
- output layer -- sends signals or results out to the environment
- hidden layer -- receives signals from a neural layer, sends signals to a neural layer; no direct connection with environment
- generally, all processing is completed in one layer before the results are passed to the next layer
- less often, nodes are considered as a bunch -- i.e. not in layers
- a wild tangle of nodes connected to one another
- the Boltzmann machine
Setting the Weights
- the credit assignment problem
- a difficult problem
Interconnection Schemes
- the patterns for which the weights are connected
- intrafield (lateral) connections
- between nodes of the same layer
- often just called lateral
- interfield connections
- between nodes of different layers
- often just called connections (default)
- can be one or two directional (can have different weights or two sets of weights)
- completely connected: when all of nodes in one layer are connected to all nodes in the next layer (bipartite)
- recurrent connections
- connects a node to itself
- intrafield (lateral) connections