Notes 20110317 CIS 6050 Neural Networks
From SnOwy - Ed's Wiki Notebook
Contents |
Radial Basis Function (RBF)
- how do we make a very complicated non-linear space smooth in a lower dimension?
- press something complicated into a larger but simpler space
- imagine stretching out a crumpled piece of paper back into a plane
- the data might be solvable by linear functions now
- a three layer network
- not like a back-propagation
- must be exactly three layers of nodes
- uses a statistical measure to fit a curve to the data
- measures the quality of the fit
- this is a classifier
- back propagation does not measure the quality of a fit -- stochastic approximation
- the hidden unit activation ...
- determined by distance between input vector and prototype
- two stages of training
- parameters for basis function (hidden units) determined using unsupervised learning
- using only input data
- loosely, we are just clustering the data
- calculate the second weight layer
- create linear mapping from hidden layer to output
- parameters for basis function (hidden units) determined using unsupervised learning
- both steps are relatively fast
- rationale:
- two processing step
- first is non-linear
- second is attributed to Cover (1965)
- this work states that the pattern classification...
- problem cast into high dimensional space is more likely to be linearly separable
- the hidden units in RBF represent dimensionality into which we transform the input patterns
- this is why RBFs often have large numbers of hidden units
- the number of hidden units also related to ability of network to approximate smooth input→output mapping
Architecture
Input Hidden Output
O O O
O => O => O
O O O
O O
O
O
^ ^
| |
weights: |
non-linear |
mapping |
weights:
linear
mapping
- exact interpolation
- technique for performing interpolation of a set of data points
- interpolation means every input point is mapped onto a target
- every input point must appear as part of the system
- there is no smoothing
- exact interpolation is not desired in an RBF
- we actually want to build a network that can smooth out the data and correctly model variance
- it's expensive to create a new basis function for each data point
- we would like to have fewer basis functions in total that will represent a smoothed curve
- the RBF will save as many basis functions as there are data points (one per point)
- the activation function is
- h(x) = Σn = ΣnwnΦ(||x-xn||)
- Σ is a linear combination of basis functions
- Φ is the basis function
- ||x-xn|| is the distance between input vector x and training pattern x
- Φ(x) is often a Gaussian:
- Φ(x) = exp((-x2) / (2σ2))
- in this example, there is one global smoothing parameter σ
- for other data, different smoothing patterns are used (noisey data)
RBFs
- exact interpolation is not desirable in an RBF
- noisy data produces oscillations in exact interpolation (EI) function (not desirable)
- generally better if noise is smoothed (averaged)
- also want to reduce number of basis function
- ideally less than number of input patterns
- much less than N
- point sare interpolated
- oscillate between centres
- 2 basis functions x -- points not interpolated
- no oscillation between basis functions
changes to exact interpolation which leads to an RBF
we started with exact interpolation -- let's add a few things to make it into an RBF
- the number of basis functions M is much less than N (the number of inputs)
- the centres of the basis functions are not constrained by the input vectors
- determining centres becomes part of the training
- instead of using a common smoothing width parameter σ...
- each basis function has its own σj
- determined during training
- some centres will have larger width than others (variable area)
- a bias parameter is included in the linear sum
- compensates between differences between average/basis activations
- average of the targets
- equation for RBF activation becomes ...
- yk(x)=Σj=1MwkjΦj(x)+wk0
- Φj(x) is different for every basis function
- wk0 is the bias?
- Φj(x) = exp(- (||x-μj||)2 / (2σj2))
- x is the d dimensional input vector with elements xi
- μj is the vector determining the centre of the basis function Φj