Notes 20110331 CIS 6050 Neural Networks
From SnOwy - Ed's Wiki Notebook
Radial Basis Functions Continued
Getting the RBF Centres
- previous methods ...
- random
- start with all data and remove data until performance is affected
- orthogonal least squares -- start with no data and add data until performance does not improve
- since the hidden layer is just clustering, why don't we use a clustering method?
- SVM 1st layer training algorithms
- find a set of clusters which represent the distribution
- k-means clustering algorithm
- k-means provides k clusters, but we don't know what 'k' is to start
- minimizes the distance between centres and points
- alternative to k-means is to use an ANN which performs a similar function
- Kohonen SOM -- generates a set of prototype vectors which represent cluster centres
- Gaussian Mixture Models
- problem of density estimation
- basis functions are components of density model
- vectors are optimized by maximum likelihood
- addressing the problem as a purely statistical method
- the linear combination of the basis functions
- density is modelled using ...
- p(x) = Σj=1MP(j)φj(x)
- where P(j) is the prior probability for data points to have generated the jth component of the mixture
- φj(x) are the basis functions
- the likelihood function is ...
- L = Πnp(xn)
- which is minimized with respect to P(j)
- and the parameters of φj(x)
RBF's and BP (multi-layer perceptron)
- both approximate arbitrary non-linear functional mappings of multidimensional space
- mappings are created through combining multiple functions
- hidden units (MLP) -- sum of inputs transformed through a threshold function
- RBF -- distance to a prototype followed by a local transformation
- activation is actually a function of the similarity between prototypes
- MLP-distributed representation -- many hidden units contribute to the system
- training process is highly non-linear
- there are problems with local solutions
- can have slow convergence
- typically only a few hidden units -- have significant activations
- can have multiple layers of weights
- single shared training method that determines all of the weight values
- (RBF)s -- one hidden unit contributes to the solution
- non-linear training in first layer
- local solutions not a problem in the same manner
- faster convergence
- only one hidden unit has significant activation
- always has two layers of weights
- two stage training with different training methods for each weight layer