Improving Neural Network Performance

From SnOwy - Ed's Wiki Notebook

Jump to: navigation, search

In this piece, a number of algorithms and features that I've used or intend to use will be covered each designed to improve neural network performance in one way or another. This is a collection of all of the ideas I've come across in my studies, and I hope that analogies from them can help in other areas of study as well.

Items marked "novel" are things that I've discovered. Methods that aren't cited are general knowledge in the field.

Contents

Improving Network Convergence Probability

stub

Blindspot (Novel)

stub

Improving Network Function Accuracy

stub

Balancing

Balancing a training set consists of ensuring an even distribution in the range of the training set as much as possible. Notice that while it’s important to have an evenly distributed range, where possible an attempt should be made to even out the domain as well to ensure that an inductive learning machine has a broad enough collection of exemplars to work with. Where this isn’t possible, a stochastic treatment is better than no treatment (i.e. pick out several combinations randomly).

Balancing abstractly reduces the probability that a neural network is working by chance distribution of range elements; in an extreme case, one could argue that an extension can be done to test the robustness of a method by reversing the balance that naturally occurs in the training set.

Boosting

Boosting a dataset refers to a change in training algorithm. Under normal circumstances, a neural network is trained with a static sequence of exemplars every single epoch; a boosted algorithm sees a dynamic treatment instead. In this treatment, the amount exposure of exemplars to the inference machine in training is inversely proportional to the accuracy of a prediction made for those exemplars; an exemplar on which a neural network performs poorly is shown to the network more often.

Boosting by contrast increases system bias. Care must be taken in each epoch so that the final algorithm used does not overwhelm the system with data points that are known to be flaky (unreliable).

Stochastic Gradient Descent

A minibatch is a subset of the training data -- in stochastic gradient descent, sets of minibatches are used to estimate the gradient at each epoch rather than the entire set of training data.

I first encountered the notion of minibatches when reading about restricted Boltzmann machines.

Special Architectures

Long Short-Term Memory Cells (LSTM)

stub, cite.

Different Training Algorithms

This is more of a summary of things that other people have tried. First, the gradient descent-back proposition (GDBP) algorithm isn't the only method available to neural networks. Coarsely, the GDBP falls under a class of algorithms that utilize the first derivative (gradient of error) to determine how to tune the weights to reduce error in the next epoch; other methods may involve using second derivatives and innovations based on Newton's Method.

BFGS Algorithm (Broyden–Fletcher–Goldfarb–Shanno)

stub, cite.

LMA (Levenberg–Marquardt algorithm)

stub, cite.

Parallelism

stub

Odd-Even Pairing

stub

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox