NGN Generator Instructions
From SnOwy - Ed's Wiki Notebook
This page documents how to operate the NGN generator script. This page refers to NGN Generator versions 1.01 to 3.00.
There may be inaccuracies or omissions, but I've tried to be as comprehensive as possible.
Contents |
Building the NGN Source
Run main.py-- the 'usage' line is misleading and well, incorrect.
The first argument is the location of the grammar file-- in general, that would be:
grammars/something.bnf
Where 'something' is the name of the file.
The second argument is the location that you want the built source to occur.
So, here's an example that will build the default 'smiles' executable, and create a directory called 'test' where the sources will be compiled.
main.py grammars/smiles.bnf test
Compiling the NGN Source
In order to compile the NGN source, you go into the directory that you made in the previous step, and run the make file with no arguments.
Continuing from the previous example, that would be
cd test make
The output executable has the same name as the directory-- in this example, this produces the binary:
test/test
SharcNet and Python 2.3 -- important notes
SharcNet has Python 2.3 installed universally across all machines on all nodes. This implies you cannot perform the Build step above on SharcNet. A work around is to run Build locally, then to run Compile on SharcNet.
What now?
So now that you have a working binary, it would be useful to know what arguments it takes and what format it expects as input...
Note: There's some really messy code here because the argument handling is admittedly terrible-- but if you follow these instructions, I'm sure you'll be OK.
Different modes to run the NGN executable
There are two categories of modes that a generated NGN binary can run in, they are training mode and recognition mode. In training mode, a suite of input vectors and output vectors are specified and a weight file is produced which encloses the transformations needed to map the two suites of vectors.
Running in any mode...
One has to use the command in this form in any mode because of the ad hoc nature of the code.
echo "0" | ./binary <args>
Where "0" is any integer to be used as a srand or srandom seed modifier, and binary is any executable generated by the NGN generator. The <args> item specifies runtime parameters.
Running the NGN in Help Mode
Running the NGN in help mode just dumps out the possible ways to run the NGN...
echo "0" | ./binary -h
... and yields the following output to stdout...
Seed=625129656, ==== help ====
This executable can run in one of the following modes:
---
- for completely uncached training (very disk access heavy),
use "-t" "-loud"?* <inputFile> <inResponse> <outWeights**>
- for training mode with single file read,
use "-tdiskonce" "-loud"? <inputFile> <inResponse> <outWeights**>
- for training mode with single parse (very memory heavy),
use "-tparseonce" "-loud"? <inputFile> <inResponse> <outWeights**>
- to load a weights file and continue training, use "-l" (ell)
followed by "-loud"? <inputFile> <inResponse> <inWeights> <outWeights**>
- loading defaults to uncached training,
use "-ldiskonce" or "-lparseonce" respectively.
- for loud recognition mode,
use "-r -loud" <inputFile> <outResponse***> <inWeights>
the network responses are reported to console.
- for quiet recognition mode,
use "-r" <inputFile> <outResponse***> <inWeights> <inResponse>
the network responses are reported to outResponse. a final RMSE is reported to console.
- for interactive mode, use "-i" <inWeights>
- to display the NGN structure and internals, use "-idebug" <inWeights>
- for a light weight interactive mode without a prompt suitable for piping,
use "-ipipe" <inWeights>.
---
Notes:
---
* if "-loud" is specified, the progress of training is reported to console.
if it is not specified, only the final Epoch, RMSE and Convergence
are reported to console.
** if outWeights exists, it is overwritten.
*** if outResponse exists, it is overwritten.
- "inputFile" is the name of the file containing strings to parse (read-only).
- "inResponse" is the name of the file containing response values (read-only).
- "outResponse" is the name of a file to write response values into (overwrite).
- "inWeights" is the name of a saved weights file (read-only).
- "outWeights" is the name of a saved weights file (overwrite).
Running the NGN in Training Mode
I'll indicate each of the training modes, and explain each; I'll then indicate which mode is most useful...
Uncached Training
- for completely uncached training (very disk access heavy), use "-t" "-loud"?* <inputFile> <inResponse> <outWeights**>
Use Uncached Training on very large datasets; this is the most memory economic mode, but thrashes the harddisk since each input and output vector pair are read each time they're considered by the NGN. The optional -loud switch orders the NGN to output periodic statements to console, and is not recommended when used in a wrapping script that automatically manages the binary. <inputFile> specifies a file containing \n-delimited input vectors, <inResponse> specifies a file containing \n-delimited output vectors and <outWeights**> specifies a file name to write a weight files to.
Troubleshooting:
Q: What do I do if the NGN binary exits with an error saying that there aren't the same number of input and output vectors?
A: Check to make sure that the number of input vectors is the same as the number of output vectors (count the lines in those files).
A: Check to make sure that the number of output vector units specified in the grammar is the same as the number of values in the inResponse (output vector) file.
RECOMMENDED MODE: Disk Once Training
- for training mode with single file read, use "-tdiskonce" "-loud"? <inputFile> <inResponse> <outWeights**>
Everything for this mode is identical as in Uncached Training, but the internal behaviour is different. The input and output vectors are cached as two matched arrays of character strings.
Parse Once Training
- for training mode with single parse (very memory heavy), use "-tparseonce" "-loud"? <inputFile> <inResponse> <outWeights**>
Everything is again the same as in Uncached Training-- again, internal behaviour is different: The input and output vectors are not cached, instead the NGN parsed structures themselves are; this mode consumes the most memory and it has not yet been elucidated just how much of that memory goes to swap on a particular system. Even so, this is the fastest way to run the NGN software-- operating approximately five times faster given sufficient system memory.
Load and continue Uncached Training
- to load a weights file and continue training, use "-l" (ell) followed by "-loud"? <inputFile> <inResponse> <inWeights> <outWeights**>
No text yet.
Load and continue Cached Training (Disk Once OR Parse Once)
- loading defaults to uncached training, use "-ldiskonce" or "-lparseonce" respectively.
No text yet.
Running the NGN in Recognition or Testing Mode
I'll give the recognition modes the same treatment here...
Loud Recognition Mode
- for loud recognition mode, use "-r -loud" <inputFile> <outResponse***> <inWeights> the network responses are reported to console.
Not recommended, no text yet.
RECOMMENDED: Quiet Recognition Mode
- for quiet recognition mode, use "-r" <inputFile> <outResponse***> <inWeights> <inResponse> the network responses are reported to outResponse. a final RMSE is reported to console.
<inputFile> is the file that contains the \n-delimited input strings to be tested.
<outResponse***> is the name of the file to write produced \n-delimited output vectors.
<inWeights> is a file of trained weights created during a previous training step.
<inResponse> is the file that contains the \n-delimited OUTPUT vectors to be compared with the guessed outputs. This is used for calculating RMSE... if you don't have this available, just specify a \n-delimited file filled with 0.0.
Running the NGN in Piped Recognition Mode
Ditto...
- for interactive mode, use "-i" <inWeights> - to display the NGN structure and internals, use "-idebug" <inWeights> - for a light weight interactive mode without a prompt suitable for piping,
use "-ipipe" <inWeights>.