Well, the convergence tests are well underway running on SharcNet (whale serial cluster) for the new datasets (2 Aqueous Solubility, 1 Melting Point)– I’m currently only running them for SMILES due to a problem with one of the Aqueous Solubility sets. I’ve decided to use a “ridiculously high” number of parameters (12 hidden units per hidden layer), and a “ridiculously low” learning rate / momentum (0.3 each).
A convergence test is basically used to see if the data and parameters chosen can be operated on by the NGN. For me, the test set and training sets are identical in convergence testing.
I’ll probably rerun this with yet lower learning (train = 0.15 / momentum = 0.45). Why? Because I have a feeling that the number of hidden units actually enables a greater chance of accidentally falling into the close search space, while the minute math arguments allow these accidents to be gently brushed into being.
I’m presently running all the regression data I’ve ever run in past– this time in both classic normalization and rank-normalization. Finally, one of the Aqueous Solubility datasets came with poorly formed SMILES– they can’t be parsed by the grammars that I made up, and they can’t be parsed by OpenBabel so conversion to InChI wasn’t possible in this round, let alone operation by my system. I’m going to presume they’re broken and skip those entries which still leaves well over a thousand exemplars in that set — this set will have to be rerun with the correct data soon.
Convergence rmse has been set to 1%, meaning that I have a stricter idea of what I think is a deviation from correct. I want to plot the results as a “target on actual” residuals line plot when this is done to give me an idea of how these parameters worked on self-comparison.
Of course, everything here is subject to editing, as I figure exactly how to implement and deploy the previously mentioned changes to how regression experiments are defined (for convergence, for boosting/balancing/verification).
Most immediate next steps:
- Select OK Aqueous SMILES and run.
- Rerun all with InChI-NGN