Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘SharcNet’ tag


without comments

Brief: I forgot all about sqkillall.py! It’s a convenience script for killing all of the SharcNet jobs belonging to you! (More about it; Source code).

Eddie Ma

July 28th, 2009 at 1:02 pm

Meeting with Chris

without comments

Brief: Met with Chris last week. Chris finished with the convergence tests and some cross validation sets on his descriptors and recommended his own design for 80/20 prediction tests… Meanwhile, I’ve updated the InChI grammar used for the NGN to work with the new data, and have set up experiments to run convergence tests using the SMILES-NGN and InChI-NGN on the eight possible QSAR datasets on SharcNet (16 processes total)… Next on the list– create a script to evaluate his preliminary cross validation experiments (based on Neural Network predicted vs. target values) and provide instructions for running the convergence tests with my NGN software… Will need to pull up an old nugget.py to wrap the convergence test (current one doesn’t halt and always runs 100 trials).

Soon: Port everything to Ubuntu Linux so that we can maintain compatibility without further porting care of Sun Virtualbox VM… Meeting again tomorrow…

Convergence Tests

without comments

Well, the convergence tests are well underway running on SharcNet (whale serial cluster) for the new datasets (2 Aqueous Solubility, 1 Melting Point)– I’m currently only running them for SMILES due to a problem with one of the Aqueous Solubility sets. I’ve decided to use a “ridiculously high” number of parameters (12 hidden units per hidden layer), and a “ridiculously low” learning rate / momentum (0.3 each).

A convergence test is basically used to see if the data and parameters chosen can be operated on by the NGN. For me, the test set and training sets are identical in convergence testing.

I’ll probably rerun this with yet lower learning (train = 0.15 / momentum = 0.45). Why? Because I have a feeling that the number of hidden units actually enables a greater chance of accidentally falling into the close search space, while the minute math arguments allow these accidents to be gently brushed into being.

I’m presently running all the regression data I’ve ever run in past– this time in both classic normalization and rank-normalization. Finally, one of the Aqueous Solubility datasets came with poorly formed SMILES– they can’t be parsed by the grammars that I made up, and they can’t be parsed by OpenBabel so conversion to InChI wasn’t possible in this round, let alone operation by my system. I’m going to presume they’re broken and skip those entries which still leaves well over a thousand exemplars in that set — this set will have to be rerun with the correct data soon.

Convergence rmse has been set to 1%, meaning that I have a stricter idea of what I think is a deviation from correct. I want to plot the results as a “target on actual” residuals line plot when this is done to give me an idea of how these parameters worked on self-comparison.

Of course, everything here is subject to editing, as I figure exactly how to implement and deploy the previously mentioned changes to how regression experiments are defined (for convergence, for boosting/balancing/verification).

Most immediate next steps:

  • Select OK Aqueous SMILES and run.
  • Rerun all with InChI-NGN

Eddie Ma

May 31st, 2009 at 9:52 am