Convergences Detected!
Good News
Chris’ project has come back to the forefront– after I defend my thesis on Wednesday, it’ll certainly have all of my attention.
We will at least be meeting on Monday though to discuss what can be done in the interim.
Convergence Tests Went Fine
We decided that it would be good for Chris to run a few convergence tests on the datasets he put together across each of the available descriptor sets. So far, many have come back converged meaning that it would be good to proceed. There are two concerns I have. First, do we want to melt the converged descriptors together; do we want to melt all of the descriptors together regardless of convergence? Second, if we don’t– can we do it after the fact and argue that neural network convergence is a good determiner for what descriptors are correlated with results we care about?
Melting Descriptors
To clarify– I mean “concatenating” real value vectors when I say “melting”. This means that we splice together a few linear arrays of numbers and come up with a new longer array that’s still fixed length.
The second question is only true if it turns out that selected melted converged descriptors have better predictive power than when all descriptors are melted together– it’s an even stronger case (and more practical) if it turns out that the descriptors behave better in concert than any particular subset on its own.
That would be an interesting case. The cost of running an additional eight or sixteen experiments to test that hypothesis is cheap to set up, cheap to do.
Alternatives– A Faster Solution
An alternative approach is to naïvely forget about descriptor space reduction / augmentation for now, and just go on and create training and test sets– or cross validation sets– I think with the strained timelines, this would be the wiser objective to knock down first. I’ll make a ticket for myself for both these actually — I should look up how to use the Tanimoto coefficient actually– that will assist in the design of “maximum dissimilarity” test sets to ensure we have good predictive / extrapolation power.
And the NGN…
Finally, I need to go back and uncover a working version of the NGN to use with Chris’ data– I don’t think that InChI is possible, but we’ll try anyway. The SMILES strings are already here, so I can certainly at least run a few convergence tests of my own. That constitutes eighty runs at worst (8 * 10 trials fail) and eight runs at best (1 * 10 trials converge on the first try). I am going to leave this in Unix compatible form because there isn’t enough time to complete the windows port of the NGN.
This should be OK though since everything will be set up for SharcNet.
Ed's Big Plans