Archive for May, 2009
The HPC serial cluster “Whale” on SharcNet has installed Python 2.3 for all of its nodes; in fact, I can almost guarantee that Python 2.3 is standard across SharcNet.
This means that there is no subprocessor module and no top-level exit() function (use sys.exit() instead). A few iterator constructs and other top-level functions are changed, notably the semantics of zip(), range() and xrange()– interestingly, on review– I *should* have used enumerate() in these incompatible cases anyway. But then again, this now-legacy code was written long before I even knew about that function.
This is actually NOT quite as bad as it sounds, since there aren’t too too many features that can’t be backported to work with Python 2.3 (SharcNet) and Python 2.6 simultaneously (Tin & Pewter). In future, when I switch to Python 3.0, I’ll have to take even more care since syntax changes and some major revisions have been adopted (for one, “print” the language construct is now print() a normal top-level function)… In reality, I’m likely to have both installed and will treat them as seperate languages (a sane way to manage legacy code, I’m told).
Today, I’m working on backporting the NGN to be 100% compatible with SharcNet– it takes far too long to compile, zip, upload, decompress, compile, run– the cycle can be expidited by having just a single compile procedure. So far, the problem can be traced to some issues with iterators implemented early on… I can likely take this opportunity to make some of the code more efficient as well.
UPDATE: The convergence was not as straightforward as I had hoped– I’ll still rely on two compile steps. As it turns out, there’s a String.partition() step which is impossible to ignore. It’s possible to do a String.split(), and then rejoin each subsequent segment together, but the amount of testing needed… I’ll need to return to updating the NGN code later. For now, I’m able to at least cleanly break apart the compilation into a “source generation” step, then a “compilation” step.
Well, the convergence tests are well underway running on SharcNet (whale serial cluster) for the new datasets (2 Aqueous Solubility, 1 Melting Point)– I’m currently only running them for SMILES due to a problem with one of the Aqueous Solubility sets. I’ve decided to use a “ridiculously high” number of parameters (12 hidden units per hidden layer), and a “ridiculously low” learning rate / momentum (0.3 each).
A convergence test is basically used to see if the data and parameters chosen can be operated on by the NGN. For me, the test set and training sets are identical in convergence testing.
I’ll probably rerun this with yet lower learning (train = 0.15 / momentum = 0.45). Why? Because I have a feeling that the number of hidden units actually enables a greater chance of accidentally falling into the close search space, while the minute math arguments allow these accidents to be gently brushed into being.
I’m presently running all the regression data I’ve ever run in past– this time in both classic normalization and rank-normalization. Finally, one of the Aqueous Solubility datasets came with poorly formed SMILES– they can’t be parsed by the grammars that I made up, and they can’t be parsed by OpenBabel so conversion to InChI wasn’t possible in this round, let alone operation by my system. I’m going to presume they’re broken and skip those entries which still leaves well over a thousand exemplars in that set — this set will have to be rerun with the correct data soon.
Convergence rmse has been set to 1%, meaning that I have a stricter idea of what I think is a deviation from correct. I want to plot the results as a “target on actual” residuals line plot when this is done to give me an idea of how these parameters worked on self-comparison.
Of course, everything here is subject to editing, as I figure exactly how to implement and deploy the previously mentioned changes to how regression experiments are defined (for convergence, for boosting/balancing/verification).
Most immediate next steps:
- Select OK Aqueous SMILES and run.
- Rerun all with InChI-NGN
Brief: Just finished making a backup of the present state of my thesis and Tin– Zinc is in the mail!
Brief: Taking a step back, it’s time to focus on the Master’s thesis extensions again.
With so many exciting projects becoming available, organized time slicing is key.
I met with Andre today — we spent most of the time discussing idly about programming languages.
Before that, he disclosed that many in the modeling team were building their own independent components of the Recombinatron project that were perfectly isolated from one another.
Most of the effort however was used to model the problem so that each of us has a good working idea of what we need to consider, what could be tough and what could be easy.
I recommeded to everyone that we all share our little bubble projects so we can critique what techniques and approaches we like the best. Apparently there was a meeting after I left between Andre and everyone else wherein a poll was taken with respect to what programming language to use and what versionning system to use.
I’m looking forward to seeing everyone else’s code snippets, the language and cvs chosen– as well as what API we’ll bang out and delegate to each person on the team.
My prototype of the project (incomplete).
I might put notes up in my wiki for this item, time depending.
I’ve decided that this WordPress blog will now be the frontend of my site– removing the need for a splash page and thus making everything else the buttend. This seems to have been a productive move so far, but I’ll probably want to re-organize my links to make the transition clearer.
This move presented minor technical challenges– actually, it was very smooth since I’m moving within the same virtual host and same domain name. Two settings must be altered within wordpress administration though in general when performing this kind of move. First, the URL of the wordpress installation must be changed to reflect the new address (found under Settings/General). Second, the URL of media uploads must be changed too so that pictures and other linked items start working again (found under Settings/Miscellaneous).
And that’s all!