Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘Neural Grammar Network’ tag

NNcmk: A Neural Network (Win32 & OSX)

without comments

Okay– I managed to finish that 3-layer neural network implementation the other day– actually, it was a while ago but I didn’t post about it from being busy. It’s a pretty standard network, but I’m proud to say it’s small and works for OSX and Win32. I have to put in a few #define directives to have it work with Linux as well.

I will have to document it too when I get a chance. The reason why I made a brand new executable (instead of using the source from my previous projects) is because I needed something that would take in launch-time parameters so that it didn’t need to be recompiled each time someone decides to use the binary on a new dataset with a different number of inputs. Right now, the thing has barely any solid parameters that can’t be touched at launch-time.

The NNcmk (Neural Network – Cameron, Ma, Kremer) package is C compilable, uses the previously developed in-house library for the NGN and will be available shortly after I’m satisfied that I’ve squashed all the bugs, fixed the output and have documented the thing completely. I think Chris has difficulty with it right now mostly because I didn’t specify exactly what parameters do what– I did at least provide a (DOS) batch file with an example run-in-train-mode / run-in-test-mode sequence…

Back to work on that paper right now though…

Meeting with Chris

without comments

Brief: We’ve taken on a new strategy– Chris is building a novel database of LD50 values for many many compounds. We’ll be generating descriptors with some free software, (JoeLib) and (CDK). Eventually, the fixed-width descriptor vectors will be used, as well as the SMILES and InChI counterparts in Neural Networks and NGNs respectively; the ultimate goal is the development of either a nested neural decision tree whose subtrees are the descriptor network and NGNs… OR, the nesting of the descriptor network inside an NGN… OR, the creation of an expert voting system where each decision system gets to vote on a particular molecule of interest. With the windows NN software draft and NGN ready for SharcNet, preliminary trials can start soon.

Written by Eddie Ma

June 12th, 2009 at 4:41 pm

Meeting with Stefan

without comments

We sat down and critically evaluated the amount of time left, and what I should aim for with the resources I have. With time ticking away, the earlier idea of completing the software to accomodate different training approaches has been shelved (as in, likely not to see the light of day).

Stefan feels that there is potential for two papers that we can already write. Armed with the reviewer’s notes from our previously submitted work, I’ve been tasked with quickly drafting up a table of contents for our next paper which will discuss all of the experiments that have already been run. The second paper will be decidely math oriented since all of the formalization has been completed– a table of contents for that item is also due out.

This approach optimizes the use of unpublished material, and also allows us to finally provide the mathematical backing toward our NGN approach so far– something that an earlier reviewer wanted us to have.

Written by Eddie Ma

June 12th, 2009 at 4:19 pm

NGN Upgrade Status

without comments

The NGN is being worked on again. In implementing the Bagging/Balancing/Boosting/Verifying activity of the software, I’ve decided to break everything apart into more smaller modules — that gives me a better picture of progress while allowing me to keep motivated on smaller tasks.

Realistically, there are only three functions that are ever dynamically generated by the BNF generator; these all perform some logic related to the weight layers of the neural network– I think it is feasible now to completely close off and encapsulate the dynamically generated content in its own source files. Linking it in is probably philosophically more correct and also implementationally cleaner than trying to have the BNF generator pump out all of the template items that do not change between grammars.

In retrospect, I might drop the Verifying activity soon depending on how similar or different it is to Balancing– if it’s similar enough and could be expressed in many of the same inner functions, then I’ll keep it. Verifying will probably be so weak, it can be supplemented by an ancillary test run after training prior to an actual test run. I’m simply concerned that in the stark majority of cases, verification would mislead the system into convergence / infinite non-convergence. In a good system with much data distributed between training and verification along with correct tolerances, this is not the case– I have neither the luxury of time nor copious amounts of data. I’ll leave it in for now, and figure out what I want to do with it later.

In terms of completion, the way I’ve broken down the work has me coding out C modules until at least Monday, whereupon I must fix any API discrepancies between the modules and magically anneal them together. And that’s still before making the hard-coded changes to the NGN generator.

I think making the deadline is still possible, but making sure that everything is as small (time) as possible is essential.

Written by Eddie Ma

June 10th, 2009 at 2:49 pm

Posted in Machine Learning

Tagged with ,

Meeting with Chris

without comments

Chris’ project has grown to data sets of roughly three hundred exemplars for each the mouse and rat data sets– these are the sets that mapped molecules to some physiological defect, by organ or tissue. I think he’ll be onto his next phase shortly– taking the data and applying some machine learning construct to it.

I’ve recommended four papers to him to read– three of which discuss QSAR in general, and compare the performance of different approaches. The last paper explicitly uses neural networks for descriptors in regression of melting points. The use of neural networks or similar technology is something that he’s expressed a lot of interest in, so I think this selection falls in well. I’ve provided him with an adapted version of the melting point dataset where the domain is re-expressed as SMILES and InChI.

I think it might be good to set him up with NGNs for those items as well as NNs for the descriptor vector used in the melting point paper.

Written by Eddie Ma

June 5th, 2009 at 10:27 am

Python 2.3, 2.6 and 3.0

without comments

The HPC serial cluster “Whale” on SharcNet has installed Python 2.3 for all of its nodes; in fact, I can almost guarantee that Python 2.3 is standard across SharcNet.

This means that there is no subprocessor module and no top-level exit() function (use sys.exit() instead). A few iterator constructs and other top-level functions are changed, notably the semantics of zip(), range() and xrange()– interestingly, on review– I *should* have used enumerate() in these incompatible cases anyway. But then again, this now-legacy code was written long before I even knew about that function.

This is actually NOT quite as bad as it sounds, since there aren’t too too many features that can’t be backported to work with Python 2.3 (SharcNet) and Python 2.6 simultaneously (Tin & Pewter). In future, when I switch to Python 3.0, I’ll have to take even more care since syntax changes and some major revisions have been adopted (for one, “print” the language construct is now print() a normal top-level function)… In reality, I’m likely to have both installed and will treat them as seperate languages (a sane way to manage legacy code, I’m told).

Today, I’m working on backporting the NGN to be 100% compatible with SharcNet– it takes far too long to compile, zip, upload, decompress, compile, run– the cycle can be expidited by having just a single compile procedure. So far, the problem can be traced to some issues with iterators implemented early on… I can likely take this opportunity to make some of the code more efficient as well.

UPDATE: The convergence was not as straightforward as I had hoped– I’ll still rely on two compile steps. As it turns out, there’s a String.partition() step which is impossible to ignore. It’s possible to do a String.split(), and then rejoin each subsequent segment together, but the amount of testing needed… I’ll need to return to updating the NGN code later. For now, I’m able to at least cleanly break apart the compilation into a “source generation” step, then a “compilation” step.

Written by Eddie Ma

May 31st, 2009 at 4:24 pm