Ed's Big Plans

Computing for Science and Awesome

  • Page 1 of 2
  • 1
  • 2
  • >

Archive for the ‘Neural Grammar Network’ tag

Back from Conference!

without comments

Brief: The BIBM09 conference was the very first conference I have ever attended. I learned a lot from the various speakers and poster sessions–

I thought it was really interesting how the trend is to now study and manipulate large interaction pathways in silico– a theme of which is the utilization of many different data sources integrating chemical, drug and free text as well as the connection of physical protein interaction pathways and gene expression pathways. There was even a project which dealt with the alignment of pathway graphs (topology).

Dealing with pathways especially by hand and in the form of a picture is probably the bane of many biologists’ existence– I think that the solutions we’ll see in the next few years will turn this task into simple data-in-data-out software components, much like the kind we have to deal with sequence alignments.

And now, back to the real world!

Addendum: My talk went very well 🙂

And here are my slides with a preview below.

Eddie Ma

November 6th, 2009 at 10:36 am

New Diagram for MSc-X3 (math paper)

without comments

Brief: I’m particularly happy with this diagram… I had something along these lines in my head for a while, but I never could figure out how to draw it correctly. I never thought that simplifying it to three easy steps was the smarter thing to do.

Some Assembly Required.

Eddie Ma

August 20th, 2009 at 12:08 pm

Meeting with Chris

without comments

Brief: Met with Chris last week. Chris finished with the convergence tests and some cross validation sets on his descriptors and recommended his own design for 80/20 prediction tests… Meanwhile, I’ve updated the InChI grammar used for the NGN to work with the new data, and have set up experiments to run convergence tests using the SMILES-NGN and InChI-NGN on the eight possible QSAR datasets on SharcNet (16 processes total)… Next on the list– create a script to evaluate his preliminary cross validation experiments (based on Neural Network predicted vs. target values) and provide instructions for running the convergence tests with my NGN software… Will need to pull up an old nugget.py to wrap the convergence test (current one doesn’t halt and always runs 100 trials).

Soon: Port everything to Ubuntu Linux so that we can maintain compatibility without further porting care of Sun Virtualbox VM… Meeting again tomorrow…

NNcmk: A Neural Network (Win32 & OSX)

without comments

Okay– I managed to finish that 3-layer neural network implementation the other day– actually, it was a while ago but I didn’t post about it from being busy. It’s a pretty standard network, but I’m proud to say it’s small and works for OSX and Win32. I have to put in a few #define directives to have it work with Linux as well.

I will have to document it too when I get a chance. The reason why I made a brand new executable (instead of using the source from my previous projects) is because I needed something that would take in launch-time parameters so that it didn’t need to be recompiled each time someone decides to use the binary on a new dataset with a different number of inputs. Right now, the thing has barely any solid parameters that can’t be touched at launch-time.

The NNcmk (Neural Network – Cameron, Ma, Kremer) package is C compilable, uses the previously developed in-house library for the NGN and will be available shortly after I’m satisfied that I’ve squashed all the bugs, fixed the output and have documented the thing completely. I think Chris has difficulty with it right now mostly because I didn’t specify exactly what parameters do what– I did at least provide a (DOS) batch file with an example run-in-train-mode / run-in-test-mode sequence…

Back to work on that paper right now though…

Meeting with Chris

without comments

Brief: We’ve taken on a new strategy– Chris is building a novel database of LD50 values for many many compounds. We’ll be generating descriptors with some free software, (JoeLib) and (CDK). Eventually, the fixed-width descriptor vectors will be used, as well as the SMILES and InChI counterparts in Neural Networks and NGNs respectively; the ultimate goal is the development of either a nested neural decision tree whose subtrees are the descriptor network and NGNs… OR, the nesting of the descriptor network inside an NGN… OR, the creation of an expert voting system where each decision system gets to vote on a particular molecule of interest. With the windows NN software draft and NGN ready for SharcNet, preliminary trials can start soon.

NGN Upgrade Status

without comments

The NGN is being worked on again. In implementing the Bagging/Balancing/Boosting/Verifying activity of the software, I’ve decided to break everything apart into more smaller modules — that gives me a better picture of progress while allowing me to keep motivated on smaller tasks.

Realistically, there are only three functions that are ever dynamically generated by the BNF generator; these all perform some logic related to the weight layers of the neural network– I think it is feasible now to completely close off and encapsulate the dynamically generated content in its own source files. Linking it in is probably philosophically more correct and also implementationally cleaner than trying to have the BNF generator pump out all of the template items that do not change between grammars.

In retrospect, I might drop the Verifying activity soon depending on how similar or different it is to Balancing– if it’s similar enough and could be expressed in many of the same inner functions, then I’ll keep it. Verifying will probably be so weak, it can be supplemented by an ancillary test run after training prior to an actual test run. I’m simply concerned that in the stark majority of cases, verification would mislead the system into convergence / infinite non-convergence. In a good system with much data distributed between training and verification along with correct tolerances, this is not the case– I have neither the luxury of time nor copious amounts of data. I’ll leave it in for now, and figure out what I want to do with it later.

In terms of completion, the way I’ve broken down the work has me coding out C modules until at least Monday, whereupon I must fix any API discrepancies between the modules and magically anneal them together. And that’s still before making the hard-coded changes to the NGN generator.

I think making the deadline is still possible, but making sure that everything is as small (time) as possible is essential.

Eddie Ma

June 10th, 2009 at 2:49 pm

Posted in Machine Learning

Tagged with ,

  • Page 1 of 2
  • 1
  • 2
  • >