Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘Chris Cameron’ tag

Sun Virtualbox VM * Ubuntu: Good for us.

with 2 comments

Brief: Sun Virtualbox VM with Ubuntu (or really, any agreed-upon free Linux) is an excellent way to create a consistent environment between collaborators. Everyone knows the headaches of transitioning code from Windows and Mac– let’s just transition out of the transitional mindset altogether… Anyway, I mention these two products because it helped Chris and me a lot.

Wow, a nearly all black boot screen...Compile and run, nncmk!

Sun Virtualbox VM (a free version exists! … as in beer.)

Ubuntu (still free, as in a bird– and as in beer.)

Jason Ernst says...

I use the same thing for running my network simulations now inside of ubuntu (I got too lazy to do a dual boot setup and just run the vm inside windows now) I definitely <3 the sun virtual box. Its also good for testing hobby OS stuff inside of without rebooting the computer each time if you are interested at all in that type of thing.

Eddie Ma says...

Ya… As great as it is– I managed to keep finding defects– none of them are life ruining.

The most annoying of which only happens on a Mac– in a few of the dialogue windows where one adjusts settings, configurations etc., the software always draws the window so that the title bar is above the vertical limits of the screen making some of the controls unreachable — I finally figured out that I could grab onto a few empty pixels and drag the thing down, but those empty pixels aren’t supposed to work that way either.

ANYWAY– If this happened in Windows, I’d probably use alt+space and navigate down to “move window” with my arrow keys which is actually an elegant solution (to a problem that should not exist). In Mac, the philosophy of having a single shared application menubar rendered as an OS visual element means that there’s no logical way to give each window its own menu item that has a “move window” command. Oops!

So I gather– the most meaningful conclusion is that such a flaw shouldn’t exist in any software for any OS… I figure they’ll fix it eventually.

Written by Eddie Ma

August 19th, 2009 at 9:48 am

sqkillall.py

without comments

Brief: I forgot all about sqkillall.py! It’s a convenience script for killing all of the SharcNet jobs belonging to you! (More about it; Source code).

Written by Eddie Ma

July 28th, 2009 at 1:02 pm

Meeting with Chris

without comments

Brief: Met with Chris last week. Chris finished with the convergence tests and some cross validation sets on his descriptors and recommended his own design for 80/20 prediction tests… Meanwhile, I’ve updated the InChI grammar used for the NGN to work with the new data, and have set up experiments to run convergence tests using the SMILES-NGN and InChI-NGN on the eight possible QSAR datasets on SharcNet (16 processes total)… Next on the list– create a script to evaluate his preliminary cross validation experiments (based on Neural Network predicted vs. target values) and provide instructions for running the convergence tests with my NGN software… Will need to pull up an old nugget.py to wrap the convergence test (current one doesn’t halt and always runs 100 trials).

Soon: Port everything to Ubuntu Linux so that we can maintain compatibility without further porting care of Sun Virtualbox VM… Meeting again tomorrow…

Written by Eddie Ma

July 27th, 2009 at 12:35 am

Near and Far Goals

without comments

Okay okay okay– so the defense was about four days ago, I need to get myself organized… these are the projects I can afford to participate in OR are the projects that will yield the most return in terms of enjoyment or some other intangible value.

Immediate Focus

  • Get married– with the wedding now only two weeks away, I need to pull everything I need together and really support Cara in the remaining tasks. As far as I understand, almost everything is in order already and it’s down to ballroom dance practicing etc. and getting our lines memorized. Well, there’s probably a lot more in terms of communicating with the flower girl and ring bearer– and my family– so hopefully we’ll magically finish just on time. Plus there’s a week in August where we’ll be at Disney World for the honeymoon, so I’ll certainly find myself in a bubble away from anything else.

Long Term Focus

  • The PhD– I’ve already written to Liz and Brendan about the summer, or actually– the changes about this summer. Earlier on, we had agreed it would be good if I got a head start on the PhD project in summer. At the very least, I’d refine the problem space and be able to formalize my interests. It looks like the most rational thing to do now is to do a normal clean start in Fall since there are a few outstanding things I need to tackle back at Guelph.

Important Intermediate Projects (In Randomly Generated Order)

  • Graduating and all its caveats (Must Finish)– I need to fix the thesis, which means I need to get the examiners’ notes back. I also need to finish some paper work — I have a stack of signed documents that I shouldn’t lose that has to be handed in with the final thesis, and another stack of signed documents that has to be handed in with the department keys. This item doesn’t stress me out as much as it probably should…
  • iGEM (Would Be Nice)– I’ve been delegated nothing right now, so I actually should chase down Andre next week when we morph from developers to end users. Yes, that’s right– we shall suffer the glory of using our own software creation. I really ought to checkout the repository to see if I can understand the code logic after everyone’s touched it (the modules are more or less mature at this point). Reading the code would be a start :D — After that, the modeling team will probably break apart and merge into other teams in UWiGEM. Plus there’s mathematical modeling, planning for next year etc. The iGEM project has the potential to be paper worthy if we manage to get some decent results… it’ll be a feat of an interdisciplinary team which makes me happy to be a part of it all.
  • Chris’ Project (Must Finish)– So, this item is always running in the background since I’m not the primary owner of this project. Chris has done an excellent job with the science so far, which motivates me to offer him the support he needs to finish… this one’s another potential paper– but again, we need to get results and actually have something interesting to say. He’s actually doing five-fold and leave-one-out cross validation schemes right now, but that’s a post for another time.
  • MSc-X3 (Would Be Nice)– The math paper! Stefan and I decided a long time ago that a math paper would be good to complete the story of the NGN. This would actually be slightly more comprehensive than the thesis in explaining the math, including things like run time costs and analysis plus all of the equations that got lost in the translation.
  • Andre’s Mystery MSc-??? (Would Be Nice)– I have little detail about this, but it’s another gene management system with the additional feature that fragments can be checked out and added to end users’ own databases. I’m really curious to learn more. The last I heard, Andre managed to churn out code that didn’t have any bugs– panicked at the lack of bugs– then found some bugs– and was relieved.

Other Concerns

  • I’m going to stop short of listing three additional projects I had been working on previously– I have realized that I don’t have time, and the organizations (and one person) to whom these projects belong to are probably aware that I managed to get buried in work… I really want to revisit these items in future, but I am unsure when time will come available again.

Written by Eddie Ma

July 18th, 2009 at 12:26 pm

Convergences Detected!

without comments

Good News

Chris’ project has come back to the forefront– after I defend my thesis on Wednesday, it’ll certainly have all of my attention.

We will at least be meeting on Monday though to discuss what can be done in the interim.

Convergence Tests Went Fine

We decided that it would be good for Chris to run a few convergence tests on the datasets he put together across each of the available descriptor sets. So far, many have come back converged meaning that it would be good to proceed. There are two concerns I have. First, do we want to melt the converged descriptors together; do we want to melt all of the descriptors together regardless of convergence? Second, if we don’t– can we do it after the fact and argue that neural network convergence is a good determiner for what descriptors are correlated with results we care about?

Melting Descriptors

To clarify– I mean “concatenating” real value vectors when I say “melting”. This means that we splice together a few linear arrays of numbers and come up with a new longer array that’s still fixed length.

The second question is only true if it turns out that selected melted converged descriptors have better predictive power than when all descriptors are melted together– it’s an even stronger case (and more practical) if it turns out that the descriptors behave better in concert than any particular subset on its own.

That would be an interesting case. The cost of running an additional eight or sixteen experiments to test that hypothesis is cheap to set up, cheap to do.

Alternatives– A Faster Solution

An alternative approach is to naïvely forget about descriptor space reduction / augmentation for now, and just go on and create training and test sets– or cross validation sets– I think with the strained timelines, this would be the wiser objective to knock down first. I’ll make a ticket for myself for both these actually — I should look up how to use the Tanimoto coefficient actually– that will assist in the design of “maximum dissimilarity” test sets to ensure we have good predictive / extrapolation power.

And the NGN…

Finally, I need to go back and uncover a working version of the NGN to use with Chris’ data– I don’t think that InChI is possible, but we’ll try anyway. The SMILES strings are already here, so I can certainly at least run a few convergence tests of my own. That constitutes eighty runs at worst (8 * 10 trials fail) and eight runs at best (1 * 10 trials converge on the first try). I am going to leave this in Unix compatible form because there isn’t enough time to complete the windows port of the NGN.

This should be OK though since everything will be set up for SharcNet.

Written by Eddie Ma

July 11th, 2009 at 7:00 pm

NNcmk: A Neural Network (Win32 & OSX)

without comments

Okay– I managed to finish that 3-layer neural network implementation the other day– actually, it was a while ago but I didn’t post about it from being busy. It’s a pretty standard network, but I’m proud to say it’s small and works for OSX and Win32. I have to put in a few #define directives to have it work with Linux as well.

I will have to document it too when I get a chance. The reason why I made a brand new executable (instead of using the source from my previous projects) is because I needed something that would take in launch-time parameters so that it didn’t need to be recompiled each time someone decides to use the binary on a new dataset with a different number of inputs. Right now, the thing has barely any solid parameters that can’t be touched at launch-time.

The NNcmk (Neural Network – Cameron, Ma, Kremer) package is C compilable, uses the previously developed in-house library for the NGN and will be available shortly after I’m satisfied that I’ve squashed all the bugs, fixed the output and have documented the thing completely. I think Chris has difficulty with it right now mostly because I didn’t specify exactly what parameters do what– I did at least provide a (DOS) batch file with an example run-in-train-mode / run-in-test-mode sequence…

Back to work on that paper right now though…