Ed's Big Plans

Computing for Science and Awesome

QSAR Descriptors – Chris’ Project

Wow! Go Chris!

With Chris’ dataset collection finished, it’s time to convert every single into a nice suite of descriptor sets. Chris has went and learned the CDK in and out — what started as a complicated and frustrated struggle against Java in Windows and the technicalities of class paths etc., has turned into a fruitful and promising endeavour. While he was working with that, I stumbled onto Bioclipse which utilizes CDK internally for some of its molecule data conversions. Interestingly, the QSAR feature is not yet complete– or is experimental. It looks to be promising in future, but I received an e-mail from Chris telling me all was figured with CDK before I could figure Bioclipse’s QSAR feature out.

The Windows NN port that I’ve been working on is almost done, I’d say I’ll want two more days– today to finish the port and test on a windows box– and tomorrow to figure out how to chain everything together with a windows port of GNU make and some windows variante of GCC.

We’ve decided that the format to express the molecules will be as follows…

  • Each descriptor set has its own file
  • Each file corresponds to a target species and a target organ
  • Each row in each file corresponds to a molecule
  • Each row has the columns <Comma Delimited QSAR Descriptor Elements>; Range (LD50); Species; Organ

If we keep on track, we’ll have the first experiments running on SharcNet by early next week.

Eddie Ma

June 25th, 2009 at 3:45 pm