Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘Academic Life’ Category

Academics

without comments

Brief: Taking a step back, it’s time to focus on the Master’s thesis extensions again.

With so many exciting projects becoming available, organized time slicing is key.

Written by Eddie Ma

May 28th, 2009 at 8:58 am

Posted in Academic Life

Publication Options

without comments

Options options. One of the problems I’m encountering now is that I’ve hit the edge of knowledge that both me and Stefan are capable of accomodating. When it comes to cheminformatics, having me probe the edges of this space has me realizing I’d probably need some help. If performance is good enough in the next experiments, then I think we would both benefit from snagging in a third person involved with chemistry to help author the paper. Basically, we’ve got experience writing for the machine learning and computer science crowd, but I think a potential paper is most viable in the chemistry and biomedical crowd– Just look at the number of ACS publications with the phrase “QSAR” in the tagline. Someone that’s written chemistry papers in the past and can figure out what looks best in that culture would improve the odds of an accepted paper… This of course opens the door to gearing a more general publication for the comp sci crowd once a different problem area has been formalized.

Just a thought.

Written by Eddie Ma

May 22nd, 2009 at 4:21 pm

Meeting with Stefan

without comments

The thesis document is relatively mature. Release-Candidate 2 is going to be the final major revision. A few minor changes remain now for the paper, and after that– it’s off to defend.

What do you know? It turns out Cara was right all along– check out the below diagram of a random-search network– you don’t need to know exactly what a random-search network is, just that there are fat arrows connecting long rectangles.

Random Search Network a la Hochreiter (Original)

The problem with this diagram is that the arrow coming out of the Input Layer and going into the Output Layer is placed underneath the Hidden Layer– Cara immediately understood that this was a problem (the visual occlusion makes it confusing), but I didn’t quite “see” it– Stefan added his vote as well, so a clarified diagram (below) will replace this item.

Random Search Network a la Hochreiter - Clarified

This of course is not the only change needed, but I thought it was a relatively cute and visual one. All diagrams relating to the NGN however need to be retuned so that node layers are now represented by a single node icon (circle or triangle) inside a layer (rectangle) with an integer stating the number of nodes; this is again more abstract than it is now, where each node gets its own icon and no integers are used. The new version of RC-2 along with thesis-style formatting is due out on Friday. I dub it V1.0.

(A part of me still wants to include diagrams describing work from Walsh and Mohr… This may be possible if time permits)

A few requirements go along with the thesis– for one, I need to sign a licensing agreement allowing the University some reign and privelges over the document (that’s fair), but I retain rights to it as well. I think it might make sense to break it up into logical segments and post the contents in my wiki. In particular, diagrams that will be changed will be missed (I find one-bubble-per-node more aesthetically pleasing, but Stefan and I have compromised by putting the integers inside each node bubble icon instead)– so they will likely make it into the wiki version along with the revised diagrams.

This is it… the home stretch…

Written by Eddie Ma

May 20th, 2009 at 11:01 am

Posted in Academic Life

Tagged with

NGN Experiment Dataset Treatment

without comments

Finally done sorting the data for the next experiment in regression. Checking for NGN compatibility next, then moving onto trials. These sets relate to aqueous solubility (2) and melting point (1).

Written by Eddie Ma

May 19th, 2009 at 11:07 pm

Posted in Academic Life

Tagged with

Meeting with Danielle

without comments

Met with Danielle Nash, coordinator at iGEM Waterloo today. There are three branches of projects this year, first the completion of last year’s project and the introduction of one new project in two parts.

Last year’s project consists of a delivery system, wherein one introduces bacteria that have been modified so that all genomic DNA is lost. The bacteria thus function as subcellular-sized vehicles that are broken down, so that some arbitrary payload is released into a patient. The focus of the team working on this subproject is its completion; elucidation and final characterization of the system behaviour.

This year’s project consists first of a “foundational advance” submission which formally defines a consistent means to create cassettes for the exchange of genetic material between some vector and a target bacterial chromosome. This involves the definition of a new mutant strain with the homologous recombination sites, suitable for the integrase used; a well defined and consistent cassette, which one would use to enclose the biobricks or other genes of interest; and a short integrase plasmid, likely with just the gene and a promoter of some arbitrary strength. The second is an extension to this project, which defines a different chasis (target organism); this time a plant, likely arabidopsis.

David Monje Johnston immediately comes to mind; he oversaw iGem Guelph last year, in his plant agriculture lab. His advice could likely benefit the team.

What am I doing?

Well, I’ve contacted Andre Masella who currently lives and reigns at Laurier. He’s the head of modeling this year for Waterloo and has summarized the objectives as the creation of some software that will anticipate the best sequence of sites needed to ensure the highest probability of consistent exchange between the cassette and the chasis chromosome.

We’ve all settled on the idea that I would be assisting Andre, with the likelihood of coaching an undergraduate in bioinformatics software design, deployment and utility all the while.

I’m very interested in seeing the background work on this, since I have little idea of what the problem constraints are. What makes a given sequence a good sequence (high in stability, high in predictability), what makes a given sequence bad (high in variance, low likelihood of working)? I’ve written back to ask about related papers he’s worked with, seen or written.

This should be a BLAST.

Written by Eddie Ma

May 15th, 2009 at 9:14 am

Meeting with Chris Cameron

without comments

Chris is a student working on a summer project towards starting a Master’s degree soon. We’ve been meeting semi-regularly so that we can discuss his project ideas and so that I can offer assistance with whatever cheminformatics / bioinformatics software, datasets and any other technologies that have come up in his work.

Today’s meeting was kind of interesting– it involved revisiting the idea that the NGN can be integrated in a larger expert decision system, and also that the NGN can itself operate as a molecular kernel whose output can be then fed into some other decision machine. Dr. Kremer had a few ideas, but I didn’t quite grasp what was being conveyed– it can however be narrowed down to three specific designs.

  • The NGN output can be used as features in a larger molecular descriptor vector space.
  • A different NGN tuned toward a specific kind of problem is selected from a panel of NGNs trained at different problems; this selection is done by an expert (decision tree, or some other state machine).
  • The NGN can output an x-dimension array; for instance, a 3D grid of outputs is output so that on the three axes are labeled ‘species’, ‘LD50 class’ (lethal dose for 50% of the population), and ‘target organ or tissue’

As a side note, we decided it might be good to look at the traditional QSAR task with traditional descriptors and the good ol’ feed forward neural network. I’ll prepare the general neural network software for Chris that I used in Experimental Design; the first version I sent in had too much stuff in it, I’ll basically strip it down to “just operational” so he can actually understand the thing.