Archive for the ‘Stefan C. Kremer’ tag
DNA … Knots and Lambdas
A long time ago, one Andre lead a team of students in a journey of mathematical and computational modeling; at the very least, we have reached some useful insights from our tidy trip albeit at a distance from the solution.
Presented here is a very jumbled, very abridged account of the activities of the modeling team this summer and the eventual realization that brings us to now.
The Problem Revisited
So we have a sequence. Actually, two sequences. Actually, we have two loops. Two loops of DNA that will contain a specific sequence used for cassette exchange. The problem is the design of these two loops. We want to design them so that we can predictably exchange specific objects between them. We used an enzyme for recombination that is sensitive to specific sites to perform the exchanges.
The above paragraph is an abstract-abstract of the UW iGem Project.
The Top Down Approach
What I eventually labeled in my mind as the top-down approach is called that in analogy to parsing. In parsing, we build a tree. We can do this conceptually from the bottom-up, or from the top-down. From the bottom-up, we know everything we need to know to build the tree… we know as much as we want to know, we even know if there exists not a tree for this particular string of tokens. From the top-down, we’d have to use some magical induction to chain tokens together by determining a structure that the tokens will find pleasing.
The magical induction of the top-down approach is none other than brute force. There is no magic, just an exponential explosion. The base of this power is the length of the string and the exponent of the power alludes to the complexity and depth of the grammar.
We don’t parse for the sequence problem– that is, we assume the grammar to be irrelevant, that a flat degenerate chain is a sufficient enough tree; we operate on sequences with our enzyme instead.
For our sequence problem, we pick three loops. We see if the first two loops add together with respect to the enzyme to make the third loop. By hand, one is tempted to use various heuristics of deductive logic but it became complicated and soon overflowed the allowed dozen or so objects a human brain may accommodate per instant. The machine was dragged in, and the three loops were shown to it using Python.
We presented three loops of one logical suite of tokens. It ran to completion and to no surprise, this was not our solution. We did this again for all three-loops where each loop is one logical suite. That ran to completion and again, no solution– again to be expected; not yet long enough to accommodate the anticipated length of the solution.
One logical block became two, became three… and at each step, the base of the exponent to our magical induction grew.
Four logical blocks… we halted the experiment; the machine would’ve taken a month to finish that block.
The exponential explosion was real, and our bid that the solution may be just short enough to fit therein was proven false.
The Bottom Up Approach
Months passed, various members went on various summer excursions… and many have returned now. We discuss many theoretical approaches. We resample the problem, sniffing for hints. Actually, it’s been Andre, Jordan and me … we haven’t discussed this with the remaining modeling team yet because of just how vague our new lines of intrigue are. I will revise my opinion if the thought that more individuals means faster solution finding crosses my mind again.
I’ve had a few conversations, one with my MSc advisor, Stefan; one with a friend Andrew Baker; and another with my undergraduate project advisor, Bettina. So far, no one’s seen this specific problem before or can allude to either an approach, technology or research that they’ve seen…
We reformalize the problem with the following constraints as follows.
- Must deal with circularity of DNA, hence by circularly shift invariant
- Must accommodate or encapsulate reverse complementation
Intrigue
Several lines of intrigue we visit now.
First, Knot Theory– provides a representation for knots as real-valued vectors; unique shapes however may produce degenerate vectors. Knots allow us to take our loop of DNA and place the putative recombinatory hotspots one on top of another. Missing from this item is precisely how to dope the vectors with our own sequence data.
Second, Lambda Math and Logical Programming provide a language and a method respectively to map vectors from left to right. The form of the abductive equations for this problem are yet to be discovered however. We’re thinking about this method because we suspect that the recombinase enzyme activity can be completely expressed as a mathematical construct on our doped knot vectors. We hope that this construct can be expressed with abductive statements.
Third, Recombinatory Calculus– actually, this item is in stark competition with Logical Programming as the functional crux of the model. Recombinatory Calculus which is fairly distant from Recombinatorics, mind– is a math that has shown all other math functions can be constructed by just two atoms. If it turns out that the final representation of a DNA loop looks more like arguments for these two atoms, then we may pursue this– but at present, it seems to be losing against Logical Programming– the allure of the two atoms subsides as we realized the complexity for even the addition function for integers.
Direction
Luckily… roughly a dozen papers have been recovered from various repositories that discuss knot math and how to hack it sufficiently to kindly represent DNA loops. We continue to read and discuss these papers until we feel it reasonable to raise it with the entire modeling group… that is, when the science is done and the engineering begins anew.
New Diagram for MSc-X3 (math paper)
Brief: I’m particularly happy with this diagram… I had something along these lines in my head for a while, but I never could figure out how to draw it correctly. I never thought that simplifying it to three easy steps was the smarter thing to do.

Near and Far Goals
Okay okay okay– so the defense was about four days ago, I need to get myself organized… these are the projects I can afford to participate in OR are the projects that will yield the most return in terms of enjoyment or some other intangible value.
Immediate Focus
- Get married– with the wedding now only two weeks away, I need to pull everything I need together and really support Cara in the remaining tasks. As far as I understand, almost everything is in order already and it’s down to ballroom dance practicing etc. and getting our lines memorized. Well, there’s probably a lot more in terms of communicating with the flower girl and ring bearer– and my family– so hopefully we’ll magically finish just on time. Plus there’s a week in August where we’ll be at Disney World for the honeymoon, so I’ll certainly find myself in a bubble away from anything else.
Long Term Focus
- The PhD– I’ve already written to Liz and Brendan about the summer, or actually– the changes about this summer. Earlier on, we had agreed it would be good if I got a head start on the PhD project in summer. At the very least, I’d refine the problem space and be able to formalize my interests. It looks like the most rational thing to do now is to do a normal clean start in Fall since there are a few outstanding things I need to tackle back at Guelph.
Important Intermediate Projects (In Randomly Generated Order)
- Graduating and all its caveats (Must Finish)– I need to fix the thesis, which means I need to get the examiners’ notes back. I also need to finish some paper work — I have a stack of signed documents that I shouldn’t lose that has to be handed in with the final thesis, and another stack of signed documents that has to be handed in with the department keys. This item doesn’t stress me out as much as it probably should…
- iGEM (Would Be Nice)– I’ve been delegated nothing right now, so I actually should chase down Andre next week when we morph from developers to end users. Yes, that’s right– we shall suffer the glory of using our own software creation. I really ought to checkout the repository to see if I can understand the code logic after everyone’s touched it (the modules are more or less mature at this point). Reading the code would be a start
— After that, the modeling team will probably break apart and merge into other teams in UWiGEM. Plus there’s mathematical modeling, planning for next year etc. The iGEM project has the potential to be paper worthy if we manage to get some decent results… it’ll be a feat of an interdisciplinary team which makes me happy to be a part of it all.
- Chris’ Project (Must Finish)– So, this item is always running in the background since I’m not the primary owner of this project. Chris has done an excellent job with the science so far, which motivates me to offer him the support he needs to finish… this one’s another potential paper– but again, we need to get results and actually have something interesting to say. He’s actually doing five-fold and leave-one-out cross validation schemes right now, but that’s a post for another time.
- MSc-X3 (Would Be Nice)– The math paper! Stefan and I decided a long time ago that a math paper would be good to complete the story of the NGN. This would actually be slightly more comprehensive than the thesis in explaining the math, including things like run time costs and analysis plus all of the equations that got lost in the translation.
- Andre’s Mystery MSc-??? (Would Be Nice)– I have little detail about this, but it’s another gene management system with the additional feature that fragments can be checked out and added to end users’ own databases. I’m really curious to learn more. The last I heard, Andre managed to churn out code that didn’t have any bugs– panicked at the lack of bugs– then found some bugs– and was relieved.
Other Concerns
- I’m going to stop short of listing three additional projects I had been working on previously– I have realized that I don’t have time, and the organizations (and one person) to whom these projects belong to are probably aware that I managed to get buried in work… I really want to revisit these items in future, but I am unsure when time will come available again.
End Game — A Thesis Defence
Brief: I just finished my master’s defence today! I won!
– Apparently I was a bit dry.
Congratulations Eddie!
Conference Paper Submitted
Brief: That IEEE November conference paper Stefan and I had been working on was finally submitted in the wee hours of last night. The newborn weighed in at the IEEE specified 6 pages, single spaced.
Just for fun, here’s a really zoomed out view of pages 1, 4 and 6.
NNcmk: A Neural Network (Win32 & OSX)
Okay– I managed to finish that 3-layer neural network implementation the other day– actually, it was a while ago but I didn’t post about it from being busy. It’s a pretty standard network, but I’m proud to say it’s small and works for OSX and Win32. I have to put in a few #define directives to have it work with Linux as well.
I will have to document it too when I get a chance. The reason why I made a brand new executable (instead of using the source from my previous projects) is because I needed something that would take in launch-time parameters so that it didn’t need to be recompiled each time someone decides to use the binary on a new dataset with a different number of inputs. Right now, the thing has barely any solid parameters that can’t be touched at launch-time.
The NNcmk (Neural Network – Cameron, Ma, Kremer) package is C compilable, uses the previously developed in-house library for the NGN and will be available shortly after I’m satisfied that I’ve squashed all the bugs, fixed the output and have documented the thing completely. I think Chris has difficulty with it right now mostly because I didn’t specify exactly what parameters do what– I did at least provide a (DOS) batch file with an example run-in-train-mode / run-in-test-mode sequence…
Back to work on that paper right now though…
Ed's Big Plans