Archive for the ‘Python’ tag
Yesterday and Today on UWiGem Recombinatron
So, I’ve been assigned the Recombinatron DNA submodule and spent a good part of yesterday morning and afternoon working on it at the UWiGem office. I’ve brain stormed out what features the submodule should have and have finished a sizeable chunck of it.
While doing so, I managed to learn how to use the Python yield keyword (along with raise StopIteration); and all about abstract functions and how to manipulate them underneath the hood. Abstract functions include items like “len()”, “some_collection[5:7]” and “some_object >= another_object”.
Basically, the DNAClass submodule will be the atomic type that will be passed between the different larger modules of the project. Each DNAClass instance (DNAObject) encapsulates a read-only string that is to be iterated in a loop either forward or backward, along with the ability to be sliced as a string. This all must be transparent to the user.
I might go into detail later, but for now, here’s a good resource– Python.org / Emulating Sequence Types (hard to find / hard to google) and also the Ordered Dictionary [odict] class by Nicola Larosa and Michael Foord. The odict source is an excellent primer actually, it contains many many useful comments that’ve really helped me figuring out iterators, the slice object and abstract functions.
Finally, I only really have two items left that are mandatory– fixing up slices when a “None” object is used, and then the iterator-iterators…
The iterator-iterators (terrible idea actually) would be a list of iterators, so that each iterator will start at a different position in the loop of DNA all of which correspond to the same token.
I’m thinking now that I should replace it with an accessor that returns a list of positive integers corresponding to the tokens of interest instead; this can still be transparent to the user AND have the benefit of not being unwielding to implement. Having nested yield statements is just asking for trouble.
Update: Done. An initial version has been committed to the repository.
Modeling Meeting

Modeling Team Selection with Flush();
A modeling meeting occurred on Wednesday. Andre headed off the discussion and revisited the entire program layout in a nice chalkboard cartoon. Unfortunately, Andre generally doesn’t push down hard enough or make wide enough lines with the chalk in order to make a high enough contrast image against the black board for photography (i.e. faint drawing => no photos, sorry).
The discussion saw the formalization and division of the programming problem into three distinct software components as follows.
- Genetic Fragment Operators
- Genetic Fragment Filters
- Overall Program Logic
Genetic Fragment Operators
These are the functions that represent reverse-complementation, enzyme activity etc..
Genetic Fragment Filters
These are functions that represent removing uninteresting, ‘inert’, undesirable and fatal fragments of DNA. This definition will become more precise once we’ve worked on the project a bit and better understand the philosophical correctness of each of these notions.
Overall Program Logic
The overall program logic will constitute producing some structure that represents a Big Bag of DNA (as opposed to a cell), communication between this Big Bag, the Operator module and the Filter module and of course– our main program loop.
What I’m doing…
I’ve been tasked with producing a universal representation of DNA which includes a circular iterator on a loop of DNA with an arbitrary starting position. This is OK to do in Python with the use of the ‘yield’ operator. I will be borrowing from Jordan / Brendan / My own previous ideas for this representation– we want to have an easy single-letter-token system and for the moment are happy with the single byte space ascii has to offer.
It’s Python for UWiGem Modeling
Brief: The University of Waterloo iGem team has officially designated Python as the choice language for the modeling project. As far as I understand, we’ll be banging out two main components next week– a logical decomposition of the problem and an API for which to express decomposed modules. I’m curious to know if we’re talking Python 2.0 or 3.0. I’m also curious to know if Andre has further formalized first the promiscuous cases of the enzyme-to-dinucleotide bind and second the glorious exponential explosion of the search space– which is crititcal as it is the raison d’être of the project.
Python 2.4 is installed on the iGEM server, so that’s what stays.
I have not formalized the enzyme-to-dinucleotide binding any further. I was unaware further formalization was needed.
Oh right! We’re sticking to our six non-promiscuous enzyme operator cases. I keep forgetting that whenever someone mentions the exponential explosion.
Python 2.3, 2.6 and 3.0
The HPC serial cluster “Whale” on SharcNet has installed Python 2.3 for all of its nodes; in fact, I can almost guarantee that Python 2.3 is standard across SharcNet.
This means that there is no subprocessor module and no top-level exit() function (use sys.exit() instead). A few iterator constructs and other top-level functions are changed, notably the semantics of zip(), range() and xrange()– interestingly, on review– I *should* have used enumerate() in these incompatible cases anyway. But then again, this now-legacy code was written long before I even knew about that function.
This is actually NOT quite as bad as it sounds, since there aren’t too too many features that can’t be backported to work with Python 2.3 (SharcNet) and Python 2.6 simultaneously (Tin & Pewter). In future, when I switch to Python 3.0, I’ll have to take even more care since syntax changes and some major revisions have been adopted (for one, “print” the language construct is now print() a normal top-level function)… In reality, I’m likely to have both installed and will treat them as seperate languages (a sane way to manage legacy code, I’m told).
Today, I’m working on backporting the NGN to be 100% compatible with SharcNet– it takes far too long to compile, zip, upload, decompress, compile, run– the cycle can be expidited by having just a single compile procedure. So far, the problem can be traced to some issues with iterators implemented early on… I can likely take this opportunity to make some of the code more efficient as well.
UPDATE: The convergence was not as straightforward as I had hoped– I’ll still rely on two compile steps. As it turns out, there’s a String.partition() step which is impossible to ignore. It’s possible to do a String.split(), and then rejoin each subsequent segment together, but the amount of testing needed… I’ll need to return to updating the NGN code later. For now, I’m able to at least cleanly break apart the compilation into a “source generation” step, then a “compilation” step.
Integrase Problem Introduction
In a meeting with iGemmers @ Waterloo today– specifically the Modeling team headed by Andre and supplemented by core members Sheena and John– a discussion was held on this year’s modeling project. We’re currently interested in creating a solver that will yield an arrangement of attX sites on a chasis bacterial host chromosome that can accomodate several rounds of deterministic recombination.
Plainly, we need to write software that will create a solution that is a sequence of DNA– this DNA is arranged such that specific sites that can be operated on by the enzyme integrase is sequenced so that it can accept several loops of artificial DNA to recombine with. In this design, we’re interested in a sequence for the host chasis, a sequence for the artificial loops and another loop for integrase to be produced at some arbitrary tonic level inside the cell.
The first step is to mathmatically formalize the problem– and along with it, some working particles of software that successfully model the problem space. The solver is a yet more abstract piece of software that will use these particles in its solution. This is similar to designing the notion of integers and arithmetic operators prior to using those components to solve algebra.
This description is very coarse– I’ll refine it in a later post after I’ve had some time to analyze the problem constraints and what software particles are important to set down on paper.
Ed's Big Plans