Archive for the ‘Python’ tag
Python’s List Multiply — Use List Comprehension Instead
Brief: I ran into a snag parsing a CSV today — I have lines upon lines of 27 integers each. Each line represents a cube of values — each cube has dimensions 3×3×3. Here’s an example line and the corresponding cube it represents.
A line …
0, 0, 0, 0, 0, 0, 0, 50, 0, 2, 22, 0, 0, 4, 0, 5, 0, 17, 0, 26, 24, 0, 0, 0, 0, 0, 0,
The corresponding cube …
|
|
|
… Where the three major squares represents three levels of depth; each major square has three rows and three columns — the values in each depth are organized row-major.
So let’s say the lines in a file are read from stdin thanks to the magic of posix pipes.
We might be able to use a construct like this to store all of our lines in the variable cases as below …
import sys
cases = []
for line in sys.stdin:
entry = [int(i) for i in line[:-1].split(",") if i != '']
current_cases = [[[[0] * 3] * 3] * 3] # Doesn't work ...
for ii, i in enumerate(entry):
current_cases[ii/9%3][ii/3%3][ii%3] = i
# ii is the index (of enumeration), i is the value (from the 'entry')
cases.append(current_cases)
Let’s break apart the line that assigns entry first — it’s equal to the below …
entry = line[:-1]
# get rid of the trailing newline character
entry = entry.split(",")
# break apart the line by comma character
entry = [int(i) for i in entry if i != '']
# convert each element into an integer
# except for the trailing empty string
Now let’s break apart the line that assigns current_cases — here’s the logic behind getting each element of the 27-element cube …
[ i x x i x x i x x i x x i x x i x x i x x i x x i x x ] - column index [ i i i x x x x x x i i i x x x x x x i i i x x x x x x ] - row index [ i i i i i i i i i x x x x x x x x x x x x x x x x x x ] - depth index
The columns are indexed by [index (mod 3)] — every third item falls in the same column. The rows are indexed by [index /3 (mod 3)] — every run of three items out of nine are part of the same row. The levels of depth are indexed by [index /9 (mod 3)] — every run of nine items out of the 27 elements is part of the same depth.
So here’s the line that doesn’t work …
current_cases = [[[[0] * 3] * 3] * 3]
In current_cases as defined above, there are only three integers — not 27 that are allocated in memory. The above saves cubes where the depth index and the row index don’t matter, only the inner-most column index has any meaning. The strange thing is, this wasn’t immediately intuitive to me — I expected the list multiply operation to create nested lists — instead, each of the two enclosing levels of lists just creates additional references to the same list.
This is the line we must use instead to make the nested lists correctly save the cubes …
current_cases = [[[0 for i in xrange(3)] for i in xrange(3)] for i in xrange(3)] # or ... current_cases = [[[0] * 3] for i in xrange(3)] for i in xrange(3)] # or ... current_cases = [[[0, 0, 0] for i in xrange(3)] for i in xrange(3)]
Here, the comprehension iteratively creates the correct 27 memory locations needed for each cube. Each nested comprehension is responsible for creating a unique list at the correct depth.
Putting it together, the correct listing for this task is …
import sys
cases = []
for line in sys.stdin:
entry = [int(i) for i in line[:-1].split(",") if i != '']
current_cases = [[[0 for i in xrange(3)] for i in xrange(3)] for i in xrange(3)]
for ii, i in enumerate(entry):
current_cases[ii/9%3][ii/3%3][ii%3] = i
cases.append(current_cases)
I’ve written this solution down this time because I seem to rediscover it every time I encounter it. Hopefully, this will save you some work too
The Null Coalescing Operator (C#, Ruby, JS, Python)
Null coalescence allows you to specify what a statement should evaluate to instead of evaluating to null. It is useful because it allows you to specify that an expression should be substituted with some semantic default instead of defaulting on some semantic null (such as null, None or nil). Here is the syntax and behaviour in four languages I use often — C#, Ruby, JavaScript and Python.
C#
Null coalescencing in C# is very straight forward since it will only ever accept first class objects of the same type (or null) as its operator’s arguments. This restriction is one that exists at compile time; it will refuse to compile if it is asked to compare primitives, or objects of differing types (unless they’re properly cast).
Syntax:
<expression> ?? <expression>
(The usual rules apply regarding nesting expressions, the use of semi-colons in complete statements etc..)
A few examples:
DummyNode a = null; DummyNode b = new DummyNode(); DummyNode c = new DummyNode(); return a ?? b; // returns b return b ?? a; // still returns b DummyNode z = a ?? b; // z gets b return a ?? new DummyNode(); // returns a new dummy node return null ?? a ?? null; // this code has no choice but to return null return a ?? b ?? c; // returns b -- the first item in the chain that wasn't null
No, you’d never really have a bunch of return statements in a row like that — they’re only there to demonstrate what you should expect.
Ruby, Python and Javascript
These languages are less straight forward (i.e. possess picky nuances) since they are happy to evaluate any objects of any class with their coalescing operators (including emulated primitives). These languages however disagree about what the notion of null should be when it comes to numbers, strings, booleans and empty collections; adding to the importance of testing your code!
Syntax for Ruby, Javascript:
<expression> || <expression>
Syntax for Ruby, Python:
<expression> or <expression>
(Ruby is operator greedy
.)
The use of null coalescence in these languages are the same as they are in C# in that you may nest coalescing expressions as function arguments, use them in return statements, you may chain them together, put in an object constructor as a right-operand expression etc.; the difference is in what Ruby, Python or Javascript will coalesce given a left-expression operand. The below table summarizes what left-expression operand will cause the statement to coalesce into the right-expression operand (i.e. what the language considers to be ‘null’-ish in this use).
| Expression as a left-operand | Does this coalesce in Ruby? | Does this coalesce in Python? | Does this coalesce in JavaScript? |
| nil / None / null | Yes | Yes | Yes |
| [] |
No | Yes | No |
| {} | No | Yes | n/a* |
| 0 | No | Yes | Yes |
| 0.0 | No | Yes | Yes |
| “” | No | Yes | Yes |
| ” | No | Yes | Yes |
| false / False / false |
Yes | Yes | Yes |
*Note that in JavaScript, you’d probably want to use an Object instance as an associative array (hash) so that the field names are the keys and the field values are the associated values — doing so means that you can never have a null associative array.
Contrast the above table to what C# will coalesce: strictly “null” objects only.
The null coalescing operator makes me happy. Hopefully it’ll make you happy too.
Python Crash Course – Lesson 1
The first lecture of my Python Crash Course went really well! I ran it two evenings ago in the Dean’s Conference Room.
In gearing the very first lecture for absolute beginners, I had very little to cater to BIC (Bioinformatics Club) members. I however took the opportunity to discuss with them about the SOLVER group (more on that later); many of which seemed interested.
Overall there were roughly a dozen people that turned out, including Ariana, my TA partner from last term. There were about four iGEM members and six BIC members.
I also took the opportunity to poll for the kinds of things that students wanted to learn. Here are my findings.
- Object Orientation is something everyone wants to know– especially the people coming in with a Javascript, PERL, C, C++ and Scheme background; I was surprised that the C++ people didn’t get exposure to thinking in objects earlier.
- The beginners came in two groups. First, there are the ones who are happy to learn anything as long as it can be applied later.
- The second group of beginners want to data crunch PDBs, SDFs, FASTAs, Nucleotides etc.
In week two, we’ll take care of object orientation and in week three, we’ll take care of everything anyone ever needs to know about input output in order to do data crunching. I have added a link in the navigation of this blog for the Python Crash Courseware which will eventually include all the PDFs, code modules and examples used in class.
Oh right, I don’t know if I’ll get around to it– but I am missing instructions for setting environment variables in Windows. Perhaps I will add it later when I have time.
(iGEM attendees were John Heil, Danielle Nash, Tiffany and Lina; BIC members included Fiona, James and about four others whose names I have forgotten.)
Edit: Direct link to Python Crash Courseware; Direct link to Week 1: A Mad Mad Introduction, PDF.
Practical Scripting for Biologists (Python)
Draft Syllabus Here (public Google doc).
I’m currently putting together the course materials for my Python Crash Course for Biologists… all that’s left is to fasten the lesson ideas into slide shows, example code and exercise materials and I’ll be ready to indicate a launch date.
I’ve heard some interest from iGem members, lab mates here at the Meiering lab and also the Waterloo Bioinformatics Club… I want to run this soon.
Andre may be doing something similar on a different topic– so it’s been very nice to have another set of eyes evaluate the syllabus.
sqkillall.py
Brief: I forgot all about sqkillall.py! It’s a convenience script for killing all of the SharcNet jobs belonging to you! (More about it; Source code).
iGEM: Freedom Unhashed
An iGEM modeling meeting was held yesterday wherein Andre revealed his big plans for switching the team into enduserhood. Unfortunately, I didn’t follow along as well as I could have this time around and can really only document and comment on the bottom line.
We’ve again self-organized into two to three teams based on task. The first team is charged with creating a hashing function which creates a sequence of integrase usable tokens from an integer. The second (and third?) team is responsible for creating a check to ensure that a given product corresponds correctly to a given pair of reactant sequences. Finally, the dangling task of creating an even bigger external harness along with modifications to the present main.py program logic is likely being handled by the latter team.
The Hashing Task is kind of interesting because it essentially calls for unhashing an integer into a meaningful sequence rather than hashing a meaningful sequence into a unique integer. Since the reactant strings can themselves be lexicographically sequenced, then the task quickly becomes an enumeration or counting problem whereupon we find the most efficient way to count through the possible permutations of reactant tokens until we reach the integer that we want. The backward task (what we’re doing) may end up being implemented as the forward task with a sequential search.
The hashing subteam is headed by Jordan, the modeling head from last year and is joined by myself and Wylee– I honestly don’t see this as a task that can’t be completed by one person in a single bout of insanity– so it’s likely that I’ll hop over to Andre’s reactant-product verification team whenever this finishes.
We’ve planned another meeting for Tuesday 5pm next week to pull whatever we have together and to tackle any nascent problems.
Reactant-Product Verification is I think the more straight forward item, at least to explain. It is likely more technically challenging. Basically, we make the reaction go forward, and if the product matches what we wanted, then we favour the persistence of the product. … Err, at least that’s how I understood it… I’ll probably need to pop in and ask about it on Thursday before the big oGEM Skype meeting.
Side note– Oddly, both Shira and John were present at this meeting– it probably means we’re expecting progress
I’d actually like to see a bit more “current product verification” — that is, verifying that the code we currently have actually works — before moving on to the distributed-computing-and-madness realm.
That aside, I’m glad you figured out the hashing stuff. Just out of curiosity, what exactly is an open-form, lexicographically sequenced, permuted, time-amortized, mathematical expression that falls under counting problems, anyway?
Okay, I accept your challenge: It is exactly as it sounds. Although I’m certain it didn’t sound *that* terrible when I said it
Current product verification? Of course.
Ed's Big Plans