Ed's Big Plans

Computing for Science and Awesome

  • Page 1 of 2
  • 1
  • 2
  • >

Archive for the ‘Python’ tag

Python’s List Multiply — Use List Comprehension Instead

without comments

Brief: I ran into a snag parsing a CSV today — I have lines upon lines of 27 integers each. Each line represents a cube of values — each cube has dimensions 3×3×3. Here’s an example line and the corresponding cube it represents.

A line …

0, 0, 0, 0, 0, 0, 0, 50, 0, 2, 22, 0, 0, 4, 0, 5, 0, 17, 0, 26, 24, 0, 0, 0, 0, 0, 0,

The corresponding cube …

0 0 0
0 0 0
0 50 0
2 22 0
0 4 0
5 0 17
0 26 24
0 0 0
0 0 0

… Where the three major squares represents three levels of depth; each major square has three rows and three columns — the values in each depth are organized row-major.

So let’s say the lines in a file are read from stdin thanks to the magic of posix pipes.

We might be able to use a construct like this to store all of our lines in the variable cases as below …

import sys
cases = []
for line in sys.stdin:
    entry = [int(i) for i in line[:-1].split(",") if i != '']
    current_cases = [[[[0] * 3] * 3] * 3] # Doesn't work ...
    for ii, i in enumerate(entry):
        current_cases[ii/9%3][ii/3%3][ii%3] = i
        # ii is the index (of enumeration), i is the value (from the 'entry')
    cases.append(current_cases)

Let’s break apart the line that assigns entry first — it’s equal to the below …

entry = line[:-1]
# get rid of the trailing newline character

entry = entry.split(",")
# break apart the line by comma character

entry = [int(i) for i in entry if i != '']
# convert each element into an integer
# except for the trailing empty string

Now let’s break apart the line that assigns current_cases — here’s the logic behind getting each element of the 27-element cube …

[ i x x i x x i x x i x x i x x i x x i x x i x x i x x ] - column index
[ i i i x x x x x x i i i x x x x x x i i i x x x x x x ] - row index
[ i i i i i i i i i x x x x x x x x x x x x x x x x x x ] - depth index

The columns are indexed by [index (mod 3)] — every third item falls in the same column. The rows are indexed by [index /3 (mod 3)] — every run of three items out of nine are part of the same row. The levels of depth are indexed by [index /9 (mod 3)] — every run of nine items out of the 27 elements is part of the same depth.

So here’s the line that doesn’t work

current_cases = [[[[0] * 3] * 3] * 3]

In current_cases as defined above, there are only three integers — not 27 that are allocated in memory. The above saves cubes where the depth index and the row index don’t matter, only the inner-most column index has any meaning. The strange thing is, this wasn’t immediately intuitive to me — I expected the list multiply operation to create nested lists — instead, each of the two enclosing levels of lists just creates additional references to the same list.

This is the line we must use instead to make the nested lists correctly save the cubes …

current_cases = [[[0 for i in xrange(3)] for i in xrange(3)] for i in xrange(3)]
# or ...
current_cases = [[[0] * 3] for i in xrange(3)] for i in xrange(3)]
# or ...
current_cases = [[[0, 0, 0] for i in xrange(3)] for i in xrange(3)]

Here, the comprehension iteratively creates the correct 27 memory locations needed for each cube. Each nested comprehension is responsible for creating a unique list at the correct depth.

Putting it together, the correct listing for this task is …

import sys
cases = []
for line in sys.stdin:
entry = [int(i) for i in line[:-1].split(",") if i != '']
    current_cases = [[[0 for i in xrange(3)] for i in xrange(3)] for i in xrange(3)]
    for ii, i in enumerate(entry):
        current_cases[ii/9%3][ii/3%3][ii%3] = i
    cases.append(current_cases)

I’ve written this solution down this time because I seem to rediscover it every time I encounter it. Hopefully, this will save you some work too 😀

Eddie Ma

March 5th, 2011 at 11:15 pm

The Null Coalescing Operator (C#, Ruby, JS, Python)

without comments

Null coalescence allows you to specify what a statement should evaluate to instead of evaluating to null. It is useful because it allows you to specify that an expression should be substituted with some semantic default instead of defaulting on some semantic null (such as null, None or nil). Here is the syntax and behaviour in four languages I use often — C#, Ruby, JavaScript and Python.

C#

Null coalescencing in C# is very straight forward since it will only ever accept first class objects of the same type (or null) as its operator’s arguments. This restriction is one that exists at compile time; it will refuse to compile if it is asked to compare primitives, or objects of differing types (unless they’re properly cast).

Syntax:

<expression> ?? <expression>

(The usual rules apply regarding nesting expressions, the use of semi-colons in complete statements etc..)

A few examples:

DummyNode a = null;
DummyNode b = new DummyNode();
DummyNode c = new DummyNode();

return a ?? b; // returns b
return b ?? a; // still returns b
DummyNode z = a ?? b; // z gets b
return a ?? new DummyNode(); // returns a new dummy node
return null ?? a ?? null; // this code has no choice but to return null
return a ?? b ?? c; // returns b -- the first item in the chain that wasn't null

No, you’d never really have a bunch of return statements in a row like that — they’re only there to demonstrate what you should expect.

Ruby, Python and Javascript

These languages are less straight forward (i.e. possess picky nuances) since they are happy to evaluate any objects of any class with their coalescing operators (including emulated primitives). These languages however disagree about what the notion of null should be when it comes to numbers, strings, booleans and empty collections; adding to the importance of testing your code!

Syntax for Ruby, Javascript:

<expression> || <expression>

Syntax for Ruby, Python:

<expression> or <expression>

(Ruby is operator greedy :P.)

The use of null coalescence in these languages are the same as they are in C# in that you may nest coalescing expressions as function arguments, use them in return statements, you may chain them together, put in an object constructor as a right-operand expression etc.; the difference is in what Ruby, Python or Javascript will coalesce given a left-expression operand. The below table summarizes what left-expression operand will cause the statement to coalesce into the right-expression operand (i.e. what the language considers to be ‘null’-ish in this use).

Expression as a left-operand Does this coalesce in Ruby? Does this coalesce in Python? Does this coalesce in JavaScript?
nil / None / null Yes Yes Yes
[] No Yes No
{} No Yes n/a*
0 No Yes Yes
0.0 No Yes Yes
“” No Yes Yes
No Yes Yes
false / False / false
Yes Yes Yes

*Note that in JavaScript, you’d probably want to use an Object instance as an associative array (hash) so that the field names are the keys and the field values are the associated values — doing so means that you can never have a null associative array.

Contrast the above table to what C# will coalesce: strictly “null” objects only.

The null coalescing operator makes me happy. Hopefully it’ll make you happy too.

Eddie Ma

July 7th, 2010 at 11:00 am

Python Crash Course – Lesson 1

without comments

The first lecture of my Python Crash Course went really well! I ran it two evenings ago in the Dean’s Conference Room.

In gearing the very first lecture for absolute beginners, I had very little to cater to BIC (Bioinformatics Club) members. I however took the opportunity to discuss with them about the SOLVER group (more on that later); many of which seemed interested.

Overall there were roughly a dozen people that turned out, including Ariana, my TA partner from last term. There were about four iGEM members and six BIC members.

I also took the opportunity to poll for the kinds of things that students wanted to learn. Here are my findings.

  • Object Orientation is something everyone wants to know– especially the people coming in with a Javascript, PERL, C, C++ and Scheme background; I was surprised that the C++ people didn’t get exposure to thinking in objects earlier.
  • The beginners came in two groups. First, there are the ones who are happy to learn anything as long as it can be applied later.
  • The second group of beginners want to data crunch PDBs, SDFs, FASTAs, Nucleotides etc.

In week two, we’ll take care of object orientation and in week three, we’ll take care of everything anyone ever needs to know about input output in order to do data crunching. I have added a link in the navigation of this blog for the Python Crash Courseware which will eventually include all the PDFs, code modules and examples used in class.

Oh right, I don’t know if I’ll get around to it– but I am missing instructions for setting environment variables in Windows. Perhaps I will add it later when I have time.

(iGEM attendees were John Heil, Danielle Nash, Tiffany and Lina; BIC members included Fiona, James and about four others whose names I have forgotten.)

Edit: Direct link to Python Crash Courseware; Direct link to Week 1: A Mad Mad Introduction, PDF.

Eddie Ma

January 9th, 2010 at 9:00 pm

sqkillall.py

without comments

Brief: I forgot all about sqkillall.py! It’s a convenience script for killing all of the SharcNet jobs belonging to you! (More about it; Source code).

Eddie Ma

July 28th, 2009 at 1:02 pm

iGEM: Freedom Unhashed

with 2 comments

An iGEM modeling meeting was held yesterday wherein Andre revealed his big plans for switching the team into enduserhood. Unfortunately, I didn’t follow along as well as I could have this time around and can really only document and comment on the bottom line.

We’ve again self-organized into two to three teams based on task. The first team is charged with creating a hashing function which creates a sequence of integrase usable tokens from an integer. The second (and third?) team is responsible for creating a check to ensure that a given product corresponds correctly to a given pair of reactant sequences. Finally, the dangling task of creating an even bigger external harness along with modifications to the present main.py program logic is likely being handled by the latter team.

The Hashing Task is kind of interesting because it essentially calls for unhashing an integer into a meaningful sequence rather than hashing a meaningful sequence into a unique integer. Since the reactant strings can themselves be lexicographically sequenced, then the task quickly becomes an enumeration or counting problem whereupon we find the most efficient way to count through the possible permutations of reactant tokens until we reach the integer that we want. The backward task (what we’re doing) may end up being implemented as the forward task with a sequential search.

The hashing subteam is headed by Jordan, the modeling head from last year and is joined by myself and Wylee– I honestly don’t see this as a task that can’t be completed by one person in a single bout of insanity– so it’s likely that I’ll hop over to Andre’s reactant-product verification team whenever this finishes.

We’ve planned another meeting for Tuesday 5pm next week to pull whatever we have together and to tackle any nascent problems.

Reactant-Product Verification is I think the more straight forward item, at least to explain. It is likely more technically challenging. Basically, we make the reaction go forward, and if the product matches what we wanted, then we favour the persistence of the product. … Err, at least that’s how I understood it… I’ll probably need to pop in and ask about it on Thursday before the big oGEM Skype meeting.

Side note– Oddly, both Shira and John were present at this meeting– it probably means we’re expecting progress 😀

Eddie Ma

July 22nd, 2009 at 5:36 pm

NNcmk: A Neural Network (Win32 & OSX)

without comments

Okay– I managed to finish that 3-layer neural network implementation the other day– actually, it was a while ago but I didn’t post about it from being busy. It’s a pretty standard network, but I’m proud to say it’s small and works for OSX and Win32. I have to put in a few #define directives to have it work with Linux as well.

I will have to document it too when I get a chance. The reason why I made a brand new executable (instead of using the source from my previous projects) is because I needed something that would take in launch-time parameters so that it didn’t need to be recompiled each time someone decides to use the binary on a new dataset with a different number of inputs. Right now, the thing has barely any solid parameters that can’t be touched at launch-time.

The NNcmk (Neural Network – Cameron, Ma, Kremer) package is C compilable, uses the previously developed in-house library for the NGN and will be available shortly after I’m satisfied that I’ve squashed all the bugs, fixed the output and have documented the thing completely. I think Chris has difficulty with it right now mostly because I didn’t specify exactly what parameters do what– I did at least provide a (DOS) batch file with an example run-in-train-mode / run-in-test-mode sequence…

Back to work on that paper right now though…

  • Page 1 of 2
  • 1
  • 2
  • >