Archive for July, 2010
Frequent typos of mine
Brief: There are a few typos that I consistently make. I have concluded that these things have been trained into my brain some how. No matter how much I want to correct them, they just keep showing up. Even conscious efforts to “not type it wrongly this time” are only partially successful. It’s … like a speech impediment for my fingers.
Part of me wants to conjecture about motion planning, the cerebellum, buffer overruns and the QWERTY layout — but a larger part would rather not.
Here’s a list of words that get these automatic typos — Yes, I know I use four fingers on my left hand plus three fingers on my right hand to type — I think this scheme was inherited from an obsession with the DooM series of first person shooters when I was a primordial computer user.
- total → totoal — redundantly drummed ‘o’ with right middle finger.
- schematics → schemaitcs — incorrect priority given to right middle finger ‘i’ over left index ‘t’.
- blast → blasy — incorrectly increased reaching distance between left middle finger ‘s’ to left index ‘y’.
- desktop → dekstop — incorrect priority given to right middle finger ‘k’ over left index ‘s’.
- people → poeple — incorrect priority given to right middle finger ‘o’ over left middle ‘e’.
If you have a patch for my brain, please let me know.
Bioinformatics: Edit Strings (Ruby, JavaScript)
>>> Attached: ( editStringAdder.rb — in Ruby | editStrings.js — in JavaScript ) <<<
I wanted to share something neat that I’ve been using in the phylogeny analyzing software I’ve been developing. In the MUSCLE3 BMC Bioinformatics paper (Robert Edgar, 2004), Edgar describes a way to describe all the changes required to get from one raw sequence to how it appears somewhere in an internal node, riddled with gaps. He calls the data structures Edit Strings (e-strings).
I started using these things in my visualizer because it is a convenient way to describe how to render a profile of sequences at internal nodes of my phylogenetic trees. Rather than storing a copy of the sequence with gaps at each of its ancestors, e-strings allow you to store a discrete pathway of the changes your alignment algorithm made to a sequence at each node.
You can chain e-strings together with a multiply operation so that the profile of any node in the tree can be created out of just the original primary sequences of the dataset (the taxa) and the set of corresponding e-strings for the given tree. You can choose to store the alphabet-distribution profiles this way at each node too, but I decided to use the e-strings only for my visualizer (where seeing each sequence within a profile is important).
Edit strings take the form of a sequence of alternating positive and negative integers; positive integers mean “retain a substring of this length” while negative integers mean “insert gaps of this length” — here’s a few example applications of edit strings on the sequence “ABCDEFGHI“:
<10>("ABCDEFGHIJ") = "ABCDEFGHIJ"
<10 -3>("ABCDEFGHIJ") = "ABCDEFGHIJ---"
<4, -2, 6>("ABCDEFGHIJ") = "ABCD--EFGHIJ"
<2, -1, 5, -4, 3>("ABCDEFGHIJ") = "AB-CDEFG----HIJ"
A few particulars should be mentioned. An application of an e-string onto a sequence is only defined when the sum of the positive integers equals the length of the sequence. The total number of negative integers is arbitrary, and refers to any number of inserted gaps. Finally, a digit zero is meaningless. Edit strings can be multiplied together. If you apply two edit strings in succession onto a sequence, the result is the same as if you had applied the product of two edit strings onto that same sequence. Here are some examples of edit strings multipled together.
<10> * <10> = <10> <10> * <7, -1, 2> = <7, -1, 2> <6, -1, 4> * <7, -1, 3, -2, 1> = <6, -2, 3, -2, 1> <6, -1, 4> * <3, -1, 3, -1, 3, -1, 2> = <3, -1, 3, -2, 2, -1, 2> <6, -1, 4> * <2, -1, 6, -5, 3> = <2, -1, 4, -1, 1, -5, 3>
The multiply function is a bit tricky to implement at first, but one can work backward from the result to get the intermediate step that’s performed mentally. The last two examples above can manually calculated if we imagine an intermediate step as follows. We convert the first e-string to its application on a sequence of ‘+’ symbols– in both the below cases, this results in ten ‘+’ symbols with one ‘-’ (gap) inserted after the sixth position. We then apply the second edit string to the result of the first application. Finally, we write out the total changes as an e-string on the original sequence of ten ‘+’ symbols. Don’t worry, we don’t really do this in real life software — it’s just a good way for human brains to comprehend the overall operation.
Second last example above — expanded out…
<6, -1, 4> * <3, -1, 3, -1, 3, -1, 2> = ... ++++++-++++ * <3, -1, 3, -1, 3, -1, 2> = ... +++-+++--++-++ = <3, -1, 3, -2, 2, -1, 2>
Last example above — expanded out…
<6, -1, 4> * <2, -1, 6, -5, 3> = ... ++++++-++++ * <2, -1, 6, -5, 3> = ... ++-++++-+-----+++ = <2, -1, 4, -1, 1, -5, 3>
Here’s the code in Ruby for my take on the multiply function. I derived my version of the function based on the examples from Edgar’s BMC paper (mentioned before). When you think about it, the runtime is the sum of the number of elements of the two e-strings being multiplied. This becomes obvious when you realize that the only computation that really occurs is when a positive integer is encountered in the second e-string. The function is called estring_product(), it takes two e-strings as arguments (u, v) and returns a single e-string (w). This function internally calls an estring_collapse() function because we intermediately create an e-string that may have several positive integers or several negative integers in a row (when this happens, it’s usually just two in a row). Consecutive same-signed integers of an e-string are added together.
def estring_product(u, v)
w = []
uu_replacement = nil
ui = 0
vi = 0
v.each do |vv|
if vv < 0
w << vv
elsif vv > 0
vv_left = vv
uu = uu_replacement != nil ? uu_replacement : u[ui]
uu_replacement = nil
until vv_left == 0
if vv_left >= uu.abs
w << uu
vv_left -= uu.abs
ui += 1
uu = u[ui]
else
if uu > 0
w << vv_left
uu -= vv_left
elsif uu < 0
w << -vv_left
uu += vv_left
end
vv_left = 0
uu_replacement = uu
end
end
end
vi += 1
end
return estring_collapse(w)
end
If you aren’t familiar with Ruby, the “<<” operator means “append the right value to the left collection”. Everything else should be pretty self-explanatory (leave a comment if it’s not).
def estring_collapse(u)
v = []
u.each do |uu|
if v[-1] == nil
v << uu
elsif (v[-1] <=> 0) == (uu <=> 0)
v[-1] += uu
else
v << uu
end
end
return v
end
The “<=>” lovingly referred to as the spaceship operator compares the right value to the left value; if the right value is greater, then the result is 1.0; if the left value is greater, then the result is -1.0; if the left and right values are the same, then the result is 0.0.
A Few Notes on the Attached Files (Ruby, Javascript)
The Ruby source has a lot of comments in it that should help you understand when and where to use the included two functions. The Javascript is actually used in a visualizer I’ve deployed now so it has a few more practical functions in it. A function called sequence_estring() is included that takes a $sequence and applies an $estring, then returns a new gapped sequence. A utility signum() function is included which takes the place of the spaceship operator in the Ruby version. A diagnostic function, p_uvw() is included that just uses document.write() to print out whatever you give as (estring) $u, (estring) $v and (estring) $w. Finally, the JavaScript functions sequence_estring() and estring_product() will print out error messages with document.write() when the total sequence length specified in the first argument is not the same as the total sequence length implied by the second argument. Remember, the sum of the absolute values of the first argument describes the total length of the sequence– each of these tokens must be accounted for by the positive values of the second argument, the estring that will operate on it.
Updates to this post:
- Two hyphens in a row were displayed as a single dash — this has been fixed.
- Code listing plug-in uses literal “<” and “>” instead of the entity codes “<” and “>” — fixed.
My first TA Evaluations!
Brief: I got my TA evaluations from BIOL 208 back last week (which I taught with Ariana in fall 2009) and it looks the students liked my instruction. The overall positive response is encouraging but I’m concerned that they’ve actually been too kind. The whole thing was something of a learning experience for me as I’ve never given tutorials before. The written remarks were very informative too. There are two big things the students wanted more of: first, I should increase the depth of my background in the course; second, I should ensure there’s time to take up quiz and workbook questions. The first item is a bit difficult to do actively mostly because it’s hard to proactively decide on what kinds of questions a student will have cooked up based on the readings available for a given week. It looks like it’s a self-repairing problem however — I simply have to TA more in and around the same topic area until the background information is second hand (or at least until all the keywords are loaded into my brain along with hints toward appropriate literature). The second point is important. The amount of time needed to take up questions can be built into the lesson plan — I think the best way to approach this is to reduce the amount that we try to cover in the tutorial slide show. Besides, there’s little advantage to repeating all of the same things as the instructor (particularly if we might say it differently, or explain it in a way that’s even more confusing or worse, disagree with the instructor). It’s thus better to focus on giving the background for the workbook questions. I figure that an average of 25% to 33% less material covered will enable us to focus in on the workbook and allow us to discuss quiz questions (and spurious questions) etc. with sufficient time.
Overall, this was a very enjoyable and instructional experience
.
The Null Coalescing Operator (C#, Ruby, JS, Python)
Null coalescence allows you to specify what a statement should evaluate to instead of evaluating to null. It is useful because it allows you to specify that an expression should be substituted with some semantic default instead of defaulting on some semantic null (such as null, None or nil). Here is the syntax and behaviour in four languages I use often — C#, Ruby, JavaScript and Python.
C#
Null coalescencing in C# is very straight forward since it will only ever accept first class objects of the same type (or null) as its operator’s arguments. This restriction is one that exists at compile time; it will refuse to compile if it is asked to compare primitives, or objects of differing types (unless they’re properly cast).
Syntax:
<expression> ?? <expression>
(The usual rules apply regarding nesting expressions, the use of semi-colons in complete statements etc..)
A few examples:
DummyNode a = null; DummyNode b = new DummyNode(); DummyNode c = new DummyNode(); return a ?? b; // returns b return b ?? a; // still returns b DummyNode z = a ?? b; // z gets b return a ?? new DummyNode(); // returns a new dummy node return null ?? a ?? null; // this code has no choice but to return null return a ?? b ?? c; // returns b -- the first item in the chain that wasn't null
No, you’d never really have a bunch of return statements in a row like that — they’re only there to demonstrate what you should expect.
Ruby, Python and Javascript
These languages are less straight forward (i.e. possess picky nuances) since they are happy to evaluate any objects of any class with their coalescing operators (including emulated primitives). These languages however disagree about what the notion of null should be when it comes to numbers, strings, booleans and empty collections; adding to the importance of testing your code!
Syntax for Ruby, Javascript:
<expression> || <expression>
Syntax for Ruby, Python:
<expression> or <expression>
(Ruby is operator greedy
.)
The use of null coalescence in these languages are the same as they are in C# in that you may nest coalescing expressions as function arguments, use them in return statements, you may chain them together, put in an object constructor as a right-operand expression etc.; the difference is in what Ruby, Python or Javascript will coalesce given a left-expression operand. The below table summarizes what left-expression operand will cause the statement to coalesce into the right-expression operand (i.e. what the language considers to be ‘null’-ish in this use).
| Expression as a left-operand | Does this coalesce in Ruby? | Does this coalesce in Python? | Does this coalesce in JavaScript? |
| nil / None / null | Yes | Yes | Yes |
| [] |
No | Yes | No |
| {} | No | Yes | n/a* |
| 0 | No | Yes | Yes |
| 0.0 | No | Yes | Yes |
| “” | No | Yes | Yes |
| ” | No | Yes | Yes |
| false / False / false |
Yes | Yes | Yes |
*Note that in JavaScript, you’d probably want to use an Object instance as an associative array (hash) so that the field names are the keys and the field values are the associated values — doing so means that you can never have a null associative array.
Contrast the above table to what C# will coalesce: strictly “null” objects only.
The null coalescing operator makes me happy. Hopefully it’ll make you happy too.
Ed's Big Plans