Archive for November, 2009
Practical Scripting for Biologists (Python)
Draft Syllabus Here (public Google doc).
I’m currently putting together the course materials for my Python Crash Course for Biologists… all that’s left is to fasten the lesson ideas into slide shows, example code and exercise materials and I’ll be ready to indicate a launch date.
I’ve heard some interest from iGem members, lab mates here at the Meiering lab and also the Waterloo Bioinformatics Club… I want to run this soon.
Andre may be doing something similar on a different topic– so it’s been very nice to have another set of eyes evaluate the syllabus.
WordPress Inline Comments
Thanks to flisterz for this tip!
I added inline comments to this blog– it actually involves editing index.php, something I generally avoided doing. Peering into the code, it’s as normal as any other PHP so all of those silly worries were washed aside. The WordPress Codex does an excellent job of — well, it’s an incomplete API reference so it at the very least lists the functions that exist if it doesn’t truly explain them all.
Anyway– I didn’t just copy the hint from flisterz one to one because I wanted different functionality– but it did offer an example that let me make my changes much more easily. Because I don’t get very many comments, I don’t mind having complete comments show up inline on the main page instead of just an excerpt. As well, I wanted to have nice title bars to separate out the comments too… finally, while I was mucking around in the code, I decided to move the post tags up right beneath the title of each post instead of at the very bottom.
Changes made are inside index.php located before <?php end_while() ?>, just as in flisterz’s tip– here’s my version.
<div class="recent-comment">
<?php $comment_array = get_approved_comments($wp_query->post->ID); ?>
<?php if ($comment_array) { ?>
<?php foreach($comment_array as $comment){ ?>
<br />
<div style="background-color: #DDD9C6; border: 1px #F2EFE5 solid;">
<em><?php comment_author_link(); ?> says...</em>
</div><div>
<?php comment_text(); ?>
</div>
<?php } ?>
<?php } ?>
</div>
In future, I’ll probably want to clean up the markup a bit by moving the style changes to this theme’s CSS. Oh, while I’m at it, I should also thank Digital Nature for this awesome theme (Arclite).
fridgelib — Andre’s C Library
Brief: Fridge Library (fridgelib) is a light weight C library that’s basically what each C programmer eventually comes away with. Fridgelib contains a queue, stack and trie data type implementation– these implementations are cleaner than mine, hence I have devoured them into my code where necessary.
There is however one item that Andre’s told me I can add to fridgelib if I can ever tidy it up, and that’s my evil linked-array. It’s basically a doubly linked list whose elements are arrays of fixed size; the traversal is cut down by a multiple of that array size while indexing thereafter is still constant time given the modulus of the index. My evil implementation is a double-headed stack/queue/array which supported python-like negative-value indexing, slicing and iteration… actually, that’s when I decided it had grown too grotesque and left it to sit in my repository…
I’ll probably return to it later, to remove slicing and to more cleanly define the semantics of iteration.
Hmm– the only things that are really missing from fridgelib are the hashtable and either a nice red-black tree or a treap…
Yo! Holy crap you have a blog. Interesting concept of a linked array. Do the allocations increase exponentially? You would have an interesting problem trying to keep constant time because even if your allocations increase exponentially, if you have a large number, you would have to traverse the small nodes first.
No, the allocations increase at a constant size
The modulus short cut can only work that way– this pseudocode should clear it up… I haven’t checked for off-by-one logical errors– this’ll give you the sense of what I’m doing though.
Pseudocode:
function get linkedArray this, int index:
int which_page = index / length of one internal array
int which_index_in_page = index % length of one internal array
internalArray this_page = this -> start
for int i = 0; i < which_page; i++
this_page = this_page ->> next
return this_page -> array[ which_index_in_page ]
struct linkedArray:
int length of one internal array
internalArray start
struct internalArray:
generic array
internalArray next
Note that I haven’t added anything in the above to convert out of Python-style negative-indices– I also had an optimization that would traverse the list of arrays from the last internal array instead of the first internal array if the selected index is actually on a ‘page’ that’s past the midpoint of the list.
Edit: Getting pseudocode in html to work well in WordPress comments is a pain…
Hmm, if you double the number and start the array size at 1, the capacity:
sum of 1->numPages 2**maxIndex
You could either do a O(lg(maxIndex)) (using bit shifting) access if you don’t store where all the pages start, or a O(1) access if you do. It’s an extra 4 bytes per page plus the occasional expansion of the index which should happen seldomly. That way inserts are no longer a max O(numberInserted/number per array) but O(lg(numberInserted)).
Heavier on the memory but saves big time on write time. I like this idea though, never heard of a linked array. Arbitrary inserts aren’t so bad because you can just expand one array rather than all of them, that’s so smart!
I suppose though, if you knew that the inserts were constant, then your data structure would be better. It really does depend on application.
Java Classpaths are Evil
While working with Phylogenetic Analysis Library (PAL) for an alignment problem, I ran into the problem of having to specify classpaths to a jar file… it should have be straight forward enough…
Java classpaths are a pain.
Here are a few observations I’ve made about how to specify them in the command line.
- an Item can either be a directory that contains .class and .java files OR
- an Item can be a .jar file.
- to specify more than one Item.jar in Unix, use:
javac -classpath .:Item1.jar:Item2.jar
- note that you cannot put a space between the colons
- note that you must include an extra Item ‘.’ to specify the current working directory
- note that in Windows, you must use ‘;’ instead of ‘:’
- note that after compiling with javac, the same -classpath and its arguments must then be applied with java
Nuisance? Yes! Necessary Evil? No!
In the compiled Java class, there certainly could have been some metadata implemented that is a copy of the last known classpath string… why is there a disparity between the symbols used in Unix and Windows? … Why aren’t spaces allowed? Why does one have to specify the current working directory?
Evil.
A side effect of not being able to put spaces in between the colons of several paths is that one can’t just put a backslash in to negate a newline– you would need to have the next path start at the very beginning of the next line which is just ugly.
This is one of the many Java screw ups. It was thought that classes were the units of program, so there was never a mechanism to aggregate classes into libraries. JARs became the standard, but they have no metadata to express dependencies or versioning (classes do this individually). So, Java’s dynamic linker works at the class level, not the “module” level, hence, you have to feed it all the places classes might be. This was done much better in .NET where there are DLLs that work like normal libraries, expression module-level dependencies and the dynamic linker can deal with libraries and then with the classes in side them. There is a proposal for a Java module format like this, but I haven’t heard from it in a while.
Sure– that seems rational– I bet Sun would just glue on a module format so that it didn’t interfere with existing functionality. Honestly, why is it so hard to slap a sunset clause onto the Java 1, 2, 5, 6 specification, or offer temporary dual support during a phase out (e.g. Python 2.x, 3.x)?
1:1 String Comparison Tool
link (science.uwaterloo.ca) | link (eddiema.ca)
Watching one of my labmates painstakingly move two index fingers across two printed pages scanning letter by letter for a single point mutation in a nucleotide sequence motivated me to produce this very simple software. It’s 100% Javascript and runs client side.
It basically does what he did… scans two strings (contrast: not sequences) letter by letter, looking for single point mutations.
No alignments are done, and nothing more sophisticated. Just … single … point … mutations … only.
Licensing information: Do anything you want, it’s just a loop.
Back from Conference!
Brief: The BIBM09 conference was the very first conference I have ever attended. I learned a lot from the various speakers and poster sessions–
I thought it was really interesting how the trend is to now study and manipulate large interaction pathways in silico– a theme of which is the utilization of many different data sources integrating chemical, drug and free text as well as the connection of physical protein interaction pathways and gene expression pathways. There was even a project which dealt with the alignment of pathway graphs (topology).
Dealing with pathways especially by hand and in the form of a picture is probably the bane of many biologists’ existence– I think that the solutions we’ll see in the next few years will turn this task into simple data-in-data-out software components, much like the kind we have to deal with sequence alignments.
And now, back to the real world!
Addendum: My talk went very well
And here are my slides with a preview below.



Ed's Big Plans