Ed's Big Plans

Computing for Science and Awesome

Python Crash Course & CS[64]83 Next Directions

without comments

Python Crash Course

Having completed teaching that Python Crash Course before reading week, I can say that I’m probably not going to teach it again any time soon for its time commitment! It was a good experience for me since I’ve figured out how to balance content for time and also how to talk to a crowd of heterogeneous experience levels in programming.

There’s going to be a PERL one-off course offered by BIC on this week that I think I’ll attend to figure out what the general style is for one-session courses– it was previously offered by Edgar and is offered this time by Anna and James.

I definitely need to clean up the course materials, but Andre thinks we should have my course packaged along with the Biology for Engineers course he offered. These two packages will eventually be shipped off to oGEM & iGEM HQ as per their recent interest in local workshops.

Link: Python Crash Course Materials

CS [64]83 Structural Bioinformatics

I threw up a hints page some time two weeks ago to coordinate help for assignment three for the structural bioinformatics course I’ve been auditing. I’ll probably take it down from the main menu when it becomes completely irrelevant. For the mean time, I’ll leave it up or replace it with assignment four hints when that gets underway.

I think I’ll eventually consolidate all of my one-off pages as a single page linked in the menu to reduce clutter. I’ll probably put all of my slide shows in there too.

Link: Structural Bioinformatics A3 Hints

Eddie Ma

February 28th, 2010 at 4:18 pm

Generic Functions in C# and Java

without comments

>>> Attached: ( Main.java — in Java | Main.cs — in C# ) <<<

Updated: (1) Made code more readable. (2) Removed unnecessary package (Java) and namespace (C#) and added a function that returns a generic type as well. (3) Attached compilable demo source code in separate files.

The most fun and productive concept in object oriented programming is generics — for me anyway. In C, one could deploy generics hazardously with code that casts the contents of memory addresses with a putative struct. The first field gives away what that chunk of memory is supposed to be at run time (usually, it’s a typdef int or an enum). I still do that when it’s called for, but it’s quite delicate and often leads to insidious bugs that don’t crash immediately. At least one would know what code to suspect when crashes do happen.

In C# and Java, two languages that derive from C — we find full safe support of generics. Generic classes (the things that collections are made of) are interesting, and I’m sure most who have used either of these languages have already played with them and have found them useful. One of the things that don’t receive a healthy dose of spotlight is Generic Functions (“Generic Methods” if you like).

I’ll compare two segments of code, one in C# and one in Java that do exactly the same thing — demonstrate two trivial functions printArrayList() and getElement(). The function printArrayList() prints out the contents of an ArrayList (Java) or a List (C#). The function getElement() retrieves an element from a list. This shows how single generic functions can operate on collections, each with a different defined type without the need for unsafe casting. The only assumption the code makes is that each object in a list implements the toString() method (needed for the printing function).

Note naming convention: In Java, methods are just members of an object, so they are named in lowercase. In C#, methods are capitalized. We will refer to methods by the Java convention here to keep things consistent.

Setting Up in Main…

Let’s declare and fill a few lists for this demonstration. Three generic list objects, cow, dog and elephant are constructed in a for loop. Each gets ten elements. Each list contains objects of a particular type; cow contains integers, dog contains doubles and elephant contains strings.

Java Code C# Code
ArrayList<Integer> cow = new ArrayList();
ArrayList<Double> dog = new ArrayList();
ArrayList<String> elephant = new ArrayList();
List<int> cow = new List();
List<double> dog = new List();
List<string> elephant = new List();

Notice that Java does not autobox the type in the angel brackets so you can’t give it the primitives int and double. In C#, this is allowed plus string is also a primitive. Remember: In both Java and C#, primitives are emulated — they are first class objects that are only different from other objects in that they are pass-by-value rather than pass-by-reference.

Appending ten items to each list. Shown below is the Java version — in C#, change “add()” to “Add()”.

for(int i = 0; i < 10; i ++) {
    cow.add(3 * i);
    dog.add(0.25 * i);
    if(i % 2 == 0)
        elephant.add("Even");
    else
        elephant.add("Odd");
}

The below is the code we want to make work — We’ll call printArrayList() to print out all of the elements in each list, then we’ll call getElement() to return a specific element from each list. Notice that this is the Java version below — in C#, we capitalize method names and use Console.Writeline() instead of System.out.println().

System.out.println("== Generic List Printer ==");

printArrayList(cow);
printArrayList(dog);
printArrayList(elephant);

System.out.println();

System.out.println("== Generic Element Accessing ==");

int    cow_at_7      = getElement(cow, 7);
double dog_at_2      = getElement(dog, 2);
String elephant_at_4 = getElement(elephant, 4);

System.out.println("Cow at 7      = " + cow_at_7);
System.out.println("Dog at 2      = " + dog_at_2);
System.out.println("Elephant at 4 = " + elephant_at_4);

Note that in C#, we may use the keyword “var” instead of typing out the types for cow_at_7, dog_at_2, and elephant_at_4 — the compiler infers the type for us. This is different from unsafely casting with “Object”, as the compiler infers the narrowest possible type and substitutes in that correct type.

Onto the methods …

Below is the Java version of printArrayList().

static <A> void printArrayList(ArrayList<A> animalList) {
    for(A a : animalList)
        System.out.print(a + "\t");
    System.out.println();
}

Below is the C# version of PrintArrayList().

static void PrintArrayList<A>(List<A> animalList) {
    foreach(A a in animalList)
        Console.Write(a + "\t");
    Console.WriteLine();
}

Notice that printArrayList() is a method that specifies a generic type <A>, but only in its argument list. In Java, <A> appears before the function’s type and in C#, this appears after the function name. In this case, it’s obvious what we would do if we have functions that return specific types — we just substitute the type where the keyword “void” is. So what happens when we want to return the generic type? That’s what getElement() will demonstrate.

Below is the Java version of getElement().

static <A> A getElement(ArrayList<A> what, int which) {
    return what.get(which);

Below is the C# version of GetElement().

static A GetElement<A>(List<A> what, int which) {
    return what[which];

Yes, these are both trivial functions, as you could have easily called ArrayList.get() in Java and List[] in C# respectively — but it does the job in this demonstration. In the Java version, the generic type <A> is placed before the type of the function, A. Don’t let that confuse you, just recall how we specified the return type when it wasn’t the generic type. In C#, we place the generic type <A> after the function name just as before.

Below is the output you should expect if you run the main function.

C# Output

== Generic List Printer ==
0	3	6	9	12	15	18	21	24	27
0	0.25	0.5	0.75	1	1.25	1.5	1.75	2	2.25
Even	Odd	Even	Odd	Even	Odd	Even	Odd	Even	Odd	

== Generic Element Accessing ==
Cow at 7      = 21
Dog at 2      = 0.5
Elephant at 4 = Even

The Java output is the same, except the values that are doubles are always printed with a trailing “.0” even if it is numerically equal to an integer.

Eddie Ma

January 29th, 2010 at 11:41 am

fsMSA Algorithm Context

without comments

What started as a meeting between me and my advisors ended up being a ball of unresolved questions about the cultural context of multiple sequence alignment and phylogenetic trees. While I had a good idea of what the field and its researchers had looked into and developed, I hadn’t a grasp of how far along we were. The result is the presentation I’ve just finished. In it, I discuss what I consider to be a representative sampling of the alignment and phylogenetic tree building algorithms available right now, at this very instant.

(PDF not posted, contact me if interested.)

Eddie Ma

January 13th, 2010 at 12:12 pm

Python Crash Course – Lesson 1

without comments

The first lecture of my Python Crash Course went really well! I ran it two evenings ago in the Dean’s Conference Room.

In gearing the very first lecture for absolute beginners, I had very little to cater to BIC (Bioinformatics Club) members. I however took the opportunity to discuss with them about the SOLVER group (more on that later); many of which seemed interested.

Overall there were roughly a dozen people that turned out, including Ariana, my TA partner from last term. There were about four iGEM members and six BIC members.

I also took the opportunity to poll for the kinds of things that students wanted to learn. Here are my findings.

  • Object Orientation is something everyone wants to know– especially the people coming in with a Javascript, PERL, C, C++ and Scheme background; I was surprised that the C++ people didn’t get exposure to thinking in objects earlier.
  • The beginners came in two groups. First, there are the ones who are happy to learn anything as long as it can be applied later.
  • The second group of beginners want to data crunch PDBs, SDFs, FASTAs, Nucleotides etc.

In week two, we’ll take care of object orientation and in week three, we’ll take care of everything anyone ever needs to know about input output in order to do data crunching. I have added a link in the navigation of this blog for the Python Crash Courseware which will eventually include all the PDFs, code modules and examples used in class.

Oh right, I don’t know if I’ll get around to it– but I am missing instructions for setting environment variables in Windows. Perhaps I will add it later when I have time.

(iGEM attendees were John Heil, Danielle Nash, Tiffany and Lina; BIC members included Fiona, James and about four others whose names I have forgotten.)

Edit: Direct link to Python Crash Courseware; Direct link to Week 1: A Mad Mad Introduction, PDF.

Eddie Ma

January 9th, 2010 at 9:00 pm

Apache Optimized Finally! (Firebug, YSlow)

without comments

I didn’t realize I hadn’t added the mod_expires.c and mod_deflate.c items to my httpd.conf file in Apache yet– Andre clued me in!

Andre noticed my blog was taking a while to load, even when the browser cache should have significantly dented the page weight. He used Firebug and Yahoo’s YSlow to make a diagnosis and told me to do the same– this page ended up taking a whopping 17 seconds to load which is … very … sad. After I added these lines to my httpd.conf file, things were looking better (roughly 1.5 seconds — not perfect, but it’s far better).

The mod_expires.c chunk specifies that files displayed on a webpage ought to live in the browser cache. The caching information is sent as part of the file header by Apache to the client browser. Without this, files were apparently expiring instantly meaning that each refresh required downloading every single file again including including the images comprising this theme’s background.

The mod_deflate.c chunk specifies that file data should be gzipped before transmitting– this is again handled by Apache. The trade off between compressing a few text files (even dynamically generated ones) versus sending uncompressed text is more than fair.

<IfModule mod_expires.c>
    FileETag MTime Size
    ExpiresActive On
    ExpiresByType image/gif "access plus 2 months"
    ExpiresByType image/png "access plus 2 months"
    ExpiresByType image/jpeg "access plus 2 months"
    ExpiresByType text/css "access plus 2 months"
    ExpiresByType application/js "access plus 2 months"
    ExpiresByType application/javascript "access plus 2 months"
    ExpiresByType application/x-javascript "access plus 2 months"
</IfModule>

<IfModule mod_deflate.c>
    # these are known to be safe with MSIE 6
    AddOutputFilterByType DEFLATE text/html text/plain text/xml

    # everything else may cause problems with MSIE 6
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE application/x-javascript
    AddOutputFilterByType DEFLATE application/javascript
    AddOutputFilterByType DEFLATE application/ecmascript
    AddOutputFilterByType DEFLATE application/rss+xml
</IfModule>

I’ve also removed the custom truetype font files specified in the CSS… they aren’t handled correctly for whatever reason– even after I added ‘font/ttf’ entries to the mod_expires.c chunk above. Finally, I tried completely removing background images from the site and restoring them again– it doesn’t make things any faster after images have been cached (correctly, finally).

I am very happy.

Eddie Ma

December 24th, 2009 at 5:19 pm

fsMSA Algorithm… Monkeys in a β-Barrel…

without comments

I’ve finally finished documenting my foil sensitive protein sequence algorithm… This is part of Monkeys in a β-Barrel — a work in progress, this time continuing more on Andrew’s half of the problem rather than Aron’s.

I’ve decided on using the word “foil” to mean “internal repeat” since it’s easier to say and less awkward in written sentences. Andre suggested it after “trefoil” and “cinquefoil”, the plant.

Thumbnails below (if you are curious about the full slide show, contact me :D).