Ed's Big Plans

Computing for Science and Awesome

Brief Hints: C# Nullable Types and Arrays, Special Double Values

with 2 comments

Brief Hints: I wanted to show you three things in C# that I’ve been using a lot lately.

Nullable Types are a convenient language construction in C# that allows one to assign a primitive type with null…

double? someval = null;
// declares a nullable double called 'someval'.

The question mark suffixing the keyword double makes the variable someval nullable. This was originally designed so that one can retrieve values from LINQ to SQL without checking for nulls (SQL inherently makes this distinction). This could be thought of as yet another construct to make autoboxing primitives more intuitive and more entrenched in the language.

I use nullable types when I need a special ‘unassigned’ value for “find the greatest” or “find the least” kinds of loops.

When we apply Nullable Types to Arrays, we get an array of nullable primitives (arrays are already nullable, being first class objects).

double?[] somevals = new double?[10];
// declares an array called 'somevals' of ten nullable doubles.

An array of primitives is initialized with all values 0.0; whereas an array of primitve?s is initialized with nulls.

Special Double Values are also something that I’ve started using a lot. There are many algorithms I’ve been coming across that use “magic numbers” corresponding to arbitrarily high and arbitrarily low numbers. Instead of using evil magical quantities, I’ve been using Double.PositiveInfinity and Double.NegativeInfinity. C# makes it easy to assign three more special quantities: Double.NaN, Double.MinValue and Double.MaxValue.

Edit: I forgot to mention why I kept italicizing the word primitive. C# doesn’t really have primitives that are exposed to the developer– everything actually IS an object, and the illusion of boxing or not isn’t really relevant. This just makes nullable types all the more logical.

Eddie Ma

March 16th, 2010 at 12:01 pm

An Old Physiology Project — Operation Spinny Chair :D

featured post

with 2 comments

I discovered this ancient report in my repository about three months ago– I’ve finally decided to put it up because it made my day reading the abstract again. This is definitely one of my prouder albeit sillier projects from the days of my undergrad.

Independent Research Project Acute centripetal acceleration is correlated with increased heart rate and R-wave amplitude

Matthew Boyle, Bryan Chung, Eddie Ma

Abstract

In the present study, we set out to discover the correlation between the exposure of acute centripetal acceleration in human subjects and cardiovascular function across the following three dimensions: Heart rate, R-wave amplitude and QRS interval. This was accomplished by measuring the above properties via Lead II Bipolar ECG trace, after having spun the subject at 0.8 revolutions per second in an office chair for successively, 30, 60 and 90 seconds. It was determined that heart rate showed strong positive correlation (n = 3, average increase between trials of increasing duration, 3.2 beats per minute, s = 1.8). R-wave amplitude showed positive correlation in all subjects up until and including the 60 second trial. There was no systemic correlation between duration of spin and the length of the QRS interval in any of the subjects. The heart is therefore an important effector in response to centripetal acceleration in the human model.

Key Words: electrocardiogram, QRS interval, centripetal force, R-wave amplitude, spinning office chair.

[ Download this | PDF ].

Thumbnails of pages 5, 7 and 11.

Eddie Ma

March 12th, 2010 at 5:07 pm

Idea: Delaunay Simplex Graph Grammar

with 2 comments

The Structural Bioinformatics course I’m auditing comes with an independent project for graduate students. I’ve decided to see how feasible and meaningful it is to create a graph rewriting grammar for proteins that have been re-expressed as a Delauney Tessellation.

I was first introduced to the Delauney Tessellation about half a year ago. Such a tessellation is composed of irregular three dimensional tetrahedrons where each vertex corresponds to an amino acid. A hypothetical sphere that is defined by the four points of such a tetrahedron cannot be crossed by a line segment that does not belong to said tetrahedron.

An alphabet in formal languages is a finite set of arbitrarily irreducible tokens that composes the inputs of a language. In this project, I want to see if I can discover a grammar for the language of Delauney protein simplex graphs. Graph rewriting is likened to the collapse of neighbouring tetrahedrons. The tetrahedrons selected are either functionally important, stability important or have a strangely high probability of occurrence. This definition is recursively applied so that previously collapsed points are subject to further collapse in future passes of the algorithm.

When a subgraph is rewritten, two things happen. Some meaning is lost from the original representation of the protein, but that same meaning is captured on a stack of the changes made to the representation. In this way, the protein graph is iteratively simplified, while a stack that records the simplifications indicates all of the salient grammatical productions that have been used.

This stack is what my project is really after. Can a stack based on grammatical production rules for frequency of occurrence render any real information, or is it just noise? I can’t even create a solid angle to drive my hypothesis at this point. … “Yes … ?” …

I’ve seen a lot of weird machine learning algorithms in my line of work… and I attest that it’s hard for a novice to look at a description and decide whether or not it derives anything useful. Keep in mind that the literature is chuck full of things that DO work, and none of the things that didn’t make it. I conjecture that this representation has made me optimistically biased.

This method however IS feasible to deploy on short notice in the scope of an independent project 😀

Generic Functions in C# and Java

without comments

>>> Attached: ( Main.java — in Java | Main.cs — in C# ) <<<

Updated: (1) Made code more readable. (2) Removed unnecessary package (Java) and namespace (C#) and added a function that returns a generic type as well. (3) Attached compilable demo source code in separate files.

The most fun and productive concept in object oriented programming is generics — for me anyway. In C, one could deploy generics hazardously with code that casts the contents of memory addresses with a putative struct. The first field gives away what that chunk of memory is supposed to be at run time (usually, it’s a typdef int or an enum). I still do that when it’s called for, but it’s quite delicate and often leads to insidious bugs that don’t crash immediately. At least one would know what code to suspect when crashes do happen.

In C# and Java, two languages that derive from C — we find full safe support of generics. Generic classes (the things that collections are made of) are interesting, and I’m sure most who have used either of these languages have already played with them and have found them useful. One of the things that don’t receive a healthy dose of spotlight is Generic Functions (“Generic Methods” if you like).

I’ll compare two segments of code, one in C# and one in Java that do exactly the same thing — demonstrate two trivial functions printArrayList() and getElement(). The function printArrayList() prints out the contents of an ArrayList (Java) or a List (C#). The function getElement() retrieves an element from a list. This shows how single generic functions can operate on collections, each with a different defined type without the need for unsafe casting. The only assumption the code makes is that each object in a list implements the toString() method (needed for the printing function).

Note naming convention: In Java, methods are just members of an object, so they are named in lowercase. In C#, methods are capitalized. We will refer to methods by the Java convention here to keep things consistent.

Setting Up in Main…

Let’s declare and fill a few lists for this demonstration. Three generic list objects, cow, dog and elephant are constructed in a for loop. Each gets ten elements. Each list contains objects of a particular type; cow contains integers, dog contains doubles and elephant contains strings.

Java Code C# Code
ArrayList<Integer> cow = new ArrayList();
ArrayList<Double> dog = new ArrayList();
ArrayList<String> elephant = new ArrayList();
List<int> cow = new List();
List<double> dog = new List();
List<string> elephant = new List();

Notice that Java does not autobox the type in the angel brackets so you can’t give it the primitives int and double. In C#, this is allowed plus string is also a primitive. Remember: In both Java and C#, primitives are emulated — they are first class objects that are only different from other objects in that they are pass-by-value rather than pass-by-reference.

Appending ten items to each list. Shown below is the Java version — in C#, change “add()” to “Add()”.

for(int i = 0; i < 10; i ++) {
    cow.add(3 * i);
    dog.add(0.25 * i);
    if(i % 2 == 0)
        elephant.add("Even");
    else
        elephant.add("Odd");
}

The below is the code we want to make work — We’ll call printArrayList() to print out all of the elements in each list, then we’ll call getElement() to return a specific element from each list. Notice that this is the Java version below — in C#, we capitalize method names and use Console.Writeline() instead of System.out.println().

System.out.println("== Generic List Printer ==");

printArrayList(cow);
printArrayList(dog);
printArrayList(elephant);

System.out.println();

System.out.println("== Generic Element Accessing ==");

int    cow_at_7      = getElement(cow, 7);
double dog_at_2      = getElement(dog, 2);
String elephant_at_4 = getElement(elephant, 4);

System.out.println("Cow at 7      = " + cow_at_7);
System.out.println("Dog at 2      = " + dog_at_2);
System.out.println("Elephant at 4 = " + elephant_at_4);

Note that in C#, we may use the keyword “var” instead of typing out the types for cow_at_7, dog_at_2, and elephant_at_4 — the compiler infers the type for us. This is different from unsafely casting with “Object”, as the compiler infers the narrowest possible type and substitutes in that correct type.

Onto the methods …

Below is the Java version of printArrayList().

static <A> void printArrayList(ArrayList<A> animalList) {
    for(A a : animalList)
        System.out.print(a + "\t");
    System.out.println();
}

Below is the C# version of PrintArrayList().

static void PrintArrayList<A>(List<A> animalList) {
    foreach(A a in animalList)
        Console.Write(a + "\t");
    Console.WriteLine();
}

Notice that printArrayList() is a method that specifies a generic type <A>, but only in its argument list. In Java, <A> appears before the function’s type and in C#, this appears after the function name. In this case, it’s obvious what we would do if we have functions that return specific types — we just substitute the type where the keyword “void” is. So what happens when we want to return the generic type? That’s what getElement() will demonstrate.

Below is the Java version of getElement().

static <A> A getElement(ArrayList<A> what, int which) {
    return what.get(which);

Below is the C# version of GetElement().

static A GetElement<A>(List<A> what, int which) {
    return what[which];

Yes, these are both trivial functions, as you could have easily called ArrayList.get() in Java and List[] in C# respectively — but it does the job in this demonstration. In the Java version, the generic type <A> is placed before the type of the function, A. Don’t let that confuse you, just recall how we specified the return type when it wasn’t the generic type. In C#, we place the generic type <A> after the function name just as before.

Below is the output you should expect if you run the main function.

C# Output

== Generic List Printer ==
0	3	6	9	12	15	18	21	24	27
0	0.25	0.5	0.75	1	1.25	1.5	1.75	2	2.25
Even	Odd	Even	Odd	Even	Odd	Even	Odd	Even	Odd	

== Generic Element Accessing ==
Cow at 7      = 21
Dog at 2      = 0.5
Elephant at 4 = Even

The Java output is the same, except the values that are doubles are always printed with a trailing “.0” even if it is numerically equal to an integer.

Eddie Ma

January 29th, 2010 at 11:41 am

fsMSA Algorithm Context

without comments

What started as a meeting between me and my advisors ended up being a ball of unresolved questions about the cultural context of multiple sequence alignment and phylogenetic trees. While I had a good idea of what the field and its researchers had looked into and developed, I hadn’t a grasp of how far along we were. The result is the presentation I’ve just finished. In it, I discuss what I consider to be a representative sampling of the alignment and phylogenetic tree building algorithms available right now, at this very instant.

(PDF not posted, contact me if interested.)

Eddie Ma

January 13th, 2010 at 12:12 pm

Apache Optimized Finally! (Firebug, YSlow)

without comments

I didn’t realize I hadn’t added the mod_expires.c and mod_deflate.c items to my httpd.conf file in Apache yet– Andre clued me in!

Andre noticed my blog was taking a while to load, even when the browser cache should have significantly dented the page weight. He used Firebug and Yahoo’s YSlow to make a diagnosis and told me to do the same– this page ended up taking a whopping 17 seconds to load which is … very … sad. After I added these lines to my httpd.conf file, things were looking better (roughly 1.5 seconds — not perfect, but it’s far better).

The mod_expires.c chunk specifies that files displayed on a webpage ought to live in the browser cache. The caching information is sent as part of the file header by Apache to the client browser. Without this, files were apparently expiring instantly meaning that each refresh required downloading every single file again including including the images comprising this theme’s background.

The mod_deflate.c chunk specifies that file data should be gzipped before transmitting– this is again handled by Apache. The trade off between compressing a few text files (even dynamically generated ones) versus sending uncompressed text is more than fair.

<IfModule mod_expires.c>
    FileETag MTime Size
    ExpiresActive On
    ExpiresByType image/gif "access plus 2 months"
    ExpiresByType image/png "access plus 2 months"
    ExpiresByType image/jpeg "access plus 2 months"
    ExpiresByType text/css "access plus 2 months"
    ExpiresByType application/js "access plus 2 months"
    ExpiresByType application/javascript "access plus 2 months"
    ExpiresByType application/x-javascript "access plus 2 months"
</IfModule>

<IfModule mod_deflate.c>
    # these are known to be safe with MSIE 6
    AddOutputFilterByType DEFLATE text/html text/plain text/xml

    # everything else may cause problems with MSIE 6
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE application/x-javascript
    AddOutputFilterByType DEFLATE application/javascript
    AddOutputFilterByType DEFLATE application/ecmascript
    AddOutputFilterByType DEFLATE application/rss+xml
</IfModule>

I’ve also removed the custom truetype font files specified in the CSS… they aren’t handled correctly for whatever reason– even after I added ‘font/ttf’ entries to the mod_expires.c chunk above. Finally, I tried completely removing background images from the site and restoring them again– it doesn’t make things any faster after images have been cached (correctly, finally).

I am very happy.

Eddie Ma

December 24th, 2009 at 5:19 pm