Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘linkedin’ tag

Idea: Delaunay Simplex Graph Grammar

with 2 comments

The Structural Bioinformatics course I’m auditing comes with an independent project for graduate students. I’ve decided to see how feasible and meaningful it is to create a graph rewriting grammar for proteins that have been re-expressed as a Delauney Tessellation.

I was first introduced to the Delauney Tessellation about half a year ago. Such a tessellation is composed of irregular three dimensional tetrahedrons where each vertex corresponds to an amino acid. A hypothetical sphere that is defined by the four points of such a tetrahedron cannot be crossed by a line segment that does not belong to said tetrahedron.

An alphabet in formal languages is a finite set of arbitrarily irreducible tokens that composes the inputs of a language. In this project, I want to see if I can discover a grammar for the language of Delauney protein simplex graphs. Graph rewriting is likened to the collapse of neighbouring tetrahedrons. The tetrahedrons selected are either functionally important, stability important or have a strangely high probability of occurrence. This definition is recursively applied so that previously collapsed points are subject to further collapse in future passes of the algorithm.

When a subgraph is rewritten, two things happen. Some meaning is lost from the original representation of the protein, but that same meaning is captured on a stack of the changes made to the representation. In this way, the protein graph is iteratively simplified, while a stack that records the simplifications indicates all of the salient grammatical productions that have been used.

This stack is what my project is really after. Can a stack based on grammatical production rules for frequency of occurrence render any real information, or is it just noise? I can’t even create a solid angle to drive my hypothesis at this point. … “Yes … ?” …

I’ve seen a lot of weird machine learning algorithms in my line of work… and I attest that it’s hard for a novice to look at a description and decide whether or not it derives anything useful. Keep in mind that the literature is chuck full of things that DO work, and none of the things that didn’t make it. I conjecture that this representation has made me optimistically biased.

This method however IS feasible to deploy on short notice in the scope of an independent project :D

Jordan Lapointe says...

This sounds really neat. I want to make sure I have your idea straight:

a) a Delaunay tessellation of a protein defines/is equivalent to a simplex graph

b) such simplex graphs are the tokens of your language

c) collapsing tetrahedra in the Delaunay tessellation/performing the equivalent operation on the simplex graph are your grammatical production rules

d) you generate a sequence of tessellations/graphs (each of which is, by definition, a token) by using your grammatical production rules
d.1) as a restriction on your production rules, you only allow collapsing of tetrahedra that meet certain criteria (functionality, stability, or frequency)

e) such a sequence forms a “word” in your language

f) you capture this word via the stack of operations performed on the simplex graph

g) Ultimately you want to be able to look at the stack (“word”) produced by your grammar and have it tell you something about the protein

Wow, this post turned out to be far longer than I meant it to be :) .

Eddie Ma says...

That was very succinct. I probably should have written it in the form you gave me to start. I’m currently dissecting a few proteins using a tessellation script Dr. Burkowski put together– apparently, work has already been done compatible with Chimera which cuts short my work. The present dissection actually sees evaluation of two things: which combinations of contact surfaces have potential for reduction; and which tetrahedron labels (as four-tuples) occur most frequently as tokens. To be honest, it swings between more daunting and less daunting as I progress. I’ll update this post when I have some nice pictures :D

Arclite Theme – Invisible Text in Submenu Fix

without comments

Brief: The Arclite WordPress Theme is great, but the default CSS for the submenu causes the text to be almost completely invisible when moused over. The fix is an easy edit in style.css.

ul#nav ul a:hover, ul#nav ul a:hover span,
ul#nav a.active ul a:hover span,
ul#nav li.current_page_item ul a:hover span,
ul#nav li.current_page_ancestor ul a:hover span,
ul#nav ul li.current_page_parent a:hover span,
ul#nav ul li.current_page_item a:hover span,
ul#nav ul li.current_page_parent li.current_page_item a:hover span{
  //color: #fff;
  color: #2d83d5;
  background: #CCCCFF;
}

To make the text look like a deep blue on light blue, I changed the default white colour for a:hover in the sub-menus section to match the text colour of the blue in the rest of the theme plus a lighter blue as its background.

Written by Eddie Ma

March 5th, 2010 at 7:51 pm

Posted in Web Programming

Tagged with , , ,

iGEM*BIC — An Awesome Meeting

without comments

About two weeks ago (Feb. 11th), we had an iGEM*BIC meeting where five iGEM members showed up and roughly a dozen BIC members showed up. I expected a few more from iGEM but they ended up with illnesses or midterm exams that week.

We started the meeting with a nice description of BIC from Anna, followed by a nice description of iGEM at large then iGEM at home from Andre. I then finished with a collaborative projects presentation.

I’ve attached the slides I presented (actually, I’ve updated them since then)– just like the very last set of slides for the Python Crash Course, I ended up using iWork Pages this time around instead of NeoOffice.

Download: Updated slides [pdf].

The meeting was designed to go for half an hour because of its proximity to midterms. We ended up discussing for about two hours about all of the projects we wanted to try this term– everything from the now defunct Bunny Buddy to BactoBones to BactoHouseMD.

Anna had remarked earlier that iGEM isn’t well marketed to CS students or BIC– so this will certainly be a recurring thing at the beginning of each semester. This is particularly important summer because the next stream of BIC students are returning from co-op.

The general consensus is that everyone was interested in doing *something* in iGEM which is a real bonus. This Wednesday, we’re going to have a modeling meeting that’s punctuated with John’s mini-project talk. I’m hoping that many BIC members will show up to carry over their interest.

My original assessment that most BIC students would want to do in silico modeling and software development was far off. As it turns out, BIC students showed interest in every facet of iGEM from wet lab to software development to outreach and public relations.

A direct consequence of the iGEM*BIC meeting is that we had a much larger design meeting the next day (Feb. 12th)– thirteen showed up.

Finally, the feeling of the group is that the next crossover meeting should be more social. I think that’s something we can shoot for, for early summer.

Written by Eddie Ma

February 21st, 2010 at 3:54 pm

Python Crash Course — 4/5ths done!

without comments

This week is going to be crowded enough for me that I’m going to cancel this week’s class. On the bright side, the classes have gone better than I thought it would. We will continue on February 9th.

The very first class ended up being too short, with the advanced students feeling that it moved too slowly. The second and third classes ended up being just the right speed– with the exception that the example fill-in-the-blank script from the third class was too difficult.

The difficulty rose when I too quickly introduced dictionaries whose values are lists.

The fourth class held last week was excellent– I completely ditched slides that week and produced five fill-in-the-blank scripts that were just the right tempo for everyone. I had a good mix of BIC (Bioinformatics Club), iGEM and chemistry graduate students– all who attended got something out of the hour which was my objective.

We only had time for four out of the five scripts with the remaining script as a bonus that everyone could take home and try.

Now, it’s back to Structural Bioinformatics homework… It’s quite a daunting assignment to be true (having just formally shaken hands with Singular Value Decomposition), but the parts that are Python (particularly the bonus question) are familiar enough for comfort.

Written by Eddie Ma

January 31st, 2010 at 5:03 pm

Generic Functions in C# and Java

without comments

>>> Attached: ( Main.java — in Java | Main.cs — in C# ) <<<

Updated: (1) Made code more readable. (2) Removed unnecessary package (Java) and namespace (C#) and added a function that returns a generic type as well. (3) Attached compilable demo source code in separate files.

The most fun and productive concept in object oriented programming is generics — for me anyway. In C, one could deploy generics hazardously with code that casts the contents of memory addresses with a putative struct. The first field gives away what that chunk of memory is supposed to be at run time (usually, it’s a typdef int or an enum). I still do that when it’s called for, but it’s quite delicate and often leads to insidious bugs that don’t crash immediately. At least one would know what code to suspect when crashes do happen.

In C# and Java, two languages that derive from C — we find full safe support of generics. Generic classes (the things that collections are made of) are interesting, and I’m sure most who have used either of these languages have already played with them and have found them useful. One of the things that don’t receive a healthy dose of spotlight is Generic Functions (“Generic Methods” if you like).

I’ll compare two segments of code, one in C# and one in Java that do exactly the same thing — demonstrate two trivial functions printArrayList() and getElement(). The function printArrayList() prints out the contents of an ArrayList (Java) or a List (C#). The function getElement() retrieves an element from a list. This shows how single generic functions can operate on collections, each with a different defined type without the need for unsafe casting. The only assumption the code makes is that each object in a list implements the toString() method (needed for the printing function).

Note naming convention: In Java, methods are just members of an object, so they are named in lowercase. In C#, methods are capitalized. We will refer to methods by the Java convention here to keep things consistent.

Setting Up in Main…

Let’s declare and fill a few lists for this demonstration. Three generic list objects, cow, dog and elephant are constructed in a for loop. Each gets ten elements. Each list contains objects of a particular type; cow contains integers, dog contains doubles and elephant contains strings.

Java Code C# Code
ArrayList<Integer> cow = new ArrayList();
ArrayList<Double> dog = new ArrayList();
ArrayList<String> elephant = new ArrayList();
List<int> cow = new List();
List<double> dog = new List();
List<string> elephant = new List();

Notice that Java does not autobox the type in the angel brackets so you can’t give it the primitives int and double. In C#, this is allowed plus string is also a primitive. Remember: In both Java and C#, primitives are emulated — they are first class objects that are only different from other objects in that they are pass-by-value rather than pass-by-reference.

Appending ten items to each list. Shown below is the Java version — in C#, change “add()” to “Add()”.

for(int i = 0; i < 10; i ++) {
    cow.add(3 * i);
    dog.add(0.25 * i);
    if(i % 2 == 0)
        elephant.add("Even");
    else
        elephant.add("Odd");
}

The below is the code we want to make work — We’ll call printArrayList() to print out all of the elements in each list, then we’ll call getElement() to return a specific element from each list. Notice that this is the Java version below — in C#, we capitalize method names and use Console.Writeline() instead of System.out.println().

System.out.println("== Generic List Printer ==");

printArrayList(cow);
printArrayList(dog);
printArrayList(elephant);

System.out.println();

System.out.println("== Generic Element Accessing ==");

int    cow_at_7      = getElement(cow, 7);
double dog_at_2      = getElement(dog, 2);
String elephant_at_4 = getElement(elephant, 4);

System.out.println("Cow at 7      = " + cow_at_7);
System.out.println("Dog at 2      = " + dog_at_2);
System.out.println("Elephant at 4 = " + elephant_at_4);

Note that in C#, we may use the keyword “var” instead of typing out the types for cow_at_7, dog_at_2, and elephant_at_4 — the compiler infers the type for us. This is different from unsafely casting with “Object”, as the compiler infers the narrowest possible type and substitutes in that correct type.

Onto the methods …

Below is the Java version of printArrayList().

static <A> void printArrayList(ArrayList<A> animalList) {
    for(A a : animalList)
        System.out.print(a + "\t");
    System.out.println();
}

Below is the C# version of PrintArrayList().

static void PrintArrayList<A>(List<A> animalList) {
    foreach(A a in animalList)
        Console.Write(a + "\t");
    Console.WriteLine();
}

Notice that printArrayList() is a method that specifies a generic type <A>, but only in its argument list. In Java, <A> appears before the function’s type and in C#, this appears after the function name. In this case, it’s obvious what we would do if we have functions that return specific types — we just substitute the type where the keyword “void” is. So what happens when we want to return the generic type? That’s what getElement() will demonstrate.

Below is the Java version of getElement().

static <A> A getElement(ArrayList<A> what, int which) {
    return what.get(which);

Below is the C# version of GetElement().

static A GetElement<A>(List<A> what, int which) {
    return what[which];

Yes, these are both trivial functions, as you could have easily called ArrayList.get() in Java and List[] in C# respectively — but it does the job in this demonstration. In the Java version, the generic type <A> is placed before the type of the function, A. Don’t let that confuse you, just recall how we specified the return type when it wasn’t the generic type. In C#, we place the generic type <A> after the function name just as before.

Below is the output you should expect if you run the main function.

C# Output

== Generic List Printer ==
0	3	6	9	12	15	18	21	24	27
0	0.25	0.5	0.75	1	1.25	1.5	1.75	2	2.25
Even	Odd	Even	Odd	Even	Odd	Even	Odd	Even	Odd	

== Generic Element Accessing ==
Cow at 7      = 21
Dog at 2      = 0.5
Elephant at 4 = Even

The Java output is the same, except the values that are doubles are always printed with a trailing “.0″ even if it is numerically equal to an integer.

Written by Eddie Ma

January 29th, 2010 at 11:41 am

Python Crash Course – Lesson 1

without comments

The first lecture of my Python Crash Course went really well! I ran it two evenings ago in the Dean’s Conference Room.

In gearing the very first lecture for absolute beginners, I had very little to cater to BIC (Bioinformatics Club) members. I however took the opportunity to discuss with them about the SOLVER group (more on that later); many of which seemed interested.

Overall there were roughly a dozen people that turned out, including Ariana, my TA partner from last term. There were about four iGEM members and six BIC members.

I also took the opportunity to poll for the kinds of things that students wanted to learn. Here are my findings.

  • Object Orientation is something everyone wants to know– especially the people coming in with a Javascript, PERL, C, C++ and Scheme background; I was surprised that the C++ people didn’t get exposure to thinking in objects earlier.
  • The beginners came in two groups. First, there are the ones who are happy to learn anything as long as it can be applied later.
  • The second group of beginners want to data crunch PDBs, SDFs, FASTAs, Nucleotides etc.

In week two, we’ll take care of object orientation and in week three, we’ll take care of everything anyone ever needs to know about input output in order to do data crunching. I have added a link in the navigation of this blog for the Python Crash Courseware which will eventually include all the PDFs, code modules and examples used in class.

Oh right, I don’t know if I’ll get around to it– but I am missing instructions for setting environment variables in Windows. Perhaps I will add it later when I have time.

(iGEM attendees were John Heil, Danielle Nash, Tiffany and Lina; BIC members included Fiona, James and about four others whose names I have forgotten.)

Edit: Direct link to Python Crash Courseware; Direct link to Week 1: A Mad Mad Introduction, PDF.

Written by Eddie Ma

January 9th, 2010 at 9:00 pm