Ed's Big Plans

Computing for Science and Awesome

Archive for September, 2010

Virus capsids are pretty

without comments

Brief: The majority of virus protein coats (capsids) are in the shape of an icosahedron — a figure with twenty equilateral triangles. The first time I saw this rendered was in a paper by David S. Goodsell. In it, Goodsell describes proteins with structural symmetries. Four viruses are used as examples — they are tobacco necrosis virus (2BUK), tomato bushy stunt virus (2TBV), bluetongue virus (3IYK) and simian virus 40 (1SVA) linked here to their RSCB PDB entries.

Pretty, aren’t they? Very pretty.

If you follow the PDB links, you can take a look at how a single tessellation unit appears, how long a chain is and how massive the capsid is.

Notice: The icosahedron (twenty equilateral triangles) must not be confused with the dodecahedron (twenty points).

Eddie Ma

September 27th, 2010 at 1:00 pm

C# & Bioinformatics: Indexers & Substitution Matrices

without comments

I’ve recently come to appreciate the convenience of C# indexers. What indexers allow you to do is to subscript an object using the familiar bracket notation. I’ve used them for substitution matrices as part of my phylogeny project. Indexers are essentially syntactical sugar that obey the rules of method overloading. I first describe what I think are useful substitution matrix indexers and then a bare bones substitution matrix class (you could use your own). The indexer notation implementation is discussed last, so feel free to skip the preamble if you’re able to deduce what I’m doing.

Note: I’ve only discussed accessors (getters) and not mutators (setters) today.

Some Reasonable Substitution Matrix Indexers

This is the notation you might expect from the indexer notation in C#.

// Let there be a class called SubMatrix which contains data from BLOSUM62.

var  sm = new SubMatrix( ... );
     // I'll assume you already have some constructors.

int  index_of_proline = sm['P'];
     // Returns the row or column that corresponds to proline, 14.

char token_at_three = sm[3];
     // Returns the amino acid at position three, aspartate.

int  score_proline_to_aspartate = sm['P', 'D'];
     // Returns the score for a mutation from proline to aspartate, -1.

int  score_aspartate_to_proline = sm[3, 14];
     // Returns the score for a mutation from aspartate to proline, -1.

An Example Bare Bones Substitution Matrix Class

Let’s say you’ve loaded up the BLOSUM62 and are representing it internally in some 2D array…

// We've keying the rows and columns in the order given by BLOSUM62:
// ARNDCQEGHILKMFPSTWYVBZX* (24 rows, columns)

int[,] imatrix;

For convenience, let’s say you’ll also keep a dictionary to map at which array position one finds each amino acid…

// Keys = amino acid letters, Values = row or column index

Dictionary<char, int> indexOfAA;

Finally, we’ll put these two elements into a class and assume that you’ve already written your own constructors that will take care of the above two items — either from accepting arrays and strings as arguments or by reading from raw files. If this isn’t true and you need more help, feel free to leave a comment and I’ll extend this bare bones example.

// Bare bones class ...
public partial class SubMatrix {

    // Properties ...
    private int[,] imatrix;
    private Dictionary<char, int> indexOfAA;

    // Automatic Properties ...
    public int Width { // Returns number of rows of the matrix.
        get {
            return imatrix.GetLength(0);
        }
    }

    // Constructors ...
    ...
}

I’ve added the automatic property “Width” above — automatic properties are C# members that provide encapsulation: a public face to some arbitrary backend data — you’ve been using these all along when you’ve called “List.Count” or “Array.Length“.

Substitution Matrix Indexer Implementation

You can implement the example substitution matrix indexers as follows. Notice the use of the “this” keyword and square[] brackets[] to specify[] arguments[].

//And finally ...
public partial class SubMatrix {

    // Indexers ...
    // Give me the row or column index where I can find the token "aa".
    public int this[char aa] {
        get {
            return this.indexOfAA[aa];
        }
    }
    // Give me the amino acid at the row or column index "index".
    public char this[int index] {
        get {
            return this.key[index];
        }
    }
    // Give me the score for mutating token "row" to token "column".
    public double this[char row, char column] {
        get {
            return this.imatrix[this.indexOfAA[row], this.indexOfAA[column]];
        }
    }
    // Give me the score for mutating token at index "row" to index "column".
    public double this[int row, int column] {
        get {
            return this.imatrix[row, column];
        }
    }
}

Similar constructs are available in Python and Ruby but not Java. I’ll likely cover those later as well as how to set values too.

Eddie Ma

September 1st, 2010 at 11:15 am