Ed's Big Plans

Computing for Science and Awesome

Idea: Delaunay Simplex Graph Grammar

with 2 comments

The Structural Bioinformatics course I’m auditing comes with an independent project for graduate students. I’ve decided to see how feasible and meaningful it is to create a graph rewriting grammar for proteins that have been re-expressed as a Delauney Tessellation.

I was first introduced to the Delauney Tessellation about half a year ago. Such a tessellation is composed of irregular three dimensional tetrahedrons where each vertex corresponds to an amino acid. A hypothetical sphere that is defined by the four points of such a tetrahedron cannot be crossed by a line segment that does not belong to said tetrahedron.

An alphabet in formal languages is a finite set of arbitrarily irreducible tokens that composes the inputs of a language. In this project, I want to see if I can discover a grammar for the language of Delauney protein simplex graphs. Graph rewriting is likened to the collapse of neighbouring tetrahedrons. The tetrahedrons selected are either functionally important, stability important or have a strangely high probability of occurrence. This definition is recursively applied so that previously collapsed points are subject to further collapse in future passes of the algorithm.

When a subgraph is rewritten, two things happen. Some meaning is lost from the original representation of the protein, but that same meaning is captured on a stack of the changes made to the representation. In this way, the protein graph is iteratively simplified, while a stack that records the simplifications indicates all of the salient grammatical productions that have been used.

This stack is what my project is really after. Can a stack based on grammatical production rules for frequency of occurrence render any real information, or is it just noise? I can’t even create a solid angle to drive my hypothesis at this point. … “Yes … ?” …

I’ve seen a lot of weird machine learning algorithms in my line of work… and I attest that it’s hard for a novice to look at a description and decide whether or not it derives anything useful. Keep in mind that the literature is chuck full of things that DO work, and none of the things that didn’t make it. I conjecture that this representation has made me optimistically biased.

This method however IS feasible to deploy on short notice in the scope of an independent project ๐Ÿ˜€

Jordan Lapointe says...

This sounds really neat. I want to make sure I have your idea straight:

a) a Delaunay tessellation of a protein defines/is equivalent to a simplex graph

b) such simplex graphs are the tokens of your language

c) collapsing tetrahedra in the Delaunay tessellation/performing the equivalent operation on the simplex graph are your grammatical production rules

d) you generate a sequence of tessellations/graphs (each of which is, by definition, a token) by using your grammatical production rules
d.1) as a restriction on your production rules, you only allow collapsing of tetrahedra that meet certain criteria (functionality, stability, or frequency)

e) such a sequence forms a “word” in your language

f) you capture this word via the stack of operations performed on the simplex graph

g) Ultimately you want to be able to look at the stack (“word”) produced by your grammar and have it tell you something about the protein

Wow, this post turned out to be far longer than I meant it to be :).

Eddie Ma says...

That was very succinct. I probably should have written it in the form you gave me to start. I’m currently dissecting a few proteins using a tessellation script Dr. Burkowski put together– apparently, work has already been done compatible with Chimera which cuts short my work. The present dissection actually sees evaluation of two things: which combinations of contact surfaces have potential for reduction; and which tetrahedron labels (as four-tuples) occur most frequently as tokens. To be honest, it swings between more daunting and less daunting as I progress. I’ll update this post when I have some nice pictures ๐Ÿ˜€