Ed's Big Plans

Computing for Science and Awesome

Phew! Sun Virtualbox Port Forwarding (on NAT) Solution

without comments

I finally found a solution for the port forwarding issue I was having with Sun VirtualBox (i.e. NAT provided a weird one-way mapping from which Tin (host) couldn’t dial into TinUbuntu (guest)…) — as far as I understand, this solution is guest operating system neutral but must be configured for each guest handled by a particular installation of Sun VirtualBox. That means that if you’re crazy like me and plan to put your virtual harddrives on USB sticks, you’d probably do just as well to put in a makefile that will do your configurations for you too. So here’s the solution, also saved in my wiki for extra truthiness!

Thanks to Evan and his “justwerks software” blog.

The following commands were issued…

cd /Applications/VirtualBox.app/Contents/MacOS

$VBoxManage setextradata "TinUbuntu"\
     "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guesthttp/Protocol" TCP

$VBoxManage setextradata "TinUbuntu"\
     "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guesthttp/GuestPort" 22

$VBoxManage setextradata "TinUbuntu"\
     "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guesthttp/HostPort" 2222

Which causes TinUbuntu TCP:22 to be forwarded to Tin TCP:2222.

In future, I should remove these rules and re-add them replacing “guesthttp” with my own name for this rule– In his blog, Evan has used a series of names: “guesthttp”, “guestssh” and “guestsql”– In reality, these are actually very good, unambiguous names that I’ll probably end up borrowing.

The one thing I should really dig around for is the meaning of “LUN#0”– it looks like something that’s important– the remaining virtual-directory-like objects in the configuration string look less daunting– “pcnet/0” is likely referring to the first virtual network adaptor connected to TinUbuntu.

Eddie Ma

October 16th, 2009 at 10:05 am

Posted in Technology

Tagged with ,

Squishy TIM Barrel Subunits

featured post

without comments

Again with the TIM Barrel pictures! Here’s some text about it from my notes…

1a5m (A Urease) is a really interesting protein– it consists of three subunits. Each subunit consists of three unique domains: a very squashed TIM Barrel, an alpha-alpha-alpha-beta-beta domain and a beta-beta-alpha-beta-beta domain. I’m not yet sure what to call little broken alpha helices that have less than two complete turns. The TIM Barrel (though exceedingly asymmetrical) will still be accounted for in the data to be analyzed. The TIM Barrel (566 amino acids) is the alpha subunit of each symmetrical subunit. The remaining two domains are the alpha and beta subunit though PDB is not clear which is which: they each weigh in at 100 and 101 amino acids. 1a5m is part of several solved urease structures in the PDB– the collection: {1A5K, 1A5L, 1A5M, 1A5N, 1A5O} are solved by Pearson et al. (1998).

Matthew A. Pearson, Ruth A. Schaller, Linda Overbye Michel, P. Andrew Karplus, and, Robert P. Hausinger (1998). Chemical Rescue of Klebsiella aerogenes Urease Variants Lacking the Carbamylated-Lysine Nickel Ligand. Biochemistry. 37(17):6214-20.

Squishy squishy shapes– the giant pink object in the next picture is actually three such TIM Barrels, each of which belongs to one of the three subunits.
Each of the three subunits are shown separately below…

1a5m_1_3 1a5m_2_3 1a5m_3_3

Aesthetically pleasing– these images were captured from the JMol output available from RCSB PDB.

Eddie Ma

October 14th, 2009 at 9:49 am

T.A.ing — half of the semester done!

without comments

Wow, it’s been half the semester already. It’s been a good experience so far. This job has highlighted two key things I should work on though. First, it’s impossible to know everything that can be asked in a given tutorial, so I should figure out what the boundaries for the module are (i.e. what the current module does not approach), and know the most likely places the answers are in the textbook so I can refer to an official explanation when I’ve drawn a blank. Providing answers that go beyond the scope of a course is actually harmful since it affects students’ recall during examinations! Second, I need to reduce the amount of repetition I use to explain something. Specifically, saying the same sentence twice doesn’t help comprehension– clearly, there’s something incompatible about that sentence.

Overall though, I’m happy with how the tutorials are progressing and how our (my and Ariana’s) introductory slide shows go– The quizzes at the end of class have progressed into being a very smooth transaction too.

In about half an hour, I have to head down to watch them all write a midterm along with the other TAs.

Applications for Winter TA positions are due soon.

It might be nice to do a hands-on laboratory course next, but tutorials are certainly pleasant.

Eddie Ma

October 14th, 2009 at 9:21 am

The Return of Phi C31

without comments

I’ve been so out of the loop with iGEM over the last month. I’ll need to figure out how to get back into the swing of things, probably starting with the post mortem meeting on Tuesday. Generally, since no new maths could be put on the table that actually encompassed the problem well– the brute force approach was kicked into high gear with a few more filters to increase the probability of success.

Call these “System Filters” since they aren’t really based on biologically significant concepts, really just sanity checks that are conceptually consistent with the project (i.e. we’d run out of hard disk space otherwise…). Significantly, Matthew implemented “Blank Stare”, which destroys reactants that exceed a given length (thus preventing them from hogging the CPU looking for less parsimonious solutions). Less significant were Andre’s “Lone Gunman” which deletes arbitrary chromosomes with stochastic efficiency and my “Tag” which prevents chromosomes from cross reacting.

(On second thought, “Tag” IS a “Biological Filter” not a “System Filter” because it removes redundancy by implementing the rule that we only admit bacteria that have exactly one chromosome.)

I should mention that “significance” above isn’t about the triviality of the code, it’s about the amount of anticipated efficiency boon we’d gain from an item’s deployment.

Tomorrow’s post mortem will continue the work I’ve started on our iGEM 2009 Wiki Modelling page… We’ll decide what we want to mention, how close we got to our solution and figure out how to precisely characterize the problem space uncovered by our various attempts.

Additionally, we should probably discuss the relevance of John’s attN site cloning and tests to see if the operators show any sign of degeneracy, and which ones in particular.

Finally, I should mention that Brandon has been working on a C++ port of the whole application we wrote in Python to elucidate how much the virtual machine impacted the performance of our solver– the team is quite divided on this idea with a big half (myself included) thinking that the exponential growth due to the algorithm is the greater factor– Brandon may have some answers for us when it’s up and running.

TIM Barrels look like this…

without comments

It occurred to me that I didn’t actually post any images in the past few posts. Here are two TIM Barrels that I’ve arbitrarily picked from DATE. Note– the image files are horribly misnamed, so please use the text description underneath.

From left to right– these figures are 1A53 (Beta Strand C-Face), 1A53 (Beta Strand N-Face), 1BG4 (Beta Strand N-Face), 1BG4 (Beta Strand C-Face). 1A53 is called “indole-3-glycerol phosphate synthase” and 1BG4 is called “xylanase” isolated from Penicillium simplicissimum. I won’t go into the function of each of these enzymes, but they do illustrate what a general beta-alpha TIM barrel looks like. TIM Barrels comprise of eight beta-alpha secondary structure elements. Extra helices and sheets may occur but must flank the TIM Barrel-like portion of the protein domain (1A53 has a very prominent extra alpha helix close to the camera in the far left image). The “barrel” name derives from the twisted cylinder enclosed by the parallel beta sheets in the middle of the object. TIM barrels can be deformed quite a bit too if they’re a subunit part of a larger holoprotein.

The four-character designations (1A53, 1BG4) are RCSB (A Resource for Studying Biological Macromolecules) Protein Data Bank identifiers– I’ve found that SCOP frequently links into PDB while PFAM frequently links to UniProtKB (Knowledge Base) and utilizes UniprotKB identifiers. More on that later…

Eddie Ma

October 6th, 2009 at 10:52 am

Protein Databases and Parsability

without comments

Brief: Parsability is essential for fast machine assisted analysis of vast databases… So, I got lucky with SCOP since the entire protein hierarchy is offered exactly like that… The entries are even linked to ASTRAL pdb-like structure files. Something similar is given CATH, but I don’t comprehend it yet.

Aside: I haven’t really given enough credence to PFAM yet– I should spend a little time figuring out how useful it is. From what I understand, it doesn’t classify proteins by structure so it may be more useful in secondary and later analyses.

Aside: Hey look! A big giant page of alignment tools care of ExPASy. Goody. Reinventing the wheel as least often as possible is certainly a good modus.

Eddie Ma

September 24th, 2009 at 2:36 pm

Posted in Computational Biology

Tagged with