Ed's Big Plans

Computing for Science and Awesome

Java Classpaths are Evil

with 2 comments

(now also in from my wiki)

While working with Phylogenetic Analysis Library (PAL) for an alignment problem, I ran into the problem of having to specify classpaths to a jar file… it should have be straight forward enough…

Java classpaths are a pain.

Here are a few observations I’ve made about how to specify them in the command line.

  • an Item can either be a directory that contains .class and .java files OR
  • an Item can be a .jar file.
  • to specify more than one Item.jar in Unix, use:
javac -classpath .:Item1.jar:Item2.jar
  • note that you cannot put a space between the colons
  • note that you must include an extra Item ‘.’ to specify the current working directory
  • note that in Windows, you must use ‘;’ instead of ‘:’
  • note that after compiling with javac, the same -classpath and its arguments must then be applied with java

Nuisance? Yes! Necessary Evil? No!

In the compiled Java class, there certainly could have been some metadata implemented that is a copy of the last known classpath string… why is there a disparity between the symbols used in Unix and Windows? … Why aren’t spaces allowed? Why does one have to specify the current working directory?

Evil.

A side effect of not being able to put spaces in between the colons of several paths is that one can’t just put a backslash in to negate a newline– you would need to have the next path start at the very beginning of the next line which is just ugly.

Andre Masella says...

This is one of the many Java screw ups. It was thought that classes were the units of program, so there was never a mechanism to aggregate classes into libraries. JARs became the standard, but they have no metadata to express dependencies or versioning (classes do this individually). So, Java’s dynamic linker works at the class level, not the “module” level, hence, you have to feed it all the places classes might be. This was done much better in .NET where there are DLLs that work like normal libraries, expression module-level dependencies and the dynamic linker can deal with libraries and then with the classes in side them. There is a proposal for a Java module format like this, but I haven’t heard from it in a while.

Eddie Ma says...

Sure– that seems rational– I bet Sun would just glue on a module format so that it didn’t interfere with existing functionality. Honestly, why is it so hard to slap a sunset clause onto the Java 1, 2, 5, 6 specification, or offer temporary dual support during a phase out (e.g. Python 2.x, 3.x)?

Written by Eddie Ma

November 23rd, 2009 at 12:28 pm

Posted in Pure Programming

Tagged with , ,

1:1 String Comparison Tool

without comments

link (science.uwaterloo.ca) | link (eddiema.ca)

Watching one of my labmates painstakingly move two index fingers across two printed pages scanning letter by letter for a single point mutation in a nucleotide sequence motivated me to produce this very simple software. It’s 100% Javascript and runs client side.

It basically does what he did… scans two strings (contrast: not sequences) letter by letter, looking for single point mutations.

No alignments are done, and nothing more sophisticated. Just … single … point … mutations … only.

Licensing information: Do anything you want, it’s just a loop.

Written by Eddie Ma

November 10th, 2009 at 4:25 pm

Posted in Pure Programming

Tagged with ,

Back from Conference!

without comments

Brief: The BIBM09 conference was the very first conference I have ever attended. I learned a lot from the various speakers and poster sessions–

I thought it was really interesting how the trend is to now study and manipulate large interaction pathways in silico– a theme of which is the utilization of many different data sources integrating chemical, drug and free text as well as the connection of physical protein interaction pathways and gene expression pathways. There was even a project which dealt with the alignment of pathway graphs (topology).

Dealing with pathways especially by hand and in the form of a picture is probably the bane of many biologists’ existence– I think that the solutions we’ll see in the next few years will turn this task into simple data-in-data-out software components, much like the kind we have to deal with sequence alignments.

And now, back to the real world!

Addendum: My talk went very well :)

And here are my slides with a preview below.

Written by Eddie Ma

November 6th, 2009 at 10:36 am

IEEE BIBM09 Conference – Day 1

with 4 comments

Warning: This post is highly non-sequitur– it’s late at night and my brain is full of science.

Wow! I’m learning a lot and having a blast– I’ve met a lot of people and have managed to fill the entire day with workshops and tutorials. The venue is convenient, being right in a hotel (Hyatt).

To be more precise, the conference isn’t really in Washington. It’s in Bethesda.

My talk hasn’t come up yet, but my slides are 100% done– I just have to review them until I start to twitch uncontrollably. Today was the first day filled with talks, so I’m feeling a bit light headed and needing of sleep to digest away all of the information.

One key thing I want to take home to my current project is the use of tetrahedral tessellations called simplices to express protein topology and structure– there’s something very useful about that when searching for regularities and especially when dealing with stability.

Aside from going to workshops, each attendant is asked to fill in a ballot to indicate which of the posters on display are the best of show. We get to nominate three– and I’ve chosen mine for today. For me, it was an easy choice to pick the top two while the last nominee was originally tied. An additional poster show will start on Nov. 3rd.

Matt says...

Hey, a new looking blog — very nice. How did you have time to do that before/at a conference?

Anyway, have a great time in Bethesda (and learn some science, of course).

Eddie Ma says...

I’m just using a theme! That’s no secret– look at the copyright text at the bottom of the blog :P — I seriously recommend WordPress for the number of themes and plug-ins… It means I get to develop content, which is what I wanted to do :D

Matt says...

Oh right, I keep forgetting that there are some things that can actually be done without reinventing the wheel.

Eddie Ma says...

Heh… Perhaps Recombinatron v2 will fall under that category.

Written by Eddie Ma

November 2nd, 2009 at 12:52 am

Protein Project Progress…

without comments

Last week, Liz, Aron, Andrew, Brendan and I sat down to discuss the beta-trefoil project. It was a good chance for me to understand the methods used and the kinds of results we are interested in for my own TIM Barrel project.

Continuing on with the structural repeat problem, I’ll today be writing a short FSA parser that can handle DSSP or DSS output– simply, a very primitive machine will be used to imitate a human’s visual inspection of repeated secondary structural elements in given proteins. This is in line with the work I did manually staring at structures to get a grasp of how to look at protein models, and also in line with the objective to automate much of this work. Prior to that step, I reduced the probability of doing redundant work by using BLASTCLUST and selecting only a few known structures in each cluster to inspect… a sequence based alignment for each cluster will inform me of where my manually detected repeat boundaries map to the remaining sequences.

Oddity: If you BLASTCLUST all the “FULL” (not “SEED”) sets of TIM Barrel sequences for the entire fold from PFAM along with the sequences of known TIM Barrel fold structures of SCOP, you’ll find that cluster fifteen (as of today) has these elements:

1YBE_A A6U5X9 A6WV52 Q2KDT0 Q2YNV6 Q6G0X7 Q6G5H6 Q8UIS9 Q8YEP2 Q92S49 Q98D24 A1UUA2

In the above listing, 1YBE:A (PDB code) is the sole known PDB structure, while the remainder are putative TIM Barrels (uniprot codes) as determined by the HMM model from PFAM.

The enzyme 1YBE looks like this…1YBE_A

It’s an oddity because of the number of alpha-helices inserted within what is usually a hydrophobic beta barrel– the red pieces of ribbon should form a hollow cylinder, but it’s split apart for 1YBE and accommodates a bunch of cyan helices. Labeled in white are helices that break with the beta-alpha repeating secondary structural element (SSE) pattern by occurring before the first repeat. Labeled in green are breaks between beta-alpha SSE patterns.

Reference

Seetharaman, J., Swaminathan, S., Crystal Structure of a Nicotinate phosphoribosyltransferase [To be Published]

Written by Eddie Ma

October 28th, 2009 at 9:35 am

Our Final iGem Wiki (2009)

without comments

Link: University of Waterloo iGem 2009 Team Page

The team’s off to the 2009 iGem conference in Boston soon (not me though due to prior engagements). Our wiki was finally finished about a week ago– the wiki is a required objective that iGem teams must produce.

As luck would have it — through all the hustle and bustle of daily busy, I still managed to contribute to the original draft and to the final summary for the poster version of our presentation.

Our wiki pages this year are something to be proud of. It is a stark improvement over last year’s work and the team effort shows through in the varied though unified writing styles and diagrams. The modeling team worked especially well– I’m impressed with the amount of team work just looking through the discussions we had on the Google Group.

There’s a presentation practice today– I wish I could go, but the construction in and around the university is becoming prohibitive… The construction should stop some time around the first snowfall.

Future: Andre has some plans for the modeling team in iGem for the coming year… and I have some interesting ideas for what this team can do in the long run.

Written by Eddie Ma

October 28th, 2009 at 9:04 am

Posted in Academic Life