Ed's Big Plans

Computing for Science and Awesome

Archive for the ‘linkedin’ tag

Add an arbitrary Mediawiki search box anywhere!

without comments

Update: A better solution that doesn’t require a separate “search-redirect.php” file has been posted here.

I’ve been looking for this solution for a long time now and I’m happy to have finally found it. Following hints from Dave Taylor and Peter De Decker, I’ve glued together a solution that doesn’t take too much effort and doesn’t require any additional hacking around in SQL.

The objective was to add a search box on the right-column navigation of this blog that would search my wiki notebook. It wasn’t until I stumbled on the above two blogs that I realized I can just mangle URLs to conduct a search on mediawikis! The specific URL used to search my wiki looks a little like this…

https://eddiema.ca/wiki/index.php?title=Special%3ASearch&search=searchterms&go=Go

It might be a bit different for your installation depending on the version that you installed and a few of your settings– to find out what it looks like, search for something and copy down the URL in the address bar.

The file “search-redirect.php” used by Wikipedia takes in your search terms, and mangles those terms into a URL conforming to the above example. It then redirects you to that constructed URL. You can find this used on the main page of Wikipedia as noted by Dave.

My search-redirect.php based on Peter’s work above is two lines long, and looks like this:

<php?
$redirect_url =
     "https://eddiema.ca/wiki/index.php?title=Special%3ASearch&search=".
     $_GET['search'].
     "&go=Go";
@header( "Location: ".$redirect_url );
?>

You probably want to place your own search-redirect.php in the root directory of your wiki installation– however, it looks to me like it doesn’t really matter since the whole URL is rewritten anyway. I bet you can put this file anywhere on the net that supports PHP. The final thing that’s missing is the search box– anything that uses this formula will work:

<form action="https://eddiema.ca/wiki/search-redirect.php" method="get">
Search Ed's Wiki:
<input type="text" name="search" />
<input type="submit" value="Search" />
</form>

Putting this code on a page with your own wiki URL as a stem instead of mine will allow you to search your own wiki from any other page.

Eddie Ma

June 3rd, 2010 at 9:54 am

The Computational Bio & Chem Group (SOLVER Revisited)

without comments

A Revisitation

In the beginnings of the Winter semester, I had an idea to start up a undergraduate/graduate student group that would provide a scaffold for faster, computer assisted research.

The semester was simply too full for me to try and get it running at that time. I’m tempted to do some work on it later this semester, after I’ve gotten more of my thesis done.

The basic idea is as follows. Faculty and graduate students in the biology and chemistry departments often have the need to analyze data with some elegant computing. Whether this be as commonplace as hacking together an excel sheet to do work or learning some existing toolbox, or something slightly more in depth such as analysis in R, Python or PERL. Unfortunately, these research problems often fall by the wayside as the time commitment to learn a new software package or programming language is not trivial without a stronger computing background. Undergraduate students who are raised in the computer science environment, particularly with a bioinformatics interest have some knowledge of the research problem semantics as well as some knowledge of how to do the above analysis by using and coding software.

The “SOLVER” group would fill the gap by performing a matchmaking and coaching service. In my vision, SOLVER creates working teams of three to four– (1) one or two graduate students or faculty with a research problem, (2) one or two undergraduate students with some knowledge in bio / chem and some talent in computing, (3) an experienced coach that can recommend best practices so that the team has a good shot of solve the problem in a reasonable amount of time. In the end, the researcher gets help and a good chance at a working solution– they might even learn some programming; educational / professional / social connections are made; and the student gets an item to add to their resume. In reality, a particular research step would be executed as week-long blocks– whether this means one block or four blocks (one month) depends on the complexity of the task.

To do…

One stretch of work that I have to do is to determine the needs of the department. I wanted to do this in the middle of Winter semester but didn’t find it a priority. For this group to work, there must be research problems. Similarly, I need to determine the capabilities of potential undergraduate students. I’m learning quickly that the key here is to start small and think big (i.e. start with one group, then two, then learn about the administrative logistics, then deploy some progress tracking mechanism, then four groups and onward).

Retuning the image

The almost idealized description above suffers from a few logistical issues. First, I will now address the issue of publication credits. Because researchers must attribute tangible work (including written code and analyses), some graduate students may be hesitant to participate in the program. I don’t know if this will actually become a barrier however as a “tangible contributor” would never be the first nor the last name on a paper (this is true in all CS, Bio and Chem). Furthermore, a paper with more than one author demonstrates the ability to collaborate in a team, and fostering the experience of another student is part of our culture (e.g. co-op students etc.). That brings me to a second major issue. One of the frames that this group could find itself in is a means to circumvent or short-circuit the co-op student appointment process– a frame that I readily reject. In fact, I should hope that this group becomes a means to introduce new putative co-op students to their future advisers which may otherwise be overlooked for their differing background. Finally, there is the problem of attrition, whereby a group dwindles in size as members drop out. The only contract-based perks or penalties I can think of is really ties to the group itself (e.g. unsatisfactory work naturally results in a time out or withdrawal from the program). It is really the only tangible leverage we can offer at the outset– unsteady leverage at best (difficult to assume a reputation when none has yet been built).

Going forward

More research has to be done in terms of polling and documenting the needs of the department. Furthermore, a deeper understanding about the kinds of students we’d attract and want to appoint is needed– this allows us to understand what time commitment is reasonable (both the lower and upper limits need definition). Finally, the group must from the outset be understood as something beneficial to all parties involved. The solutions named above must be deployed at launch time to ensure minimal friction, and maximum return. First steps were made last year by introducing this idea at the BIC-iGEM meeting– BIC students are excited with this idea, wherein an entire room of a dozen students raised their hands. Furthermore, there are no other groups on campus that attempt this activity, so that SOLVER would provide a unique non-conflicted service.

Of course– this all depends on the amount of work I get done on my own thesis in the first half of the semester.

Making friends

As an aside Isabelle Lam, a student I TA’d last semester in Biol-241 (microbio) has been planning on starting a job / volunteer / coop mine for science students. I should go and bother her and see how far she’s gotten. Her project is called “SPORE”. Last I checked, her team was registering a subdomain with the university.

EDIT: I previously confused Isabelle with Lisa– this has been corrected.

Eddie Ma

April 20th, 2010 at 10:08 am

Brief Hints: C# Nullable Types and Arrays, Special Double Values

with 2 comments

Brief Hints: I wanted to show you three things in C# that I’ve been using a lot lately.

Nullable Types are a convenient language construction in C# that allows one to assign a primitive type with null…

double? someval = null;
// declares a nullable double called 'someval'.

The question mark suffixing the keyword double makes the variable someval nullable. This was originally designed so that one can retrieve values from LINQ to SQL without checking for nulls (SQL inherently makes this distinction). This could be thought of as yet another construct to make autoboxing primitives more intuitive and more entrenched in the language.

I use nullable types when I need a special ‘unassigned’ value for “find the greatest” or “find the least” kinds of loops.

When we apply Nullable Types to Arrays, we get an array of nullable primitives (arrays are already nullable, being first class objects).

double?[] somevals = new double?[10];
// declares an array called 'somevals' of ten nullable doubles.

An array of primitives is initialized with all values 0.0; whereas an array of primitve?s is initialized with nulls.

Special Double Values are also something that I’ve started using a lot. There are many algorithms I’ve been coming across that use “magic numbers” corresponding to arbitrarily high and arbitrarily low numbers. Instead of using evil magical quantities, I’ve been using Double.PositiveInfinity and Double.NegativeInfinity. C# makes it easy to assign three more special quantities: Double.NaN, Double.MinValue and Double.MaxValue.

Edit: I forgot to mention why I kept italicizing the word primitive. C# doesn’t really have primitives that are exposed to the developer– everything actually IS an object, and the illusion of boxing or not isn’t really relevant. This just makes nullable types all the more logical.

Eddie Ma

March 16th, 2010 at 12:01 pm

Idea: Delaunay Simplex Graph Grammar

with 2 comments

The Structural Bioinformatics course I’m auditing comes with an independent project for graduate students. I’ve decided to see how feasible and meaningful it is to create a graph rewriting grammar for proteins that have been re-expressed as a Delauney Tessellation.

I was first introduced to the Delauney Tessellation about half a year ago. Such a tessellation is composed of irregular three dimensional tetrahedrons where each vertex corresponds to an amino acid. A hypothetical sphere that is defined by the four points of such a tetrahedron cannot be crossed by a line segment that does not belong to said tetrahedron.

An alphabet in formal languages is a finite set of arbitrarily irreducible tokens that composes the inputs of a language. In this project, I want to see if I can discover a grammar for the language of Delauney protein simplex graphs. Graph rewriting is likened to the collapse of neighbouring tetrahedrons. The tetrahedrons selected are either functionally important, stability important or have a strangely high probability of occurrence. This definition is recursively applied so that previously collapsed points are subject to further collapse in future passes of the algorithm.

When a subgraph is rewritten, two things happen. Some meaning is lost from the original representation of the protein, but that same meaning is captured on a stack of the changes made to the representation. In this way, the protein graph is iteratively simplified, while a stack that records the simplifications indicates all of the salient grammatical productions that have been used.

This stack is what my project is really after. Can a stack based on grammatical production rules for frequency of occurrence render any real information, or is it just noise? I can’t even create a solid angle to drive my hypothesis at this point. … “Yes … ?” …

I’ve seen a lot of weird machine learning algorithms in my line of work… and I attest that it’s hard for a novice to look at a description and decide whether or not it derives anything useful. Keep in mind that the literature is chuck full of things that DO work, and none of the things that didn’t make it. I conjecture that this representation has made me optimistically biased.

This method however IS feasible to deploy on short notice in the scope of an independent project 😀

Generic Functions in C# and Java

without comments

>>> Attached: ( Main.java — in Java | Main.cs — in C# ) <<<

Updated: (1) Made code more readable. (2) Removed unnecessary package (Java) and namespace (C#) and added a function that returns a generic type as well. (3) Attached compilable demo source code in separate files.

The most fun and productive concept in object oriented programming is generics — for me anyway. In C, one could deploy generics hazardously with code that casts the contents of memory addresses with a putative struct. The first field gives away what that chunk of memory is supposed to be at run time (usually, it’s a typdef int or an enum). I still do that when it’s called for, but it’s quite delicate and often leads to insidious bugs that don’t crash immediately. At least one would know what code to suspect when crashes do happen.

In C# and Java, two languages that derive from C — we find full safe support of generics. Generic classes (the things that collections are made of) are interesting, and I’m sure most who have used either of these languages have already played with them and have found them useful. One of the things that don’t receive a healthy dose of spotlight is Generic Functions (“Generic Methods” if you like).

I’ll compare two segments of code, one in C# and one in Java that do exactly the same thing — demonstrate two trivial functions printArrayList() and getElement(). The function printArrayList() prints out the contents of an ArrayList (Java) or a List (C#). The function getElement() retrieves an element from a list. This shows how single generic functions can operate on collections, each with a different defined type without the need for unsafe casting. The only assumption the code makes is that each object in a list implements the toString() method (needed for the printing function).

Note naming convention: In Java, methods are just members of an object, so they are named in lowercase. In C#, methods are capitalized. We will refer to methods by the Java convention here to keep things consistent.

Setting Up in Main…

Let’s declare and fill a few lists for this demonstration. Three generic list objects, cow, dog and elephant are constructed in a for loop. Each gets ten elements. Each list contains objects of a particular type; cow contains integers, dog contains doubles and elephant contains strings.

Java Code C# Code
ArrayList<Integer> cow = new ArrayList();
ArrayList<Double> dog = new ArrayList();
ArrayList<String> elephant = new ArrayList();
List<int> cow = new List();
List<double> dog = new List();
List<string> elephant = new List();

Notice that Java does not autobox the type in the angel brackets so you can’t give it the primitives int and double. In C#, this is allowed plus string is also a primitive. Remember: In both Java and C#, primitives are emulated — they are first class objects that are only different from other objects in that they are pass-by-value rather than pass-by-reference.

Appending ten items to each list. Shown below is the Java version — in C#, change “add()” to “Add()”.

for(int i = 0; i < 10; i ++) {
    cow.add(3 * i);
    dog.add(0.25 * i);
    if(i % 2 == 0)
        elephant.add("Even");
    else
        elephant.add("Odd");
}

The below is the code we want to make work — We’ll call printArrayList() to print out all of the elements in each list, then we’ll call getElement() to return a specific element from each list. Notice that this is the Java version below — in C#, we capitalize method names and use Console.Writeline() instead of System.out.println().

System.out.println("== Generic List Printer ==");

printArrayList(cow);
printArrayList(dog);
printArrayList(elephant);

System.out.println();

System.out.println("== Generic Element Accessing ==");

int    cow_at_7      = getElement(cow, 7);
double dog_at_2      = getElement(dog, 2);
String elephant_at_4 = getElement(elephant, 4);

System.out.println("Cow at 7      = " + cow_at_7);
System.out.println("Dog at 2      = " + dog_at_2);
System.out.println("Elephant at 4 = " + elephant_at_4);

Note that in C#, we may use the keyword “var” instead of typing out the types for cow_at_7, dog_at_2, and elephant_at_4 — the compiler infers the type for us. This is different from unsafely casting with “Object”, as the compiler infers the narrowest possible type and substitutes in that correct type.

Onto the methods …

Below is the Java version of printArrayList().

static <A> void printArrayList(ArrayList<A> animalList) {
    for(A a : animalList)
        System.out.print(a + "\t");
    System.out.println();
}

Below is the C# version of PrintArrayList().

static void PrintArrayList<A>(List<A> animalList) {
    foreach(A a in animalList)
        Console.Write(a + "\t");
    Console.WriteLine();
}

Notice that printArrayList() is a method that specifies a generic type <A>, but only in its argument list. In Java, <A> appears before the function’s type and in C#, this appears after the function name. In this case, it’s obvious what we would do if we have functions that return specific types — we just substitute the type where the keyword “void” is. So what happens when we want to return the generic type? That’s what getElement() will demonstrate.

Below is the Java version of getElement().

static <A> A getElement(ArrayList<A> what, int which) {
    return what.get(which);

Below is the C# version of GetElement().

static A GetElement<A>(List<A> what, int which) {
    return what[which];

Yes, these are both trivial functions, as you could have easily called ArrayList.get() in Java and List[] in C# respectively — but it does the job in this demonstration. In the Java version, the generic type <A> is placed before the type of the function, A. Don’t let that confuse you, just recall how we specified the return type when it wasn’t the generic type. In C#, we place the generic type <A> after the function name just as before.

Below is the output you should expect if you run the main function.

C# Output

== Generic List Printer ==
0	3	6	9	12	15	18	21	24	27
0	0.25	0.5	0.75	1	1.25	1.5	1.75	2	2.25
Even	Odd	Even	Odd	Even	Odd	Even	Odd	Even	Odd	

== Generic Element Accessing ==
Cow at 7      = 21
Dog at 2      = 0.5
Elephant at 4 = Even

The Java output is the same, except the values that are doubles are always printed with a trailing “.0” even if it is numerically equal to an integer.

Eddie Ma

January 29th, 2010 at 11:41 am