Ed's Big Plans

Computing for Science and Awesome

Rather Hashing

with 2 comments

Andre and Brandon

Andre and Brandon

I’ve been away recently, so this task has apparently been uptaken by Jordan. The reverse-hashing integer-to-sequence function to be done for modeling was formalized a bit more in a meeting about two weeks ago, and I’ve yet to be briefed about what form it’s taken in yesterday’s meeting.

Update: Andre tells me that Jordan did in fact finish the sources, and everything ran overnight– however, the exponential explosion that we were afraid of ended up occurring; the present algorithm (part of which is described below) ran to the fourth iteration and well, would continue to run for a few weeks had it not been stopped… It looks like there’s ever more need for a working bottom-up approach…

Getting to absolute basics, this reverse hashing function converts positive integers to a sequence of arbitrary base strings– actually, I want to describe the general form of this function in case I need it in future. So if we know the size of the alphabet used for each character of a string, we can give each character a unique ordinal value. Two examples follow…

Brandon and Matthew

Brandon and Matthew


#Latin Alphabet (as used in English)
(A, 1), (B, 2), (C, 3), (D, 4) ... (Y, 25), (Z, 26), (null, 0)


#Nucleotides
(A, 1), (C, 2), (G, 3), (T, 4), (null, 0)

If we want to represent a string, we’d simply substitute the integer for the character.

#BADEGG
214577


#AGGCTT
133244

Bredth First Search Sphere

Bredth First Search Sphere

Bredth First Search Sphere [figure]… Andre used this chalkboard diagram to illustrate to Chong that this encoding conveniently increases in sequence length– we could consider this cardinality, or radius, or more importantly, the number of hops on a graph representation– when this problem is considered in lexicographic ordering, the implied algorithm is a bredth-first traversal.

We ran into a problem– this would at most represent one string. Brandon came up with a workaround, the notion of stealing characters from a sequence based on the modulus of its index! So, if we wanted to encode three strings, then every character at (index%3 = 0) belongs to string A. Strings B and C would then occupy (index%3 = 1) and (index%3 = 2) respectively.

Examples:

#Encode BADGE, FEED and AI together
#Pushes together as... BAA AEI DEx GDx Exx
#Where lowercase 'x' is null.
211159450740500

So that null symbols are inserted to preserve the correct location of each symbol.

3D 3Layer Cake

3D 3Layer Cake

3D 3Layer Cake [figure]… An alternative approach to the encoding is to produce a triplet of integers instead of interweaving them; to farm out the sequences to be operated on, one would keep the entire domains of any two of the three sequences, while distributing the third; actually– this can still be done with the present encoding scheme, just farm out a fraction of the sequences based only on certain indexes of each sequence (the indexes satisfying the modulus math statements above). Abstractly, we can think of this as a three-dimensional cake where one dimension is distributed in layers– a three-serial-CPU-farm would consume a three-layered cake.

There’s one other thing I haven’t quite discussed, and that’s sequence sensitivity– Some sequences are masked so that the location of a certain integer implies a different entity depending on their position in their string– I’ll leave that for another post though.

I’ll have to discuss the bottom-up approach we’re going to try next– or not try at all since the modeling team is running very short on time; doing things by hand would either require inspiration, a marvel of human savant-like adduction or incredible luck… Jordan apparently demoed the algorithm by hand at the meeting I missed and showed the normal human mind can’t cope with the growth of the problem either…

Jordan playing a first person shooter...

Jordan playing a first person shooter...

Written by Eddie Ma

August 10th, 2009 at 1:45 pm

2 Responses to 'Rather Hashing'

Subscribe to comments with RSS or TrackBack to 'Rather Hashing'.

  1. Whoa, hold up: we ran the thing already? I have apparently fallen out of the loop.

    I can’t say I’m really surprised that we ran into an exponential explosion. I had a feeling it would turn out this way when I tried to make a more useable UI and kept coming up with exploding test cases.

    It looks like the easy solution’s pretty much killed off at this point. Time to break out the fun stuff?

    Matt

    3 Sep 09 at 00:16

  2. Precisely — it’ll be a while though. In fact, keep your eyes open. If you come across some math that accommodates this problem well, figure out how we can use it.

    Eddie Ma

    15 Sep 09 at 08:30

Leave a Reply