Ed's Big Plans

C & Bioinformatics: ASCII Nucleotide Comparison Circuit

Here’s a function I developed for Andre about a week ago. The C function takes two arguments. Both arguments are C characters. The first argument corresponds to a degenerate nucleotide as in the below table. The second argument corresponds to a non-degenerate nucleotide {‘A’, ‘C’, ‘G’, ‘T’} or any nucleotide ‘N’. The function returns zero if the logical intersection between the two arguments is zero. The function returns a one-hot binary encoding for the logical intersection if it exists so that {‘A’ = 1, ‘C’ = 2, ‘G’ = 4, ‘T’ = 8} and {‘N’ = 15}. All of this is done treating the lower 5-bits of the ASCII encoding for each character as wires of a circuit.

 Character ASCII (Low 5-Bits) Represents One Hot Equals A 00001 A 0001 1 B 00010 CGT 1110 14 C 00011 C 0010 2 D 00100 AGT 1101 13 G 00111 G 0100 4 H 01000 ACT 1011 11 K 01011 GT 1100 12 M 01101 AC 0011 3 N 01110 ACGT 1111 15 R 10010 AG 0101 5 S 10011 GC 0110 6 T 10100 T 1000 8 V 10110 ACG 0111 7 W 10111 AT 1001 9 Y 11001 CT 1010 10

The premise is that removing all of the logical branching and using only binary operators would make things a bit faster — I’m actually not sure if the following solution is faster because there are twelve variables local to the function scope — we can be assured that at least half of these variables will be stored outside of cache and will have to live in memory. We’d get a moderate speed boost if at all.

```/*
f():
Bitwise comparison circuit that treats nucleotide and degenerate
nucleotide ascii characters as mathematical sets.
The operation performed is set i intersect j.
arg char i:
Primer -- accepts ascii |ABCDG HKMNR STVWY| = 15.
arg char j:
Sequence -- accepts ascii |ACGTN| = 5.
return char (k = 0):
false -- i intersect j is empty.
return char (k > 0):
1 -- the intersection is 'A'
2 -- the intersection is 'C'
4 -- the intersection is 'G'
8 -- the intersection is 'T'
15 -- the intersection is 'N'
return char (undefined value):
? -- if any other characters are placed in i or j.
*/
char f(char i, char j) {

// break apart Primer into bits ...
char p = (i >> 4) &1;
char q = (i >> 3) &1;
char r = (i >> 2) &1;
char s = (i >> 1) &1;
char t =  i       &1;

// break apart Sequence into bits ...
char a = (j >> 4) &1;
char b = (j >> 3) &1;
char c = (j >> 2) &1;
char d = (j >> 1) &1;
char e =  j       &1;

return

( // == PRIMER CIRCUIT ==
( // -- A --
((p|q|r|s  )^1) &         t |
((p|q|  s|t)^1) &     r     |
((p|  r|s|t)^1) &   q       |
((p|    s  )^1) &   q&r&  t |
((p|      t)^1) &   q&r&s   |
((  q|    t)^1) & p&s       |
((  q      )^1) & p&r&s
)
|
( // -- C --
((p|q|r    )^1) &       s   |
((p|  r|s|t)^1) &   q       |
((p|    s  )^1) &   q&r&  t |
((p|      t)^1) &   q&r&s   |
((  q|r    )^1) &       s&t |
((  q|    t)^1) & p  &r&s   |
((    r|s  )^1) & p&q&    t
) << 1
|
( // -- G --
((  q|r|  t)^1) &       s   |
((p|q|  s|t)^1) &     r     |
((p|q      )^1) &     r&s&t |
((p|  r    )^1) &   q&  s&t |
((p|      t)^1) &   q&r&s   |
((  q|r    )^1) & p&    s   |
((  q|    t)^1) & p&    s
) << 2
|
( // -- T --
((p|q|r|  t)^1) &       s   |
((  q|  s|t)^1) &     r     |
((p|  r|s|t)^1) &   q       |
((p|  r    )^1) &   q&  s&t |
((p|      t)^1) &   q&r&s   |
((  q      )^1) & p&  r&s&t |
((    r|s  )^1) & p&q&    t
) << 3
)
&
( // == SEQUENCE CIRCUIT ==
( // -- A --
((a|b|c|d  )^1) &         e |
((a|      e)^1) &   b&c&d
)
|
( // -- C --
((a|b|c    )^1) &       d&e |
((a|      e)^1) &   b&c&d
) << 1
|
( // -- G --
((a|b      )^1) &     c&d&e |
((a|      e)^1) &   b&c&d
) << 2
|
( // -- T --
((a|      e)^1) &   b&c&d   |
((  b|  d|e)^1) & a&  c
) << 3
);
}
```

Andre’s eventual solution was to use a look-up table which very likely proves faster in practice. At the very least, this was a nice refresher and practical example for circuit logic using four sets of minterms (one for each one-hot output wire).

Should you need this logic to build a fast physical circuit or have some magical architecture with a dozen registers (accessible to the compiler), be my guest 😀

Eddie Ma

February 27th, 2011 at 11:49 pm

The Right Tool (Why I chose Java to do RSA)

Brief: I learned something valuable last week when working on this RSA encryption/decryption assignment for my Computer Security class. It’s important to be versatile when doing computer science — we must ensure we always use the most efficient tool. If we aren’t versatile, we risk taking tremendous amounts of time trying to reimplement something that already exists elsewhere.

So, what tool is the right tool to quickly throw together RSA encryption?

It turns out that Java does an excellent job. Its BigInteger class has all the ingredients you’ll ever need.

```// This function generates a new probable prime ...
BigInteger p = BigInteger.probablePrime(bits, random);

// This function performs modulus inverse ...
BigInteger d = e.modInverse(phi_n);

// These functions can be used to check your work ...
BigInteger one = d.multiply(e).mod(phi_n);
BigInteger one = p.gcd(q);

// This function is at the heart of RSA ...
BigInteger C = M.modPow(d, n);
```

Before I looked at the Java documentation, I had plans to do this with Python and some of my classmates had plans with MATLAB. It’s not that these are inherently bad technologies — they’re just not the right tool.

As much as I have gripes with Java, I’ve got to say that this made me and a lot of my class very happy 😀

Eddie Ma

February 24th, 2011 at 6:26 pm

Posted in Pure Programming

Tagged with , ,

C & Math: Sieve of Eratosthenes with Wheel Factorization

In the first assignment of Computer Security, we were to implement The Sieve of Eratosthenes. The instructor gives a student the failing grade of 6/13 for a naive implementation, and as we increase the efficiency of the sieve, we get more marks. There are the three standard optimizations: (1) for the current prime being considered, start the indexer at the square of the current prime; (2) consider only even numbers; (3) end crossing out numbers at the square root of the last value of the sieve.

Since the assignment has been handed in, I’ve decided to post my solution here as I haven’t seen C code on the web which implements wheel factorization.

We can think of wheel factorization as an extension to skipping all even numbers. Since we know that all even numbers are multiples of two, we can just skip them all and save half the work. By the same token, if we know a pattern of repeating multiples corresponding to the first few primes, then we can skip all of those guaranteed multiples and save some work.

The wheel I implement skips all multiples of 2, 3 and 5. In Melissa O’Neill’s The Genuine Sieve of Erastothenes, an implementation of the sieve with a priority queue optimization is shown in Haskell while wheel factorization with the primes 2, 3, 5 and 7 is discussed. The implementation of that wheel (and other wheels) is left as an exercise for her readers 😛

But first, let’s take a look at the savings of implementing this wheel. Consider the block of numbers in modulo thirty below corresponding to the wheel for primes 2, 3 and 5 …

 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Only the highlighted numbers need to be checked to be crossed out during sieving since the remaining values are guaranteed to be multiples of 2, 3 or 5. This pattern repeats every thirty numbers which is why I say that it is in modulo thirty. We hence skip 22/30 of all cells by using the wheel of thirty — a savings of 73%. If we implemented the wheel O’Neill mentioned, we would skip 77% of cells using a wheel of 210 (for primes 2, 3, 5 and 7).

(Note that the highlighted numbers in the above block also correspond to the multiplicative identity one and numbers which are coprime to 30.)

Below is the final code that I used.

```#include <stdlib.h>
#include <stdio.h>
#include <math.h>

const unsigned int SIEVE = 15319000;
const unsigned int PRIME = 990000;

int main(void) {
unsigned char* sieve = calloc(SIEVE + 30, 1); // +30 gives us incr padding
unsigned int thisprime = 7;
unsigned int iprime = 4;

unsigned int sieveroot = (int)sqrt(SIEVE) +1;

// Update: don't need to zero the sieve - using calloc() not malloc()

sieve[7] = 1;

for(; iprime < PRIME; iprime ++) {
// ENHANCEMENT 3: only cross off until square root of |seive|.
if(thisprime < sieveroot) {
// ENHANCEMENT 1: Increment by 30 -- 4/15 the work.
// ENHANCEMENT 2: start crossing off at prime * prime.
int i = (thisprime * thisprime);
switch (i % 30) { // new squared prime -- get equivalence class.
case 1:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 6;
case 7:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 4;
case 11:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 2;
case 13:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 4;
case 17:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 2;
case 19:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 4;
case 23:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 6;
case 29:
if(!sieve[i] && !(i % thisprime)) {sieve[i] = 1;}
i += 1; // 29 + 1 (mod 30) = 0 -- just in step
}
for(; i < SIEVE; i += 30) {
if(!sieve[i+1] && !((i+1) % thisprime)) sieve[i+1] = 1;
if(!sieve[i+7] && !((i+7) % thisprime)) sieve[i+7] = 1;
if(!sieve[i+11] && !((i+11) % thisprime)) sieve[i+11] = 1;
if(!sieve[i+13] && !((i+13) % thisprime)) sieve[i+13] = 1;
if(!sieve[i+17] && !((i+17) % thisprime)) sieve[i+17] = 1;
if(!sieve[i+19] && !((i+19) % thisprime)) sieve[i+19] = 1;
if(!sieve[i+23] && !((i+23) % thisprime)) sieve[i+23] = 1;
if(!sieve[i+29] && !((i+29) % thisprime)) sieve[i+29] = 1;
}
}

{
int i = thisprime;
switch (i % 30) { // write down the next prime in 'thisprime'.
case 1:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 6;
case 7:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 4;
case 11:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 2;
case 13:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 4;
case 17:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 2;
case 19:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 4;
case 23:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 6;
case 29:
if(!sieve[i]) {thisprime = i; sieve[i] = 1; goto done;}
i += 1;
}
for(; i < SIEVE; i += 30) {
if(!sieve[i+1]) {thisprime = i+1; sieve[i+1] = 1; goto done;}
if(!sieve[i+7]) {thisprime = i+7; sieve[i+7] = 1; goto done;}
if(!sieve[i+11]) {thisprime = i+11; sieve[i+11] = 1; goto done;}
if(!sieve[i+13]) {thisprime = i+13; sieve[i+13] = 1; goto done;}
if(!sieve[i+17]) {thisprime = i+17; sieve[i+17] = 1; goto done;}
if(!sieve[i+19]) {thisprime = i+19; sieve[i+19] = 1; goto done;}
if(!sieve[i+23]) {thisprime = i+23; sieve[i+23] = 1; goto done;}
if(!sieve[i+29]) {thisprime = i+29; sieve[i+29] = 1; goto done;}
}
done:;
}
}
printf("%d\n", thisprime);
free(sieve);
return 0;
}
```

Notice that there is a switch construct — this is necessary because we aren’t guaranteed that the first value to sieve for a new prime (or squared prime) is going to be an even multiple of thirty. Consider sieving seven — the very first prime to consider. We start by considering 72 = 49. Notice 49 (mod 30) is congruent to 19. The switch statement incrementally moves the cursor from the 19th equivalence class to the 23rd, to the 29th before pushing it one integer more to 30 — 30 (mod 30) is zero — and so we are able to continue incrementing by thirty from then on in the loop.

The code listed is rigged to find the 990 000th prime as per the assignment and uses a sieve of predetermined size. Note that if you want to use my sieve code above to find whichever prime you like, you must also change the size of the sieve. If you take a look at How many primes are there? written by Chris K. Caldwell, you’ll notice a few equations that allow you to highball the nth prime of your choosing, thereby letting you calculate that prime with the overshot sieve size.

Note also that this sieve is not the most efficient. A classmate of mine implemented The Sieve of Atkin which is magnitudes faster than this implementation.

Eddie Ma

February 3rd, 2011 at 11:08 pm