15 02/12
23:16

Printing values from Theano (Python)

CogFault: Tried to circumvent a library’s API to print underlying data when a simpler solution already exists.

Theano is this really pretty package for Python — it operates on top of Numpy to provide short hand and functionality useful for machine learning algorithms. I bumped into this when I was running some demo code available at deeplearning.net. The demo data used by this code is actually a Python Pickled +Gzipped version of the MNIST handwritten digit database. In this scenario, all I wanted to do was print the MNIST data after it had been already been loaded by Theano into the demo script. The data is accessible via instances of the theano.TensorVariable class.

The first attempt worked for the rows of data corresponding to the images stored as 2D arrays since Theano tensors of type matrix have a convenience method that lets one grab the data as a familiar 2D Numpy array using an undocumented get_value() method (it doesn’t appear in the API for theano.TensorVariable, but is mentioned briefly elsewhere). This convenience method disappears for the 1D arrays, since these were not constructed with a shared reference to an underlying Numpy array.

After spending a solid amount of time working away with Python’s dir(), searching for a way to coerce Theano into showing me its innards, I finally came to my senses and realized I should probably see what canonical solutions are. Completely new to Theano, and only having recently read about them — I simply wasn’t well versed enough with the notion of compiling a Theano function. Finally, peering into the loaded MNIST data came down to this …

datasets = load_data(dataset)

train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x , test_set_y  = datasets[2]

##### -- my code begins here ...

def print_set(data, name):
    with open("extract/" + name + ".txt", "w+") as handle:
        print >>handle, "# type(python)", type(data)
        print >>handle, "# type(theano)", data.type
        print >>handle, "# dimensions  ", data.ndim
        if data.ndim == 2:
            print >>handle, "# rows(numpy) ", len(data.get_value())
            print >>handle, "# cols(numpy) ", len(data.get_value()[0])
            for row in theano.function([], data)():
                for val in row:
                    print >>handle, "%0.3f" % val,
                print >>handle
        elif data.ndim == 1:
            for row in theano.function([], data)():
                print >>handle, "%d" % row
        else:
            return

print_set(train_set_x, "train_set_x")
print_set(train_set_y, "train_set_y")
print_set(valid_set_x, "valid_set_x")
print_set(valid_set_y, "valid_set_y")
print_set(test_set_x,  "test_set_x" )
print_set(test_set_y,  "test_set_y" )

return

Here, the magic is in the constructor theano.function([], data) (– Thanks: Olivier Delalleau) The great thing about working through this problem is that I should be able to extend this solution to building actual functions in Theano. The stumbling around did force me to understand more about Theano’s tensors and Theano’s compiled function graphs.

22 12/11
06:22

Ternary vs relational precedence (C)

CogFault: I thought that the relational operators had the lowest precedence – the ternary conditional operator actually has a lower precedence*.

Updated thanks to Wyatt and Devon in the comments.

The assignment below …

int A = B == C ? D : E;

Will assign A to D if B is C and A to E if B is not C. This is equal to saying …

int A = (B == C) ? D : E; // Correct

An incorrect interpretation is …

int A = B == (C ? D : E); // Incorrect

Thanks to Wyatt for checking this and pointing it out in the comments — I had a typo in my original code which lead to an incorrect interpretation. Fittingly of a CogFault, the intro is still true (ternary has lower precedence). This post has been updated to reflect these changes.

Either way, as Devon points out in the comments — when the expression “B == C” gets even slightly more complicated, like “B == C-1″, then it’s better to use parentheses.

So, thanks everyone for making this CogFault better.

*Note that the middle expression between “?” and “:” of the ternary conditional operator is always evaluated first.

15 11/11
14:50

Restore default man pages in manpath

CogFault: Accidentally removed manpath for default man pages (manual pages) in Mac OS X.

CogPatch: add this line to .bash_profile …

export MANPATH=/usr/share/man:$MANPATH

… as /usr/share/man is the default location of the system man pages.

04 07/11
20:20

Progress bar hang (Java thread sync)

CogFault: Created multi-threaded Java code that can hang indefinitely while waiting on the current state of a progress bar (threading care of one of ExecutorServices’ thread pool methods — progress bar care of swing).

Example — several functions execute a statement similar to the update line below.

static final int PRG = 100; // total progress that each thread contributes
static final int TOTAL_PRG = 1000; // total progress for all ten threads
static int progress = 0;

public double calculateSomething(double[] inputs) {
    double output = 0;
    int prev = 0;
    for(int iter = 0; iter < maximum; iter ++) {
        int diff = PRG * iter / maximum % PRG - prev;
        if(diff > 0) {
            progress += diff; // !!! update the progress bar integer !!!
            prev += diff;
        }
        // . . . math or function logic goes here --
        // calculate something with inputs, save in output . . .
    progress += PRG - prev; // finishes this bar
    return output;
}

CogPatch: Enclose the updating in a synchronized function as below.

protected synchronized void updateProgress(int amount) {
    progress += amount;
}

// ... OR -- make a thread safe mutator / accessor combo ...
// -- giving this function 0 will make it behave like a get.

protected synchronized int updateProgress(int amount) {
    progress += amount;
    return progress;
}

public double calculateSomething(double[] inputs) {
    double output = 0;
    int prev = 0;
    for(int iter = 0; iter < maximum; iter ++) {
        int diff = PRG * iter / maximum % PRG - prev;
        if(diff > 0) {
            updateProgress(diff); // update the progress bar integer (fixed)
            prev += diff;
        }
        // . . . math or function logic goes here --
        // calculate something with inputs, save in output . . .
    updateProgress(PRG - prev); // finishes this bar (fixed)
    return output;
}

Lesson: Always be thread safe! Several cores executing the addition simultaneously will result in the integer being updated only once. During debugging, this happens more with an eight-core machine than a two-core machine no matter the thread count (it’s what you’d expect). Since the progress integer never reached the total, the program hangs and waits indefinitely for completion.

26 05/11
13:39

Binary path confusion (OSX)

Posted in CogFault with tags , , ,

Cogfault: I couldn’t figure out why my OSX 10.6.6 had tar 1.14 instead of a newer version.

Symptoms: Trying to archive 175MB of data over 7680 files resulted in this …

$ tar -cvzf exp_tin.tgz exp_tin
tar: exp_tin: Cannot savedir: Cannot allocate memory
tar: Error exit delayed from previous errors
Tin:CIS6650_Chicago eddiema$

Delving into this, the last reports of this issue that Google could dig up was all the way back in 2005 — that fits with this information …

$ tar --version
tar (GNU tar) 1.14
Copyright (C) 2004 Free Software Foundation, Inc.
This program comes with NO WARRANTY, to the extent permitted by law.
You may redistribute it under the terms of the GNU General Public License;
see the file named COPYING for details.
Written by John Gilmore and Jay Fenlason.

For about fifteen minutes, I couldn’t figure out how this happened — I resolved to download and compile the latest version available (1.26). I eventually checked if the same memory error existed on my other machine — it didn’t.

Finally, I did this …

$ which tar
/sw/bin/tar

Which made me realize that this version of tar came from an old fink update.

Solution: I moved /sw/bin/tar to /sw/bin/tar114 and the default version of tar given by OSX became the default …

$ tar --version
bsdtar 2.6.2 - libarchive 2.6.2

CogPatch: Check the version then location of a suspect binary before doing anything more elaborate.

09 04/11
15:44

Namespace, directory confusion

Posted in Programming with tags , , , ,

CogFault: couldn’t figure out why gmcs (mono C# compiler) wasn’t able to find HMOther as the container of the Main() function I wanted

Example:

incorrect …

heat2.exe: heatmap/OtherHeat.cs heatmap/heatmap.cs
    gmcs -r:System.drawing -out:$@ $+ -main:$</HMOther

correct …

heat2.exe: heatmap/OtherHeat.cs heatmap/heatmap.cs
    gmcs -r:System.drawing -out:$@ $+ -main:heatmap.HMOther

Reason: forgot about namespaces, but kept looking at the directory structure over and over again

Similar: any time we have dual hierarchical structures, we’ll run into this kind of problem — if there is a logical programmatic nesting that shares a few tokens (e.g. “heatmap”) with a physical directory structure, this confusion can result

CogPatch: properly attribute nesting to either logical (namespaces) or physical (directory) depending on the application — here, makefile and gmcs both require knowing about the directory structure; however, gmcs alone requires additional knowledge of the logical programmatic structure