METHINKS IT IS LIKE A WEASEL

William Dembski, the cdesign proponentist who styles himself as the “Isaac Newton of information theory” seems to be having a bit of trouble with a simple, 22 year old program written in BASIC by biologist Richard Dawkins, and described in his book, The Blind Watchmaker.

Dawkins’ program starts off with a string of 28 randomly chosen characters. This string is copied to a new string, with some chance of random error in the copying process, say, maybe 1 out of a 100 characters will be copied incorrectly, with a random character substituted in place of the correct character. After each copy, the original and new strings are compared against a target string, “METHINKS IT IS LIKE A WEASEL”. Whichever string is closest to the target is kept, the other thrown away, and the process repeated until the target string is reached.

Dembski is making the following claim:”

As you can see, by using the Courier font, one can read up from the target sequence METHINKS*IT*IS*LIKE*A*WEASEL, as it were column by column, over each letter of the target sequence. From this it’s clear that once the right letter in the target sequence is latched on to, it locks on and never changes. In other words, in these examples of Dawkins’ WEASEL program as given in his book THE BLIND WATCHMAKER, it never happens (as far as we can tell) that some intermediate sequences achieves the corresponding letter in the target sequence, then loses it, and in the end regains it.

Dembski is claiming that once a letter is “correct,” it is locked, and mutation on that letter is prevented afterwards.

This is just lame. He failed to notice that the sample output Dawkins included in the book showed only every 10th iteration. What’s more, in advancing this claim, he obviously failed to bother to test it. Any freshman CS student not in danger of flunking out could write their own version of Dawkins’ weasel program in at most few hours. I wrote it in less than 20 minutes, in C, just now, without such ridiculous “locking” of mutations which Dembski is insinuates must be present to explain the program’s effectiveness. My version is 78 lines long, which is absolutely microscopically trivial, as C programs go. Here it is in it’s entirety:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/time.h>

static char *target = "METHINKS IT IS LIKE A WEASEL";
static char *alphabet = " ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static int alphabetlen = 27;
static float mutate_probability = 0.01;

void randomfill(char *str, int len)
{
        int i, r;
        for (i=0;i<len;i++) {
                r = (random() & 0x0ffff) * alphabetlen / 0x0ffff;
                str[i] = alphabet[r];
        }
        str[i] =0;
}

void mutantcopy(char *source, char *dest)
{
        float x;
        int i, r, len = strlen(source);
        for (i=0;i<len;i++) {
                x = (0.0 + random()) / (0.0 + RAND_MAX);
                if (x <= mutate_probability) {
                        r = (random() & 0x0ffff) * alphabetlen / 0x0ffff;
                        dest[i] = alphabet[r];
                } else
                        dest[i] = source[i];
        }
        dest[len] = 0;
}

void compete(char *target, char *str1, char *str2,  char **winner, char **loser)
{
        int i, len, score1 = 0, score2 = 0;
        len = strlen(target);
        for (i=0;i<len;i++) {
                if (str1[i] == target[i])
                        score1++;
                if (str2[i] == target[i])
                        score2++;
        }
        if (score1 >= score2) {
                *winner = str1;
                *loser = str2;
        } else {
                *winner = str2;
                *loser = str1;
        }
}

int main(int argc, char *argv[])
{
        char try1[100], try2[100];
        char *winner, *loser;
        int length, i;
        struct timeval starttime, endtime;

        gettimeofday(&starttime, NULL);
        srand(starttime.tv_usec);
        length = strlen(target);
        randomfill(try1, length);
        winner=try1;
        loser=try2;
        for (i = 0; strcmp(winner, target); i++) {
                mutantcopy(winner, loser);
                compete(target, winner, loser, &winner, &loser);
                if ((i % 100) == 0)
                        printf("%s\n", winner);
        }
        printf("%s\n", winner);
        gettimeofday(&endtime, NULL);
        printf("Target reached in %d generations, %d seconds and %d microseconds elapsed\n", i,
                endtime.tv_sec - starttime.tv_sec, endtime.tv_usec - starttime.tv_usec);
}

It typically converges (with 1% chance of mutation on each character copied) to the solution in less than 30000 generations and less than a tenth of a second of wall clock time.

The funniest part is the cdesign proponentists have their “chief programmer at the Evolutionary Informatics Lab (www.evoinfo.org) is expanding our WEASEL WARE software to model both these possibilities. Stay tuned.”

Stay tuned? For how long? I wrote my program in less than 20 minutes! Panda’s thumb has done the promised comparison themselves.

The inescapable conclusion is that William Dembski, the self-styled “Isaac Newton of Information theory” is either a fucking dumbass, or a liar. Note, I did not say he was a dumbass xor a liar.

H/T to Panda’s Thumb and The almighty Pharyngula.

~ by scaryreasoner on March 28, 2009.

9 Responses to “METHINKS IT IS LIKE A WEASEL”

  1. Howdy, I’m a big programming newbie. I pasted your code in and let it run. It finally got to the target after almost 20 minutes. Is that right? It sounds like it gets there much faster for you. Here was the final output:
    “Target reached in 734141559 generations, 1170 seconds and 62192 microseconds elapsed”

    It’s kind of weird that each time I run it it starts with the same string, “TPHPFGNKYTDILTTEINQO GYBSTQI”. Maybe there is something funny with my system? What do you think? Thanks!

    By the way, did you see this site:
    http://like-a-weasel.blogspot.com/

  2. 20 minutes? Hmm. Takes me typically around 1/10th of a second on a 2.1Ghz AMD athlon (that’s with no compiler optimization), and usually less than 20000 generations, so something’s off.

    I just cut and pasted it from the blog myself to see if there’s something weird about the C-html transformation, but it seems to work ok for me.

    I think somehow something’s wrong with the code you cut and pasted, as it shouldn’t take that many generations. Try cutting and pasting again. (Don’t know what else to say.)

    Also it is weird that it starts with the same string, as I put in that srand(starttime.tv_usec); to initialize the random number generator to somethng different each time the program is run, and for me, this works — it doesn’t start with the same string each time and runs differently each time.
    And no, I hadn’t seen that site. Thanks!

  3. Oh, earlier, there was a problem that the null terminating of the strings got botched by the C-to-html translator, so that

    x = ” would end up as x = ”;
    which doesn’t even compile (so I guess you didn’t hit that).

    I changed such lines to

    x = 0;

    and that made the c-to-html thing happy, and the compiler happy at the same time.

    (Just thinking maybe you copied the earlier version of the code with the html-botched null termination problem, and didn’t fix it quite right. The code as it is now should just compile and run though.)

  4. Well, now see, in the above comment, it seems the problem happens again… you can’t type “backslash zero” apparently. WordPress is apparently eating it. Interesting.

  5. Hi! Thanks for the reply. I didn’t see it right away because for some reason I didn’t get the subscription notification. I played around a bit and found out what was wrong. In line 26 RAND_MAX is being used, but on my system RAND_MAX is set to 32767. So x was getting below the mutate_probability VERY rarely. I just changed RAND_MAX to 2147483647 (ie. 2^31-1). Also the reason I was getting the same string every time was that on my system srand() only seeds rand(). I just changed it to srandom() and it works fine. So my new improved results: “Target reached in 9817 generations, 0 seconds and 16430 microseconds elapsed” Nice!

    I do have a few of questions though:
    1. Why is alphabetlen 27? I know the length is 27, but doesn’t line 28 in the randomfill function generate numbers from 0 to 27? So 27 would be out of bounds no? (Although I think it would require random() to have ffff in the low order bits, so about 1/(ffff+1) numbers would hit 27 I think)
    2. Can you tell me a bit about the “& 0x0ffff” thing? I understand what it’s doing (bitwise operation, etc…) but I’m not sure about the reason. I did some googling and found the following page discussing using the high order bits of rand(). I don’t know if that is an issue with random() or if it’s related to your technique here.
    http://members.cox.net/srice1/random/crandom.html
    3. In lines 46-52 in the compete function, maybe you don’t need both if and else blocks because if score 1 is higher, there is no change in which string is the winner and loser right? (Maybe this doesn’t matter, just curious if you had a particular reason here.)

    Sorry to bug you with more questions. I’m glad you put this code up, because I was very interested in the Dawkins program but I’m still learning C and I wasn’t familiar with the languages used in the other versions posted (QBASIC, python etc.) Thanks again! I learned a lot examining your code.

  6. 1. I think you’re right. So, occasionally, you get a mutation which jams a null in there (not quite out of bounds, due to C’s null terminated strings.) Doesn’t hurt the program, but probably makes it (very very slightly) slower to converge. Still, it is a bug. Thanks for pointing that out.

    2. The 0x0ffff was just a very cheesy way for me not to have to think about overflows.

    I knew I needed a (pseudo)random number between 0 and 27ish and I knew that 16 bits of precision was plenty to get me that (though, depending on the random number generator, the lower 16 bits might not be the “most random” of the bits, but… close enough for my purposes. By cutting it down to 16 bits, overflows are impossible, since (on my machine, and most machines these days) ints are 32-bits — should have used a uint32_t from stdint.h though. If I don’t do this trick, and I happen to get 27*RAND_MAX, it will overflow and if signed, maybe go negative, might wrap around and work anyway… but by doing what I did, I don’t even have to think about what happens on overflow, because there won’t be any overflow — so, to sum up, it was a very lazy hack.

    It was something I originally had done for speed in scaling random numbers, since dividing by 0xffff is very very nearly the same thing as shift right by 16, so if you don’t care about screwing the distribution slightly, which in some apps, maybe you don’t, you can do:

    x = ((randomm() & 0x0ffff) * scale) >> 16;

    and get rid of the divide, and have nearly the same thing as:

    x = ((random() & 0x0ffff) * scale) / 0x0ffff;

    which is nearly the same as:

    x = ((random() * scale) / RAND_MAX;

    (provided scale is small enough) and you again, don’t have to worry about overflow.

    The shifting version is an old-school optimization that’s probably mostly irrelevant on modern processors (though I think I did measure it being 5x as fast (iirc) as the naive implementation on my current machine which is a 2.1Ghz athlon, but it biases things slightly towards zero — that is, zero pops up too often in the distribution. I used that for generating x/y velocities for sparks in explosions in a little 2d video game I made. They look OK despite the bias — though if you know what to look for you can see the artifacts. Hmm, I’m rambling.

    3. I think you’re right.

  7. Thank you once again for taking the time to explain things! I’ll be checking out your video game next.

    One more quick question (sorry!) in the mutantcopy function you have “x = (0.0 + random()) / (0.0 + RAND_MAX);”. Is that identical to “x = ((float)rand())/((float)RAND_MAX);”? Is there any reason to do one over the other?

  8. Hey Greg! Sorry, didn’t see your reply until just now.

    I think 0.0 + whatever will promote things to double, whereas casting to float is (usually) less precise than double (and faster) (I can’t recall if C standard specifies a particular precision for floats and doubles, or if it’s like ints, where the standard doesn’t really say *how* precise just specifies that doubles are *more precise* or at least *same precision* as floats without specifying, e.g.: “floats are 32bit, doubles are 64bit” though in practice, I think nowadays this is the case, generally, floats are 32bit and doubles 64bit. That is true on x86 hardware anyhow.

    But, my intent with promoting to doubles was merely that I was trying to get a random number between zero and 1, and if I left things as some integer type, obviously that wouldn’t work. I added 0.0 to do that without really thinking very hard about the “best” way to do it.

    As for a naive approach, If you did:

    float x;

    x = random() / RAND_MAX;

    I think you’d either get zero or 1, and nothing in between, as I think it would do the division as integer, then cast the result to float, rather than casting numerator and denominator to float, then doing the divide as float, and assigning the result.

    Another way (maybe faster way — well, faster in the old days) would have been to express mutate_probability as an int, and do all the math as ints, something like:

    static int mutate_probability = 10; /* out of 1000 */

    then…

    if ((random() * 1000)/RAND_MAX) <= mutate_probability)

    These days, I think floats are approximately as fast as ints. Doubles probably aren’t as fast though, and floats get easily and transparently promoted to doubles if you aren’t careful (implicit casts are one of the very very few things that C does “behind your back.”)

  9. Note, I did not say he was a dumbass xor a liar.
    That made me laugh

    I wrote it a year or so ago, unfortunately I did “cheat” by locking letters. Maybe I should take the 20 minutes to write it again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: