Sunday, March 29, 2009

How true it is.

If you belonged to the world, the world would love you as one of its own. But because you do not belong to the world and I have chosen you out of it, the world hates you.
John 15:19

A friend of mine ran smack into this down in the Dominican Republic. What I find "interesting" about this is that I rarely hear about this sort of thing with other faiths, just with Christianity.

Friday, March 27, 2009

Irony: the Alanis Morissette kind

Actually, I find it more funny than ironic, but whatever it is, it is a little annoying.

I was searching for literature on Mandatory Work First ordering and guess what, I'm the top hit for "Mandatory Work First" min max. Lot of help that does me.

It's sort of like being your own sys admin when things go wrong:

Computer: "The system has crashed. Please contact the system administrator"
Me: "Ok self, fix it"

fun with AI's

My connect 4 AI has improved yet again. This time around I just tweaked it so that I compute the full score for each state in the BFS potion of the tree and sort each group. With this improvement, I'm running to a depth of 12 in less than 30 seconds. I've still got a bit of tuning to do (some ad-hoc testing shows that running the BFS to a depth of 3 is several times faster than a depth of 4 on the single core machine I was using)

After that's done, the next bit is to figure out how to do a sliding depth scale to leverage all the remaining time. I expect that the progress vs time plot is going to be rather interesting.

On another front, I am back in the lead; I figured out how to beat the AI while playing second!

Monday, March 23, 2009

My AI is working (mostly)

As I posted a few days ago, I'm working on a connect-4 AI as a class project. It turns out that counting the children works and the whole thing performs well (at least as fast as the single threaded version)

The solution ends up working by keeping a count of how many unpropagated children each node has. When this goes to zero, the nodes value is propagated up the tree another level. The only tricky part is building the tree in the first place. The issue here is that you can't propagate stuff while the tree is being built so it has to happen as an intermediate step before the leaf processing.

Once all that gobly-goop is done the main evaluation runs just fine. However I still haven't figured out how to do any better than evaluating the leafs in the order they fall in (fact is I haven't even coded anything yet, only puzzled over it). That works very well because, most of the time, both threads are working on nodes that share a parent so alpha-beta pruning works correctly. At 2 threads, I suspect that I'm running at almost 100% efficiency but if I stared using more threads (say more than the branching factor) I would expect that to drop a lot.

The next thing I'm planning on trying will be to run my heuristic function on all the nodes and sort each group of children as they are generated. I hope that this will better optimize the alpha beta pruning and might even give an order where I can get something closer to ideal while adjusting the depth based on time renaming.

In asking around for others solutions I was pointed at this paper that mentions something called "mandatory work first" that might mesh well with my stuff if I can figure out how to sort based on it.

Anyway, the code functions and beats me easily. On the other hand I have a few more tweaks I want to try. Fun!

Friday, March 20, 2009

Fun from the code lines.

My current school project is an AI for playing Connect-4 using minimax. The first bit was fun, lost of low level bit twiddling, building bit adders, monster functions that do recursion without recursion, finding bazaar optimizations... all the kinds of stuff crazy people like me enjoy.

After I got it to the point it could beat me, the real fun started!

So far I have, at least twice, found places where I had the AI’s playing the wrong side ("Here, let me help you win!") and then spent the better part of two days trying to figure out where the typo in a fundamentally broken algorithm was.

The thing was that I was looking to speed it up by going multi threaded. The way I was planning on doing this was

  1. Run the first few levels of the min-max tree in a breadth first evaluation

  2. Sort the leafs based on how good they look

  3. Have several threads do the normal depth first bit on each leaf

  4. Stuff the data back up the tree.

The theory here was that I could start running deep and watch the clock to known when and how much I need to back off the depth to use up all my time. However I managed to step on several severe problems with it.

First that form can’t do alpha beta pruning at all, so I drop a lot of performance right there. After being annoyed at that for a minute or two, I dug in to update the min-max tree on the fly after each leaf is evaluated. That shouldn’t be to hard, each time I get a value check to see if it is less/grater-than its parent and if so update and repeat.

The better part of two days later I was frustrated as all get-up and then noticed that this doesn’t work (I told you it was a fundamentally broken algorithm) Cases like this provide a counterexample:

First the 10 goes in, and the left min node and the max node go to 10. Then the 5 goes in and the left min node goes to 5 but the max node stays at 10. Oops...

Anyway, I think by counting children and a bit of fancy footwork I can solve that problem, but the result of that is the alpha beta pruning is going to be really funky. I’ll have to play around how things get sorted and whatnot to see what works.

Right now I’ve got the leaf evaluation and mix-max push working correctly but things go wacky when a win or loss shows up above the leafs. Fun, fun, fun. What I really need is another set of eyes to go over my code. Unfortunately given that it’s homework...

Monday, March 16, 2009

Why is .NET built as GPLs?

The CLI (basically .NET) has shown that Microsoft has what it takes to integrate several different language so tightly that it just works. You could almost roll dice to pick your language for each function and never notice problems. Given that they can make inter opt that easy, why is Microsoft primarily building GPL's * on top of it? Why don't they leverage that interoperability and start building a pile of very small, very special purpose languages? These language could be tuned for very specific cases and, by avoiding corner-cases and compromises, could be made very clean. The theory would be that points in code that transition from one domain to another are good candidates for functions anyway.

I'm thinking of languages like:

A control structures language
This would provide all the ifs, foreachs, switchs and whatnot.
A math processing language
this would define precise math semantics and advanced math expressions
An systems-of-equations language
Think Maple or EES. This would provide no control structures, or define order of evaluation (not to be confused with order of operations) but would rather compute solutions for a system of equations (math for sure, maybe logical and bit-wise) given a number of inputs
An SQL like language
aside from small cases, why should LINQ even be part of C#?

* Sorry there dosn't seem to be an right now, it redirect to the DSL page :b
Regarding the gunman (or maybe gun-boy) in Waiblingen, Germany:

Searching his bedroom, the police found violent computer games — in which, experts say, players digitally clothe and arm themselves for combat — plus brutal videos and play weapons that fire small yellow pellets, said Siegfried Mahler of the Stuttgart prosecutors' office.

Surce: International Herald Tribune (I found it first in the March 13, New York Times)

My question is: why are they so cagey about saying what they found? Sounds to me like they found the kid plays WoW (or something like it) and Air Soft. If that was the case, why don't they come out and just say so? The only reason I can think of is it wouldn't sound as dangerous.

I guess I should throw in a token "Don't blame the game" Reddit link (By the way, I'm not saying I lean either way on that one.)

Sunday, March 15, 2009

Lock free concurrency is neat.

This topic comes by way of a few interest of mine.

The first is non-locking concurrent solutions, or more actually some of the ways they are modeled and dealt with. A while ago I came across a description of a lock free hash table using a CAS instruction. I fond this to be really cool because it managed to proof correctness of a solution by using only a single concurrent primitive and a state transition diagram. That was my first introduction into lock free solutions and I have been somewhat interested in them from then on.

Another interest stems from my work. At one of the steps of our R&D we had a system that worked by doing computation purely as a pattern-match-and-extend operation. While I was working with this I was also taking a algorithms class so naturally, I tried to combine the two. The most interesting thing to come out of that was from attacking the minimum spanning tree problem. Given that the system in question is only additive, things tend to be cast in terms of classification. For this problem I chose to cast it in terms of edges to be removed so the question becomes how to identify edges in a graph that are not part of the minimum spanning tree? I have never managed to prove that this is correct, but what I can up with was to mark any edge that was the heaviest edge in any simple cycle. As it happens this is easy to define but insanely expensive to compute. I never did implement it but I keep kicking it around.

While that solution is intractable in real hard ware, it's interesting to note that it would work just fine on a system with unlimited concurrency.

  1. Start a thread on every node
  2. Every thread spawns a new thread for every edge from the current node.
  3. each thread now checks to see if the current node is in it's ancestry. If it is, then the heaviest edge between that occurrence and the end is marked and the thread quits.
  4. otherwise continue at step 2.

The next step is to notice that even under limited concurrency, this is a valid (but still impractical) transformation on a graph. Running something like Kruskal's Algorithm on the graph at any point will result in the same result as running it on the original graph. With a little hand waving, I can convince my self that you can even run them at the same time. Even more interestingly, you only need to make very few assumption (I'll get back to that) to show that you don't even need to use locks. Not even hardware locking like you would need for a CAS based algorithm.

This no-locking aspect is a result of the only state that is modified, only ever going in one direction (from "might be in the tree" to "not in the tree") and because Kruskal's algorithm won't care if it gets stale data, it will just take longer to reject it.

The solution is still impractical as finding cycles is not efficient. However, if half of Kruskal's algorithm is stolen and the extra threads just run around and test random edges to see if their endpoint share a sub-tree, I expect that this could be made very practical

Now back to the assumptions: The only assumption I can see that has to be maid to avoids even hardware locking is that writing to that status flag of one edge must not interact with any other status flags. This precludes using a bit array or anything else that places the flags closer than the granularity of the CPU's cache. On the other hand. Even if this can't be met, CPU's need some way to effectively solve this problem so I would expect that it isn't that much of an issue.

Wednesday, March 11, 2009

Sunday, March 8, 2009

I just got a 25% speedup in my code!

I am working on a Connect-4 AI for an AI class. Being me I decided to go all out on the bit twiddling and whatnot to make the thing as compact and as fast as I could make it (there is a 30 seconds per move limit).

The big part turns out being how to test for a win/loss and how to score the leaves of the tree. I chose to go with bits masks for the board as it lets me do both of these with ANDs, shifts and XORs. I shift the bits so that each sequence of 4 spaces stack and then build a set of adders from ANDs and XORs. Checking the 4s bits finds wins and looses and masking with the inverse of the opponents and counting the 3s, 2s and 1s gives a heuristic for the state. The problem with this is that the direct way would end up with about 100 lines of bit ops and piles of goo for variables names. I actually started in on that tack but decided it wasn't a good idea.

What I settled on was a templated struct that handles the shift and adds and another struct to handle the next higher level of the mucking around. The main problem with that is that I end up with about 160 variables (most of which are not needed at the same time) that are locked into struct so I don't expect that the compiler will do a good job of reusing them (I also suspect that many of the methods won't get inlined as aggressively)

I was tossing around the idea of getting the code working and then editing the structs back into the main loop (inlining by hand) to address these issue and it occurred to me that if I turned the struct into templates I can do a named mixin and I don't need to alter the usage code one bit.

As an example, this code:

struct S
int i;
int I()
return i++;

void main()
S s1, s2;
s1.i = 0;
s2.i = 0;
assert(s1.I == 0);
assert(s2.I == 0);
assert(s1.I == 1);
assert(s2.I == 1);

becomes this code:

template S()
int i;
int I()
return i++;

void main()
mixin S!() s1;
mixin S!() s2;
s1.i = 0;
s2.i = 0;
assert(s1.I == 0);
assert(s2.I == 0);
assert(s1.I == 1);
assert(s2.I == 1);
In my actual case this resulted in about a 25% speed boost.

of course this does have a few down side like I can no longer use this or any of it's aliases. I can't use the data as a struct and I can't copy it.

But all in all a 25% speedup for almost no coding is just cool!

Wednesday, March 4, 2009

So I started a blog this afternoon...

I hope I can think of stuff to say...