Sunday, March 15, 2009

Lock free concurrency is neat.

This topic comes by way of a few interest of mine.

The first is non-locking concurrent solutions, or more actually some of the ways they are modeled and dealt with. A while ago I came across a description of a lock free hash table using a CAS instruction. I fond this to be really cool because it managed to proof correctness of a solution by using only a single concurrent primitive and a state transition diagram. That was my first introduction into lock free solutions and I have been somewhat interested in them from then on.

Another interest stems from my work. At one of the steps of our R&D we had a system that worked by doing computation purely as a pattern-match-and-extend operation. While I was working with this I was also taking a algorithms class so naturally, I tried to combine the two. The most interesting thing to come out of that was from attacking the minimum spanning tree problem. Given that the system in question is only additive, things tend to be cast in terms of classification. For this problem I chose to cast it in terms of edges to be removed so the question becomes how to identify edges in a graph that are not part of the minimum spanning tree? I have never managed to prove that this is correct, but what I can up with was to mark any edge that was the heaviest edge in any simple cycle. As it happens this is easy to define but insanely expensive to compute. I never did implement it but I keep kicking it around.

While that solution is intractable in real hard ware, it's interesting to note that it would work just fine on a system with unlimited concurrency.

  1. Start a thread on every node
  2. Every thread spawns a new thread for every edge from the current node.
  3. each thread now checks to see if the current node is in it's ancestry. If it is, then the heaviest edge between that occurrence and the end is marked and the thread quits.
  4. otherwise continue at step 2.

The next step is to notice that even under limited concurrency, this is a valid (but still impractical) transformation on a graph. Running something like Kruskal's Algorithm on the graph at any point will result in the same result as running it on the original graph. With a little hand waving, I can convince my self that you can even run them at the same time. Even more interestingly, you only need to make very few assumption (I'll get back to that) to show that you don't even need to use locks. Not even hardware locking like you would need for a CAS based algorithm.

This no-locking aspect is a result of the only state that is modified, only ever going in one direction (from "might be in the tree" to "not in the tree") and because Kruskal's algorithm won't care if it gets stale data, it will just take longer to reject it.

The solution is still impractical as finding cycles is not efficient. However, if half of Kruskal's algorithm is stolen and the extra threads just run around and test random edges to see if their endpoint share a sub-tree, I expect that this could be made very practical

Now back to the assumptions: The only assumption I can see that has to be maid to avoids even hardware locking is that writing to that status flag of one edge must not interact with any other status flags. This precludes using a bit array or anything else that places the flags closer than the granularity of the CPU's cache. On the other hand. Even if this can't be met, CPU's need some way to effectively solve this problem so I would expect that it isn't that much of an issue.

No comments:

Post a Comment