As I posted a few days ago, I'm working on a connect-4 AI as a class project. It turns out that counting the children works and the whole thing performs well (at least as fast as the single threaded version)
The solution ends up working by keeping a count of how many unpropagated children each node has. When this goes to zero, the nodes value is propagated up the tree another level. The only tricky part is building the tree in the first place. The issue here is that you can't propagate stuff while the tree is being built so it has to happen as an intermediate step before the leaf processing.
Once all that gobly-goop is done the main evaluation runs just fine. However I still haven't figured out how to do any better than evaluating the leafs in the order they fall in (fact is I haven't even coded anything yet, only puzzled over it). That works very well because, most of the time, both threads are working on nodes that share a parent so alpha-beta pruning works correctly. At 2 threads, I suspect that I'm running at almost 100% efficiency but if I stared using more threads (say more than the branching factor) I would expect that to drop a lot.
The next thing I'm planning on trying will be to run my heuristic function on all the nodes and sort each group of children as they are generated. I hope that this will better optimize the alpha beta pruning and might even give an order where I can get something closer to ideal while adjusting the depth based on time renaming.
In asking around for others solutions I was pointed at this paper that mentions something called "mandatory work first" that might mesh well with my stuff if I can figure out how to sort based on it.
Anyway, the code functions and beats me easily. On the other hand I have a few more tweaks I want to try. Fun!