Friday, June 26, 2009

Beating CAPTCHA can be a good thing.

Anyone who has been around for a while has run across CAPTCHAs. These are one of the current state of the art weapons in the detecting-bots-arms-race. CAPTCHAs general boil down to finding problems that people solve easily and computers are really bad at and asking your user to solve it. One of the interesting things about this arms race is that CAPTCHAs can be designed so that successfully breaking one requires creating a better program for solving a problem that is valuable in the real word.

One example of this is reCAPTCHA. They take the best OCR they can get there hands on, find text it can’t read, make the text even harder to read and then fork it out for bots to do there best with. For someone to beat this, they would have to make a better OCR program. This has two interesting effects. First if the bot gets into general circulation (and it will sooner or later) reCAPTCHA can just start using it in there OCR system and be back on top. Second it furthers the state of the art in ORC and that is valuable in it’s own right.

This thought has some interesting implications. More generally what a CAPTCHA does is use problems that are hard to solve but easy to check the solution on to make automatic access to a resource to expensive. How about doing this more directly? How about find commercially valuable computational problems that can easily be broken up into chunks that have this attribute and can be solved in a few seconds and checked in microseconds. Then the bots would need to expend a few CPU seconds per page load to access a site.

One implementation of this could be a browser plug-in that allows you to bypass the CAPTCHA on a site. The publisher of the plug-in would push out code packets that the plug-in would be required to run. Sites that run the CAPTCHA could even get paid to use it (some fraction of what the central server gets after expenses) and to make things fun, the whole things is open so that anyone who wants can try to write better solutions to the problem. One neat trick would be to try to set the price paid by the central guy so that if you can improve the solver enough, you can make money by getting your own account and just solving problems after problem. Or if you that doesn’t make enough money fast enough for you, just sell your solution to the central guy.

No comments:

Post a Comment