It appears that spammers have found a way of automatically creating Hotmail and Yahoo email accounts, having already created more than 15,000 bogus Hotmail accounts, according to security company
BitDefender.
Both Microsoft and Yahoo use “captcha” systems to stop email accounts from being automatically generated; accounts aren’t created until a new user correctly identifies letters depicted in an image. CAPTCHA systems are designed to ensure that the letters are not easily recognized by machines.
I came across a very good article which talks about the ways to break CAPTCHA by
Greg M and Jitendra M
SummaryThis is the homepage of the Shape Contexts based approach to break Gimpy, the
CAPTCHA test used at Yahoo! to screen out bots. Our method can successfully pass that test 92% of the time. See
EZ-Gimpy in action at Yahoo! The approach we take uses general purpose algorithms that have been designed for generic object recognition. The same basic ideas have been applied to finding people in images, matching handwritten digits, and recognizing 3D objects.
BackgroundA CAPTCHA is a program that can generate and grade tests that: Most humans can pass, but
Current computer programs can't pass.CAPTCHA stands for "Completely Automated Public Turing test to Tell Computers and Humans Apart". See the
CAPTCHA site for more details. The concept of a CAPTCHA is motivated by real-world problems faced by internet companies such as Yahoo! and AltaVista. These companies offer free email accounts, intended for use by humans. However, they found that many online vendors were using "bots", computer programs that would sign up for thousands of email accounts, from which they could send out masses of junk email. By requiring the user to solve a CAPTCHA, in the case of Yahoo the word-based one called EZ-GIMPY shown above, the "bots" could be screened out.
EZ-Gimpy and Gimpy, the CAPTCHAs that we have broken, are examples of word-based CAPTCHAs. In EZ-Gimpy, the CATPCHA used by Yahoo! the user is presented with an image of a single word. This image has been distorted, and a cluttered, textured background has been added. The distortion and clutter is sufficient to confuse current OCR (optical character recognition) software. However, using our computer vision techniques we are able to correctly identify the word 92% of the time.
Gimpy is a more difficult variant of a word-based CAPTCHA. Ten words are presented in distortion and clutter similar to EZ-Gimpy. The words are also overlapped, providing a CAPTCHA test that can be challenging for humans in some cases. The user is required to name 3 of the 10 words in the image in order to pass the test. Our algorithm can pass this more difficult test 33% of the time.
Our ApproachThe fundamental ideas behind our approach to solving Gimpy are the same as those we are using to solve generic object recognition problems. Our solution to the Gimpy CAPTCHA is just an application of a general framework that we have used to compare images of everyday objects and even find and track people in video sequences.
The essences of these problems are similar. Finding the letters "T", "A", "M", "E" in an image and connecting them to read the word "TAME" is akin to finding hands, feet, elbows, and faces and connecting them up to find a human. Real images of people and objects contain large amounts of clutter. Learning to deal with the adversarial clutter present in Gimpy has helped us in understanding generic object recognition problems.
Our related work on finding people and generic objects.
A high-level description of our method can be found here.If you would like more details, see our
paper from CVPR 2003.
ResultsFor EZ-Gimpy we did experiments using 191 images. We were able to correctly identify the word in 176 of these images: a success rate of 92%! Our algorithm takes only a few seconds to process one image. If your would like to see our results on all 191 images, please click
here.
The more difficult version of the Gimpy CAPTCHA presents an image in which 10 words (some repeated), overlaid in pairs. The test-taker is required to list 3 of the words present in the image in order to pass.
The clutter in these images, real words instead of random background textures, is much more difficult to deal with. In addition, we must find 3 words instead of just one. Our current algorithm can find 3 correct words and pass this Gimpy test 33% of the time. Note that even if we could guess a single word correctly 70% of the time, we would only expect to get 3 words correct approximately 0.7*0.7*0.7 = 34% of the time. Moreover, given our 33% success rate, this CATPCHA would still be ineffective at filtering out "bots" since they can bombard a program with thousands of requests.
The algorithm we use is outlined in detail in our paper linked above. Check out our results on the
the harder version of Gimpy.