FAILs in the making?
This could easily be automated. Two ways:
1. If the voice annotation is the same voice (like those annoying voicemail systems often sound alike), you need only pattern-match on samples of each letter/digit.
2. It is an animated GIF with the text to type in a different colour. Well, you need only analyse a few frames of the image to recognise which part is the static background. This can be discarded leaving you with the wobbling text. Of the wobbling text, you can then filter for which parts of it are non-black (in case red is only one of the choices). This will leave you with various frames of wobbling code characters. Run the pattern recognition on a few frames where the text is in the centre of the image area, when you get three or so that return the same result. This is actually remarkably easy. You simply step through the GIF until you find when the characters are most separated. You use this to isolate each character. I did this manually, but it could be done using software fairly easily. Again, using previous frames to notice which bits move relative to others, it shouldn't be too challenging to identify individual characters. I clipped these out manually and passed them as 300dpi TIFFs to my lame scanning OCR software. It could not cope with uneven characters having different angles from each other, but when passed one by one, it returned the code GPA from the image [no link, it's really long!]
I bet, given this, somebody way smarter than me could throw together some code to break every one of the demo "nu"captchas in an afternoon or two. At least we can say it would be helping to end slave labour...