Correcting the Numbers
By selecting only from applications that access both personal data and the internet, they're overstating the significance of their study by about 3x. Furthermore, their summaries blur this distinction unnecessarily.
Specifically, their FAQ says "We studied just over 8% of the top 50 popular free applications in each category that had access to privacy sensitive information in order to get a sense of the behaviors of these applications." Since there were 22 categories at the time they did the study, that would imply (22*50=1,100 * 8% =) 88 applications. However, they actually only tested 30, because of the 1,100 top 50 applications only (from the PDF) "roughly a third of the applications (358 of the 1,100 applications) require Internet permissions along with permissions to access
either location, camera, or audio data." -- meaning that the other 742 apps don't have the necessary permissions to play badly. The clause "..that had access to privacy sensitive information in order to get a sense of the behaviors of these applications." from the FAQ is grammatically ambiguous in this case (it may refer to "applications" or "category"), and not specific enough to indicate that over 2/3 of the applications are (relatively) safe by dint of not having the necessary permissions.
They also didn't include in their study apps from 10 of the 22 categories, but they don't explain whether that was due to a) there not being any or enough applications in those categories that required internet and personal data permissions, b) a conscious choice to focus on the other 12 categories, or c) the results of random selection (with an explanation of why they did not use a stratified sample).
Once you factor back in the applications they ignored, the numbers don't look quite so bad. Assuming their sample was representative, 2/3 of the 358, or about 239 applications of the top 1,100 of the time use personal data suspiciously. That's about 21.7% or just over 1 in 5 -- still significant, but a far cry from 2 out of 3. In fact, the worst case maximum is actually 358 of 1,100 or just under 1 in 3 (32.45%) because they are as mentioned above the only ones that actually acquire the permissions necessary to do anything "suspicious".
I understand why both the researchers and the reporter used the 2/3 figure -- you all believe you have to sell the point as hard as possible*. But the real story is that it's likely that at least 1 in 5 Android Apps use private data "suspiciously" -- and that number is still high enough to cause concern and to justify the further use of tools like TaintDroid. It's a pity you didn't trust the facts enough to avoid the unnecessary sensationalism.
*I am assuming, here, that Mr. Goodin did actually read and digest the paper as I did, rather than simply picking out the figures from the study, the FAQ, or a press release.