Reply to post: If only there were companies that had already solved those problems...

Inside the 1TB ImageNet data set used to train the world's AI: Naked kids, drunken frat parties, porno stars, and more

Crazy Operations Guy

If only there were companies that had already solved those problems...

There are dozens upon dozens of companies that are sitting on top of literally billions of photos that have already been cataloged, tagged, and the people pictured have given full consent for their image to be reused in diverse contexts.

It would have been fairly cheap for Image-net to just get in touch with the likes of Getty Images or ShutterStock and negotiate an "Academic redistribution" license. Then feed those images to Mechanical Turk to produce the boundary boxes and normalize the tags.

As for the CSA images, those should be split into another dataset that is strictly controlled. I figure such a library could be controlled by a law enforcement agency like the FBI that acquired the images by having victims / parents of victims sign a contract to allow the images to be used in such research like they do for getting permission to reuse confiscated material in sting operations. Maybe pair it with a library of equivalent images reproduced entirely with adults so as to remove as many possible variables when learning to differentiate between legal and illegal images Like avoiding that 'skin cancer' AI that was making decisions based on whether there was a ruler in the image and not the appearance of the skin blemish).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon