Reply to post: Re: Trust of black box systems is overrated

Inside the 1TB ImageNet data set used to train the world's AI: Naked kids, drunken frat parties, porno stars, and more

jmch Silver badge

Re: Trust of black box systems is overrated

"The issue being flagged up here is that without the original data sets, it'll be impossible to recreate the AI systems which were trained on them"

I don't think that follows at all, in fact quite the opposite. If instead of a million 'bikini' pictures gathered without the individuals' consent, I use a different data set of another million 'bikini' pictures gathered with the individuals' consent, all other things being equal* I would expect AI trained on the second data set to behave substantially the same as one trained on the first data set.

If the results of the AI are so strongly linked to the exactness of some or all images** in the data set, that AI is frankly not fit for purpose. As you mention, the real issue here is time (which ultimately boils down to money) and money. Big AI-slingers will happily charge top dollar for the final product but are unwilling to pay for proper (and properly-sourced) data sets

*ie range of differences in skin tones, bikini colours, angle and distance of shot, lighting is broadly the same across the 2 data sets

**could be images, videos, data records etc depending on the AI, principle doesn't change

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon