
Jail
Just drag those bastards out and throw them in jail.
People that think they can do whatever they please without consequences need to be disposed of properly.
The guillottine seems a good way to get their attention.
A massive public dataset that served as training data for a number of AI image generators has been found to contain thousands of instances of child sexual abuse material (CSAM). In a study published today, the Stanford Internet Observatory (SIO) said it pored over more than 32 million data points in the LAION-5B dataset and …
It is like sitting a child in front of a dozen screens with unlimited access to all channels results in something rather nasty.
Training on randomly selected data will always reinforce and biases that currently exist.
The solution is to generate your own dataset.
If you want to accurately recognise images of human faces, then take photographs of of every type of face that you want to recognise.
It is going to be expensive. Get used to it.
If you want to recognise the subject matter of pictures in general, then take photos of everything.
It is going to be very expensive. Get used to it.
If you want to make good predictions of the next word, then write everything down.
It is going to take a lot of work that you will need to pay for. Still more expense. Get used to it.
If you want medical diagnostic AI to perform cheaper, more efficient and more reliable diagnosis that is not going to be racially or culturally biased, then find everyone that might have any disease, get their permission to gather all of their information, apply every diagnostic method, regardless of cost, make sure that your samples for each and every separate parameter are representative of every combination of other parameters ... ***out of body error*** *redo universe from start***
Everyone wants cheap AI so they use any crap they can scrape up for free.
We will have the AI that we pay for. We are all going to die.
LAION didn't respond to our questions on the matter, but founder Christoph Schuhmann did tell Bloomberg earlier this year that he was unaware of any CSAM present in LAION-5B, while also admitting "he did not review the data in great depth."Isn't the whole point of a training dataset to have been reviewed and curated in great depth and detail, entirely by humans, and verified, to then be used in AI training?
Otherwise what's the point? May as well just randomly scrape images off random sources.
"May as well just randomly scrape images off random sources."
Yes, that's the plan. Then, if you can be bothered, hire cheap labor to filter out some of the worst stuff. Then just train on the remaining mass. That's the models we have now. They contain stuff nobody wants in there, they contain illegal versions of works that the AI companies don't want to pay for, they contain complete gibberish, they contain personal information, and the AI companies are fine with it because they still look sort of authoritative when they make up something.
...or even slightly surprised but no one ever said that humans directly employed by StabilityAI reviewed every bit of potential training data. Who in their right mind would say "I'll take all the money you are willing to give me so that I may become intimately familiar with the worst of the worst of the worst content the Internet has to offer?"
Then again, who in their right mind would train AI's with uncured data sets? Yes not just curated, but cured, like an XMAS ham.
To be fair and fully disclose relevant info, I use Stable Diffuse a few times a week and I am glad that I do not own any of the stock. These things need guard rails for their guard rails.
> "encode a range of social and cultural biases when generating images of activities, events and objects."
> An audit of LAION-400M itself "uncovered a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes."
It's almost as if these things are part of the social and cultural landscape...
> "LAION has a zero tolerance policy for illegal content,"
Evidently not. You put it in there without looking, and when you were forced to notice that it's there, left it up and running to continue making you money. Sounds like your tolerance has a dollar value on it.
...humanity shocked to find itself awful.
Honestly there's no way that anyone should be surprized by this. You train an AI - which lets remember is just a massive statistical inference model - on a huge sample of noisy data to see what falls out. Inevitably what will eventually fall out is a bunch of stuff you don't like. For me this has been the hilarious aspect of the whole rise of large language models and other statistical AI type things. Everyone keeps acting all shocked that they produce horrible content because they don't want to grapple with the idea that viewed totally objectively humanity is regularly really fucking horrible.
"Our AI should accurately respond to it's training data!" "Our AI has been trained on the largest dataset we could get our hands on!"
shortly followed by
"Our AI should never represent children in this way!"
Really? Shouldn't it? Because you seem to have forgotten how it works. The AI has no conscience. It has no moral filter. It doesn't have the concept of "not saying the quiet part out loud." If it's horribly sexist, or racist, or abusive then perhaps - just fucking perhaps - that's because humans often are and our collected works, on which this thing was trained, reflect that?
If it produces sexually explicit imagery maybe that's because utterly - unimaginably - vast quantities of the imagery on the internet is sexually explicit. If it does things - as it has done here - that are actually straight up illegal then that's because those things happen and are recorded as happening!
It doesn't know that this is bad. It doesn't have a concept of good or bad. It's just returning to us what was put in in all it's brutal, ugly, reality.
People getting all upset about AI doing things they don't like need to take a good hard look at the world and maybe do something about that. It's only showing us what we showed it.
It's a continual source of surprise to me that anyone thinks that data scraped from the internet is going to represent some sanitised version of humanity. Besides the outright illegal content, the internet is going to reflect all of the inequalities and inequities around us - from the dominant use in the West through to poor representation of minorities. Pointing out that AI trained on large data sets is racist and misogynistic is like pointing out the sky is blue.
They probably didn't, since they were referring to geographic differences. There is a lot more traffic going through countries like the UK, US, and Australia than there is in others. That traffic will represent the inhabitants of those countries more than those of others, regardless of the ethnic background of the users concerned. There is, for example, more likely to be data from people of African ancestry now living in those countries than those of African ancestry living in various countries in Africa where internet access is limited to a small subset of the population, even though the size of the latter group may be higher than the former. Similarly, the traffic generated on the African internet is likely to be biased towards countries like Nigeria and Kenya with a lot of internet infrastructure rather than countries like Chad or Eritrea which are quite lacking. This is a pattern that an AI trained on the internet will be repeating, along with many other patterns. Depending on what you want the model to do, these patterns, either that one or something else, may be desirable or undesirable, but ignoring them and expecting the AI to bypass them is a fool's errand.
An offence is being commited.
If they take an image and copy it to their systems, technically that could come under making an indecent image (downloading or saving in anyway is classed as making).
If they then create a new image, that is production of an indecent image.
If that is then passed onto a third party, then that becomes distribution.
The UK law is very clear, these images DO NOT have to be real, it's covered under pseudo-imagery, even hand drawn cartoons fall under this
The big question is, who is ultimately responsible?
There would have to be an element of intent or recklessness I would have thought.
Otherwise a lot of classical artwork depicting an infant Cupid(Eros) as well Renaissance works depicting Putti and Cupid could fall foul of this legislation. As the late Frankie Howerd* often reminded us any obscenity is in our minds.
One painting of the Virgin surrounded by putti has two of those putti more or less facing each other with legs intertwined at their groins that might be misinterpreted as a juvenile couple tribbing. All in the mind in as much as I seriously doubt angelic creatures are of any sex(gender) at all given there is no evidence of, or necessity for, angelic reproduction thus one would assume would also lack the requisite tackle.
*“I don’t mind being vulgar; that’s all right. Vulgarity laughs at itself. Filth is self-indulgent, if you see what I mean”