GIGO: the modern version
> these models often exhibit the same biases found in their training data.
What else would anyone expect?
Bias in, bias out. Sad but inevitable. And the fault lies in the data not the AI.
US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate. "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 …
It's not AI.
It's just a transform from input to output. Potentially complex from a human perspective but just a transform. It models and represents only aspects of the training data.
Personally I wonder about this, the researchers should understand the provenance, validity and scope of their data. If they're just using someone else's scripts to train a model (eg lots of 'researchers' use YOLO for image recognition as it has easy to use training scripts - but don't really know how it works ) then that is poor.
So researcher doesn't know exactly what they're doing, identifies biases and thinks that is indicative of racism / sexism...
No, this is quite poor and indicates a lack of understanding of machine learning and statistics.
Interesting: I myself am just an input -> output transformer (okay, with a bit more interactive feedback, maybe). I model and represent aspects of my training data (which I prefer to call "life experience") on the basis of my design (which I prefer to call "evolution"). And, try as I might, no doubt my output reflects biases in my training data. I accept this may be a design flaw.