It's statistics darling
It's a machine learning algorithm, it learns translations from a corpus of texts. Its gender bias represents the bias of the corpus.
Google Translate is used by over 200 million people daily and, according to boffins from Brazil, its AI-powered tongue twisting tends to deliver sexist results. In a research paper distributed through pre-printer service ArXiv, "Assessing Gender Bias in Machine Translation – A Case Study with Google Translate," Marcelo Prates …
If you use the terminology machine learning algorithm, it's not a surprise. When it gets labelled artificial intelligence there's somehow an increased tendency to assume, erroneously, that the results will be free of human bias. There does seem to be an enthusiasm right now to over-sell the capabilities of machine learning.
will be free of human bias
Why?
Loosely speaking, we expect AI to aspire to human intelligence. Now you're saying we expect it to avoid - very specifically avoid - one aspect of human intelligence.
I expect the "bias" comes straight from the corpus: it's a statistical best choice. The observation about adjectives looks like pretty strong evidence of that.
It's a machine learning algorithm, it learns translations from a corpus of texts. Its gender bias represents the bias of the corpus.
------------------------------------------------------------------------------------------------------------------
'Bias' is often used as a perorative term
Better to say:
It's a machine learning algorithm, it learns translations from a corpus of texts.
Its assignment of gender in cases where this is not explicitly known represents an approximation determined by the circumstantially appropriate distribution of genders in the available samples.
This is hardly news. Anyone who's used any machine translation system over to translate from highly-gendered languages over the last 12 years will have seen this in the first 30 seconds.
It's easy to find many old articles for both technical and lay audiences covering this issue e.g. https://www.fastcompany.com/3010223/google-translates-gender-problem-and-bing-translates-and-systrans
Should the correct translation from Chinese, Hungarianm etc. be something like "This person is an engineer."? Is the non-gendered pronoun only/generally used to refer to a person who has already been identified in the text or conversation? Is the non-gendered pronoun the equivalent of the English word, 'it' but without the implication of being a thing instead of a person or is the word only used for people?
This is something I often encounter as a native speaker of Finnish, another language with gender-neutral pronouns. The solution depends on the context, so it requires some understanding of what is being said, which is why current machine translators perform badly. The problem often has no ideal solution, because choosing "he" or "she" may require information that the source text simply does not contain. Usually one defaults to "he", as "he/she" is too clumsy.
Going the other way also sometimes requires rephrasing the text. "He said, she said" does not translate directly into Finnish.The closest one can get is the use the nouns for "man" and "woman", but it is not quite the same.
Waitaminnit... it’s almost as if you’re not fully sharing the outrage here.
Well I'm certainly not "sharing the outrage". The study, or perhaps its conclusions are based on ignorance. If there's no overt clue, then the pronoun *should* be translated as "he" / "him" / "his", but the "other" versions of those words.
Yes, there are two versions of "he" (etc.):
* "he" that means "male person" (or occasionally male non-human animal).
* "he" that means "person of unspecified or unknown sex".
The second one is somewhat falling into disuse because people seem hell-bent on confusing the two and concluding that the speaker means the first when he(1) meant the second.
(1) In the second meaning, thanks.
(Partial irrelevance.) In French, the "person of unspecified or unknown sex" pronoun is "elle" = "she" because "personne" = "person" is grammatically feminine.
I think you need to look a little wider, because it's a bit more complicated than you seem to think. Languages change (unless perhaps you are a prescriptivist).
Gender was commonly used as synonym for sex up until maybe a century ago, when its dominant usage shifted towards the grammatical meaning before moving back a bit several decades ago. Note also that nouns described as gendered are not necessarily based on being feminine, neuter or masculine in languages outside of the Indo-European group.
Oddly, the PC crowd is not satisfied with breaking the grammar one way when they can break it two ways.
Consider the outrage about referring to an engineer of unknown sex as 'he' (the form that means he or she) while insisting that actors and actresses both be referred to as 'actors'.
This leads to the inevitable failures to communicate clearly, which I noted once again while looking a picture of an actor, or actress. I still don't know which, as the name was ambiguous with respect to sex, gender, nationality, etc. There were two people in the photo central area, one apparently female (lipstick, dress, long hair, big breasts) and one apparently male (suit, tie, short hair, full beard). The caption gave me a name... but I am still in the dark as to whom the caption referred.
Mission achieved, PC advocates, almost zero information transmitted.
This sort of thing led, more years ago than I care to remember, to the following correction in the Guardian:
A rigid application of the Guardian style guide caused us to say of Carlo Ponti in his obituary, page 34, January 11, that in his early career he was "already a man with a good eye for pretty actors…". This was one of those occasions when the word "actresses" might have been used.
>"He said, she said" does not translate directly into Finnish.
Aye, but it translates in real life. I dated a Finnish girl once and would still be if it wasn't for your awful language that adds suffixes to proper names.
Even Hungarian is distancing itself from 'Suomen'.
Even the Sami disown you, that's why they play football in ConIFA not UEFA.
English "they" doesn't communicate whether you mean singular or plural whereas s/he implies singular*
Singular:
I was speaking with a former colleague. "They" couldn't deal with that stupid manglement for another day.
Plural:
Those school kids on the train were so noisy. Why can't "they" stare at their mobile screens quietly like other normal people.
Notice how my second sentence doesn't on its own explain whether I mean one or many? So you've fixed one problem and introduced a new one.
In many cases, you don't need the additional gendered information, either because it has already been communicated and is therefore redundant or because whilst not communicated, it bears no relevance to your point.
*Doubtless someone will find some sentence which breaks my point.
"They then ran the sentences through Google Translate, via API, to see how Google's language model assigned gendered pronouns"
You'd think the university undergrads would be more tolerant of colour-diverse companies. Then I saw how these bigots only considered male, female and neuter options. Gender-fluids didn't get on a seat at the table and there was not a single two-spirit to be seen.
No need to call the new ghost-busters. The beams are already crossed.
Google translate:
> Laisse tomber ta culotte Sir William, je ne peux pas attendre l'heure du déjeuner
Deep L:
> Lâchez votre culotte, Sir William, je ne peux pas attendre le déjeuner.
Which brings me to a very important question. When going from a language with no/few familiar/formal pronouns forms, like English, to a language that is picky about them like French (or, worse, Korean), should the computer assume familiarity or not?
Sir William implies a certain formalism, IMHO, whereas drop your panties implies familiarity.
Don't forget 它 for "it".
More on point,
他们 -> they
她们 -> they
它们 -> they
Hey, we're cool as all three he/she/it third-person plurals go to the same place. We're sexism free, right?
But then
they -> 他们
Oops, defaults to the 'he' variant. But what would *you* translate bare ambiguous words to? Remember, your balls arereputation is on the line!
If I talk about nurses in general, I will use feminine pronouns on the grounds that the vast majority of nurses are female. Is this wrong?
If I talk about firemen in general, I will consequently use masculine pronouns, for the same reason.
One hundred years' ago, there were almost no female doctors, now, I wouldn't be surprised to learn that women are in the majority in the medical profession. I certainly now regard the stereotypical doctor as female (except The Doctor, who was, and shall remain, Tom Baker). The omnipresent masculine default pronoun hasn't held them back in this field.
Maybe young women just don't want to become engineers.
Are the authors of the study suggesting that one randomly chooses masculine and feminine pronouns when talking in general about professions and trades?
When talking about firewomen in general, I will use the feminine pronoun. I don't know if I prefer 'firemen' becoming gender-neutral or saying 'firemen and -women'. Is Penny in 'Fireman Sam' a fireman or firewoman? Firefighter, perhaps? I'd rather though that the whole cartoon was called 'Fireman Penny' and have her as the central character.
In the German-speaking countries, it has become common in print to see jobs in the feminine plural (with a capital I in the middle to denote the common plural). This is represented positively though. Thieves and other miscreants don't get this treatment.
"Are the authors of the study suggesting that one randomly chooses masculine and feminine pronouns when talking in general about professions and trades?"
They do.
(I've been using the singular they for so long I'm astonished people are still hold onto their he things)
"Since when has "they" become singular?"
Since about the 14th century; it has been in continuous use ever since and is accepted by pretty much every modern style guide. It is always good for a laugh watching wannabe grammar nazis fail at the English language while trying to correct others though.
Wow! Who is voting down the historically accurate singular use of they / them / their / theirs / themself.
Is it grammar nazis or SJW nazis - both of which seek to deny the historic use of 'they' as a non-gender singular term.
It is easily seen in Chaucer's Canterbury Tales (c 1400):
And whoso fyndeth hym out of swich blame,
They wol come up and offre on Goddes name
And whoever finds himself out of such blame,
They will come up and offer in God's name
Wrong: this is not a singular use. "whoever finds himself" in this context delivers a collective result - meaning all the people in this category, ranging from 0 to infinity, hence the use if "They".
This is the equivalent of an SQL "select where X is guilty" which delivers an unknown number of matches.
To refer to an unknown person: for ages. "Would they owner of car registration number AA123AAA please go to the car park as they have left their lights on." "I saw someone on the hilltop and waved to them but they didn't wave back"
To refer to a known individual, much more recently: "Jo can't afford a Goth costume to annoy their parents with, so they are identifying as non-binary for a bit."
Why not, if you care about gender equality, just use `she/her` instead of `he/him` on occasion? Perhaps 51% of the time, according to demographics. 36% of the time as prescribed by BLS. 99% of the time because you feel like it. That's what I do, 1/4-1/3 of the time.
No one is going to step on your toes for doing so, you can easily call them out as a sexist for insisting that a generic customer/computer user automatically be assumed to be a man. And, IMHO, you are indeed calling into question why that assumption should be codified in the first place.
Instead of insisting on pretend neutrality why not go all the way and just promote female presence?
Were your CIO to chastise you for it, you could easily tell her off.
Case solved, without me having to mangle my own speech in a use that seems mandated by people who like to tell us what to think and how to say it. I respect, somewhat, the intent behind using `they` but find its actual use awkward at best and symptomatic of people who doth protest too much.
Are the authors of the study suggesting that one randomly chooses masculine and feminine pronouns when talking in general about professions and trades?
----------------------------------------------------------------------------------------------------------
Why not simplify all our lives and just use 'it' when 'he' or 'she' is not clearly correct.
That way we won't have to go through it all again when AIs start taking up these professions.
This post has been deleted by its author
Erm what?
1. On one hand the article seems to imply that Google Translate should be aware of job fields with respect to gender ratios whilst simultaneously complaining that STEM fields are male by default. What if there are more men in a given field?
2. Surely it would be more sexist to stereotype certain jobs as being male or female?
3. If I gave you a sentance using gender neutral pronouns, would you be able to tell me whether to use he or she? No! How do you expect the computer to be able to?
4. There are fewer female software developers than male developers. By far less than 30%. If Google has 30% female developers, they're doing very well.
5. If Google Translate had showed a bias towards female pronouns, would anyone have accused Google of sexism? Why is it automatically problematic to default to male pronouns but not female pronouns?
6. Relating to (5), why do some activist/far-left types seem to hate men so much? What did my penis ever do to you?
@paper - spot on for your points. I would add...
Google reported 30% female staff, not 30% female developers. Given that in areas such as finance, HR, and marketing tend to be at least 50% female, I guess that females are less than 30% of their developers.
"Why is it automatically problematic to default to male pronouns but not female pronouns?"
The reasoning behind this is that an individual or group with more power has to have more responsibility in use of that power, and conversely a 'weaker' individual or group is allowed more leeway. the idea itself is actually well-accepted all across modern life. For example if a small local bookshop runs a sale to undercut Amazon then it's not problematic but if a giant company such as Amazon undercuts local bookshops to drive them out of business it's monopolist behaviour. A journalist attacking a government is acceptable while governments using their power to harass, intimidate or imprison journalists is not.
Ultra-feminists / hard lefties want to extend this principle to male-female relationships with the argument that there is an inherent male-female imbalance that needs to be corrected*, hence defaulting to female is OK while defaulting to male is sexist. While I see some merit in this reasoning and I think it's useful to apply in limited scope**, it is insane to take it to the extreme that males (or whites etc) are oppressing females (other races etc) simple by being male (white etc)
*see also - affirmative action
**eg the Rooney rule
Many languages have gender built in, and not necessarily in a balanced way (ahem). This argument would struggle to exist in some non-English speaking scenarios.
The API is also "trained" by going though lots of existing texts, which may, or may not be overtly sexist. More than likely they are talking about males in engineering because you cannot change the past... This will however create some perceivable bias just like talking to real people of certain age.
Also its a bl**dy software tool, not a person. People moderate and adjust their language to the context and target audience. It is simply not possible to expect that from software.
Overall, behaving as expected, and unlikely to be changed without completely borking it.
Personally, when I ask for a translation, I want to know what the original says. I don't much want to know what statistically an ambiguous pronoun is more likely to mean, I can work that out for myself. Maybe I am unusual in that. Do other people want a translation that reads easily even if there is a good chance it could be wrong?
That might mean an awkward construction. So what, if the meaning is clear? Machine translation should not be for unedited presentation to the public.
This is part of a bigger problem. Google comes up with a single translation even if there are many possibilities, and almost never confesses to failure. Just as Google Maps will always come up with a definite destination even if the input is ambiguous. They hate to make it appear that they might be wrong.
The Google apps are so useful that we've learnt to accept this as a foible, but it would be a lot better if they could admit to their lack of perfection - which is actually very obvious to all frequent users.
Google comes up with a single translation even if there are many possibilities,
Actually it's better than that. Click on the translation (or individual words or phrases) and it will offer alternatives. Seems like a pretty powerful tool to me, if you care enough to use it.
Languages with only two genders (e.g. French) may be mistranslated into English using a gendered pronoun - he or she, when a native English speaker would use 'it'. I come across this both in Google translate and in use by native French speakers. It makes it obvious that the person or machine doing the translation isn't fully fluent, but it hardly gets in the way of making sense of what is being said.
If you're translating from French (or German, Italian or Spanish), do yourself a favour and use DeepL... If DeepL doesn't have your language, use Bing Translator in preference to Google. Google is a really dumb pattern matching engine, whereas those two use more sophisticated linguistic analysis (DeepL is in a class above Bing, though)
Google Translate was cool a decade ago, but is a very poor tool by today's standards. Sadly, its ubiquity makes people think it's in some way representative of modern machine translation.. an error this paper's authors have helped to perpetuate.
There aren't any. The languages referred might not have gendered pronouns but that is not the same has not having a concept of gender. You could make an argument that the results are yet more evidence of cultural imperialism but I suspect the effects, when compared with sites like YouTube are negligble. But more importantly, the study seems to be ignoring why people are looking for translations.
There is no concept of "gender" in Japanese. Obviously it has words for "man", "girl" and soon, but nouns and pronouns do not have grammatical gender and the word for "driver" or "bookseller" is the same whether the person in question is male, female or one of those strange things you find in universities these days.
There is no concept of "gender" in Japanese
I know nothing about Japanese grammar but Wikipedia has this to say:In the modern Japanese, kare (彼) is the male and kanojo (彼女) the female third-person pronouns. and even that particular speech patterns are considered male or female.
Linguistically the important thing is that the grammatical role of gender has little or no ideological relevance, ie. in the real world.
What's worse about this is that Chinese doesn't have gender-neutral pronouns. When spoken, there's no differentiation between He, She and They/It - but when written, the characters used are different for each. That means that whatever research they did on this was either based on a false premise or not explained correctly.
No need to go back to Latin. Modern gendered languages like German, too.
Modern Italian (probably the nearest thing to a direct descendent of Latin) has a different issue. Pronouns are gendered, but take the gender of the object, not the subject. So english "His House" would take a feminine pronoun to match the feminine "La Casa" - the house, whereas "Her Apartment" is masculine.
"Modern Italian (probably the nearest thing to a direct descendent of Latin) has a different issue. Pronouns are gendered, but take the gender of the object, not the subject. So english "His House" would take a feminine pronoun to match the feminine "La Casa" - the house, whereas "Her Apartment" is masculine."
I think most do that, no? Whenever I translate sentences into other languages in my head, I always get worried. And in this case I cannot use Google Translate to check! I think it works that way in German, French and Spanish.
I think it works that way in German, French and Spanish.
Nope, German has the differentiation between his (sein/e) and hers (ihr/e) and agreement with the gender of the object. Gendered pronouns are about specificity: whose ball is that?. If the pronoun doesn't provide the information then something else in the context will have to. Or the audience will ask. My Swedish teacher says that in Finnish (she's a Finnish Swede), which is famously ungendered, it is common to ask whether a man or a woman is the subject, because it is often not clear from the context.
Back to romance languages: the absence of gender of the possessive third-person pronoun can be set against the use of gender in the third party plural personal pronoun, at least in French: ils they (male), elles they (female). Then again, personal pronouns are less important in romance languages than they are in Germanic ones, French being an odd mixture of the two.
Are these researchers assuming a persons pronouns? What about all the others for the gender confused (or is it fluid?)? For some amusement-
https://www.youtube.com/watch?v=IzNGkwGYE4E
Of course if they dont like the english bias of google translate they could go and make their own translator that does whatever the hell they want. Naa didnt think so.
"Google Translate typically uses English as a lingua franca to translate between other languages."
Clearly not designed by linguists, who would use etymons and flags for case, number, and gender. Really, the tool should be called "Google Transliterate", because that's all it's doing.
Consider a simple English <-> Spanish example that vexed students in an adult language class in Spain that I attended. There are often no gender neutral ways to make simple declarations about gendered subjects in a non-awkward fashion
I have three children/sons -> Tengo tres hijos
I have three daughters -> Tengo tres hijas
Performing a reverse translation in Google Translate
Tengo tres hijos -> I have three children
- the only alternate translation offered is "three kids" and not "three sons", potentially losing some detail
I have three children but no sons -> Tengo tres hijos pero no hijos
- which is rather confusing as it's the same as "I have three sons, but no sons", so you'd have to say something like
I have three children - only daughters -> Tengo tres hijos, solo hijas
And don't get me started on gendered professions