machine translation bias removal tool
DEMO
Machine translation

Google Translate versus gender bias

How does Google Translate handle gender-ambiguous input? With difficulty.

Google has been trying to “solve” gender bias in machine translation since as far back as 2018. That’s when they published their first-ever blog post on the subject and that’s when they first unveiled a feature in Google Translate which gives you not one but two translations for some queries. You can see it in action for yourself if you go to Google Translate and ask it to translate “I am a doctor” from English into Spanish. It will give you two translations, one with “male doctor” and one with “female doctor”. They call this gender-specific translations and it is available in a handful of language pairs, both from English (notably English to Spanish) and into English (eg. Turkish to English).

Gender-specific translation in Google Translate.

As you may know, Fairslator has a feature like this too, although in different language pairs (currently: English to French, German and Czech). So the interesting question is, how do the two compare? What’s same, what’s different, and which is better? In this article I’m going to discuss how Google is “doing” gender-specific translations and talk about the differences between their approach and Fairslator’s.

Difference 1: how it works inside

How does Google’s gender-specific translation feature work inside? Fortunately, Google is not too secretive (unlike DeepL), they have published blog posts and academic papers which explain in broad terms how they have done it.

They have obviously experimented with different strategies and even changed course completely at least once. Their original solution from 2018 was based on the idea that you “pre-edit” your training data by inserting invisible gender tokens into the source-language side, and then you machine-learn your translator from that. The translator is then able to take an input, insert the same invisible gender tokens into it, and produce separate “masculine” and “feminine” translations.

Then, sometime around the year 2020, they changed their mind and adopted a different strategy. They’re no longer pre-editing the training data. Instead, they allow their machine-learned models to produce gender-biased translations as before, and they rewrite them now. Rewriting is an additional step which happens after the translation is produced but before it is shown to a human user. It takes a target-language text which is gender-biased in one way and biases it the other way (for example from male doctor to female doctor).

If you know anything about Fairslator you will immediately notice that this, Google’s post-2020 strategy, is similar to how Fairslator is doing it. Fairslator too works by rewriting the output of machine translation. In fact, that’s how Fairslator is able to work with any machine translator (not just Google but also DeepL and others): it treats the translator as a black box and only looks at its output, scanning it for evidence of bias and rewriting it if necessary.

Reinflecting a translation with Fairslator.

So yes, both Google and Fairslator have the same strategy: rewriting. In Fairslator I call it reinflection but it’s more or less the same thing. The biggest difference, though, is that Google’s rewriting algorithms are machine-learned and probabilistic, while mine (Fairslator’s) are hand-coded and rule-based. I am not going to argue which is better or which produces better results, but consider this: Google’s approach probably requires large amounts of training data, a powerful machine-learning infrastructure and a big team of people (just look at the long lists of acknowledgements in their blog posts), while I was able to produce comparable results just by spending a few afternoons hand-coding reinflection rules on my laptop. I suspect that rewriting/reinflection is a task easily solvable by old-fashioned, rule-based, lexicon-powered tech and that throwing machine learning at it is an overkill. But, of course, this is just my personal hunch and I could be wrong. We won’t know until somebody has done a proper comparative evaluation.

I have noticed one important difference between Google and Fairslator though. Google’s gender algorithms seem to spring into action only on simple textbook-like sentences such as “I am a doctor”. If you give it a more complex real-world sentence where the gender-ambiguous word is somewhere deep down the syntax tree, it seems to give up and returns only a single (biased) translation. Fairslator doesn’t give up on complex sentences. Try it with something like this: “The nurse kept on trying to save the patient’s life even after the doctor gave up.

Difference 2: the user experience

When Google Translate gives you two translations instead of one, the translations are given side-by-side and labelled “feminine” and “masculine”. I wouldn’t be so sure that everybody always understands what these labels mean. These are technical terms from linguistics. In a sentence such as “How do you like your new doctor?” is it clear that “feminine” means “use this translation if you mean a female doctor”? A linguistically naive user (that is, most people) might not be sure.

Another potential problem I see is that, for somebody who doesn’t speak or read the target language, it is not obvious who the “masculine” and “feminine” labels apply to. The doctor, the ‘you’, or perhaps myself when I say the sentence? All that the label is saying is that “there is something maculine” and “there is something feminine” about this sentence – but what?

Fairslator asks human-understadable disambiguation questions.

Fairslator is more, let’s say, informative about these things. Fairslator helps you choose the right translation by asking disambiguation questions which reuse words from the source-language sentence, such as “Who is the doctor? A man or a woman?” With questions like these, we are telling the human user exactly who the gender dictinction applies to, and we are saying it with everyday words the user is likely to understand: “man” and “woman”, not “masculine” and “feminine”.

I don’t know why Google has decided to give their users such a spartan, uninformative user experience. But maybe they haven’t exactly decided that, maybe it’s a consequence of the technology they have under the hood. Their algorithms for detecting and rewriting biased translations have probably been trained to know that there is something feminine or masculine about the sentence, but they don’t know what it is. To be able to machine-learn such things would require a training dataset of a kind that doesn’t exist. Fairslator, on the other hand, is not dependent on training data. Fairslator consists of hand-coded rules which are able to discover exactly the information that Google is missing: who the gender distinction applies to.

Conclusion

I know what you’re thinking: the Fairslator guy has written an article comparing Google Translate and Fairslator, and surprise surprise, Fairslator comes out as the winner. So let me summarize what I’m claiming and what I’m not.

I’m not claiming that Fairslator’s rule-based algorithm has better coverage, better precision or better whatever metric, compared to Google Translate’s machine-learning approach. All I’m saying is that Fairslator’s performance seems comparable to Google’s on simple sentences, and seems to exceed Google’s on complex sentences. The final verdict would have to come from a comprehensive evaluation, which I haven’t done (yet).

What I am claiming is that Fairslator provides a better user experience. It puts easy-to-understand disambiguation questions to the users, guiding them towards the correct translation even if they don’t speak the target language. This is a subjective judgement, but I really do think Google has underestimated the importance of the user experience here.

Google is obviously very interested in gender bias, if their publicly visible research output is anything to go by. They talk about it often at NLP conferences and they’ve published quite a few papers on the subject. But the tech they’ve developed so far is not fully baked yet.

Contact the author

What next?

Read more about bias and ambiguity in machine translation.
Cover page
We need to talk about bias
in machine translation
The Fairslator whitepaper
Download
Sign up for my very low-traffic mailing list. I'll keep you updated on what's new with Fairslator and what's happening with bias in machine translation generally.
Your address is safe here. I will only use it to send you infrequent updates about Fairslator. I will not give or sell it to anyone. You can ask me to be taken off the list at any time.

Faislator blog

| Infographic
How gender rewriting works in machine translation
This is how Fairslator deals with gender-biased translations.
| Announcement
Introducing the Fairslator API
Like what Fairslator does? Want to have something similar in your own application? There's an API for that!
| Gendergerechte Sprache
Kann man das Gendern automatisieren?
Überall Gendersternchen verstreuen und fertig? Von wegen. Geschlechtergerecht zu texten, das braucht vor allem Kreativität.
| Oh là là
Three reasons why you shouldn’t use machine translation for French
But if you must, at least run it through Fairslator.
| Ó Bhéarla go Gaeilge
Tusa, sibhse agus an meaisínaistriúchán ó Bhéarla
Tugaimis droim láimhe leis an mhíthuiscint nach bhfuil ach aon aistriúchán amháin ar gach rud.
| Machine translation
Finally, an Irish translation app that knows the difference between ‘tú’ and ‘sibh’
It asks you how you want to translate ‘you’.
| Forms of address
Why machine translation has a problem with ‘you’
This innocent-looking English pronoun is surprisingly difficult to translate into other languages.
| Male and female
10 things you should know about gender bias in machine translation
Machine translation is getting better all the time, but the problem of gender bias remains. Read these ten questions and answers if you want to understand all about it.
| Machine translation in Czech
Finally, a translation app that knows the difference between Czech ‘ty’ and ‘vy’!
Wouldn’t it be nice if machine translation asked how you want to translate ‘you’?
| Strojový překlad
Představ si, že jseš stroj, který překládá
Proč se překladače nikdy neptají, jak to myslíme?
| Gender bias in machine translation
Gender versus Czech
In Czech we don’t say ‘I am happy’, we say ‘I as a man am happy’ or ‘I as a woman am happy’.
| Maschinelle Übersetzung
Stell dir vor, du bist DeepL
Warum fragt der Übersetzer eigentlich nicht, was ich meine?
| German machine translation
Finally, a translation app that knows the difference between German ‘du’ and ‘Sie’!
Wouldn’t it be nice if machine translation asked how you want to translate ‘you’?

Fairslator timeline

coming up icon wider October 2024 — We will be talking about bias in machine translation at a Translating Europe Workshop organised by the European Commission in Prague as part of Jeronýmovy dny, a series of public lectures and seminars on translation and interpreting.
icon September 2024 — We are presenting a half-day tutorial on bias in machine translation at this year's biennial conference of AMTA, the Association for Machine Translation in the Americas.
icon December 2023 — Fairslator presented a workshop on bias in machine translation at the European Commission's Directorate-General for Translation, attended by translation-related staff from all EU institutions.
icon November 2023 — Fairslator went to Translating and the Computer, an annual conference on translation technology in Luxembourg, to present its brand new API.
icon November 2023 — We were talking about gender bias, gender rewriting and Fairslator at the EAFT Summit in Barcelona where we also launched an exciting spin-off project there: Genderbase, a multilingual database of gender-sensitive terminology.
November 2023 — English–French language pair added to the Fairslator API.
July 2023 — The Fairslator API was launched. Explore the API or read the announcent: Introducing the Fairslator API »
icon February 2023 — We spoke to machinetranslation.com about bias in machine translation, about Fairslator, and about our vision for “human-assisted machine translation”. Read the interview here: Creating an Inclusive AI Future: The Importance of Non-Binary Representation »
icon October 2022 — We presented Fairslator at the Translating and the Computer (TC44) conference, Europe's main annual event for computer-aided translation, in Luxembourg. Proceedings from this conference are here, the paper that describes Fairslator starts on page 90. Read our impressions from TC44 in this thread on Twitter and Mastodon.
icon September 2022 — In her article Error sources in machine translation: How the algorithm reproduces unwanted gender roles (German: Fehlerquellen der maschinellen Übersetzung: Wie der Algorithmus ungewollte Rollenbilder reproduziert), Jasmin Nesbigall of oneword GmbH talks about bias in machine translation and recommends Fairslator as a step towards more gender fairness.
icon September 2022 — Fairslator was presented at the Text, Speech and Dialogue (TSD) conference in Brno.
icon August 2022Translations in London are talking about Fairslator in their blog post Overcoming gender bias in MT. They think the technology behind Fairslator could be useful in the translation industry for faster post-editing of machine-translated texts.
August 2022 — A fourth language pair released: English → French.
icon July 2022 — Germany's Goethe-Institut interviewed us for the website of their project Artificially Correct. Read the interview in German: Wenn die Maschine den Menschen fragt or in English: When the machine asks the human, or see this short video on Twitter.
icon May 2022Slator.com, a website for the translation industry, asked us for a guest post and of course we didn't say no. Read What You Need to Know About Bias in Machine Translation »
April 2022 — A third language pair added: English → Irish.
February 2022 — Fairslator launched with two language pairs: English → German, English → Czech.