machine translation bias removal tool
DEMO
Forms of address

Why machine translation has a problem with ‘you’

This innocent-looking English pronoun is surprisingly difficult to translate into other languages.

Have you ever stopped to think how weird it is that the English pronoun you can be both singular and plural? In sentences such as you are here or are you happy? it can refer to either one person or to several people, and it isn’t clear which meaning is intended: you simply have to be there while it’s being said, otherwise you don’t know.

This is unusual. Most other English words that refer to people or things have separate singular and plural forms. Practically all nouns do (friend and friends, house and houses) with only a few exceptions (don’t ask me why the plural of fish is fish). All other pronouns do too: we have I versus we in the first person and he/she versus they in the third person – but in the second person we only have you.

This is one of the things where English is the odd one out among European languages. Most other languages have separate pronouns for singular and plural you. French has singular tu and plural vous, German du and ihr, Czech ty and vy, Irish and sibh and so on.

Who are you to me?

And that’s not the end of it. Another thing which is very common in European languages when addressing people with second-person pronouns is that you have a choice between two levels of formality: a casual level and a polite level. In French, tu is for addressing a single person, but only if you’re addressing him or her casually, such as when it’s a friend of yours. If it’s, let’s say, a random stranger you’ve just bumped into in the street or a customer in your restaurant or something like that, then you need to address them using the polite version of tu, which is... wait for it... vous! Yes, vous actually has two meanings in French: in the singular it’s a polite version of tu, while in the plural it’s just the plural. Complicated? Let’s summarise it in a table.

singular plural
casual tu vous
polite vous vous

There is a metaphor behind all this. If you’re talking to somebody where the polite form of address is needed, you do it as if you were talking to a group of people. Many European languages have this metaphor built-in, including Romance languages such as French and Italian, and Slavic languages such as Czech and Russian. (Historical interlude: English used to have this metaphor too, the singular casual pronoun was thou and the plural-or-polite one was you. Somehow thou fell out of use and you expanded to take its place.) An exception is German which has a different metaphor: talking to somebody politely is like talking about a group of people in the third person (they). So German has not two but three words for you, two casual ones (singular du, plural ihr) and one polite one (Sie for both singular and plural, always written with a capital S). OK, this is getting complicated again, so let’s summarise it in another table.

singular plural
casual du ihr
polite Sie Sie

Context and where you might find it

All this means that translating any sentence with you in it into another language is not as straightforward as people might think. You need to know who you’re talking to and what your mutual relationship is. In other words, you need to have some context.

Well, context... what is it, anyway? In translation studies, context is a fancy word for “information you need to know in order to understand something which was not said explicitly”. Context can come in two forms: it can be inside the text or outside the text.

An example of context inside the text is the sentence you know it yourself. Here the word yourself is giving you the information you need to figure out that you is singular (if it were plural it would be yourselves). Similarly, in the sentence dude what have you done? the word dude is informal (and singular) and that’s how you can figure out that the casual form of address is appropriate here: you should be translated as tu rather than vous, as du rather than Sie. All these clues are examples of context being available right there inside the text.

But consider this: what if there are no such clues in the text? Take a sentence such as you are here. There are no clues inside the sentence to help you figure out whether the you should be translated as singular or plural, as casual or polite. That’s when you need to look for context outside the text: you need to know who is saying it to whom, look at the people, understand what their relationship is, and if you don’t know, ask. That’s why good translators ask follow-up questions: they ask how exactly do you mean that? before they give you a translation.

Good translators, bad translators

That’s what human translators do. But what about machine translation? Machine translators such as Google Translate and DeepL are getting better all the time at understanding context when it’s inside the text. So, in a sentence such as you know it yourself, the presence of yourself is usually enough to tip a machine translator towards the singular reading of you, and yourselves towards the plural. Machines weren’t always capable of doing this, picking up on hints from context, but technology is making fast progress and now machines are almost as good at it as humans.

The final frontier is context outside the text. Machine translators don’t have eyes and ears, they can’t really see what’s out there, who the people are, what their relationships are. Type a sentence like you are here into Google Translate or DeepL: the text you have typed is all the machine has. How then is the machine supposed to know whether the you is singular or plural, casual or polite? There are absolutely no clues inside the text, and whatever clues may exist outside the text are inaccessible. In a situation like this, machines usually make an assumption, based on what they have seen more frequently in their training data. But that assumption is biased: the machine has no good reason to believe it knows what you mean by you, because you haven’t told it and it hasn’t asked you.

Can this final frontier be breached? Not with conventional methods. No AI, however intelligent, can ever guess the unguessable. So, over here in the Fairslator project, we’re doing it differently: we’re developing translation technology which knows when to ask for outside context. If you type a sentence such as you are here into Fairslator and if you ask for a translation into a language where there are multiple equivalents of you, it will ask: who is this you? Is it one person or many? Are we addressing them casually or politely? Based on your answers Fairslator will reinflect the translation with the correct pronoun. Because why assume if you can ask?

Contact the author

What next?

Read more about bias and ambiguity in machine translation.
Cover page
We need to talk about bias
in machine translation
The Fairslator whitepaper
Download
Sign up for my very low-traffic mailing list. I'll keep you updated on what's new with Fairslator and what's happening with bias in machine translation generally.
Your address is safe here. I will only use it to send you infrequent updates about Fairslator. I will not give or sell it to anyone. You can ask me to be taken off the list at any time.

Faislator blog

How gender rewriting works in machine translation
This is how Fairslator deals with gender-biased translations.
Introducing the Fairslator API
Like what Fairslator does? Want to have something similar in your own application? There's an API for that!
Google Translate versus gender bias
How does Google Translate handle gender-ambiguous input? With difficulty.
Kann man das Gendern automatisieren?
Überall Gendersternchen verstreuen und fertig? Von wegen. Geschlechtergerecht zu texten, das braucht vor allem Kreativität.
Three reasons why you shouldn’t use machine translation for French
But if you must, at least run it through Fairslator.
Tusa, sibhse agus an meaisínaistriúchán ó Bhéarla
Tugaimis droim láimhe leis an mhíthuiscint nach bhfuil ach aon aistriúchán amháin ar gach rud.
Finally, an Irish translation app that knows the difference between ‘tú’ and ‘sibh’
It asks you how you want to translate ‘you’.
10 things you should know about gender bias in machine translation
Machine translation is getting better all the time, but the problem of gender bias remains. Read these ten questions and answers if you want to understand all about it.
Finally, a translation app that knows the difference between Czech ‘ty’ and ‘vy’!
Wouldn’t it be nice if machine translation asked how you want to translate ‘you’?
Finally, a translation app that knows the difference between German ‘du’ and ‘Sie’!
Wouldn’t it be nice if machine translation asked how you want to translate ‘you’?
Gender versus Czech
In Czech we don’t say ‘I am happy’, we say ‘I as a man am happy’ or ‘I as a woman am happy’.
Představ si, že jseš stroj, který překládá
Proč se překladače nikdy neptají, jak to myslíme?
Stell dir vor, du bist DeepL
Warum fragt der Übersetzer eigentlich nicht, was ich meine?

Fairslator timeline

icon wider October 2024 — We were talking about bias in machine translation at a Translating Europe Workshop organised by the European Commission in Prague as part of Jeronýmovy dny, a series of public lectures and seminars on translation and interpreting. Video here »
icon September 2024 — We presented a half-day tutorial on bias in machine translation at this year's biennial conference of AMTA, the Association for Machine Translation in the Americas.
icon December 2023 — Fairslator presented a workshop on bias in machine translation at the European Commission's Directorate-General for Translation, attended by translation-related staff from all EU institutions.
icon November 2023 — Fairslator went to Translating and the Computer, an annual conference on translation technology in Luxembourg, to present its brand new API. Proceedings from this conference are here, our paper starts on page 98.
icon November 2023 — We were talking about gender bias, gender rewriting and Fairslator at the EAFT Summit in Barcelona where we also launched an exciting spin-off project there: Genderbase, a multilingual database of gender-sensitive terminology.
November 2023 — English–French language pair added to the Fairslator API.
July 2023 — The Fairslator API was launched. Explore the API or read the announcent: Introducing the Fairslator API »
icon February 2023 — We spoke to machinetranslation.com about bias in machine translation, about Fairslator, and about our vision for “human-assisted machine translation”. Read the interview here: Creating an Inclusive AI Future: The Importance of Non-Binary Representation »
icon October 2022 — We presented Fairslator at the Translating and the Computer (TC44) conference, Europe's main annual event for computer-aided translation, in Luxembourg. Proceedings from this conference are here, the paper that describes Fairslator starts on page 90. Read our impressions from TC44 in this thread on Twitter and Mastodon.
icon September 2022 — In her article Error sources in machine translation: How the algorithm reproduces unwanted gender roles (German: Fehlerquellen der maschinellen Übersetzung: Wie der Algorithmus ungewollte Rollenbilder reproduziert), Jasmin Nesbigall of oneword GmbH talks about bias in machine translation and recommends Fairslator as a step towards more gender fairness.
icon September 2022 — Fairslator was presented at the Text, Speech and Dialogue (TSD) conference in Brno.
icon August 2022Translations in London are talking about Fairslator in their blog post Overcoming gender bias in MT. They think the technology behind Fairslator could be useful in the translation industry for faster post-editing of machine-translated texts.
August 2022 — A fourth language pair released: English → French.
icon July 2022 — Germany's Goethe-Institut interviewed us for the website of their project Artificially Correct. Read the interview in German: Wenn die Maschine den Menschen fragt or in English: When the machine asks the human, or see this short video on Twitter.
icon May 2022Slator.com, a website for the translation industry, asked us for a guest post and of course we didn't say no. Read What You Need to Know About Bias in Machine Translation »
April 2022 — A third language pair added: English → Irish.
February 2022 — Fairslator launched with two language pairs: English → German, English → Czech.