Languages - how to decide which ones and how many?

You’re developing a new cool technology product, and you want it to reach emerging markets users, or perhaps you’re targeting a specific region of the world. Clearly everyone understands that the product needs to support languages other the English, but the question is which and how many? In this chapter, we will look at how to go about analyzing which languages you need.
Shop signs, Addis Ababa, Ethiopia.
Ethiopia is a highly multi-lingual country, but only Amharic along with English are used for writing purposes 

The topic of languages is one of the most complex and fascinating in dealing with emerging markets. Indeed, the language landscape and the complicated interaction between languages is profoundly different in many African and Asian countries than it is in the West (luckily, Latin America is far simpler from a language perspective).

There is no easy or generic answer to these questions - it mostly depends on the dynamics in each specific target country, the target demographic of the product (eg urban vs rural, middle class vs low income), and, very importantly, whether the text in question relates to written text (for example, a website) or verbal text (for example, an Interactive Voice Response (IVR) system).

Let’s start with the basics - according to Ethnologue, the world has over 6,000 living languages. About 300 of these languages have large populations - with over 1M native speakers. Some countries are immensely multi-lingual - these are countries which their geographic makeup historically supported the existence of a large number of reasonably isolated communities - most notably DR Congo with 300 languages, Indonesia, with about 700 and Papua New Guinea, with a total population of 7 million people, who speak 750 languages between them. These are extreme cases, but most African and Asian countries have 3-10 languages or more.

In my travels in over 60 countries, I have noticed a few recurring themes related to languages, which I think are important to understand:

(1) Disconnected remote areas are no longer. They are a thing of the past. Well over 99% of the world’s population, even those living in poor countries in remote areas with little infrastructure, are culturally and linguistically connected to the country’s core, usually meaning the capital city. A couple of years ago I was visiting a remote village in Uganda for a local celebration. By chance, I met an Israeli teenager who was spending her school holiday with her father who works in Uganda. Asking her of her first impression of Africa, her response stunned me: “this place is so strange... there is no running water, no electricity, but there is Internet everywhere!” - it stunned me because it is spot-on, and yet it never occurred to me that it is strange. It takes surprisingly little time to get use to checking out the news on one’s smartphone or IMing with colleagues on the other side of the planet while the phone is charging from a stand-alone car battery in a village (i.e. just a battery without a car). The telecommunication infrastructure, mostly in the form of Edge (2.5G) connectivity, reaches far beyond the reach of paved roads or electricity. In fact, it is becoming increasingly hard to find any significant human settlements anywhere in the world today that don’t have at least Edge connectivity - and where gaps exist, they are closing fast.

According to the International Telecommunications Union (ITU), in 2011, 90% of the population of the world had 2G phone coverage at home (which almost always includes Edge or at least GPRS), with half of them having 3G coverage. Today the numbers are surely even higher. So for almost all people on the planet, a phone is never too far away for catching up with relatives in the city. Satellite TV, which might not be installed in most homes in rural areas (no electricity...), but with all likelihood can be found in the village bar or a local restaurant, brings programs in the national language to every corner of the country. Relatives coming home to the village from the city to celebrate national holidays and family events, bring with them knowledge about smartphones, VoIP, social networks and mobile money. A small number, but not an insignificant one, of young people in rural areas already use the Internet either at Internet Cafes when visiting the nearby upcountry town or on their phones.

In other words, it is increasingly hard to find any remaining people in the world that don’t have frequent connections with people, such as friends and relatives in their country’s capital city.

This frequent access to the country’s cultural center, and in particular access to national TV, brings fluency in the national language. Indeed, in remote rural areas in many countries in Africa and Asia, one can see a very clear generational gap between the older generation that is monolingual in their ethnic language, and the younger generation that is fluent in the national language in addition to their mother tongue.

(2) In multilingual countries, most people are multilingual. A few months ago I was negotiating buying some traditional ceremonial items at a market in Goma, in the rebel-infested region of eastern Congo, with the generous help of a Rwandan friend. The Congolese seller was effortlessly switching, within a single sentence, between four languages - French, English, Kiswahili and Kinyarwanda - trying to negotiate a common language for the three of us (I speak lousy French and very basic Kiswahili, while my friend speaks Kiswahili, Kinyarwanda and English, and the seller speaks French and Kiswahili and very basic English and Kinyarwanda). It was fascinating to see how the conversation evolved by picking a language, almost on a word-by-word basis, that would be intelligible by the three of us: Try the Kiswahili word; If that didn’t work, switch to the French word, and so on. The negotiation took more than half and hour, partly because I was thoroughly enjoying the linguistic side of the discussion! This type of conversation is the norm rather than the exception in Africa - when people of different ethnic groups meet, at first there is an attempt to identify a bridge language - usually the national language of the country will do. If that doesn’t work, for example, because, as in our case, the speakers are from different countries, each person draws on the many languages he can somewhat speak to find a common vocabulary. It helps that most African and many Asians (most notably Indians) are high multilingual - I rarely meet an African or an Indian that doesn’t speak, at least at a basic level, at least 3 or 4 languages, and often as many as 6 or 8. This includes poor rural people.

(3) The language of instruction chosen for public high schools education becomes the primary written language. The choice of the language of instruction is a contested political topic in some countries, touching on the relationship between the various ethnic groups, the effort to build a unified national identity to supercede ethnic identifies, class (or caste) related perceptions of the various languages, as well as pragmatic considerations such as the availability of textbooks and the desire to be prepared for the age of globalization (an argument for English). For example, in South Africa, contention over the language of instruction of schools for the Blacks in the 70s (the government ordered it to be Afrikaans, the people wanted it to be English), triggered the Soweto uprising in 1976 which was very significant in the struggle against Apartheid.

As it turns out, the language of instruction in high schools, becomes the primary written language of the graduates of these high schools. It turns out that the language of instruction in primary schools, as well as their preferred language for speaking (usually the mother tongue), has almost no influence on the choice of primary written language.

Some countries (not many) have public high schools that teach in different languages, to serve different demographics. For example, some countries offer different high-school languages of instruction to different ethnic groups (as in the case of many multi-ethnic Central Asian countries as well as Israel). In the case of India, socioeconomic status is the main factor the determines the language of instruction - middle class Indians tend to prefer schools in which English is the language of instruction, whereas working class and rural Indians tend to prefer schools that teach in the regional language. These countries end up with different populations having different primary written languages. For example, in my home country Israel, the Arab ethnic minority mostly prefers schools in which Arabic is the language of instruction, and ends up with Arabic as their primary written language, while the majority of the population uses Hebrew as their primary written language.

Newspaper vendor in Antananarivo, Madagascar.  
Newspapers exist in both French and Malagasy, reflecting the shifting policy regarding language of instruction 

The bottom line of all this, is that knowing what is the language of instruction for public high schools, and how it varies by in-country demographics, is the surest way to guess what language(s) are used in the country for writing purposes, which might be very different from the languages used for speaking. This is a critical piece of information for products that are based on written text, such as most Internet services.

(4) The false attractiveness of large languages. The fact that a language has a large number of native speakers, doesn’t mean that it is an important target language for a technology product to support. For example, Zulu in South Africa, Yoruba in Nigeria, Lingala in DR Congo and Javanese in Indonesia all have many tens of millions of native speakers. So one might be forgiven for assuming that it might be prudent to support these languages in order to reach these populations.

This is false, at least for products based on written text.

I once asked a Muganda friend (i.e. a member of the Baganda ethnic group, the largest ethnic group in Uganda after which the country is named), whose native as well as everyday language is Luganda, but who went through schooling in English, to read out for me a Luganda language article in Bukedde, a Ugandan Luganda-language newspaper. She struggled - reading the words letter by letter, as she was not use to seeing her language in written form. Therefore, this women, despite being a native Luganda speaker, would definitely prefer to use any technology product in English rather than Luganda.

A bookstore in Blantyre, Malawi - a very small Chichewa-language section in an otherwise all-English bookstore. Chichewa, the most widely spoken language in Malawi, is not its primary written language (English is)

How to make sense of this? Well, these languages are oral languages and their speakers, when writing, write in a different language, which is the language of instruction in their high school system (which happen to be mostly English for South Africa, Nigeria and Uganda, French for DR Congo and Indonesian for Indonesia). While there might be some ethnic “nationalists” that make a point of using their native language in writing, these are few and far between, as most people are pragmatic, and will use the language they were taught at school, and which is most useful for communication, as everyone else has been taught in it at school too.

It is now clear why amongst the native speakers of Zulu, Yoruba, Lingala or Javanese languages, only a tiny subset will prefer to use your product in their native language, and the vast majority will likely opt for the national language.

(5) Languages as registers of speech. I remember that on my very first visit to Nairobi many years ago, I was sitting in a restaurant and observed that the waiters were talking among themselves in Kiswahili, except for one guy, a Kenyan just like them, with whom they were conversing in English. This was very puzzling to me... surely this guy is fluent in Kiswahili too? When I enquired, I found out that that guy was their manager. So in that specific culture (working-class employees working in a nice restaurant in downtown Nairobi), English was perceived as the appropriate language for formal conversations, as when talking to one’s boss, while Kiswahili was the language for informal conversations, with one’s friends or like-level colleagues.

All languages have registers of speech - different words and phrases one would use for addressing people of different social statuses, or in different circumstances. Some languages, such as Thai and Khmer have formalized registers - one learns in school the different verbs and phrases used to address lower-class (or younger) people, equal-class people, monks and royalty. It is interesting that in multilingual societies languages themselves play the role of speech registers - as in the example above (English as a formal register, Kiswahili as an informal register).

Perhaps an extreme example of languages playing the role of registers of speech is Mauritius - the native language of all Mauritians, regardless of race, is Mauritian Creole, a language that is a reflection of the history of nation - a mix of people that originated from India, Africa, Europe and Madagascar. Linguistically, Mauritian Creole is as good as any other language. However, socially, middle class Mauritians feel that Creole is the language appropriate for casual conversations, but in a formal setting, the appropriate language is Standard French, and speaking Creole would be inappropriate and convey a low social status. Furthermore, when writing an official letter, many Mauritians will opt for English, which is perceived as more official than French. So one could say with some simplification that in Mauritius, Creole is the informal register, French is the formal register and English is the official register.

Why does all this matter? It matters because the selection of the language for your product in a multilingual environment conveys brand value. For example, is this product “official” and “formal”, designed for professional purposes? Or is “casual” designed for social purposes

(6) Languages as conveyors of social status. I remember a conversation I had with an Indian friend in Pune. This guy grew up in poverty, and through sheer talent, hard work and luck, managed to make his way up, get a top education and land a nice job at a multinational firm. I was discussing with him ways of getting technology to poor Indians, which are the large majority of the Indian population, most of which do not speak English. I was really confused by my friend’s strong resistance to localizing cool technology products to local languages, it sounded completely irrational, and he was getting increasingly emotional about it. Between the lines, it emerged that being fluent in English and technology savvy was perceived by him as a hard-earned social status, akin of a poor person who achieves success and spends every last penny on buying a Porsche, hoping to assert a new social status. Essentially, my friend intuitively felt that making his hard-earned achievements available to the masses would devalue his social assets. The last thing our Porsche owner wants is that his lesser successful friends in his neighborhood suddenly have the same car.

While this friend was particularly vocal in his resistance, perhaps testimony to his own insecurity in the permanence of his hard-earned new social status, I have noticed that it exists in more subtle ways in broad segments of the affluent and middle class demographics. Essentially, people use language, sometimes without being aware, to exclude people from lower social statuses. This is one of the reasons many publications (including oral publications like certain TV programs), online or offline, in many South Asian and African countries, that wish to convey prestige and send a signal as to the type of people they are targeting are in English or French, as opposed to the more widely spoken national language.

Being aware of this subtlety is important, and in certain countries it will convey brand value - is your product “exclusive” and “aspirational”? Or is it “friendly”, “unpretentious”, “meant for everyone” ?

So, how many languages do I need?

Fortunately for us product managers, most countries in the world have an education system based on the national or regional official language(s), with very few exceptions. When you recognize that of the 190 member states of the UN, for 104 of them, one of their national languages is either English or French or Spanish, one might realize that only a small fraction of the world’s 6000 or so languages are used for writing purposes, the others are used for verbal purposes only.

If your product is essentially textual, and you’re targeting the literate population - most Internet services would fall into this category - then supporting to the world’s 80 (give or take a few) primary written languages is all that you need.

There are some special cases though:

If your product is essentially speech based, and especially if it is targeting the rural population or the urban poor, you might need to support regional languages. Still, it is worth selecting these languages wisely, and being aware of the multilinguality of the population which presents an opportunity to reduce the number of supported languages. For example, DR Congo has 300 languages, more or less corresponding to the ethnic groups that make up the country. The official language, French, is generally spoken by urban people but not by poor rural people. However, almost all Congolese, regardless of how poor, will be fluent, in addition to their mother tongue, in one of the four regional bridge languages - Lingala, Kikongo, Kiswahili of Chiluba. So for a rural-targeting speech-based product to cover Congo’s population, it is sufficient to support just four languages, not 300. Nevertheless, if your product is targeting a very specific region or ethnic group, investing in more specific languages, might yield dividends in form of increased adoption and loyalty of the targeted ethnic groups. People will often feel a warm & fuzzy feeling towards a technology in their native language, even if they are fluent in the national language or the regional bridge language - in particular because it is extremely rare for technology products to support these languages.

An interesting technical problem for products targeting only the national language or regional bridge language, is that many - and in some countries most - people speak these languages badly. They will regularly use incorrect simplified grammar and use many loanwords from other languages, and of course will have an accent reflecting their mother tongue. As in the example of the Congolese vendor at the beginning of this chapter, sentences mixing several languages are the norm rather than the exception in many cases.

In this chapter, I have touched just the tip of the iceberg of the complexity of the issue of languages. Here is the key take-away:

For text-based applications targeting the literate population, it is sufficient to support only the Primary Written Languages of the target countries


