The Challenges of Handling Proverbs in Malay-English Machine Translation – a research paper

14th-International COnference on Translation - August 2013*This paper was presented in the 14th International Conference on Translation 2013 ( 27 – 29 August 2013, Universiti Sains Malaysia, Penang.
The PDF version is here:
The test data is here:
Khirulnizam Abd Rahman,
Faculty of Information Science & Technology, KUIS

Abstract: Proverb is a unique feature of Malay language in which a message or advice is not communicated indirectly through metaphoric phrases. However, the use of proverb will cause confusion or misinterpretation if one does not familiar with the phrases since they cannot be translated literally. This paper will discuss the process of automated filtering of Malay proverb in Malay text. The next process is translation. In machine translation, the biggest challenge related to proverb is ambiguity. In Malay language itself, there are proverbs that have several meanings, and even can be translated literally. This will also be discussed in the paper. The objective of this review is to find what are the issues in handling Malay proverbs in the Machine Translation, and how do the previous researchers solve them.
There was also an experiment conducted to test the effectiveness of several online Machine Translators. The researchers have gathered 200 proverb entries from the Kamus Peribahasa by Abdul Aziz Abdul Rahman, 2012 (it is a Malay proverb dictionary). Two online Machine Translation services were tested; namely Google Translate and From the experiment, 34-52 percent of the proverbs are correctly translated out of 200 Malay proverbs tested. The rest are either translated wrongly or literally. This paper is part of the research and development of a Malay proverb filterer whereby the system will recognize any proverb presence in a Malay text, and convert them into plain Malay before translating the text into English via the Machine Translators.
Keywords: proverb translation, Malay-English translation, machine translation, proverbs, idioms.
1.1 Idioms in Malay
Bahasa Malaysia (Malay) is a language spoken mostly by the Malays in Malaysia (Aiti et. al., 2009b), and also spoken by non-Malay. It is the national language for Malaysia and Brunei (Clynes & David, 2011), one of the official language in Singapore and Southern Thailand (Pattani) also using Malay to communicate (Fontong, 2007).
Several researchers agree that idiom is a phrase with combination of two or more words and the overall meaning of these words is different than the meaning of the constituent word (Zoltan and Peter, 1996). Idioms in Malay are categorised into proverb and non-proverb (Ainon & Abdullah, 2010).
Figure 1: Hierarchy of idioms in Malay, adapted from Ainon & Abdullah (2010).
1.2 Descriptions of Malay Proverbs
Peribahasa or Malay proverb is a group of words in a fixed order that has a particular meaning, different from the meanings of each word understood on its own (Abdullah & Ainon, 2011). It is mentioned that Malay proverbs are usually come in fixed words order, however according to Abdullah (2010), sometimes different word is used. For example the proverb “kera kena belacan” can also be in this form “monyet dapat belacan”. The proverb means restless. Kera can also be mentioned as monyet which is monkey.
Examples: bagaikan aur dengan tebing, air dicincang tidak akan putus, sekilas ikan di air sudah ku tahu jantan dan betina, anak di rumah kelaparan kera di hutan disusukan.
There are four categories of Malay proverbs (as described by Abdullah & Ainon, 2011); which are simpulan bahasa, perumpamaan, bidalan and pepatah.
Simpulan bahasa – normally consist of two words (sometimes three). The literal meaning of the word combination is different than the actual meaning of the ‘simpulan bahasa’.
Example: Langkah kanan; literally means right footstep, yet the actual meaning is lucky.
Perumpamaan – phrases started with seolah-olah, ibarat, bak, seperti, macam, bagai or laksana. If translated to English these words are similar to as or like which is to resemble.
Example: bagaikan pinang dibelah dua; literally means like betel nut split apart evenly, yet the actual meaning is compatible / equally beautiful and handsome for a pair of just married bride and bridegroom.
Pepatah – proverb that contains advices or teachings.
Example: Adat berperang, yang kalah jadi abu, menang jadi arang; literally means in war, loser become ashes, winner become coal, yet the actual meaning is in war, the defeated and the winner are both losers.
Bidalan – phrase (pepatah) started with jangan, biar or ingat.
Example: Kalau kail panjang sejengkal, lautan dalam jangan diduga; literally means if you have a short hook, do not attempt to fish in the deep sea, yet the actual meaning is if you have little knowledge, do not dare to dream big.
Machine translation (MT) is sometimes defined as automated translation or machine aided translation. It is the process by which computer software is used to translate a text from one natural language, for example Malay, to another, English (Systrans, 2012).
Though Malay is an important language in Malaysia, the study of Malay language in machine translation has only been serious since 1984 with the establishment of Unit Terjemahan Melalui Komputer (UTMK) in USM (Chuah & Zaharin, 2002). Then followed by another research funded by MIMOS to automatically translate English to Malay implementing the Bilingual Knowledge Base bank (Suhaimi & Normaziah, 2004). However they did not discuss in details on the proverb handling in Malay-English machine translation. The authors believe that the proverb treatment is important due to the Malay language nature which is rich in proverb (Koh Boon, 1992; Lim, 2003).
Proverbs (peribahasa) in Malay are beautiful elements to deliver advices, Malay teachings, moral values and comparison through metaphoric phrases (Susana, 2010). They are normally short, generally known sentence of the folk which contains wisdom, truth, morals, and traditional views in a metaphorical, fixed and memorisable form and which are handed down from generation to generation (Mieder,1993). Although proverbs do beautifies Malay literature, however this brings challenges to machine translation since proverb cannot be translated literally, rather logically (Dmitri, 2010).
2.1 Multi Words Expressions (MWE)
MWE is combination of several words to form another meaning. As in other languages, Malay also contains a lot of MWE. Although some MWEs can be isolated in the tokenization process, and then analysed as a single cluster, most of them cannot (Arvi, 2008). Aiti Aw et. al. (2009a) in their research realized the important of Noun Phrases in Malay sentence structure decided to study this issue. They proposed a translation approach making use of parallel bilingual corpus to obtain a large set of bilingual terms and then implemented it to train a statistical engine. There’s another research by Rais et. Al. (2011) indexing the Malay MWE using combination of query translation approach and weighting schemes. The researchers did mention about dictionary is crucial in multiword detection.

2.2 Why proverbs are another type of MWE that are more difficult to translate?
Proverb and idioms are a part of MWE (Arvi, 2008; Sharma & Goyal, 2011). The only problem with proverbs and idioms in MT is they cannot be translated literally, rather logically (Dmitri, 2010). Thus the translation machine needs to know the definite meaning of the proverbs.
The most challenging issue in interpreting natural language texts is the ambiguity problem (Kiyavitskaya et. al., 2007; Hejab et. al., 2008). Proverbs are one part of the ambiguity issues. Proverbs normally come with fixed sequence of words; however the meaning is not based on the words directly (Abdullah & Ainon, 2011). Since proverb is translated logically, the machine translation algorithm needs to know the semantic (real meaning) of the phrases. On top of this issue, certain Malay proverbs have ambiguous meaning (more than one meaning) which the solution has not been mentioned in existing proverbs treatment (Noah & Ismail, 2008; Dmitra, 2010; Brahmaleen et. al., 2010).
The paper is discussing two major challenges in the proverb treatment of machine translation. The first challenge is in the detection/filtering phase of proverb; and the second issue is determining the correct semantic definition of the proverb (based on the context of the sentence).
3.1 Malay proverbs automated detection
These are several challenges encountered in the process of detecting/filtering the Malay proverbs:
Word with affixes – There are proverbs that has affixes. Example: “Kembang sayap” = “Mengembangkan sayap”. In this early research phase, the authors would like to propose the stemming process (Muhamad Taufik, et. al, 2009). Stemming is a process to remove word affixes.
Another word in between (stopword)
Example: “berpijak di bumi nyata” or sometimes “berpijak di bumi yang nyata” which means “do not day-dreaming”. To overcome this issue, the researchers propose Stop words removal; it is a process of enlisting all the words that are considered not important (meaningless). In this case stop words are such as yang, itu, macam, seperti, bagai, laksana, ibarat, umpama, ini, begitu, begini and etc. This process will produce better accuracy (Agus, 2009) in text processing.
Different combination of word used to represent the same proverbial meaning. It is mentioned that Malay proverbs are usually come in fixed words order, however according to Abdullah (2010), sometimes different word is used.
Example: “Ada angin, ada pokoknya” which can also be used as “Ada angin, ada pohonnya”; which means anything that happen has its cause. Another example is “dapat dihitung (dibilang) dengan jari”; “bagai kera (beruk/monyet) kena belacan”.
3.2 Malay proverb translation
The most difficult in translating proverbs is ambiguity. Ambiguity in language study is defined as uncertainty or inexactness of meaning. From the authors’ observation, there are two categories of ambiguities in Malay proverbs. The first category is the phrase can have a literal and also an idiomatic meaning. The second category is the proverb has more than one idiomatic meaning (Abdullah & Ainon, 2011; Abdullah, 2010; Peribahasa Melayu, 2012).
3.2.1 Several phrases with literal and idiomatic meaning.
Example of Malay Proverb Literal meaning Figurative meaning
Angkat senjata Lifting a weapon Going to a war
Cuci tangan Washing hands Repented from doing any crime
Do not want to be responsible
Cuci mata Washing eyes Sight seeing
Gosok belakang Rub on the back To console
Ibu ayam / bapa ayam Hen / cock Pimp
Kena tembak Shot Cheated
Kena tikam Stabbed Betrayed
Kena tikam dari belakang Stabbed from the back Betrayed by whom we trust
Kena tendang Kicked Fired
Kerak nasi Rice crust Easily deterred
Kipas angin An electrical device to blow wind Provoking others to fight
geleng kepala an action of shaking your head refusal
angkat kaki an activity of lifting up leg to run away- proverb
patah kaki Broken leg 1. A person we are hoping to help us, however he/she is not around
2. No vehicle to move around
garu kepala An action of scratching one’s head Confused
hari hendak hujan It’s going to rain Teasing – a person is going to cry
Bulan penuh The state of full moon Beautiful face
Gigit jari Biting the finger Failed to achieve
Gigit bibir Biting the lips Furious
Mengurut dada Stroking the chest To be patient
Menjolok sarang tebuan Poking the beehive Intentionally doing something dangerous
Meriam buluh A canon made of bamboo that provide loud sound, however it does not shoot any cannonball Bragging about things that he/she did not do

3.2.2 Examples of proverbs that have more than one idiomatic meaning.
Example of Malay Proverb Meaning Other meaning
mata air underground water resource lover
orang putih European people pious man
air muka face pride
bawa diri travel or being independent Sulking or running away
Seperti cicak makan kapur ashamed of his own offense pleased
Ada air adalah ikan there must be people in a country fortune is everywhere
Abu di atas tunggul Easily forgotten Not safe
Ada hati Desire above competency Having a feeling to somebody
Ada tangga, hendak memanjat tiang Doing things against the norm/regulation Opting the hard way to do an easy thing

This experiment is to analyse what is the current state of Malay proverb translation using three commercially available machine translation; namely Google Translate, and The study is to analyse 200 Malay proverbs extracted from several Malay proverb dictionary used by the secondary school students. The result will be the percentage of correctly translated proverbs by the automated translators, and also discussions pertaining to the translation issues.
5.1 The Online Automated Translators
There are several online machine translator; Google Translate, Bing Translator, Babel Fish and Bing Translator provides Indonesian-English not Malay-English, however Babel Fish does not have Malay-English or even Indonesia-English. Google Translate, and are three online machine translators that capable of automatically translating Malay sentences into English. These services provide basic translation for free.
Google Translate is a free translation service that provides online instant translations between 64 (Google Translate, 2013) different languages (including Malay – to any other 63 languages). Google Translate is implementing the statistical machine translation approach since 2007. Meaning that the translation system is highly dependent on the translation examples analysed from thousands of translated documents. is a new company (established in 2007) providing mainly English-Malay-English automated translation (, 2013). There are several languages supported such as English-Chinese-English and English-Indonesians. Currently the authors could not find any information which approach is implemented in this translation service; either rule-based, statistical or example based. is another interesting automated translation service which implements the statistical translation approach ( by combining several others services such as Google Translate, Bing Translate, Systran and Worldlingo. This service also benefits a lot from the user’s contribution to enrich the translation examples. The website claims that it service is capable of translating Malay into other 150 languages, including English. However, at the end of the study we found out that the translation result of this tool is very similar with the output from Google Translate. Hence, we decide to exclude it from the discussion.
There is a claim by Gaule & Singh (2012) saying that Google Translate does not filter idioms or proverbs on English to Hindi translation. As the result of this failure the proverbs are translated literally instead of semantically. From an early observation, the authors found out that these two systems sometimes capable to detect Malay proverb in the source language. However there are also failures in detecting Malay proverbs. For example the phrase “berat hati” is translated into “reluctantly”, which is correct. However the proverb “kera sumbang” (a person who live in seclusion or a recluse) is translated literally into “monkey incest”. Though the translation services
5.2 Experiment Objective
The main purpose of this study is to roughly estimate the correctness of the automated translation tools in translating Malay sentences contains proverbs.
Example (translations are done on 2nd January 2013):
Malay proverbs Meaning Correct English translation (manually) Translation by Google Translator Translation by Translation by
Ada hati Keinginan / menyimpan perasaan Having a feeling / crush / ambition / desire Some heart (translated literally) There is heart (translated literally) Have heart (translated literally)
Mencari sesuap nasi bekerja Make ends meet Make ends meet (correct) Seeking for a mouthful of rice (translated literally) Make ends meet
Bagai langit dengan bumi Sangat berbeza Absolutely different Like heaven and earth (translated literally) Like sky with earth (translated literally) Like heaven and earth
(translated literally)

5.3 Study and Findings
The authors randomly collected 200 Malay proverbs (mostly categorised as simpulan bahasa) from two Malay proverb dictionaries (Abdul Aziz, 2012; Zanariah, 2012). The dictionaries provide the sample sentence on how to use the proverbs inside the Malay sentence. These sentences are than automatically translated to English using Google Translate () and (). Each of the translated sentences is evaluated by the language expert whether they provide correct translation. Although Bing Translate provides Indonesian to English translation, the authors decided not to include it into the study since it is for Indonesian language.
This is the result of the study. The table provide the percentage of; 1) correctly translated by the both translator, 2) literally translated (word-by-word translation, direct translation), 3) translated into similar idioms available in English. From the 200 proverb entries tested scored better with 52% of correctly translated proverb as compared to Google Translate. In the case of literal translation attempted by both services, Google Translate provide 55% incorrect literal translation while is 44.5% incorrect literal translation. Overall the researchers concluded that perform better job in handling Malay proverb translation as compared to Google Translate.
Criteria Google Translate
% correctly translated 34.0 52.0
% literally translated but wrong 55.0 44.5
% translated into similar idioms in English 9.6 11.0

The correctly translated criteria refer to the translation either providing the meaning or providing similar idioms in English. For example the Malay proverb “ada gula ada semut” is translated into “where bees, there is honey”. These two proverbs contain almost similar meanings which are;
ada gula ada semut”: People will go to any place where they can earn their living (Shamsuddin, 2006).
“where bees, there is honey”: Where there are industrious persons, there is wealth, for the hand of the diligent maketh rich. (Simpson & Speake, 2003)
Note: The complete data set of the sentences containing the Malay proverbs and the translation by Google Translate and could be obtained from the corresponding author’s website at .

The main objective of this paper is to discuss on the challenges of detecting and translating Malay proverb in the machine translation. From the experiment conducted using a couple of online machine translation services, we would like to conclude that in terms of handling proverbs, these services have quite a poor performance. However has done a better job as compared to Google Translate in handling Malay proverbs. From the observation both system do not cater proverbial phrases with more than one meaning.
In the next phase, the researchers will be proposing a Malay proverb filterer to be developed as a complement tool before the Malay text is processed by these machine translation tools.
  1. Abdul Aziz Abdul Rahman, 2012. Kamus Peribahasa. Pan Asia Publications Sdn. Bhd.
  2. Abdullah Hassan and Ainon Mohd. 2011. Kamus Peribahasa Kontemporari, Edisi Ketiga. PTS Professional Publishing.
  3. Agus T. Kwee, Flora S. Tsai, Wenyin Tang, 2009. Sentence-Level Novelty Detection in English and Malay. Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science Volume 5476, 2009, pp 40-51.
  4. Aiti Aw, Sharifah Mahani Aljunied and Haizhou Li. 2009. Malay Multi-word Expression Translation. Second Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST 2009), Suntec, Singapore.
  5. Aiti Aw, Sharifah Mahani Aljunied, Lianhau Lee and Haizhou Li. 2009. Piramid: Bahasa Indonesia and Bahasa Malaysia Translation System Enhanced through Comparable Corpora. Second Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST 2009), Suntec, Singapore.
  6. Arvi Hurskainen. 2008. “Multiword Expressions and Machine Translation”. Technical Reports in Language Technology Report No 1, 2008 . <>
  7. Brahmaleen K. Sidhu, Arjan Singh and Vishal Goyal. 2010. Identification of Proverbs in Hindi Text Corpus and their Translation into Punjabi. Journal of Computer Science and Engineering, Vol. 2, Issue 1, July 2010.
  8. Choy-Kim Chuah and Zaharin Yusoff. 2002. Computational Linguistics at Universiti Sains Malaysia. LREC 2002 Third International Conference On Language Resources And Evaluation. The University of Las Palmas de Gran Canaria, Canary Islands – Spain, 29 May - 31 May 2002.
  9., 2013. About CitCat. Accessed on 7 January 2013.
  10. Clynes, Adrian and Deterding, David, 2011. Standard Malay (Brunei). Journal of Phonetic Association, Vol. 42 Issue 2, 2011, pp.
  11. Darwis Harahap. 1992. Sejarah Pertumbuhan Bahasa Melayu. Penerbit Universiti Sains Malaysia, Pulau Pinang.
  12. Dimitra Anastasiou. 2010. Idiom Treatment Experiments in Machine Translation. PhD Thesis, Universitat des Saarlandes. Unpublished.
  13. Fontong Raine Boonlong, 2007. The Language Rights of the Malay Minority in Thailand. Asia Pacific Journal of Human Rights and the Law, Vol 1, 2007, pp. 47-63.
  14. Gaule, Monika and Singh, Gurpreet Josan, 2012. Machine Translation of Idioms from English to Hindi. International Journal of Computational Engineering Research, Vol2, Issue 6, pp. 50-54.
  15. Google Translate, - Accessed on 4th January 2013.
  16. Hejab Ma’azer Al Fawareh, Shaidah Jusoh and Wan Rozaini Sheikh Osman. 2008. Ambiguity in Text Mining. Proceedings of the International Conference on Computer and Communication Engineering 2008. May 13-15, 2008 Kuala Lumpur, Malaysia
  17. Lim, Kim Hui, 2003. BUDI as a Malay Mind: A Philosophical Study of Malay Ways of Reasoning and Emotion in Peribahasa. PhD Thesis, University of Hamburg. Unpublished.
  18. Lim, L. T. and Hussein, N.. 2006. Fast Prototyping of a Malay WordNet System. In: Proceedings of the Language, Articial Intelligence and Computer Science for Natural Language Processing (LAICS-NLP) Summer School Workshop. Bangkok, Thailand, 2006, pp. 13–16.
  19. Mieder, Wolfgang. 2003. Proverbs are Never out of Season. Popular Wisdom in the Modern Age. New York: Oxford University Press.
  20. Mohd Noor, Yusnita and Jamaludin, Zulikha and Jusoh, Shaidah. 2010. A Retrospective View On The Promise On Machine Translation For Bahasa Melayu-English. In: Found in Translation : International Conference on Translation and Multiculturalism, 23-25 July 2010, University of Malaya.
  21. Mosleh H. Al-Adhaileh and Tang Enya Kong. 1999. Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema, Machine Translation Summit VII, 1999.
  22. Muhamad Taufik Abdullah, Fatimah Ahmad, Ramlan Mahmod and Tengku Mohd Tengku Sembok, 2009. Rules Frequency Order Stemmer for Malay Language. IJCSNS International Journal of Computer Science and Network Security, Vol.9 No.2, pp 433-438
  23. Nadzeya Kiyavitskaya, Nicola Zeni, Luisa and Mich Daniel M. Berry: Requirements for Tools for Ambiguity Identification and Measurement in Natural Language Requirements Specifications. WER 2007: 197-206.
  24. Nik Safiah Karim, Farid M Onn and Hashim Haji Musa. 2008. Tatabahasa Dewan Edisi Ketiga. Dewan Bahasa dan Pustaka, Kuala Lumpur.
  25. N.H. Rais, M.T. Abdullah & R.A Kadir, 2011. Multiword Phrases Indexing for Malay-English Cross Language Information Retrieval, Information Technology Journal, 2011.
  26. Peribahasa Melayu. 2012. Institut Tamadun dan Alama Melayu, UKM. 6 January 2012.
  27. Princeton University "About WordNet." WordNet. Princeton University. Accessed on 6 January 2012.
  28. S.A. Noah and F. Ismail. 2008. Automatic Classifications of Malay Proverbs Using Naive Bayesian Algorithm. Information Technology Journal, 2008.
  29. Shamsuddin Ahmad. 2006. Kamus Peribahasa Melayu-Inggeris, PTS Publication.
  30. Sharma, Monika and Goyal, Vishal, 2011. Extracting Proverbs in Machine Translation from Hindi to Punjabi using Relational Data Approach. International Journal of Computer Science and Communication Vol. 2, No. 2, July-December 2011, pp. 611-613.
  31. Silva C. and B. Ribeiro. 2003. The Importance of Stop Words Removal on Recall Values in Text Categorization. Proceedings of the International Joint Conference on Neural Networks, 20-24 July 2003.pp 1661-1666.
  32. Simpson, J.A. and Speake, J.. 2003. The Concise Oxford Dictionary of Proverbs. Oxford University Press.
  33. Suhaimi Ab. Rahman, Noorhayati Ahmad, Hafizullah Amin Hashim, Abdul Wahab Dahalan. 2006. Real Time On-Line English-Malay Machine Translation (MT) System. Third Real-Time Technology And Applications Symposium 2006, UPM, Malaysia.
  34. Suhaimi Ab. Rahman, Normaziah Abd Aziz, “Improving Word Alignment in an English-Malay Paralell Corpus for Machine Translation”, In Pre-LREC 2004 Workhop on Amazing Utility of Parallel and Comparable Corpora, Lisbon, Portugal, May 2004.
  35. Suhaimi Ab. Rahman, Normaziah Abdul Aziz and Badariah Solemon. 2008. An English-Malay Translation Memory System, CITWorkshops, pp.619-624, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, 2008.
  36. Supyan Hussin, Ding Choo Ming, Afendi Hamat & Arba’eyah Abdul Rahman. 2004. Kamus Peribahasa Melayu Digital yang Pertama. Sari 22 (2004), 49-61.
  37. Susana Widyastuti. 2010. Peribahasa: Cerminan Kepribadian Budaya Lokal Dan Penerapannya Di masa Kini. Proceeding of Seminar Nasional UTY 3 Juli 2010.
  38. Systrans. 2004.
  39. Zanariah Abdul, 2012. Peribahasa WATAFA (Wajib Tahu dan Faham). Pelangi Book Publishing, Kuala Lumpur.
  40. Zoltan Kovecses and Peter Szabc, 1996. Idioms: A View from Cognitive Semantics. Journal of Applied Linguistic, Vol 17(3),pp 326-355.
Khirulnizam Abd Rahman 2013. THE CHALLENGES OF HANDLING PROVERBS IN MALAY-ENGLISH MACHINE TRANSLATION. International Conference on Translation 2013. Universiti Sains Malaysia, Penang.

Popular Posts