This is a problem I’ve been thinking a good deal about lately, without really coming to any conclusion, maybe because there is no right way. The issue can be summed up fairly easily, but it does betray the multitude of issue that lie beneath.
How do you handle non-english terms in your code?
I imagine most programmers write their code in english. This simply makes the most sense, I’m sure people will agree with this. The vast majority of pre-existing code (libraries, APIs) is in english. The literature and enviroments that programmers need to work in or with are english. And sometimes even technical issues, such as tool chain inabilities to handle some more complex character sets. While the issue is fairly banal, it does relate to the core issues in programming, semiotics and the encoding of meaning in the written word.
Programming is solving problems in some domain, be it mathematics, the local post office or a ticket seating systems for a stadium, and as such, you deal in the terms found in that domain. But when you’re not dealing with an english domain term, how do you handle that? I’m sure almost every non-english CS student has come across this issue, as universities tend to be a place where the native language is valued highly (and rightly so), and thus even very technical terms have found local translations.
The problems of encoding and translations are not new issues. The image depicts the top of the Rosetta Stone, showing the then lost egyption heiroglyphs encoding.
Indblikskode versus Access code
To keep it from being completely abstract, let’s take on an example. At work, we have a massive EDM system. As part of the security of this system, there’s a term called “indblikskode”. This denotes a code which a user has in the system. The code allows the user to view (and otherwise interact) with documents in the system. So, do we translate this danish term, or do we use it as is?
Indblikskode is an exact term with a specific meaning when coupled with our EDM system, but if I translate it into for example “access code”, which may well mean the same thing, it has lost all the embedded meaning when coupled with our EDM system. If we forego a translation, we risc ending up with pidgin english, such as
getIndblikskode();. Having myself, and anyone who may have to handle the code, plays translator isn’t exactly very productive either. And herein lies the issue.
How to properly balance the loss of information as you move away from the terms of the problem domain, versus the problems of mixing languages.
As always the answer is probably “it depends”. Who will look at the code after you’re done with it? How well can translations actually be made? Can only some terms be translated? The issue is that this problem is most oftenly handled by programmers, who sit with their editors, and are naming variables. It’s unrealistic to expect to commitee such work, and most programmers must simply work from a gut feeling.