LPI 101-500 – 107.3: Localization and Internationalization
- Character codes, iconv
This chapter is about localization and international. nationalization, sorry, so to speak, about language settings. Let’s first clarify what is meant by localization. Localization means that a linux system is adapted to the guidelines of a corresponding country or culture. Of course, the first thing to be mentioned here is the language. But there are other things too. For example, the notation of the date format, currencies times, sometimes even colors in graphic programs, and so on. In principle, internationalization is the generic term for localization. A linux system is designed so that localization is not a major problem.
linux thrives on its worldwide community so that even the most exotic languages can find volunteer translators who adapt the corresponding distributions. The character encoding is very important for the internationalization and localization of programs, which means nothing more than that the system must contain the characters that are to be displayed in the respective language. In the past, the Ascii character code was the absolute standard. Ascii stands for American Standard code for Information interchange. Ascii was intended for American English at that time. The Ascii character code can represent 128 different characters.
This naturally includes uppercase letters, lowercase letters and numbers, but also special characters and so called control characters such as tap or space, et cetera. The Ascii character set was later replaced by the iso 8859 character set. This was able to use 256 instead of 128 characters. The iso 8859 character set has been divided into various different subcategories. For example, iso 88591 has been used for most of the Western European languages such as German, French, Italian, Portuguese, Spanish or Swedish. For example, iso 88595 was used for slavic languages with a carillic alphabet, for example, Russian bulgerian, serbian, Ukrainian and so on.
There are 16 different isil 8859 character sets starting with iso 88591 through iso Eight Eight 5916. It makes perfect sense to take a look here, but the Epic exam will certainly not ask questions such as which iso version will be used for Greek? And so on. It should just be known that this character set exists, that it is and was the successor to the Ski character set and that it comprises 256 characters. The Unicode character set was developed in parallel with iso Ten Six four six. The aim here was to combine all alphabets in a single character set. Since Unicode and iso Ten Six four six were almost completely the same, this shorthand was put together compared to Ascii or iso 8859.
The Unicode character set is not just a table of characters, but individual characters are also provided with rules for example, sorting or so called b directional writing, which is important for languages such as arabic or Abref, unicode or iso Ten Six Four six also contains Japanese or Chinese characters, so called ideograms or mathematical characters. Today’s standard character set is Utf eight, which in principle uses the Unicode character set but use less memory than this. Let’s see how it looks in practice. Let’s create a text file and simply insert a short text. Here I will just call the file character set character set.
And now I will just add test one, two, three, save at least. And now we can use the file command to display which character set. This text file uses file character set and we see that it is Ascii in this case. So let’s open the file again and just add an German Air. This one here. And now we look again with File character set and we see that the character set is now Utf Eight. Because the Ascii character set could not map an Air otherwise. The first 128 characters of the Ascii and the utfs character set are completely the same. Therefore, it is even likely that the Utf Eight character set was used in the first version of this text file.
But the file tool interpreted it differently. With the Iconf command, we can convert text files into certain character encodings, or we can convert the character encoding of a text file into another character encoding. Icon for the option l shows the known fonts l and we see that the lot here. Which doesn’t necessarily mean that you can convert any character set to any other character set. Let’s try something out using the text file we just created. The conversion should be used like this. I will explain it shortly. So what does the icon mean? As I just said, icon can convert fonts from an existing file. The option F stands for from.
So it turns from Utf Eight to T. The option ten t stands for two. So from utf eight to ascii. And we specify here as the source file character set and enter the file character set ascii as the target file with this larger character, which should then be created at the same time. So let’s try it out. So we get an error message here. Why is that? Because our text file contains a special character. So this one here, the air, the German air in Germany. This special character has a name names Umlout, and this cannot be represented by Ascii. Accordingly, iconf also refuses its task. So I will go into the five character set again and I deleted the last sign here, the special character.
And I tried that again now. So now it obviously worked. Five Characters at Ascii. And here we have Ascity text. The conversion from Utf Eight to ascii worked. theoretically, one could also use another option to simply ignore the character that Ascii does not know in this case. This time I would say we don’t create a text file, we just pass the command directly from the console. For this we use echo at a pipe. So let’s just say something like that, echo. So you see, we have three special characters here and I use the pipe and the icon command from Utf Eight to Ascii. And as we saw in the example before, this wouldn’t work because of these special characters that are not included in the S key format.
But now we are using and then Ignore. So these three special characters will be ignored. Let’s try it out. And as a result we see that the letters ao and U were converted, but the three special characters were ignored. And instead we get an error message, namely this one here. Illegal input sequence at position ten. So that looks familiar to us from the example three minutes ago. This means that the ignore option has ensured that the characters which are not recognized by the Ascii code. So these three ones here are simply ignored. Instead of ignore.
We can also use the translit option. In this case, translit tries to find other or similar characters. So let’s try it out. We do it exactly as we did before. Just use Translate instead of ignore. Translate. And here we see translate has converted the German air into a German U into O, and the german U into U. So that’s not correct. Normally an equals ae equals oe and so on, but that doesn’t matter. Translate, try to find something similar. So it has found something similar. It’s not correct, but it has found something similar. And we have no error message anymore.
- locales, LANG, LC_*
This video is about the language and country settings. When installing a Linux system you can of course select the appropriate language, but that does not mean that the selected language is fixed for all time. Ultimately you could of course change the language afterwards also via the console. The following command would suffice for this or pseudo dpkg minus reconfigure and then locate it. So and here you would ultimately choose your appropriate language and so on. But I would not go into this further at this point, as this is not relevant for the test and the rest would explain itself anyway. You just choose your language, press OK and that’s it basically.
So I’m going out of here again with tab I can switch to Cancer enter and I’m out of it again. The environment variable Lang shows how the system and how my current session is configured. Let’s take a look at this. Variable Lang is of course the short form of language echo Lang. And here we see En underscore US utf eight. The first small En stands for the language English. The US is capitalized for the country. So in this case it is very clear. In German you would see here a De underscore and a capital. De stands for Deutsche and capital de for Deutsche London. So because of that it is unclear at the first time which De is for the language and which De is for the country.
But in this case it is very clear en is for the language, us is for the country. After the country code we also find the coding used by default separated by a dot here. In this case, UATF Eight. We can of course change the Lang variable. As a result, the entire operating system is not suddenly switched to a different language. But some things in the console do. However, the variable Lang is not sufficient here there are many more variables that play a role here. For example, the LC time variable. Let’s look at a small example date. Date only outputs the current date and time without options. In this case in German notation.
Let’s take a quick look at the man page I wanted to show you special option. You see here are several options I wanted to show you this one here percent X. Okay, it’s date representation. So with this option we can only output the date. In the case with date you have always to put a plus sign and then in quotation marks the option. And here we see a completely normal date in US notation. Here is the month, here is the day and here is the year. And now we would change the variable LC time, which is responsible for time and date to German. When we look at this variable echo LC time, you see here also Enus. And let’s switch to Germany. LC time equals De dotf eight.
Now we have De here and we use this date command again then we see that we can now see the date in German notation. So at first there is the day, then the month and then the year separated by a dot and not by this slash here from the US notation. Okay, I would now change the LC time variable back again. So again for test. Now everything is back to normal. Regarding date, there are many other options here. It is definitely worth taking another look at the man page. However, I do not believe that the exam will ask for complicated options here. But you should always take a look. We come to the Socalled LC environment variables or also called locals. These locations contain important information for the system with regard to various localization settings.
It must be specified somewhere, whether 24 hours time is used or the system, or only a twelve hour twelve hour time with Am and p m somewhere. It has to be specified how for example, abbreviations in weekdays in the respective national language, look and so on. And this is where the locals come into play. They all start with LC. So LC for local and underscore them with a command local you can look at lists of the corresponding locals and you can also see at the same time how they are set local. By the way, the local program or the local command is located in user bin local. So I mentioned this again explicitly as this is listed separately in Api’s catalog of requirements.
So it is possible that you will be asked about it in the exam. And the result of course is exactly the same. So user bin lake and you see of course the same result. Otherwise everything is the same. With the option A you can display all locations available on the system and here we see quite a few here. These are not all, but enough that we can definitely use without having to reload anything or the like. I call up local again. Let’s go through this together. So right at the top we see Lang. We just talked about it. The variable Lang is basically of course this also belongs to the locates below we see the language variable which is currently empty here.
The language variable is very similar to the Lang variable, but we can specify several different languages here. This variable is then used by the programs to be able to display your corresponding messages in different languages. The corresponding languages would be separated from each other with a colon. So you would do it like this for example language quotes and then for example de utf minus eight colon enus utf. But that’s not important in this case. It is only important to know what the language variable is for. So here we are basically telling the system that it should use German first. But if there is no German language in the program, then English. So the second language should be used.
And if there is third next to it, for example French. The system then use French if the first two language are not available and so on.Next up is the LC type or LCC type. The characters are specified here. Say which letters are used and which capital and small letters belong together. This is followed by LC numeric. Here it is determined whether dot or a comma should be used for decimal values. Also whether dot is used for a number from thousands to make the number more readable and of course various numerical values. Next we find LC time. We have already discussed this in detail. The damed and time are specified here. The LC collate defines how words should be arranged alphabetically.
The LC monetary defines which currency is used and how it should be used. LC messages is used so that programs can output the output in their messages in different languages. LC paper indicates the paper format. LC name specifies how names should be formatted. LC address contain information such as address and location information and how these should be formatted. LC telephone specifies how telephone numbers should be formatted. LC measurement contains corresponding units of measurement. For example should meters, centimeters or kilometers be used, or rather inches or miles. LC identification contains information about the currently set locations, for example, what the source is, the email addresses and so on.
LC all is empty. In this case the LC all is there to convert the whole system to the appropriate language. If the LC all is set to German, then all other LC variables are also set to German, regardless of how they were previously configured. If you do a little research on this you will find that on one website it says Lang overrides the LC all setting, while on the other side you can read that LC all overrides the Lang setting. So it’s something opposite. Obviously there is not so much agreement. I don’t know, maybe that changed over the time or something. But that’s what I’ve noticed over and over again and that’s why I thought we would just try it out.
Of course the atse default local file would be important here because the standard that applies system wide is stored here. We take a look at that file pseudovi at C default local and we see some located here we have the Lang is English and here numeric time, monetary, paper name and so on. It’s German. This is because I have used the German language for keyboard during the installation. So this was head automatically. And now let’s try it out. So we set the variable LC all and I would choose also here deutf minus eight. Store it, get out. First of all, nothing happens here. We have to restart the system here now. I will do that now and in that time I will pause the video and turn it back on and let it continue when the system is up again and I turn off the video now.
So the system is up again, I have logged in and otherwise I didn’t do anything. And we can already see this window here. And I am informed that I have logged in with a new language and that I can automatically convert all my folders into the corresponding language languages. So from desktop here to Shreptish, it’s the German word for desktop, from templates to forlagn, from documents to documentar, from music to music, from pictures to builder and so on. So I will just say update names. And here I have now the German names for it and I open my terminal again. Let’s take another look at the locals and we see that Lange continues to be and everything else is de.
So everything was changed to German. And this proves that the LC all file overrides everything else and is therefore the most important file in this context. So the LC all variable definitely has priority over the Lang variable. I would like to have my system in English again, which is why I remove the corresponding line from the file pseudo VI etsy default. And I just remove this line here, save it and I restart the system again. Now I get the same message again and I can change back to the English folder names. So I will do that with update names and that’s it. I open my terminal window again. And here everything has been changed. Not everything, but you know, before I had several German Germans here, but now here you see, LCC type is English again, language is English.
It was English than before. But we don’t have an LC all anymore, so this one is gone. This means that the system only pays attention to the Lang and thus all the others have been converted back to the stages of before. In principle we only see the names of the locates here in the overview and that English and Utf eight are set everywhere there. But we cannot see any real content here. How can we see more? For example, I just mentioned that the LC identification contains information such as the source and email address. How can we look at this? You might think that it should work with echo, of course, since these are variables, but it doesn’t work with the lookheads.
We can try it out echo dollar, LC identity and we just see what we can see with the local command here it’s dede utf, utf eight. Instead we use the local command again and then we select the appropriate local. So local now we get a result. It looks a bit confused. So that it doesn’t look so confused, we use the K option. K stands for keyword name and we will see the difference right away. K and then LC Identity. And of course that looks much cleaner and the output just makes more sense in this way. You can of course also look at all other locators LC time for example. And here you can see the abbreviations from the data from the month and so on. The days dontac mutations. German words for Sunday, Monday, Tuesday and so on.
- Time zones
The final topic of this chapter is the time zone setting topic. As everyone probably knows, different places in the world have different time zones. If it is 03:00 p. m. In this country, it is only 09:00 A. m. In the USA. Depending on the location, the time zone is set selected during the operating system installation. Normally we will never have to change this time zone again unless we emigrate. But you could. This time zone is specified in the at sea time zone file. So let’s take a look. Cat at sea time zone. And in my case we see europe Berlin here. If you wanted to change the defined time zone, you could look into the directory UserShare zone info which valid time zones exist ll user share sorry zone info and here we find various directories.
For example, indian here Mexico, europe so let’s go all the way in with ll user share zone info europe and here you can see the individual cities accordingly. So we have here Berlin, Brussels, Dublin, Istanbul, London and so on. So we can set all of that as the time zone, and that would have to be entered accordingly in the atsea time zone file. The Etsy time zone file just shows us Europe Berlin as we have just seen, and we have learned that or we have learned where to find this entry, namely in UserShare zone info in Europe Berlin. The file at the local time was created. To keep this path a little shorter, we cannot open this file cat at the local time, so that doesn’t work.
But if we check what kind of file it is file at the local time, then we see that it is nothing more than a symbolic link to the file user share zone. Info europe. Berlin if you wanted to change the time zone, you would have to adjust the two files at the time zone and etsy local time by hand. But it is also a little more convenient using the Tzat select tool. Tzat select is the abbreviation for time zone select, and the time zone that is now selected via Tzat select is not saved system wide, but it is only valid for the corresponding user. So for my user and we can now easily select our time zone. So I use seven.For Europe, please select a country whose clocks agreed with yours. I choose 16. For Germany, please select one of the following time zones swiss time or Germany.
I use Germany. Therefore, time zone tzat Europe Berlin will be used. Is the above information okay? Yes, it is okay. And in the output we can see that we are advised to save the corresponding time zone in the Tzat variable. Tzat naturally stands for time zone. As expected, the variable is currently empty. We can check that with echo tzat empty here. There’s nothing in it, but theoretically we could fill it as it is shown here in the output and then store it in the profile file so that it would simply be permanently active regardless of what we did during the installation. Yeah, of course we don’t have to do that now because the language is already set system wide, which is why we don’t need the variable at all.
Of course we can use other language now, but I think that’s clear what the variable should stand for. If you want to change the time zone system wide, you would have to choose the following command reconfigure and then Tzat Data attention tzat Data and not T that date. So now we actually get a graphical query and here I choose Europe Berlin and that’s it. I haven’t changed anything either. The last command in this chapter is the Time date Ctl command. If executed without an option, it shows various time information. Time Date Ctl first of all, the system displays the local time that has just been discussed for a long time, then the Universal Time, so the coordinated Universal Time, then the RTC time RTC stands for Real Time Clock and shows the physical time.
This is followed by the time zone, by the set time zone, followed by a query whether the system clock has been synchronized and so on. With Time date Ctl you can also change this time zone or the time itself. For example, the following command now lists time zones known to all of us time Date Ctl list time Zones and here we have all the time zones that the system knows and that we could select if we want to change a time zone. We could enter the following command time Date Ctl set Time Zone and that for example Europe Amsterdam in the Netherlands and let’s check the Etsy time zone file and we see the time zone has now been changed to Europe Amsterdam.
We will check, change it back again and I will say Berlin again. So time date ctl set time zone Europe, berlin. And now the time zone file is back to Berlin. The time could be changed with the command time date Ctl set time date Ctl set Time followed by the time of course. And I don’t do that now. And here at the end I would say I definitely recommend to take another look at the man page on the subject of Time date Ctl. And there are really many options that you can choose here. Here, we see set time. So how should the time be set in this format here? Or for example set time zone list time zones that was just discussed and so on. So I would recommend to take another look. Good, that’s it for chapter 107. We received you again in chapter 180.