LINGO LANGUAGE FILES This document describes 1. how the program Lingo uses language files; 2. how language files could be enhanced or new ones created; 3. how Ergane language files could be imported to Lingo; 4. how the Lingo language files have been compressed. 1. Structure of Lingo Language Files ------------------------------------ The lingo language files are ordinary database files, created with the DATA application. The names of the files follow the ISO 639 convention. The structure of the files consists of two columns; the first column k is a 5-digit number, the second column n is a 28-character text. Here is an excerpt of some of these files: EO ¦ EN ¦ FR ¦ DE k n ¦ k n ¦ k n ¦ k n 1 ¦ 1 ¦ 1 ¦ 1 ... ¦ ... ¦ ... ¦ ... 11800 facila ¦ 11800 easy ¦ 11800 facile ¦ 11800 leicht ¦ 11800 facile ¦ ¦ ¦ ¦ ¦ 16125 hela ¦ 16125 bright ¦ 16125 clair ¦ 16125 hell ¦ 16125 clear ¦ ¦ 16125 licht ¦ 16125 light ¦ ¦ 16125 lichtvoll ¦ ¦ ¦ 24047 leghera ¦ 24047 light ¦ 24047 léger ¦ 24047 leicht ¦ ¦ ¦ 24799 luma ¦ 24799 bright ¦ 24799 clair ¦ 24799 hell ¦ 24799 light ¦ 24799 lumineux¦ ¦ ¦ ¦ 24837 lumo ¦ 24837 light ¦ 24837 lumière ¦ 24837 Licht ¦ ¦ ¦ 25991 malpeza ¦ 25991 light ¦ 25991 léger ¦ ¦ ¦ ¦ ... ¦ ... ¦ ... ¦ ... 59999 ¦ 59999 ¦ 59999 ¦ 59999 One thing that this example wants to show is, that the english word 'light' has different meanings. The example wants also to show that an esperanto word has only one meaning, even if some esperanto words could mean the same thing (like 'hela'='luma'). Because esperanto words are (almost) never ambiguous, Lingo (and its father Ergane) are based on this language. An example will show the benefit: suppose that I want to translate 'leicht' from German to French. I see that 'leicht' has two meanings, 'facila' and 'leghera' in Esperanto. The corresponding words in French are 'facile' and 'léger'. If I would have used English as auxiliary language, I would have got some erronous translations: 'leicht' could be 'light' in English, and for 'light' there are several more translations in French, like 'lumière' (which is a noun and not an adjective). Now, Lingo does not really look in the esperanto-file for translating. It just uses the number column k in both the input and output languages to join the words (for every number k, there is a unique word in the esperanto file). More in detail, if an input word (or a pattern with wildcards) is given, the first step of the search algorithm makes a list of numbers k matching the input word. In a second step, the program looks up in the output language all the words corresponding to one of the numbers in the list; it then makes couples of translations and eliminates duplicated couples. Lingo expects that in the language files, the k column is sorted ascending; thus when reading through the files searching for word number x, the searching stops when a word with a number greater than x is read. 2. Editing (or Creating) Lingo Language Files --------------------------------------------- Since Lingo language files are compressed (see below) it is not possible to edit them anymore with the DATA application. Instead, a new (uncompressed) DATA file, based on the provided template A419 must be used. To copy from an existing language-file to the new one: open the existing language-file with DATA; use ExportAsText; open the template A419; SaveAs an new name; ImportFromText. When editing the new language-file, some rules must be respected. Rule 1: if a new language is created, it must have a name of two letters, as Lingo (for simplicity) only displays these languages. Rule 2: the language-file must have the correct structure of 2 columns as described above. When new files are based on the template A419, this is assured. Rule 3: if a synonym is added, it must be put at the place with the correct meaning. Example: if we know that the word 'light' could also translate to 'leuchtend' in German, then we could add this word at position k=24799 in the german language file; but certainly not at k=24047. It is an advantage to know basic rules of Esperanto, to find the right place. Rule 4: if a missing word in one language is added, again the correct place must be located. Example: in the excerpt list above, it seems that in the german language, a new entry (k=25991,n=leicht) could be added. Rule 5: if a couple of translations is added, then a new place must be defined in both (or more) languages. Example: suppose you know that the English word 'foobar' will translate to 'Dingbat' in german. None of these words is already in the language files; thus you can add (k=99999,n=foobar) in EN, and add (k=99999,n=Dingbat) in DE. The number k can be freely choosen, but should be greater than 60000 (as the 'official' Lingo language files will use k in the range 1 .. 60000). Rule 6: due to the way that Lingo searches in the language-files, the words must be physically in ascending order of column k! When words are added in DATA, they are always appended at the end of the file. When in DATA you ask to sort on k, the file is DISPLAYED in correct order, but still physically the last entries are at the end of the file. To fix this problem, proceed as follow: after some words have been added, ask the DATA application to resort on column k, then export the whole database to a textfile, then make an empty language file, and finally reimport data from the textfile. In this way, the modified language-file is OK for Lingo. 3. Importing files from Ergane ------------------------------ The language files at the website of Ergane (www.travlang.com/Ergane) are downloaded as ZIP-files. In the ZIP-file, there is one Access-file (with extension mdb). In this file, there is a table with the same name as the language. This table has several columns; the only columns that are interesting us are EspKey and XEntry: they are exactly the columns named k and n of lingo databases. First, the table must be sorted ascending on column EspKey. Then, the two columns must be cut and pasted to a textfile. It is suggested to format the textfile like this: "1","word or expression1" "2","word2" "4","another word" ... This file is transfered without conversion to the EPOC machine. Here, the Data application is started with an empty lingo language file based on the template A419 (see above), and the textfile is ImportFromText'ed. For some Ergane files (using non-latin character sets), the process described here is not possible, since the MDB file is empty and the words are stored in a different files. The conversation from these files is more complicated and it must be settled (using special fonts with Lingo) how these languages could be used on the EPOC machine. 4. Compression of the language files ------------------------------------ In order to safe precious memory on the EPOC machines, the DATA files have been compressed, using the utility program DATAZIP included in the Symbian SDKs. Originally this tool, from the aleppo-package, is meant to compress helpfiles, but it can be applied to any DATA file. As a result, the language file can be reduced to almost half the size! This compressing must not be mistaken with COMPACT: while DATAfiles are edited, their size increases with every change done; periodically when the file is reopened, it is compacted to recover idle space. This operation, which can also be requested (with COMPACT function in OPL for example), recovers little amount compared to the applied ZIP-like compression. - - - - Patrick Hahn phahn@vo.lu