Font size:
Webdict Compilation
This text documents the compilation process of the web dictionaries.
For the moment, we use the Apertium dictionary format.
The files
The online files themselves are stored in the relevant catalogue for displaying online files on victorio. They should be identical to the files in $GTHOME/apps/dicts/apertiumdict/ , so keep the two catalogues in synch. The victorio files are for the net, and the ones in apps/dicts are for offline use.
The conversion procedure
The files are converted from the original dictionary files, $GTHOME/words/dicts/LANG1LANG2/ , via the following procedure:
- collect all relevant entries in a single file (this script does only that)
- in $GTHOME/words/dicts/scripts/ issue the following command (where INPUT_DIR is ../LANG1LANG2/src/):
- java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it main scripts/collect-dict-parts.xsl inDir=INPUT_DIR/ > PATH_TO_OUTPUT_FILE
- different filters for different language pair are possible/needed
- convert the gt_format into the accepted apertium_xml format
- in $GTHOME/words/dicts/scripts/ issue the command:
- java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it main gtdict2simple-apertiumdix.xsl inFile=INPUT_FILE outDir=PATH_TO_OUTPUT_DIR
- compile the file using the apertium tools, with this command:
- ./apertium-dixtools dix2trie INPUT_FILE lr LANG1-LANG2-lr-trie.xml
- update the file from the $GTHOME/apps/dicts/apertiumdict/ into the relevant catalogue for displaying online files on victorio.

