Sámediggi > Divvun
 
Font size:      

Webdict Compilation

This text documents the compilation process of the web dictionaries.

For the moment, we use the Apertium dictionary format.

The files

The online files themselves are stored in the relevant catalogue for displaying online files on victorio. They should be identical to the files in $GTHOME/apps/dicts/apertiumdict/ , so keep the two catalogues in synch. The victorio files are for the net, and the ones in apps/dicts are for offline use.

The conversion procedure

The files are converted from the original dictionary files, $GTHOME/words/dicts/LANG1LANG2/ , via the following procedure:

  • collect all relevant entries in a single file (this script does only that)
    • in $GTHOME/words/dicts/scripts/ issue the following command (where INPUT_DIR is ../LANG1LANG2/src/):
    • java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it main scripts/collect-dict-parts.xsl inDir=INPUT_DIR/ > PATH_TO_OUTPUT_FILE
    • different filters for different language pair are possible/needed
  • convert the gt_format into the accepted apertium_xml format
    • in $GTHOME/words/dicts/scripts/ issue the command:
    • java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it main gtdict2simple-apertiumdix.xsl inFile=INPUT_FILE outDir=PATH_TO_OUTPUT_DIR
  • compile the file using the apertium tools, with this command:
    • ./apertium-dixtools dix2trie INPUT_FILE lr LANG1-LANG2-lr-trie.xml
  • update the file from the $GTHOME/apps/dicts/apertiumdict/ into the relevant catalogue for displaying online files on victorio.