Building and testing hunspell and hunspell dictionaries
by Børre Gaup
Building and installing hunspell
- Get hunspell, preferrably version 1.1.10
- Unpack it
- Run ./configure
- Run make
- Run make install
Building hunspell dictionaries and testing
To be able to build hunspell dictionaries we will have to build and install a hunspell specific transducer, then build and run the java program that generates the hunspell dictionaries.
Building and installing the transducer
- Goto gt and do cvs up
- Build the hunspell transducer, make TARGET=sme i-hunspellfst M4FLAGS=-DHUNSPELL
- Move the resulting transducer to /opt, sudo cp sme/bin/i-hunspell-sme.fst /opt/sme/bin
Building and installing the java program
- Go gt/src/lexc2xspell
- Build the program, ant
Generate dictionaries and debugging output
While generating dictionaries, the program can also produce debugging output, by making full paradigms of the words we try to generate for hunspell. If the --debug option is used, then the debugging output will placed inside gt/src/lexc2xspell, in the files debug-$POS-$LANG.txt. The dictionary files are placed in sme.dic and sme.aff.
- Generate the hunspell dictionaries and debugging output, java -cp build Lexc2hunspell --debug ../../sme/src/noun-sme-lex.txt
Testing the dictionaries
To test the quality of the generation, we will have to build one POS at the time, then do the following.
- hunspell -d ./sme -l debug-$POS-$LANG.txt > feil-$POS-$LANG.txt
- Find the error percentage:
- wc -l feil-$POS-$LANG.txt
- wc -l sme.dic
- Divide the first number with the second
To test the speller for real, build all the POS's, then do the following: hunspell -d ./sme -l <test-text>
Last changed: $Id: hunspell.xml 21489 2008-11-05 17:52:54Z boerre $