Hyphenation testing documentation
Hyphenation testing follows the same automatic test bench setup as for the spellers. The following tests will soon be available (it depends on the availability of the command line hyphenator, which should be ready for download within days):
- regression test
- baseform comparison test
- correct test
It is possible that we also can provide hyphenation for OpenOffice, but nothing is decided yet.
Regression testing
Regression testing takes as input a set of known problematic words, ie words we have seen have been wrongly hyphenated. These words are sent through the hyphenator, and the result is compared to the expected hyphenation pattern. Finally a test report is generated, which is transformed to Forrest xml, and can be viewed directly in a web browser if Forrest is running.
Baseform comparison testing
The point of this test is to compare the hyphenation output from our Polderland hyphenators (InDesign, MS Office) with that of our hyphenating transducer. The input is all baseforms in our lexicon (ie no dynamic compounds), which are then sent through both our hyphenating transducer as well as the command line hyphenator from Polderland.
The result is compared, and all diffs are reported back as bugs in one or the other hyphenator. This test report is also XML, transformed to Forrest XML, which are then directly viewable in your web browser of choice.
Correct testing
The input is here a list of (manually) correctly hyphenated words. The words are sent through the hyphenator (with the manually added hyphenation points removed), and the result is compared to the facit. The test report produced will highlight any differences, and give a quality score.
Hyphenation engine
As for the spellers, one can test different hyphenator engines. Presently we have only one (soon, that is), the Polderland command line hyphenator. But in the future, expect to see other implementations as well, such as ones for OpenOffice.org. The engine can be selected with a make command option, as follows:
make hyp-regr-test TARGET=sme TOOL=pl
The above command will run the regression test for North Sámi, using the Polderland command line hyphenator as the hyphenation engine.
The hyphenating transducers are not really hyphenator engines, but is used as a reference point to compare the output of the other engines against. Still, there is of course a chance that it also produces wrong results, and both the correct test and the regression test can be used to ensure that the hyphenating transducers are behaving correctly, at least for the sample given in the test data. The TOOL code for the transducer is tx.
Last modified: $Date: 2008-11-05 18:52:54 +0100 (ons, 05 nov 2008) $, by $Author: boerre $




