Font size:
corpus_maintenance
Corpus maintenance
This document keeps track of measures to improve the corpus conversion process.
Note also the sentence alignment page, which looks into that specific sub-part of the corpus maintenance.
Corpus improvement project meetings
Spring 2011
Autumn 2011
OCR and conversion errors leftover from spring 2011
- OCR error overview, May 2011(still open issues here)
- Conversion errors(open issues here?)
Task list, autumn 2011
Corpus conversion targets
As a reminder, this is what we aim at:
- Sentence-aligned bilingual corpus for CAT
- Analysed mono- and bilingual corpus
- Lemmatised and word-aligned for terminology
- Fully analysed and presented for linguistic work and terminology
- Among other things, a one-click corpus like europarl

