Meeting_2008-06-30
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Pedagogical software online
- Documentation
- Corpus gathering
- Future plans, directions and ideas
- Infrastructure
- Linguistics
- Name lexicon/risten.no infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 30.6.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 11:50.
Present: Børre, Jovsset, Sjur, Trond
Absent: Per-Eric, Thomas, Tomi
Agenda accepted as is.
Updated task status since last meeting
Børre
- update svn user documentation as needed
- try to repair G5 accounts for iCal Server
- make a test-all target that runs all tests we have
- define and document testing routines
- fix the remaining hunspell conversion bugs
- send new svn e-mail 23.6 as a reminder
- change license on hunspell distros to GPL2+
- fix bugs!
Jovsset
- follow up on sma corpus texts
- translate leaflet text
- Talk to John Marcus Kuhmunen about layout, pictures for the leaflet.
- Talk to John Marcus Kuhmunen printing
- Presentation on sami publisher meeting
- distribute CD version through the library bus, the language centres and common sami centres in all of Sápmi
Lene
- get the ped content ready
- Work on test routine with Trond and Sjur
Per-Eric
- follow-up on the smj texts from Kurt Tore ( Per-Eric )
- follow-up contracts from Nord-Salten avis and Lena Davidsson
- Work with missing list same_dutkama_pgr.txt
- Plan a smj pr tour for our tools
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
- implement the ped UI and functionality
Sjur
- update svn user documentation as needed
- follow up on sma corpus texts
- name db/risten.no
- update the Changes document
- follow-up on some Polderland-related bugs: 621, 630, 652
- InDesign documentation
- make a test-all target that runs all tests we have
- define and document testing routines
- write leaflet text
- fix bugs!
Thomas
On vacation
Tomi
On vacation
Trond
- Set up Jabber for Lene
- make a test-all target that runs all tests we have
- define and document testing routines
- Dictionaries
- update svn user documentation as needed
- prepare external users: Meänkieli and Greenlandic groups, Jack Ruether
- fix bugs!.
Pedagogical software online
Meeting memos can be found at http://giellatekno.uit.no/ped/index.html#Meeting+memos
TODO:
- get the content ready ( Lene, Biret, Trond )
- implement the UI and functionality ( Saara )
- get an easy-to-remember URL ( UiT/IT )
- More thorough skin, layout, ... ( External person within the Ped team , Internal forrest expert ) This we will postponed until later
- Make a pedagogical speller ( Tomi when finished with his MA thesis)
- Turn off peripheral compounds (numbers, acros, perhaps names)
- Increase editing distance by one for suggestions? Only possible with limited compounding
Documentation
TODO:
- start to reorganise the documentation ( Børre, Sjur, Trond )
Corpus gathering
TODO:
- follow up on sma corpus texts ( Sjur, Jovsset )
- follow-up on the smj texts from Kurt Tore ( Per-Eric )
- other contacts: Nord-Salten avis ( Børge Strandskog ), Lena Davidsson daughter to Lars-Matto Tuolja
- Ulf Stefan Winka has a lot of smj texts ( Thomas )
- plan a smj pr tour for our tools ( Per-Eric, Thomas )
- meet with the Sámi publishers; main topics: ( Jovsset )
- present the present project (Divvun II)
- discuss inclusion of our contract in theirs (DG and Čálliid Lágádus already positive)
- present the electronic dictionary sme-nob, with "inflectional intelligence"
- also regular contracts, and how to make faster progress with them, especially for the south sami and julev sami publishers since we have few texts in these two varieties.
- prepare meeting with Børre ( Jovsset )
- make leaflet to inform about the project
- write text ( Sjur )
- translate text ( Thomas, Jovsset, Maaren )
- talk to John Marcus Kuhmunen about layout, pictures ( Jovsset )
- talk to John Marcus Kuhmunen about printing ( Jovsset )
- distribute CD version through the library bus, the language centres and common sami centres in all of Sápmi. Gaaltije in Östersund for example. ( Leif Åge, Jovsset, Sjur )
Future plans, directions and ideas
See a separate document in plan/strat/5year.jspwiki .
Infrastructure
To accomodate future enhancements in different directions (in rough order of importance):
- test bench for all parts of our language technology efforts
- migrate to svn
- merge gt, kt and st into one, probably after the svn move
- more modularised make / build infra (prepare for smn, sms, sjd, others)
- close certain parts of the code repository (requires svn)
- set up the Leopard Server features for collaborative support:
- permanent chat rooms
- stored (and indexed) chat transcripts of the chat rooms
- iCal server / group calendars
- wiki
- wiki? (is part of Leopard Server) or other web-based documentation
- improve Forrest stability and i18n support
- reorganise the documentation content:
- differ between target groups
- get better grouping
- decide what to write in forrest and what in wiki (cf. Apertium for a similar split)
- update/add missing parts
- migrate lexicons to XML, splitting the task
- Name lexica (the Name project)
- Dictionaries (already in XML, task is to integrate them)
- Open POSes (Komi as a test case)
- change the look of the documentation web
- sfst? Both as replacement for xfst and for hunspell/open-source proofing tools
- investigate the NSIS installer, potentially replacing the InstallShield package from Polderland
- corpus content moved to Max Planck repositories?
SVN issues:
- root/prooftools not available - fixed (somehow)
- http access not yet available
- read access to the whole repo is working, BUT:
- gt/smX/polderland should be protected
- everything will be google-able by default if the repo URL is posted
- plan should be protected?
TODO:
- make a test-all target that runs all tests we have ( Børre, Sjur, Trond )
- define and document testing routines ( Børre, Sjur, Trond )
- add Jabber account in iChat for Lene ( Trond )
- follow-up migration to svn ( Børre, Sjur, Trond )
- update user documentation as needed ( Børre, Sjur, Trond )
- rewrite bashrc aliases geared towards cvs, if needed ( Trond )
- prepare and discuss with external users: Meänkieli group, Greenlandic group, Jack Ruether ( Trond )
- try to repair G5 accounts for iCal Server ( Børre )
- update the OS at the same time
Linguistics
North Sámi
(nothing new, see proofing bugs below)
Lule Sámi
(nothing new, see proofing bugs below)
TODO:
- sme->smj lexicon conversion to build bilingual lexicon resources, and increase smj coverage ( Trond, Svenne ).
- Add the words when all words are ready.
South Sámi
Jovsset will ask the authors whether we can get a copy of the Verbh manuscript in electronic version, with the usual corpus contract.
Name lexicon/risten.no infrastructure
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale request) ( Sjur )
- fix bugs in lexc2xml; add comments to the log element ( Saara )
- finish first version of the editing ( Sjur )
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond )
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) ( Sjur, Saara )
- convert propernoun-($lang)-lex.txt to a derived file from common xml files ( Sjur, Tomi, Saara )
- implement data synchronisation between risten.noand the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) ( Thomas, Maaren, linguists )
- merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset ( linguists )
- publish the name lexicon on risten.no ( Sjur )
- add missing parallel names for placenames ( linguists )
- add informative links between first names like Niillas and Nils ( linguists )
Dictionaries
TODO:
- clean up and generalise the make infrastructure
- make Linux and Windows local/integrated versions
- make simple installer applications
- make a public release
- Make a homepage with instructions for dictionary use: xtdoc/gtuit/src/documentation/content/xdocs/dict.eng.xml
- Clarify the difference between local and online dictionaries:
- Plugin for Firefox and Internet Explorer (online dictionaries)
Proofing tools
Hunspell
TODO:
- change license on distros to GPL2+ ( Børre )
- QA README and installation docs ( Trond )
- fix the remaining conversion bugs ( Børre, Tomi )
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) ( Saara )
- test new and nested error markup ( Sjur )
Speller bugs
List of bugs returned from Polderland: 621, 630, 652, 656, 676.
Open issues based on test results:
sme
Version: Davvisámi, version 1.0.1, 2008-06-02
- 426 - comp words from Divvun.no - guoktedássásaš accepted - still OPEN
- 435 - roman numbers - inflection of single letter numbers rejected, as well as some complex numbers (but is ok in smj ) - still OPEN
- we should pregenerate all numbers once and for all, and store them in a separate lexicon file
- 595 - prefix+name wihtout hyphen ( ovdaLot instead of ovda-Lot ) - still OPEN
- 600 - gen+hyph compound sámi-dáru - still OPEN
- 603 - suomabealdi accepted - still OPEN
- 606 - speller accepts VUOHTA compound - still OPEN
- 611 - double hyphen sugg still accepted - still OPEN
- 613 - short gen. as second compound part - still OPEN
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still OPEN
- 627 - prefix + hyhpen does not get accepted - FIXED
- 629 - a taking part in compounding without hyphen - still OPEN
- 633 - REGRESSION: double hyphens accepted
- 634 - PropGen+hyph+PropGen - still OPEN
- 641 - numeral+noun compounds - still OPEN
- 642 - noun/adj/proper + hyphen + ain - still OPEN
- 644 - cased numeral+numeral compund - still OPEN
- 646 - adverb + hyphen + noun - still OPEN
- 647 - numerals+NOUN - still OPEN
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 649 - name + adj compound without hyphen - still OPEN
- 654 - speller does not recognize ordinals on -nuppelogát - still OPEN
- 655 - pron + nai - still OPEN
- 658 - Suggestion saame - still OPEN - won't fix
- 666 - guovtte- and njealje- - FIXED
- 676 - triple-hyphen - FIXED , but double hyphen is still accepted
- other regressions:
- skuvlajagin now accepted
- skierranis now accepted
smj
Version: Julevsáme, version 1.0.1, 2008-06-02
- 435 - roman number - single letter numbers now recognised
- we should pregenerate all numbers once and for all, and store them in a separate lexicon file
- please note that inflection of single letter numerals is fine in smj , as opposed to sme
- 595 - prefix+name wihtout hyphen ( tsåhkeLot instead of tsåhke-Lot ) - still OPEN
- 599 - REGRESSION: numeral attr:s on lot
- 600 - gen+hyph compound sáme-dáro - still OPEN
- 616 - Bispadime-me-ráden - still OPEN , try to find an acro or abbr me
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still OPEN
- 629 - a taking part in compound - still OPEN
- 634 - rop gen + hyphen + Prop gen - still OPEN
- 641 - numeral+noun compounds - still OPEN
- 644 - cased numeral+numeral compund - still OPEN
- 647 - numerals+NOUN - still OPEN
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 649 - name + adj compound without hyphen - still OPEN
- 650 - noun prefix+name compound without hyphen - still OPEN
- 658 - Suggestion saame - still OPEN , won't fix
- 692 - NEW: - numeral-variants
- other regressions:
- gus NOT accepted anymore
TODO:
- compile new speller lexicons ( Tomi )
- document how compounding is controlled in the PLX conversion ( Tomi )
Hyphenator bugs
Open issues based on test results :
sme
Lexicon version: Davvisámi, version 1.0.1, 2008-04-01
- 468 - REGRESSION: Márkomeanu
- 547 - REGRESSION: hyphen in front of vowel: Lotnolasealáhusas
- 548 - REGRESSION: mid syllable hyphenation: Háliidivččen
- 549 - REGRESSION: division without hyph: Váccedettiin
- 673 - adj-derivations: guovttenuppelotčoarvvagiin (the word is not rec.)
- 677 - NEW: Wrongly hyphenated ending -danidja - invalid
smj
Lexicon version: Julevsáme, version 1.0.1, 2008-04-01
- 545 - REGRESSION: bad hyphenation in compounds: åhpadusorganisásjåvnån (not recognised)
- 546 - REGRESSION: obligatory hyph rules seem to work in facultative manner: organisásjåvnån (not recognised)
- 547 - REGRESSION: hyphen in front of vowel: Jienastimnjuolgadusá and Orgánajs
TODO:
- fix hyphenator errors ( Tomi )
InDesign tools
Nothing new.
Releases
TODO:
- update the Changes document ( Sjur )
- InDesign documentation ( Sjur )
- Norwegian translation received from Davvi Girji
Other
Corpus contracts + open source
Now decided to wait until we have changed from cvs to svn .
Summer vacations
| Who | When |
|---|---|
| Børre | 30/6-6/7, 21/7-3/8, 11/8-17/8 |
| Jovsset | 7-11/7, 21/7-8/8 |
| Per-Eric | 11/6 - 30/6 |
| Sjur | 7/7 - 1/8 |
| Tomi | 16/6 - 4/8 |
| Thomas | 23/6 - 18/7 |
| Trond | 30/6 - 18/7, 28/7 - 1/8 |
Next meeting, closing
The next meeting is 11.8.2008, 9.30 Norwegian time.
The meeting was closed at 13:18.
Appendix - task lists for the next five days
Boerre
- update svn user documentation as needed
- try to repair G5 accounts for iCal Server
- make a test-all target that runs all tests we have
- define and document testing routines
- fix the remaining hunspell conversion bugs
- send new svn e-mail 23.6 as a reminder
- change license on hunspell distros to GPL2+
- fix bugs!
Jovsset
- follow up on sma corpus texts
- translate leaflet text
- Talk to John Marcus Kuhmunen about layout, pictures for the leaflet.
- Talk to John Marcus Kuhmunen printing
- Presentation on sami publisher meeting
- distribute CD version through the library bus, the language centres and common sami centres in all of Sápmi
Lene
- get the ped content ready
- Work on test routine with Trond and Sjur
Per-Eric
- follow-up on the smj texts from Kurt Tore ( Per-Eric )
- follow-up contracts from Nord-Salten avis and Lena Davidsson
- Work with missing list same_dutkama_pgr.txt
- Plan a smj pr tour for our tools
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
- implement the ped UI and functionality
Sjur
- update svn user documentation as needed
- follow up on sma corpus texts
- name db/risten.no
- update the Changes document
- follow-up on some Polderland-related bugs: 621, 630, 652
- InDesign documentation
- make a test-all target that runs all tests we have
- define and document testing routines
- write leaflet text
- fix bugs!
Thomas
On vacation
Tomi
On vacation
Trond
- Set up Jabber for Lene
- make a test-all target that runs all tests we have
- define and document testing routines
- Dictionaries
- update svn user documentation as needed
- prepare external users: Meänkieli and Greenlandic groups, Jack Ruether
- fix bugs!.

