Sámediggi > Divvun
new seed
powered by the dispatcher
spacer

General information

by Trond Trosterud

General information for new users

Here you find the technical documentation of the Sámi language technology project (also called Giellatekno), conducted at the University of Tromsø, and its twin project Divvun (making Sámi spellers), conducted at the Norwegian Sámi Parliament.

In the thin menu line above you see:

  • The Project tab that gives some general information about both projects.
  • The Meetings tab that contains the minutes of our meetings, both from our weekly (virtual) meetings and from the other meetings we have held during the project
  • The Infrastructure tab that contains information on how to set up your computer to be able to use the tools and programs to be able to work with our source code and documentation on your own machine.
  • The Language tab that is divided into one subpart for each language treated. It contains detailed documentation on the source files for each language.
  • The Linguistics tab that contains language-independent documentation on how to
    • convert texts to our corpus format
    • preprocess files
    • how to use the morphologial parsers
    • how to do disambiguation of sami texts
  • The Ped tab documents our pedagogical project
  • The Tools tab that gives an introduction on how to use the tools that are described under the Infrastructure tab
  • The Bugzilla tab links directly to our bug database.

Documentation and source code

The source code and the documentation of these projects are under version control, using the program svn.

svn makes it possible for everyone to fetch our source code and documentation to their own machine, and tinker with it in peace and quiet. There are two methods of fetching the files to your own machine.

  • Those who have an account on our svn server can both fetch files (called check out) and write back changed files (called check in) on this server. (Instructions)
  • Others who don't have an account at our server can only fetch files and don't write back changes. (Instructions). Note that these instructions still refer to our obsolete cvs system, they will be rewritten shorty.
Note
If you want to compile our source code you must have the Xerox finite state tools, which is the building stone of these projects. This is proprietary software which is bundled with the book Finite State Morphology. For syntactic disambiguation, you will need the vislcg3 program.

Directory structure for the source code

The project specific documentation are in the xtdoc module, the divvun project's files are in the sd directory, the university project's files are in the gtuit directory. The common documentation and source code of these two projects is in the gt svn module.

The project is located in the directory gt/ (an acronym for giellateknologiija, language technology). These are the subdirectories (the abbreviations for the different languages are in accordance with the ISO 639-2 standard for language codes):

  • doc/ = Documentation directory, which contains this documentation, and has the following subdirectories:
    • admin/ = Contains files related to project administration
    • infra/ = Documents how to set up users and machines and contains information on how the servers are set up.
    • lang/ = Contains documentation on the lexica and language files for the Sámi languages.
    • ling/ = Contains documentation on topics common to all languages.
    • tools/ = Howtos for the tools used by the project.
  • script/ = script files, with cgi-bin, emacs and testing as subdirectories, along with other script files
  • smi/ = files relevant to all the languages, e.g. proper names,
  • sme/ = Northern Sámi
  • smj/ = Lule Sámi
  • sma/ = Southern Sámi
  • smn/ = Inari Sámi
  • sms/ = Skolt Sámi
  • sjd/ = Kildin Sámi
  • www/ = directory for web-related issues
  • tmp/ = directory for temporary storing of script files under compilation.

Each language directory has the following subdirectories:

  • bin/ the program files, these are automatically generated by the make command
  • dev/ developer's file (store your own notes here if needed)
  • corp/ corpus files (cf. the README file in the corp/ directory)
  • int/ intermediate binary files, not that relevant to the final parsers
  • src/, the source files, our crown jewels
  • testing/, containing files for morphology testing (how to conduct such testing is explained on the Testing tools page)

The gt/ directory is copied to the home directory of each user by the svn program.

Last changed: $Id: intro.xml,v 1.8 2005/06/21 12:13:53 boerre Exp $

Copyright © 2004-2010 Norgga sámediggi · The Norwegian Sámi Parliament.
Send feedback about the website to: the Webmaster