<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.3//EN" "document-v13.dtd">
<document xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en">
  <header>
    <title>Localisation</title>

    <authors>
      <person email="trond.trosterud@hum.uit.no" name="Trond Trosterud"/>
    </authors>
  </header>

  <body>
    <section>
      <title>Localisation</title>

      <p>All our projects are run in UTF-8, since the spring of 2005.</p>

      <p>There is one exception, and it is not in-house: The Oslo web interface Glossa
      handles only 8-bit code tables. Our texts at glossa are coded in 
      <link href="http://www.hum.uit.no/a/trond/197.jpg">ISO-IR 197</link>.</p>

      <section>
        <title>Northern Sámi</title>

        <p>The Northern Sámi parser is localised with UTF-8, as it should be.
        In our web interface there is a filter to let users feed in c1, d1, s1, etc,
         instead of the correct Sámi letters. Output to the user is in UTF-8.</p>
      </section>

      <section>
        <title>Lule Sámi </title>

        <p>Lule Sámi can adequately be represented in 8-bit format, by Latin 1
        (Lule Sámi files can use <em>ñ</em> for n-acute). In order to have
        only one localisation interface for the languages we work on, Lule
        Sámi has been moved to UTF-8 as well. There is a script,
        <code>spell-relax.regex</code>, which makes it possible to use both ñ,
        &#324; and &#331;.</p>
      </section>

      <section>
        <title>Southern Sámi</title>

        <p>Also Southern Sámi can be represented in 8-bit format, by Latin 1,
        but just as for Lule Sámi, it is stored in UTF-8. The script 
        <code>spell-relax.regex</code> is used for Southern Sámi as well, but 
        with a slightly different purpose: It is used to accept the wide-spread
        sloppy use of <em>i</em> for <em>ï</em>.</p>
      </section>

      <section>
        <title>Inari, Skolt and Kildin Sámi</title>

        <p>Inari Sámi, Skolt and Kildin Sámi are repesented in Unicode. For Kildin
        Sámi, note that the latin-like letters we use are actually Cyrillic ones.</p>
      </section>


      <section>
        <title>Our other languages</title>

        <p>All our other languages are in UTF-8 as well.</p>
        <p>Note that for Iñupiaq, we do <em>not</em> use the wide-spread 8-bit
        <em>Interactive IñupiaQ Dictionary</em> encoding, as it has placed the Iñupiaq
        characters in the ASCII area. There is a converter on the Iñupiaq page.</p>
      </section>
      
      <p class="last_modified">Last modified: $Date: 2009-05-22 18:17:14 +0200 (fre, 22 mai 2009) $, by
      $Author: trond $</p>
    </section>
  </body>
</document>
