<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.3//EN" "document-v13.dtd">
<document xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en">
  <header>
    <title>Development tools</title>
    <authors>
      <person email="trond.trosterud@hum.uit.no" name="Trond Trosterud"/>
      <person email="borre.gaup@samediggi.no" name="Børre Gaup"/>
    </authors>
  </header>

  <body>
    <section>
      <title>Development tools</title>

      <p>The project manipulates text in many ways, organized in lexicons.</p>

      <section>
        <title>Editors</title>

        <p>To edit our source file we need a text editor, which has to support
        UTF-8, and can save the edited result as pure text. The tools also have 
        to be <link href="utf-8-setup.html">setup</link> to handle the utf-8 
        encoding. You may use <link href="docu-emacs.html">emacs</link> and it's
        <link href="docu-emacs-modes.html">modes</link> On a Mac you may use
        <link href="subethaedit.html">SubEthaEdit</link>.</p>
      </section>

      <section>
        <title>Documentation tools</title>

        <p>We publish our documentation with <link href="../infra/forrest-howto.html">forrest</link>. 
        To edit these documents you need at least a text
        editor, and we recommend <link href="xml-mind.html">XMLMind</link>.</p>
      </section>

      <section>
        <title>Morphological analysis</title>

        <p>The project uses the following Xerox tools: <strong>twolc</strong>
        (for morphophonology), <strong>lexc</strong> (for morphology),
        <strong>xfst</strong> (for compiling the final transducer) , and
        <strong>lookup</strong> (for analysis and generation).</p>

        <p>The link list below refers to the Xerox documentation pages for
        these tools, these and other links are found <link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/">here</link>:</p>

        <ol>
          <li><link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/twolc-92/twolc92.html">twolc</link>,
          for phonological and morphophonological rules</li>

          <li><link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/lexc-93/lexc93.html">lexc</link>,
          for representing the Sami stems and the affix lexica</li>

          <li><link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/fst-97/xfst97.html">xfst</link>,
          the finite-state transducer tool, for integratingthe different parts
          of the program, and for compiling the preprocessor</li>

          <li><link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/tokenize-97/tokenize97.html">tokenize</link>,
          for tokenization and processing (note that we do not use tokenize
          for preprocessing at the moment, but perl)</li>

          <li><link href="http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/lookup-97/lookup97.html">lookup</link>,
          an interface to the morphological analyser. NB! cf. our <link href="docu-lookup.html">lookup notes</link></li>
        </ol>

        <p>The programs are activated by printing e.g. <code>lexc</code> and
        then pressing the enter key. The tools are documented in Karttunen /
        Beesley <link href="http://www.fsmbook.com">Finite-State Morphology:
        Xerox Tools and Techniques</link>. The tools may also be
        installed on your own machine, be it on Mac OSX, Linux or Windows. One
        version of the software is found on the CD accompanying the book, for
        the latest version, ask Trond for reference.</p>
      </section>

      <section>
        <title>Disambiguation tools</title>

        <ol>
          <li><link href="../ling/docu-disambiguation.html">Morphological
          disambiguation</link></li>

          <li><link href="docu-lookup2cg.html">lookup2cg</link>, a script to
          transform Xerox output to CG input</li>
        </ol>
      </section>

      <section>
        <title>Analysis and testing</title>

        <p>The easiest and the most effective way to do this (although a
        little scary at first) is to use commandline tools. We have made a
        <link href="docu-unix.html">short introduction</link> in English and a
        longer <link href="docu-unix-nno.html">document</link> in Norwegian on
        this topic. The <link href="docu-sme-manual.html">introduction</link>
        on how to use our parser is also an excellent introduction on how to
        combine the individual tools.</p>
      </section>

      <section>
        <title>Our home-made tools, and adjustments of public tools</title>

        <ol>
          <li><link href="../infra/docu-cgi-bin.html">The cgi-bin setup for making the
          parsers accessible on the web</link></li>

          <li><link href="paradigm_presentation.html">How the generated paradigms should be presented at web</link></li>

          <li><link href="../infra/docu-webinterface.html">The web interface to our web
          demo</link></li>

          <li><link href="docu-conversionscripts.html">Conversion
          scripts</link></li>

          <li><link href="../ling/docu-testing.html">Testing tools</link></li>

          <li><link href="docu-tools-emacs.html">Emacs for lexicon
          expansion</link></li>

          <li><link href="docu-emacs-modes.html">Special emacs
          modes</link></li>
          <li><link href="autshumato.html">Autshumato CAT platform</link></li>
        </ol>
      </section>
      <section>
        <title>Other tools</title>

        <ol>
                  <li><link href="../tools/tca2.html">tca2</link>, the corpus alignment program.</li>
                  <li><link href="salignment.html">Evaluating other sentence alignment programs</link>.</li>
        </ol>
      </section>

    </section>
  </body>
</document>
