Direkt zu

 


Le Nouveau Corpus d'Amsterdam

The original corpus: The Amsterdam Corpus of Old French Literary Texts was compiled at the beginning of the 1980s by a group of scholars directed by Anthonij Dees and resulted in the Atlas des formes linguistiques des textes littéraires de l'ancien français (1987). The electronic version of the texts was provided by Piet van Reenen (Free University of Amsterdam). It contains about 200 different texts, written between the beginning of the 11th and the end of the 14th century, some of them in several versions, which adds to a total of almost 300 text samples with more than three million words (tokens).
These forms had been manually annotated by Dees' team with a set of 225 numeric tags encoding part of speech and other morphological categories (e.g. "566" for verb, futur tense, 3rd person, plural). Some of the texts are electronic versions of existing editions (e.g. the Miracles de Notre Dame de Chartres by Jean le Marchant, edited by P. Kunstmann, Chartres/Ottawa, 1973), others are transcriptions of manuscripts made especially for this corpus. The original texts were not lemmatized. They are nevertheless a precious resource which enabled us to extract a lexicon of more than 130.000 Old French inflected forms and to train the part of speech tagger.

"Le Nouveau Corpus d'Amsterdam" (NCA): The new version of the corpus edited (revised, lemmatized, XML-formatted) by Pierre Kunstmann and Achim Stein has been presented at the Lauterbad Workshop in February 2006 (see Kunstmann/Stein 2007 below). The corpus, the lexical resources used for the annotation and the documentation are available free of charge for non-commercial, non-profit research purposes for registered users who have sent the signed license agreement (PDF) to this address: Prof. Dr. Achim Stein, Institut für Linguistik/Romanistik, Universität Stuttgart, Keplerstraße 17, D-70174 Stuttgart, Germany.

History:

  • 19.03.2011: v3 installed, with updated bibliography
  • 22.03.2010: v2 is also provided for TIGERSearch
  • 2008: v2 for download, queries online (with TWIC and TWICweb)
  • 2006: v1 for download

Please quote the corpus versions as indicated below. If you refer specifically to the bibliographical information, please refer to:

[Glessgen/Vachon:2010] Gleßgen, Martin-Dietrich & Vachon, Claire (2010): Répertoire bibliographique du Nouveau Corpus d'Amsterdam, établi par Anthonij Dees et Piet Van Reenen (Amsterdam 1987), revu et élargi par M.-D.G. et C.V., 3. ed., Stuttgart: Institut für Linguistik/Romanistik. 

For more Information see:

[Stein 2010] Stein (2010): Outils et méthodes pour l'annotation des textes médiévaux. (Slides of a talk given at the Sorbonne, Paris, March 2010). [PDF]

[Glessgen/Vachon:to-appear] Gleßgen, Martin-Dietrich & Vachon, Claire (to appear): "L'étude philologique et scriptologique du Nouveau Corpus d'Amsterdam" - Casanova, Emili / Calvo, Cesáreo (éds.): Actes du XXVI CILPR, València 6-11 septembre 2010, Berlin: De Gruyter. [PDF]

[Glessgen/Gouvert:2007] Gleßgen, Martin-Dietrich & Gouvert, Xavier (2007): "La base textuelle du Nouveau Corpus d'Amsterdam: ancrage diasystématique et évaluation philologique" - Kunstmann, Pierre & Stein, Achim (ed.): Le Nouveau Corpus d'Amsterdam. Actes de l'atelier de Lauterbad, 23-26 février 2006, Stuttgart: Steiner, 51-84.

[Kunstmann/Stein:2007a] Kunstmann, Pierre & Stein, Achim (2007): "Le Nouveau Corpus d'Amsterdam" - Kunstmann, Pierre & Stein, Achim (ed.): Le Nouveau Corpus d'Amsterdam. Actes de l'atelier de Lauterbad, 23-26 février 2006, Stuttgart: Steiner, 9-27 (ISBN 978-3-515-08997-5).


For registered NCA users

Most of the links in the following section will require a user name and password. Access is free, but requires a license agreement (PDF).


Version 3.0 (2011)
 

1. TWIC Online Search for the NCA:

Open TWIC in a new window.

2. Download the NCA corpus with TIGERSearch for local installation

TIGERSearch is much easier to install than TWIC. It provides a graphical user interface and is available for Windows, Mac OS X, Linux and Solaris. Please follow these steps:

  • On a Windows computer:
    • Download TIGERSearch including the NCA corpus: tigersearch-win-nca3.zip (ca 88MB)
    • Unpack the downloaded zip file: it contains a folder "TIGERSearch"
    • Copy this folder into top folder of your system, the C:\ drive.
    • In the folder "C:\TIGERSearch", go to the subfolder "bin" and click on "TIGERSearch.exe" program file (not on the TIGERSearch icon, even if looks nicer). You may want to create shortcuts for this program file in your start menu or on your desktop.
  • On a Mac:
    • Download TIGERSearch including the NCA corpus: tigersearch-mac-nca3.dmg (ca 68MB)
    • In your Home directory, create a folder "Applications"  (exactly this name, uppercase A)
    • Open the downloaded dmg file: it contains a folder "TIGERSearch"
    • Copy this folder into the "Applications" folder you just created (you can now delete the dmg file).
    • In the "TIGERSearch" folder, go to the subfolder "lib" and click on "runTS.command".
  • If TIGERSearch is already installed on your computer:
    • Download the corpus only: nca3-for-tiger.zip (ca 41MB)
    • Unpack the downloaded zip file: it contains a folder "NCA3"
    • Copy this folder into the "TIGERCorpora" folder of your TIGERSearch installation. Start TIGERSearch, and the corpus will appear.

TIGERSearch will start up. You will find a quick start guide for using TIGERSearch with the NCA on my homepage, in the section Resources (or use this direct link). TIGERSearch includes a help function and a pdf manual (see section III for a general description of the query language).

3. Download the NCA corpus with TWIC for local installation

Download this ZIP Archive to install TWIC on your computer (Windows, Linux, Mac OS X). It includes the TWIC Perl programme, documentation for the different operating systems, and a sample corpus taken from the NCA. Open the ZIP Archive and read the included PDF document "TWIC installation".

Once TWIC is installed, you can replace the sample corpus by the entire NCA corpus (download nca3.xml.gz , 2,5 MB). You can install and search your own corpora: read the section about the configuration file.

4. Documentation and Bibliography for this version

Please follow the link for the online query above. On the query form, click on "corpus information window", where you will find links to the bibliography in various formats.

Please refer to this version as:

Stein, Achim et al. (ed.): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littéraires d'ancien français (ca 1150-1350), établi par Anthonij Dees (Amsterdam 1987), remanié par Achim Stein, Pierre Kunstmann et Martin-D. Gleßgen, Stuttgart: Institut für Linguistik/Romanistik, version 3, 2011.

Changes:

  • The bibliography has been revised considerably (by the Zurich group: Martin-D. Gleßgen and Claire Vachon).

Version 2.0 (2008, updated 2010)
 

1. TWIC Online Search for the NCA:

Open TWIC in a new window.

2. Download the NCA corpus with TWIC for local installation

Download this ZIP Archive to install TWIC on your computer (Windows, Linux, Mac OS X). It includes the TWIC Perl programme, documentation for the different operating systems, and a sample corpus taken from the NCA. Open the ZIP Archive and read the included PDF document "TWIC installation".

Once TWIC is installed, you can replace the sample corpus by the entire NCA corpus (download nca2.xml.gz , 2,5 MB). You can install and search your own corpora: read the section about the configuration file.

3. Download the NCA corpus with TIGERSearch for local installation

Note that TIGERSearch is probably easier to install than TWIC (since it does not require the installation of a web server). It provides a graphical user interface and is available for Windows, Mac OS X, Linux and Solaris. Please follow these steps:

  1. Download TIGERSearch from the TIGERSearch Download page (IMS, Stuttgart)
  2. Download one of the corpus files in TIGER-XML format:
  3. Follow the instructions in the document The NCA for TIGERSearch (PDF)

4. Documentation and Bibliography for this version

Please refer to this version as:

Stein, Achim et al. (ed.): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littéraires d'ancien français (ca 1150-1350), établi par Anthonij Dees (Amsterdam 1987), remanié par Achim Stein, Pierre Kunstmann et Martin-D. Gleßgen, Stuttgart: Institut für Linguistik/Romanistik, version 2, 2008.

Changes:

  • The Text La passion des jongleurs (id=jong) was updated: in version 1, the last word of each line was missing. (Thanks to Yves-Charles Morin for signalling this error).
  • The bibliography has been revised considerably (by the Zurich group: Martin-D. Gleßgen and his staff). The first entry of the bibliography (see links above) is a "comment entry" which briefly explains the meaning of the descriptors. These descriptors are values of the XML element "subcorpus" (using TWIC, you can therefore restrict your search to texts corresponding to these values, e.g. date ranges, regions, quality of the manuscript etc.).

Note that the bibliography is still work in progress. Updates will be published here.


Version 1.0 (2006)

Download the orginal distribution of the corpus:

The first version (1.0) of the corpus has been presented on a CD-Rom at the Lauterbad Workshop, February 2006. To reproduce it,

  1. create a directory on your local disk drive, e.g. "nca"
  2. download the files 00readme.txt, 00license.txt to "nca" (see below)
  3. download the following zip archives to "nca"
    • twic.zip, 27MB, (new corpus, TWIC search tool)
    • sofa.zip, 10MB, (documentation, original corpus, frequency lists...)
    • perl.zip, 13MB, (Active State Perl for Windows, required if you use TWIC, also available at www.activestate.com)
    • xaira.zip, 382MB, (corpus formatted for Xaira, Xaira for Windows, not required if you use TWIC)
    • tagger.zip, 4MB, (TreeTagger, parameters for Old French)
  4. unpack the archives (preserve the directory structure)
  5. follow the Installation Guide in 00readme.txt

Browse the documentation online:

The SOFA directory (Sources et Outils pour le français ancien): documentation, material and resources for the Nouveau Corpus d'Amsterdam.

Quote this version as:

Stein, Achim et al. (ed.): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littéraires d'ancien français (ca 1150-1350), établi par Anthonij Dees (Amsterdam 1987), remanié par Achim Stein, Pierre Kunstmann et Martin-D. Gleßgen, Stuttgart: Institut für Linguistik/Romanistik, 2006.


Les chartes de l'Aube

  • Printed version: Pieter van Reenen, avec le concours de Evert Wattel et Margôt van Mulken: Champagne 1270-1300, Chartes en langue française conservées aux Archives de l'Aube, Orléans: Paradigme 2006.
  • Electronic version, provided by Piet van Reenen with the permission of the publisher: Zip archive, 145k