Events

What is Nooj ?

Smart

Sophisticated linguistic engine

Versatile

More than 20 languages are known by the software

Cross-Platform

More than 150 file formats on windows, Mac and Linux.

Flexible

Made for developement an fully customizable

Presentation

NooJ is a linguistic development environment software developed by Max Silberztein. Using mathematical rules it provides linguistic tools to help customers formalize natural languages and aid in the building of software able to automatically process texts written in natural language (Natural Language Processing, or NLP).

This application of descriptive linguistics includes spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, and more.


Analysis of the televised debate between Macron and Marine le Pen on May 3, 2017

The Autor

Max Silberztein

Foundation

I constructed the first package of Finite State tools for Natural Language Processing, as well as the French DELAC-DELACF dictionaries for compound words, for my PhD research from 1986 to 1989 at the LADL (University of Paris 7-CNRS), under the supervision of Prof. Maurice Gross. The thesis was later published as: Max Silberztein, 1993. Dictionnaires électroniques et analyse automatique de textes : le système INTEX. Masson Ed.: Paris
In 1992 I started to work on INTEX. It's a linguistic development environment that includes large-coverage dictionaries and grammars, and parses texts of several million words in real time. Which is base on my PhD research. I developed it with the UTBM and the MSHE(Maison des sciences de l'homme et de l'environnement Claude-Nicolas Ledoux). But I stop the developement of INTEX in 2002.
Since 2002 I am working on Nooj which based on what I developed for my PhD and INTEX.

The Technology

NooJ runs on MS-Windows, Mac OS X, LINUX and BSD Unix.
NooJ processes texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels. All linguistic information (at any level) is represented by annotations that are stored in the Text Annotation Structure (TAS).
Annotations are typically inserted added to the TAS in cascade, without destroying the original text. Annotations can describe units inside word forms (for contracted words, e.g. "cannot" and for agglutinative languages), simple forms (e.g. "table"), multiword units (e.g. "round table") as well as discontinuous expressions (e.g. "turn ... off").
NooJ offers the four types of grammars/machines of the Chomsky hierarchy:
NooJ contains several tools to process Finite-State machines and Regular grammars.
NooJ processes Context-Free Grammars and Push Down Automata. Note that in most cases, NooJ can "flatten out" sets of recursively embedded graphs, to de-recursivate Context-Free Grammars into Regular grammars. NooJ processes Context-Sensitive Grammars in two steps: the first step is performed by a Push Down Automaton (or even a Finite-State Machine when the grammar is flattened out), the second step is performed by computing variables' value and testing the constraints of the Grammar (in O(n)).
NooJ can perform Z. Harris's transformations in cascade, giving NooJ the power of a Turing Machine. The morphological and the syntactic engines are integrated: this makes it possible to perform morphological operations on words while performing a syntactic transformation.
NooJ can process texts written in over 20 languages, including some Roman, Germanic, Slavic, Semitic and Asian languages, as well as Hungarian. All NooJ grammars/machines are compatible, i.e. one can insert parts of a Regular Grammar in a Context-Free Grammar, in a Context-Sensitive Grammar, and use them in a loop to simulate a Turing-Machine.
NooJ dictionaries are extremely simple objects and can describe orthographical and synonymous variants, inflectional as well derivational forms. NooJ includes tools to check, debug, adapt, maintain, and share dictionaries and grammars.

The Book

I made a book to provides the theoretical and methodological framework needed to create a successful linguistic project. If you are a teacher, please contact: max.silberztein@univ-fcomte.fr to get the solutions of the exercises proposed at the end of each chapters.

Errata

Page Number In the context Should Be
43 UTF uses either one 1 byte is coded 27
48 The letter s is coded 19 UTF uses either one byte
83 250,000 350,000
85 Member of the working classis not a … working class is not a
117 ..using anparser… …using a parser…
132 G3=G1 | G1= eat | eats G3=G1 | G2 = eat | eats
134 …the following language LGN …the following language LNP
149 [MES 08] [MES 08b]
149 The following grammar... The grammar shown in Figure 6.18…
150 (Grammar recognizes the text “Have evening”) (In the graph, replace node "evening" with “a nice evening”)
162 In Figure 7.2:
Main = :NP :VG :NP.
(Replace the full stop “.” character with a semi-colon “;”)
Main = :NP :VG :NP ;
163 (The first sentence on this page ends with “above:”) Replace with “which is presented in Figure 7.2.”
163 Thehighlighted nodes… (One cannot see which nodes are highlighted because of the poor contrast of the print) The NP, VG and NP nodes...
164 In figure 7.4. the graph VG recognizes the texts ‘be going’ and ‘be not going’ Replace with (only conjugated forms of the verb to be).
165 We can also produce the sequence ‘abab’...’ We can also produce the sequence 'aabb',
168 Wmissing reference [MOO 88] Moore, Robert C. (May 2000). "Removing Left Recursion from Context-Free Grammars" (PDF). In 6th Applied Natural Language Processing Conference: 249–255.
174 Missing reference [DON 13] Donnelly Ch. Stallman R. The Bison Manual. https://jdcqivvcr.updog.co/amRjcWl2dmNyMTg4MjExNDIzWA.pdf
176 …: does not mean that context-free languages are themselves ‘smaller’ than context-free languages. … does not mean that context-free languages are themselves ‘smaller’ than context-sensitive languages.

Thanks

I wish to express many thanks to my colleagues and students, as well as to all the INTEX users who have contributed to help enhance INTEX, and now NooJ, with their patience, criticisms, creative ideas and ambitious expectations.