
What is Nooj ?


Sophisticated linguistic engine


More than 20 languages are known by the software


More than 150 file formats on windows, Mac and Linux.


Made for developement an fully customizable


NooJ is a linguistic development environment software developed by Max Silberztein. Using mathematical rules it provides linguistic tools to help customers formalize natural languages and aid in the building of software able to automatically process texts written in natural language (Natural Language Processing, or NLP).

This application of descriptive linguistics includes spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, and more.

Analysis of the televised debate between Macron and Marine le Pen on May 3, 2017


Tutorials and Linguistic Ressources

Serbian module

The Serbian NooJ module (SrpNooJ) was produced in the scope of the EU- funded CESAR project. It consists of a set of resources that use both the Cyrillic and the Latin alphabets. Each subset (Cyrillic and Latin) consists of:

-- the dictionary properties’ definition file (metadata).

* sr\Lexical Analysis\SrpNooj_properties.def
-- the sample text – a novel “Dva carstva” (Two empires) from a Serbian author Branimir Ćosić comprising of 106684 tokens.

* sr\Projects\cirDvaCarstva.txt (Serbian Cyrillic, plain text)
* sr\Projects\cirDvaCarstva.not (Serbian Cyrillic, NooJ Annotated Text)
* sr\Projects\latDvaCarstva.txt (Serbian Latin, plain text)
* sr\Projects\latDvaCarstva.not (Serbian Latin, NooJ Annotated Text)
-- the sample dictionary in readable form with 35 lemma that belong to 9 grammatical classes, with examples of multiword units and derivational morphology.

* sr\Lexical Analysis\Serbian_Sample_cir.dic (Serbian Cyrillic)
* sr\Lexical Analysis\Serbian_Sample_lat.dic (Serbian Latin)
-- the sample of morphological grammars (used in the sample dictionary) – three for simple nouns, two for adjectives, two for verbs, and one for a multiunit noun.

* sr\Lexical Analysis\Serbian_Sample_cir.nof (Serbian Cyrillic)
* sr\Lexical Analysis\Serbian_Sample_lat.nof (Serbian Latin)
-- the full compiled dictionary (divided in three files: nouns, verbs, and other).
It comprises of 85868 entries: nouns (40886), adjectives (25558), verbs (15366), and other (4058).
* sr\Lexical Analysis\cirNooJDict-gl-reduced.nod (Serbian Cyrillic, verbs)
* sr\Lexical Analysis\cirNooJDict-im-reduced.nod (Serbian Cyrillic, nouns)
* sr\Lexical Analysis\cirNooJDict-os-reduced.nod (Serbian Cyrillic, other)
* sr\Lexical Analysis\latNooJDict-gl-reduced.nod (Serbian Latin, verbs)
* sr\Lexical Analysis\latNooJDict-im-reduced.nod (Serbian Latin, nouns)
* sr\Lexical Analysis\latNooJDict-os-reduced.nod (Serbian Latin, other)
-- the syntactic grammar for recognition of one class of named entities – full personal names with their roles or functions.

* sr\Syntactic Analysis\Im_Prez_FM_all_Cir.nog (Serbian Cyrillic)
* sr\Syntactic Analysis\Im_Prez_FM_all_Lat.nog (Serbian Latin)

Contact Persons:
Cvetana Krstev (cvetana at matf bg ac rs)
Duško Vitas (vitas at matf bg ac rs)

Download attachments: