
What is Nooj ?


Sophisticated linguistic engine


More than 20 languages are known by the software


More than 150 file formats on windows, Mac and Linux.


Made for developement an fully customizable


NooJ is a linguistic development environment software developed by Max Silberztein. Using mathematical rules it provides linguistic tools to help customers formalize natural languages and aid in the building of software able to automatically process texts written in natural language (Natural Language Processing, or NLP).

This application of descriptive linguistics includes spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, and more.

Tutorials and Linguistic Ressources

Arabic module

One full dictionary named EL-DICAR (ELectronic DICtionary for ARabic) including more than 52000 lexical entries:
1/ 19504 nouns (N)
2/ 10375 verbs (V)
3/ 5816 adjectives (ADJ)
4/ 1236 particles (PREP, ADV, REL, DEM)
5/ 3686 loclizations (N+LOC)
6/ 11860 First names (N+Prenom)

-- one sample dictionary (_Example.dic) with its corresponding inflectional grammar(_Example.nof)
-- a file _properties.def that includes a listing of the dictionary codes, inflections, morphology and derivations
-- three morphological grammars:
1/ REQUIRED : Graph_Morpho.nom : a tokenization grammar to identify and annotate morphemes in agglutinated forms (this grammar has to be checked in Info > Preferences)
2/ OPTIONAL : Graph_Morpho_AlifToHamza.nom : a spelling correction grammar that converts Alif to Hamza (if used, this grammar should be associated with a low level priority)
3/ OPTIONAL : Graph_Morpho_HamzaToAlif.nom : a spelling correction grammar that converts Hamza to Alif (if used, this grammar has to be associated with a low level priority)
-- one syntactic grammar: Graph_Timex.nog : a local grammar to recognize temporal expressions (date, hour, age, period, ...)
-- two texts :
1/ declaration_rights.not : The Universal Declaration of the Rights of Man and of the Citizen
2/ example.not : a collection of samples from newspapers that include temporal expressions

