Language
Tutorials and Linguistic Ressources
Serbian module
The Serbian NooJ module (SrpNooJ) was produced in the scope of the EU- funded CESAR project. It consists of a set of resources that use both the Cyrillic and the Latin alphabets. Each subset (Cyrillic and Latin) consists of:
-- the dictionary properties’ definition file (metadata).
File:
* sr\Lexical Analysis\SrpNooj_properties.def
-- the sample text – a novel “Dva carstva” (Two empires) from a Serbian author Branimir Ćosić comprising of 106684 tokens.
Files:
* sr\Projects\cirDvaCarstva.txt (Serbian Cyrillic, plain text)
* sr\Projects\cirDvaCarstva.not (Serbian Cyrillic, NooJ Annotated Text)
* sr\Projects\latDvaCarstva.txt (Serbian Latin, plain text)
* sr\Projects\latDvaCarstva.not (Serbian Latin, NooJ Annotated Text)
-- the sample dictionary in readable form with 35 lemma that belong to 9 grammatical classes, with examples of multiword units and derivational morphology.
Files:
* sr\Lexical Analysis\Serbian_Sample_cir.dic (Serbian Cyrillic)
* sr\Lexical Analysis\Serbian_Sample_lat.dic (Serbian Latin)
-- the sample of morphological grammars (used in the sample dictionary) – three for simple nouns, two for adjectives, two for verbs, and one for a multiunit noun.
Files:
* sr\Lexical Analysis\Serbian_Sample_cir.nof (Serbian Cyrillic)
* sr\Lexical Analysis\Serbian_Sample_lat.nof (Serbian Latin)
-- the full compiled dictionary (divided in three files: nouns, verbs, and other).
It comprises of 85868 entries: nouns (40886), adjectives (25558), verbs (15366), and other (4058).
* sr\Lexical Analysis\cirNooJDict-gl-reduced.nod (Serbian Cyrillic, verbs)
* sr\Lexical Analysis\cirNooJDict-im-reduced.nod (Serbian Cyrillic, nouns)
* sr\Lexical Analysis\cirNooJDict-os-reduced.nod (Serbian Cyrillic, other)
* sr\Lexical Analysis\latNooJDict-gl-reduced.nod (Serbian Latin, verbs)
* sr\Lexical Analysis\latNooJDict-im-reduced.nod (Serbian Latin, nouns)
* sr\Lexical Analysis\latNooJDict-os-reduced.nod (Serbian Latin, other)
-- the syntactic grammar for recognition of one class of named entities – full personal names with their roles or functions.
Files:
* sr\Syntactic Analysis\Im_Prez_FM_all_Cir.nog (Serbian Cyrillic)
* sr\Syntactic Analysis\Im_Prez_FM_all_Lat.nog (Serbian Latin)
Contact Persons:
Cvetana Krstev (cvetana at matf bg ac rs)
Duško Vitas (vitas at matf bg ac rs)
Download attachments: sr.zip