This document describes the morphology module of the Lexicon Model for Ontologies as a result of the work of the Ontology Lexicon community group (OntoLex-Morph). The module is targeted at the representation of linguistic morphology in dictionaries and and other linguistic resources, as well as the formalization of rules for word formation and inflection as employed in computational morphology and grammatical appendices as frequently provided as part of bilingual dictionaries.
This module operates in combination with the lemon core module and extends it with support for two distinct views on linguistic morphology:
OntoLex-Morph allows to enrich lexical entries and individual forms with information about the morphological units that they consist of (descriptive morphology). This improves the capability of OntoLex-Lemon to encode, preserve and document the structure of morphologically complex forms or lexical entries.
OntoLex-Morph allows to formalize morphological rules that can be used to produce complex lexical entries and inflected forms from their component morphs, resp., their base forms (generative morphology). This allows to extend OntoLex-Lemon resources with a framework that describes how to produce and analyze complex lexical entries or inflected forms.
OntoLex-Morph has been designed with the premise to make OntoLex-lemon applicable to morphologically rich languages of any type, supporting both fusional and agglutinating morphology, and thereby contributing to a truly multilingual web.
The RDF file with the OntoLex lemon lexicography module can be found at http://www.w3.org/ns/lemon/morphThis document is an official report of the OntoLex community group. It does not represent the view of single individuals but reflects the consensus and agreement reached as part of the regular group discussions. The report should be regarded as the official specification of lemon.
If you wish to make comments regarding this document, please send them to public-ontolex@w3.org (subscribe, archives).
Morphology is a vital and, in many languages, very sophisticated part of language, and as such it has been an important part of the work of lexicographers. In the traditional print form, morphological information is provided in brief abbreviated terms that can only be deciphered with significant knowledge of the language, however with the transformation of the dictionary to an electronic resource a re-imagining of the morphology information in a dictionary is certainly due.
The morphology module aims at fulfilling two modelling purposes:
Morphological decomposition on the lexical entry level.
scope: The kind of elements of which a lexical entry can consist should be as non-restrictive as possible. I.e. The decomposition of lexical entries encompass lexical entries, components, derivational affixes, inflectional affixes, stems, roots and zero morphs. However, a lexical entry can NEVER be composed of a form!
Morphological decomposition on the form level.
scope: Elements of which a form can consist include roots, stems, inflectional affixes and zero morphs.
A fine-grained description of phonological and morphophonological processes that are involved in any kind of stem or word formation on the phoneme level is excluded and not representable with this Morphology Module. Only the elements between the lexical entry and the morph levels will be covered. It is possible, however, that such information may be addressed in future OntoLex modules.
The OntoLex-Morph module aims to be adequate for both traditional dictionary content (which contains only abbreviated information about morphological rules and paradigms, often organized in appendices) and structured computational data (morphological dictionaries) as used in Language Technology, with the goal of making resources from one community more accessible to the other.
OntoLex-Morph is designed to account for
OntoLex-Morph was intended for (but is not limited to) the following primary use cases:
At its core, OntoLex-Morph operates with three main classes:
They are related with each other and with OntoLex in the following way:
ontolex:Form
that results from the application of these to a base form can have a grammatical meaning.Individual morphological processes (derivation, compounding, inflection) and their relation to lexical entries and forms are represented by designated subclasses of ontolex:Rule
as described below.
Limitations: OntoLex-Morph is designed with a focus on deep morphology. Morphophonological rules can be modelled with OntoLex-Morph to a certain extent, but we expect phenomena such as assimilation, dissimilation and morphological “Level-2” rules to be more adequately handled by a separate vocabulary specialized in surface generation (transcription, text-to-speech, morphophonology).
Morph (class)
URI: http://www.w3.org/ns/lemon/morph#Morph
Class morph:Morph is a subclass of ontolex:LexicalEntry that represents any element of morphological analysis below the word level.
lexinfo:termElement
(for what?)ontolex:Affix
is defined as a subclass of morph:Morph
.consistsOf (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#consistsOf
Property morph:consistsOf states into which Morph resources a Form resource can be segmented.
Domain: ontolex:Form
Range: morph:Morph
We still have no way to encode the order of morphemes. We can model forms and morphs as an aggregate (here: rdf:List
?).
GrammaticalMeaning (Class)
URI: http://www.w3.org/ns/lemon/morph#GrammaticalMeaning
morph:GrammaticalMeaning can be used to represent (bundles of) values of different morpho-syntactic or morpho-semantic features expressed by a form, morph or rule (e.g., value ‘nominative’ for feature ‘case’, value ‘singular’ for feature ‘number’, etc.; or the feature bundle composed by the latter two values, in a fusional language where they are expressed cumulatively, e.g. Latin)
rdfs:label
grammaticalMeaning (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#grammaticalMeaning
property morph:grammaticalMeaning assigns a grammatical meaning to a morph, form, or rule
Domain: ontolex:Form or morph:Morph or morph:Rule
Range: morph:GrammaticalMeaning
For instance, the segmentation into morphs of the english plural form cats, and the assignment of grammatical meaning to the form and to the corresponding plural morph, can be expressed in this way.
morph:grammaticalMeaning lexinfo:plural
, but I don’t think this should be validIn this case we create a blank node for the grammatical meaning that corresponds to a single feature in Lexinfo. In practice, it might be better to define instances for common morphological meanings and reuse these objects.
For example, in the Latin form lupus, nominative case and singular number are expressed cumulatively by the affix -us. This is a common combination, therefore, an instance of morph:GrammaticalMeaning is introduced for that feature bundle. This time we use Lexinfo vocabulary alongside with Paralex vocabulary — even though Lexinfo is the preferred way to represent grammatical features in OntoLex, there is no restriction on this.
Discussion/History:
morph:meaning
in a comparable function, but with Morph being subclass of LexicalEntry, this role is taken over by ontolex:sense
.morph:InflectionRule
as a short-hand for morph:involves/morph:grammaticalMeaning
.morph:Rule
. For circumstances in which no explicit morph can be provided (but only a rule), e.g., because a resource comes without an explicit notion of morph(eme)s, there would not be a way to express the meaning or function of that morpheme, otherwise.morph:InflectionType
? This would be useful to express that a certain “slot” contains information of a particular kind, e.g., morphological gender or morphological number. Right now, this information is implicit (in the inflection rules assigned to a particular inflection type).baseConstraint (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#baseConstraint
morph:baseConstraint defines the grammatical characteristics of the stem or base that a derivational or inflectional morpheme can be combined with
Domain: ontolex:Morph or morph:Rule
Range: morph:GrammaticalMeaning
For example, an element for nominal inflection can only be applied to nouns, and derivational affixes can have similar constraints. Note that such information is not applicable to an ontolex:Form
because this describes only the result of the application of a rule or the addition of a particular form.
As a concrete example, the fact that the English affix -s expresses plural number if attached to nouns, and 3rd person singular agreement if attached to verbs, can be coded as follows using morph:baseConstraint.
Discussion/History: - CC 2022-10-24: by analogy with morph:grammaticalMeaning, this property should also be applicable to rules to specify necessary preconditions.
baseForm (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#baseForm
baseForm is a subproperty of ontolex:lexicalForm
that indicates the form that is taken as base for the application of inflection or derivation rules to generate other forms.
Domain: ontolex:Word
(not lexical entry!)
Range: ontolex:Form
This property is necessary in cases in which inflection or derivation relations do not take the canonical form as their basis, but a different one. One example is German verbal inflection (e.g., for gehen
“to go”), where the canonical form (gehen
, infinitive) is derived from the base form (geh-
, stem) by means of a suffix (-en
, infinitive marker), like other inflected forms (geh
, gehst
, geht
“I/you go; he/she/it goes”).
Rule (Class)
URI: http://www.w3.org/ns/lemon/morph#Rule
morph:Rule represents the formal operation applied to a base form to obtain another form (inflectionally or derivationally related to it). It must contain either morph:example
or morph:replacement
(or both). “Tabular” value of a morpheme must be stored in rdfs:label
(e.g. “-s”@en for usual PL in English). One rule applies exactly one morphological transformation, i.e. adds one Morph.
example (DatatypeProperty)
URI: http://www.w3.org/ns/lemon/morph#example
morph:example: A single form that was demonstrates a class of forms that can be generated by a single rule with no allomorphy.
Domain: morph:Rule
Range: string literal
This property allows to provide an example of a class of forms that share a morpological process. It is necessary in cases where the way the form is generated is not specified but we still want to represent a morphological transformation. This is common case for retrodigitised dictionaries.
replacement (DatatypeProperty)
URI: http://www.w3.org/ns/lemon/morph#replacement
morph:replacement states the replacement pattern that is involved in a morphological rule for the generation of a form
Domain: morph:Rule
Range: any URI, cf. in doc/wrapup/minutes-2025-06-64
This property points to an object that describe the morphological transformation required to produce a valid form according to the rule. Morph module does not limit the exact way to represent these transformations since there are many common ways to do this, therefore, there are no properties in the module to represent that. However, we provide a non-normative option — replacement with regular expressions, which will be used in the examples in the subsequent sections.
morph:source
and morph:target
, so that diacritics are separated from the base character as combining characters. This is a best practice that simplifies the writing of rules in many cases, as diacritic and base character can be manipulated independently from each other.involves (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#involves
morph:involves links a Rule to the Morph that is involved in the process.
Domain: morph:Rule
Range: morph:Morph
Note that this does not encode order.
MI: Each rule correspond to exactly one Morph, so there is no need for ordering
InflectionClass (Class)
URI: http://www.w3.org/ns/lemon/morph#InflectionClass
morph:InflectionClass represents the inflection class to which a LexicalEntry belongs/is assigned – e.g., the declension of a noun, or the conjugation of a verb.
It may contain metadata information about this type of declension.
The link between inflection classes and lexical entries is not defined in OntoLex-Morph, but modelled using ontolex:morphologicalPattern
.
inflectionClass (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#inflectionClass
morph:inflectionClass links an inflection rule to the inflection class it pertains to.
Domain: morph:InflectionRule
Range: morph:InflectionClass
In the case of fusional morphology — languages like Greek, Latin or English — there is usually only one morph attached to a form that carries information about inflection. The situation is different for languages with agglutination, where each inflectional value is represented by its own morph. In order to represent this, the model has another class.
InflectionSlot (Class)
URI: http://www.w3.org/ns/lemon/morph#InflectionSlot
morph:InflectionSlot represents a single slot that can be filled with a morph of corresponding to a grammatical category. Since one rule can introduce only one morph, inflection slots are necessary when we need to represent forms that are generated by several independent morphological processes.
For agglutinative languages like Finno-Ugric, Turkic and many more, each grammatical value that is encoded with a morph: e.g. number and case for Finnish nouns — is associated with a single slot. This way, there should be two separate rules for adding number and case to form an inflected Finnish noun form.
inflectionSlot (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#inflectionSlot
morph:inflectionSlot links an inflection rule to the slot it pertains to
Domain: morph:InflectionRule
Range: morph:InflectionSlot
In order to set the order of morphs and also simplify the process of form generation, the property morph:next
points from one InflectionSlot to the next.
next (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#next
morph:next links two consecutive inflection types (“slots”), e.g. number and case in Finnish
Domain: morph:InflectionType
Range: morph:InflectionType
InflectionRule (Class)
URI: http://www.w3.org/ns/lemon/morph#InflectionRule
morph:InflectionRule represents the formal operation applied to a base form of a LexicalEntry to obtain another inflected form of that LexicalEntry.
morph:inflectionRule provides information on how to generate inflected forms and, in case of a dataset with pre-generated forms, links these forms to InflectionRules that were used to generate them. If inflection slots were used, forms might have several rules attached to them.
Domain: ontolex:Form
Range: morph:InflectionRule
The example below illustrates the modelling of inflection classes and rules for the generation of the genitive singular of lupus in Latin.
In a fusional language like Latin, there is no need to have different inflection slots: a single inflection rule (specific for the inflection class to which the lexical entry is assigned) allows for the generation of the genitive singular form as follows:
On the other hand, in an agglutinative language like Turkish, it is useful to define separate inflection slots for each morphosyntactic feature, and separate inflection rules for each inflection slot, as illustrated in the example below.
When a software compatible with the specifications runs on this data to generate forms of the entry :adam
, it first extracts all the rules associated with the corresponding morphological pattern, namely sg_rule
, pl_rule
, and acc_rule
. Next, it establishes the order of inflection slots mentioned in the rules (by looking for the slot that is not used as an object in a morph:next
property).
Then, for the first inflection slot the correct form is chosen. If there is a morph:baseType
specified in the rule, the corresponding form is chosen. Otherwise the canonical form is used. Finally, for each inflection slot, the transformation is applied. For the first slot the initial form is used, after that, the output of one transformation is used as an input for the next.
With each transformation, all the properties in the grammatical meaning associated with the rule are copied to a newly created grammatical meaning. After all the transformations have been applied, the form is created with the constructed grammatical meaning. The initial form and the morphs are added as objects for the morph:consistsOf
statements.
It is also possible to create Morph elements during generation in case they are not present in the data.
In the case of the example above, the successive application of the two appropriate rules for accusative and plural formation – in the order established by the use of the morph:next property – allows for the generation of the accusative plural form as follows:
baseType (DatatypeProperty)
URI: http://www.w3.org/ns/lemon/morph#baseType
morph:baseType is used for coindexing a base form, an inflection rule and the forms generated by the rule from the respective base in cases in which the inflectional paradigm of a single lexical entry involves different bases, e.g., stems.
Domain: ontolex:Form or morph:InflectionRule (or morph:Rule? MP)
Range: literal
For instance, for Latin verbs, in addition to the citation form, dictionaries also record “principal parts” – i.e., a set of forms from which the full paradigm of a lexeme can be inferred. E.g., the entry for rumpo in the Lewis and Short dictionary lists the forms: - rumpo, displaying the present stem rump-, from which other forms displaying the present stem can be inferred; - rupi, displaying the perfect stem rup-, from which other forms displaying the perfect stem can be inferred; - ruptum, displaying the so-called third stem rupt-, from which other forms displaying the third stem can be inferred;
This can be modelled with ontolex-Morph as follows:
Note that the inflection rules operating on the perfect and third stem are not only connected to the inflection class of rumpo, but also other ones, as they are valid across conjugations. By applying these rules, the following forms can be generated:
MP: as it has been shown that also derivation can be based a form different than the canonical one (e.g. Latin deverbal conversions from the Third Stem, like capio (Third Stem capt-) > capt-o), shouldn’t this hold also for WordFormationRule?
For an inflection rule with morph:baseType
defined: If the lexical entry to which it is applied features a(n object of) morph:baseForm
or (if these are not defined) a ontolex:canonicalForm
with identital morph:baseType
, apply the rule to this form, only. For a (generated) form, morph:baseType
can be used to indicate from which form or with which rule it was generated. morph:baseType
can also be used to mark stem classes in reseources for which no explicit inflection rules are given.
This was introduced for modelling stem alternations. In this definition, we assume that we have one lexical entry for each stem variant, so that an inflection rule whose baseType doesn’t match of its lexical entry doesn’t fire.
WordFormationRule (Class)
URI: http://www.w3.org/ns/lemon/morph#WordFormationRule
morph:WordFormationRule represents the formal operation applied to a base form of a source LexicalEntry to obtain another, target LexicalEntry .
It describes the general pattern how words are being formed.For the analysis of a specific compound or derivation, use morph:WordFormationRelation
.
Note: updated according to telco April 21, 2022.
generates (ObjectProperty)
URI: http://www.w3.org/ns/lemon/morph#generates
morph:generates connects a word formation rule to the lexical entries that are generated from it
Domain: morph:WordFormationRule
Range: ontolex:LexicalEntry
MP: given the parallelism between the inflection and derivation subcomponents of the generation component, I would expect InflectionRule to generate something too – namely, ontolex:Forms. Should we change the domain and range accordingly?
subclasses CompoundRule and DerivationRule. Normally, a derivation rule will involve one specific morpheme or one allomorphic variant [MP: but what about parasynthesis?]. A compound rule can involve an interfix or another morphophonological process.
DerivationRule (Class)
URI: http://www.w3.org/ns/lemon/morph#DerivationRule
morph:DerivationRule refers to rules that take one LexicalEntry as input and generate another LexicalEntry as output through the addition of one [or possibly more than one] derivational affix.
morph:CompoundingRule refers to rules that take more than one LexicalEntry as input to generate the output LexicalEntry.
WordFormationRelation (Class)
URI: http://www.w3.org/ns/lemon/morph#WordFormationRelation
morph:WordFormationRelation is a subclass of vartrans:LexicalRelation
that relates two lexical entries that are derivationally related, with the vartrans:target
representing the resulting lexical entry, and the vartrans:source
representing the morphological base (in derivation) or head and other constituents (in compounding).
morph:wordFormationRule relates a word formation relation to the word formation rule that is applied to the source lexical entry in order to obtain the target lexical entry.
Domain: morph:WordFormationRelation
Range: morph: WordFormationRule
Accordingly, the morphological derivation of German Schönheit “beauty” can be encoded as:
CompoundRelation (Class)
URI: http://www.w3.org/ns/lemon/morph#CompoundRelation
morph:CompoundRelation is a morph:WordFormationRelation
that connects a (lexical entry representing a) morphological consituent of a compound with the (lexical entry representing the) compound. This is a reification of decomp:subTerm
: A compound relation entails that the constituent is a subterm of the compound.
TODO: text describing compound head
CompoundHead (Class)
URI: http://www.w3.org/ns/lemon/morph#CompoundHead
morph:CompoundHead is a morph:WordFormationRelation
that connects the (lexical entry representing the) morphological head of a compound with the (lexical entry representing the) compound.
These are questions we decided to postpone until finalization of the module. Don’t use that for on-going discussions, that’s what minutes are for.
morph:Morph
, this could be abused in unforeseen ways