This document describes the morphology module of the Lexicon Model for Ontologies as a result of the work of the Ontology Lexicon community group (OntoLex-Morph). The module is targeted at the representation of linguistic morphology in dictionaries and and other linguistic resources, as well as the formalization of rules for word formation and inflection as employed in computational morphology and grammatical appendices as frequently provided as part of bilingual dictionaries.

This module operates in combination with the lemon core module and extends it with support for two distinct views on linguistic morphology:

  1. OntoLex-Morph allows to enrich lexical entries and individual forms with information about the morphological units that they consist of (descriptive morphology). This improves the capability of OntoLex-Lemon to encode, preserve and document the structure of morphologically complex forms or lexical entries.

  2. OntoLex-Morph allows to formalize morphological rules that can be used to produce complex lexical entries and inflected forms from their component morphs, resp., their base forms (generative morphology). This allows to extend OntoLex-Lemon resources with a framework that describes how to produce and analyze complex lexical entries or inflected forms.

OntoLex-Morph has been designed with the premise to make OntoLex-lemon applicable to morphologically rich languages of any type, supporting both fusional and agglutinating morphology, and thereby contributing to a truly multilingual web.

The RDF file with the OntoLex lemon lexicography module can be found at http://www.w3.org/ns/lemon/morph

This document is an official report of the OntoLex community group. It does not represent the view of single individuals but reflects the consensus and agreement reached as part of the regular group discussions. The report should be regarded as the official specification of lemon.

If you wish to make comments regarding this document, please send them to public-ontolex@w3.org (subscribe, archives).

Introduction

Morphology is a vital and, in many languages, very sophisticated part of language, and as such it has been an important part of the work of lexicographers. In the traditional print form, morphological information is provided in brief abbreviated terms that can only be deciphered with significant knowledge of the language, however with the transformation of the dictionary to an electronic resource a re-imagining of the morphology information in a dictionary is certainly due.

The morphology module aims at fulfilling two modelling purposes:

  1. Stating elements that are involved in the decomposition of lexical entries and forms.
  1. Enabling the representation of building patterns that are involved in the formation of lexical entries and forms.

A fine-grained description of phonological and morphophonological processes that are involved in any kind of stem or word formation on the phoneme level is excluded and not representable with this Morphology Module. Only the elements between the lexical entry and the morph levels will be covered. It is possible, however, that such information may be addressed in future OntoLex modules.

The OntoLex-Morph module aims to be adequate for both traditional dictionary content (which contains only abbreviated information about morphological rules and paradigms, often organized in appendices) and structured computational data (morphological dictionaries) as used in Language Technology, with the goal of making resources from one community more accessible to the other.

Overall structure

OntoLex-Morph is designed to account for

OntoLex-Morph was intended for (but is not limited to) the following primary use cases:

At its core, OntoLex-Morph operates with three main classes:

They are related with each other and with OntoLex in the following way:

Individual morphological processes (derivation, compounding, inflection) and their relation to lexical entries and forms are represented by designated subclasses of ontolex:Rule as described below.

Limitations: OntoLex-Morph is designed with a focus on deep morphology. Morphophonological rules can be modelled with OntoLex-Morph to a certain extent, but we expect phenomena such as assimilation, dissimilation and morphological “Level-2” rules to be more adequately handled by a separate vocabulary specialized in surface generation (transcription, text-to-speech, morphophonology).

Morphological Segments

Morphs

Morph (class)

URI: http://www.w3.org/ns/lemon/morph#Morph

Class morph:Morph is a subclass of ontolex:LexicalEntry that represents any element of morphological analysis below the word level.

  • can carry lexinfo:termElement (for what?)
  • can consist of other morphs [MP: not in the last version of the diagram; is that intended?] [MI: true, this is no longer the case, but LexicalForms can be, using decomp, so I think we cannot restrict it]
  • the model is agnostic as to whether this represents a morpheme or one of its allomorphs, but as a lexical entry
  • grammaticalMeaning: glossing information associated with the morph
  • baseConstraint: (for affixes) contraints on the elements that this morph can be applied to
  • ontolex:Affix is defined as a subclass of morph:Morph.
  • other types of morph (roots, stems, transfix, etc.) are not defined in the module, but should be defined in Lexinfo

consistsOf (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#consistsOf

Property morph:consistsOf states into which Morph resources a Form resource can be segmented.

Domain: ontolex:Form

Range: morph:Morph

We still have no way to encode the order of morphemes. We can model forms and morphs as an aggregate (here: rdf:List ?).

Grammatical Meanings

GrammaticalMeaning (Class)

URI: http://www.w3.org/ns/lemon/morph#GrammaticalMeaning

morph:GrammaticalMeaning can be used to represent (bundles of) values of different morpho-syntactic or morpho-semantic features expressed by a form, morph or rule (e.g., value ‘nominative’ for feature ‘case’, value ‘singular’ for feature ‘number’, etc.; or the feature bundle composed by the latter two values, in a fusional language where they are expressed cumulatively, e.g. Latin)

  • should use lexinfo resources or instances with rdfs:label
  • can represent either an individual feature or a feature bundle

grammaticalMeaning (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#grammaticalMeaning

property morph:grammaticalMeaning assigns a grammatical meaning to a morph, form, or rule

Domain: ontolex:Form or morph:Morph or morph:Rule

Range: morph:GrammaticalMeaning

For instance, the segmentation into morphs of the english plural form cats, and the assignment of grammatical meaning to the form and to the corresponding plural morph, can be expressed in this way.

MI: This was morph:grammaticalMeaning lexinfo:plural, but I don’t think this should be valid

In this case we create a blank node for the grammatical meaning that corresponds to a single feature in Lexinfo. In practice, it might be better to define instances for common morphological meanings and reuse these objects.

For example, in the Latin form lupus, nominative case and singular number are expressed cumulatively by the affix -us. This is a common combination, therefore, an instance of morph:GrammaticalMeaning is introduced for that feature bundle. This time we use Lexinfo vocabulary alongside with Paralex vocabulary — even though Lexinfo is the preferred way to represent grammatical features in OntoLex, there is no restriction on this.

MI: I changed this part a bit to use lexinfo first and only then paralex

Discussion/History:

baseConstraint (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#baseConstraint

morph:baseConstraint defines the grammatical characteristics of the stem or base that a derivational or inflectional morpheme can be combined with

Domain: ontolex:Morph or morph:Rule

Range: morph:GrammaticalMeaning

For example, an element for nominal inflection can only be applied to nouns, and derivational affixes can have similar constraints. Note that such information is not applicable to an ontolex:Form because this describes only the result of the application of a rule or the addition of a particular form.

As a concrete example, the fact that the English affix -s expresses plural number if attached to nouns, and 3rd person singular agreement if attached to verbs, can be coded as follows using morph:baseConstraint.

Discussion/History: - CC 2022-10-24: by analogy with morph:grammaticalMeaning, this property should also be applicable to rules to specify necessary preconditions.

Base Forms

baseForm (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#baseForm

baseForm is a subproperty of ontolex:lexicalForm that indicates the form that is taken as base for the application of inflection or derivation rules to generate other forms.

Domain: ontolex:Word (not lexical entry!)

Range: ontolex:Form

This property is necessary in cases in which inflection or derivation relations do not take the canonical form as their basis, but a different one. One example is German verbal inflection (e.g., for gehen “to go”), where the canonical form (gehen, infinitive) is derived from the base form (geh-, stem) by means of a suffix (-en, infinitive marker), like other inflected forms (geh, gehst, geht “I/you go; he/she/it goes”).

Morphological Rules

Rule (Class)

URI: http://www.w3.org/ns/lemon/morph#Rule

morph:Rule represents the formal operation applied to a base form to obtain another form (inflectionally or derivationally related to it). It must contain either morph:example or morph:replacement (or both). “Tabular” value of a morpheme must be stored in rdfs:label (e.g. “-s”@en for usual PL in English). One rule applies exactly one morphological transformation, i.e. adds one Morph.

Examples

example (DatatypeProperty)

URI: http://www.w3.org/ns/lemon/morph#example

morph:example: A single form that was demonstrates a class of forms that can be generated by a single rule with no allomorphy.

Domain: morph:Rule

Range: string literal

This property allows to provide an example of a class of forms that share a morpological process. It is necessary in cases where the way the form is generated is not specified but we still want to represent a morphological transformation. This is common case for retrodigitised dictionaries.

Replacement

replacement (DatatypeProperty)

URI: http://www.w3.org/ns/lemon/morph#replacement

morph:replacement states the replacement pattern that is involved in a morphological rule for the generation of a form

Domain: morph:Rule

Range: any URI, cf. in doc/wrapup/minutes-2025-06-64

This property points to an object that describe the morphological transformation required to produce a valid form according to the rule. Morph module does not limit the exact way to represent these transformations since there are many common ways to do this, therefore, there are no properties in the module to represent that. However, we provide a non-normative option — replacement with regular expressions, which will be used in the examples in the subsequent sections.

Unless specified otherwise (in the documentation of a resource), implementations SHOULD provide NFD-normalized Unicode strings for morph:source and morph:target, so that diacritics are separated from the base character as combining characters. This is a best practice that simplifies the writing of rules in many cases, as diacritic and base character can be manipulated independently from each other.

Involves

involves (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#involves

morph:involves links a Rule to the Morph that is involved in the process.

Domain: morph:Rule

Range: morph:Morph

Note that this does not encode order.
MI: Each rule correspond to exactly one Morph, so there is no need for ordering

Inflection

InflectionClass (Class)

URI: http://www.w3.org/ns/lemon/morph#InflectionClass

morph:InflectionClass represents the inflection class to which a LexicalEntry belongs/is assigned – e.g., the declension of a noun, or the conjugation of a verb.

It may contain metadata information about this type of declension.

The link between inflection classes and lexical entries is not defined in OntoLex-Morph, but modelled using ontolex:morphologicalPattern.

inflectionClass (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#inflectionClass

morph:inflectionClass links an inflection rule to the inflection class it pertains to.

Domain: morph:InflectionRule

Range: morph:InflectionClass

In the case of fusional morphology — languages like Greek, Latin or English — there is usually only one morph attached to a form that carries information about inflection. The situation is different for languages with agglutination, where each inflectional value is represented by its own morph. In order to represent this, the model has another class.

InflectionSlot (Class)

URI: http://www.w3.org/ns/lemon/morph#InflectionSlot

morph:InflectionSlot represents a single slot that can be filled with a morph of corresponding to a grammatical category. Since one rule can introduce only one morph, inflection slots are necessary when we need to represent forms that are generated by several independent morphological processes.

For agglutinative languages like Finno-Ugric, Turkic and many more, each grammatical value that is encoded with a morph: e.g. number and case for Finnish nouns — is associated with a single slot. This way, there should be two separate rules for adding number and case to form an inflected Finnish noun form.

inflectionSlot (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#inflectionSlot

morph:inflectionSlot links an inflection rule to the slot it pertains to

Domain: morph:InflectionRule

Range: morph:InflectionSlot

In order to set the order of morphs and also simplify the process of form generation, the property morph:next points from one InflectionSlot to the next.

next (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#next

morph:next links two consecutive inflection types (“slots”), e.g. number and case in Finnish

Domain: morph:InflectionType

Range: morph:InflectionType

Inflection Rules

InflectionRule (Class)

URI: http://www.w3.org/ns/lemon/morph#InflectionRule

morph:InflectionRule represents the formal operation applied to a base form of a LexicalEntry to obtain another inflected form of that LexicalEntry.

morph:inflectionRule provides information on how to generate inflected forms and, in case of a dataset with pre-generated forms, links these forms to InflectionRules that were used to generate them. If inflection slots were used, forms might have several rules attached to them.

Domain: ontolex:Form

Range: morph:InflectionRule

The example below illustrates the modelling of inflection classes and rules for the generation of the genitive singular of lupus in Latin.

In a fusional language like Latin, there is no need to have different inflection slots: a single inflection rule (specific for the inflection class to which the lexical entry is assigned) allows for the generation of the genitive singular form as follows:

On the other hand, in an agglutinative language like Turkish, it is useful to define separate inflection slots for each morphosyntactic feature, and separate inflection rules for each inflection slot, as illustrated in the example below.

When a software compatible with the specifications runs on this data to generate forms of the entry :adam, it first extracts all the rules associated with the corresponding morphological pattern, namely sg_rule, pl_rule, and acc_rule. Next, it establishes the order of inflection slots mentioned in the rules (by looking for the slot that is not used as an object in a morph:next property).

Then, for the first inflection slot the correct form is chosen. If there is a morph:baseType specified in the rule, the corresponding form is chosen. Otherwise the canonical form is used. Finally, for each inflection slot, the transformation is applied. For the first slot the initial form is used, after that, the output of one transformation is used as an input for the next.

With each transformation, all the properties in the grammatical meaning associated with the rule are copied to a newly created grammatical meaning. After all the transformations have been applied, the form is created with the constructed grammatical meaning. The initial form and the morphs are added as objects for the morph:consistsOf statements.

It is also possible to create Morph elements during generation in case they are not present in the data.

In case if there are no inflecton slots in the rules, the generation proceeds without using them.

In the case of the example above, the successive application of the two appropriate rules for accusative and plural formation – in the order established by the use of the morph:next property – allows for the generation of the accusative plural form as follows:

Base Types

baseType (DatatypeProperty)

URI: http://www.w3.org/ns/lemon/morph#baseType

morph:baseType is used for coindexing a base form, an inflection rule and the forms generated by the rule from the respective base in cases in which the inflectional paradigm of a single lexical entry involves different bases, e.g., stems.

Domain: ontolex:Form or morph:InflectionRule (or morph:Rule? MP)

Range: literal

For instance, for Latin verbs, in addition to the citation form, dictionaries also record “principal parts” – i.e., a set of forms from which the full paradigm of a lexeme can be inferred. E.g., the entry for rumpo in the Lewis and Short dictionary lists the forms: - rumpo, displaying the present stem rump-, from which other forms displaying the present stem can be inferred; - rupi, displaying the perfect stem rup-, from which other forms displaying the perfect stem can be inferred; - ruptum, displaying the so-called third stem rupt-, from which other forms displaying the third stem can be inferred;

This can be modelled with ontolex-Morph as follows:

Note that the inflection rules operating on the perfect and third stem are not only connected to the inflection class of rumpo, but also other ones, as they are valid across conjugations. By applying these rules, the following forms can be generated:

MP: as it has been shown that also derivation can be based a form different than the canonical one (e.g. Latin deverbal conversions from the Third Stem, like capio (Third Stem capt-) > capt-o), shouldn’t this hold also for WordFormationRule?

For an inflection rule with morph:baseType defined: If the lexical entry to which it is applied features a(n object of) morph:baseForm or (if these are not defined) a ontolex:canonicalForm with identital morph:baseType, apply the rule to this form, only. For a (generated) form, morph:baseType can be used to indicate from which form or with which rule it was generated. morph:baseType can also be used to mark stem classes in reseources for which no explicit inflection rules are given.

This was introduced for modelling stem alternations. In this definition, we assume that we have one lexical entry for each stem variant, so that an inflection rule whose baseType doesn’t match of its lexical entry doesn’t fire.

Word Formation

Word Formation Rules

WordFormationRule (Class)

URI: http://www.w3.org/ns/lemon/morph#WordFormationRule

morph:WordFormationRule represents the formal operation applied to a base form of a source LexicalEntry to obtain another, target LexicalEntry .

It describes the general pattern how words are being formed.For the analysis of a specific compound or derivation, use morph:WordFormationRelation.

Note: updated according to telco April 21, 2022.

generates (ObjectProperty)

URI: http://www.w3.org/ns/lemon/morph#generates

morph:generates connects a word formation rule to the lexical entries that are generated from it

Domain: morph:WordFormationRule

Range: ontolex:LexicalEntry

MP: given the parallelism between the inflection and derivation subcomponents of the generation component, I would expect InflectionRule to generate something too – namely, ontolex:Forms. Should we change the domain and range accordingly?

subclasses CompoundRule and DerivationRule. Normally, a derivation rule will involve one specific morpheme or one allomorphic variant [MP: but what about parasynthesis?]. A compound rule can involve an interfix or another morphophonological process.

DerivationRule (Class)

URI: http://www.w3.org/ns/lemon/morph#DerivationRule

morph:DerivationRule refers to rules that take one LexicalEntry as input and generate another LexicalEntry as output through the addition of one [or possibly more than one] derivational affix.

morph:CompoundingRule refers to rules that take more than one LexicalEntry as input to generate the output LexicalEntry.

Word Formation Relations

WordFormationRelation (Class)

URI: http://www.w3.org/ns/lemon/morph#WordFormationRelation

morph:WordFormationRelation is a subclass of vartrans:LexicalRelation that relates two lexical entries that are derivationally related, with the vartrans:target representing the resulting lexical entry, and the vartrans:source representing the morphological base (in derivation) or head and other constituents (in compounding).

morph:wordFormationRule relates a word formation relation to the word formation rule that is applied to the source lexical entry in order to obtain the target lexical entry.

Domain: morph:WordFormationRelation

Range: morph: WordFormationRule

Accordingly, the morphological derivation of German Schönheit “beauty” can be encoded as:

CompoundRelation (Class)

URI: http://www.w3.org/ns/lemon/morph#CompoundRelation

morph:CompoundRelation is a morph:WordFormationRelation that connects a (lexical entry representing a) morphological consituent of a compound with the (lexical entry representing the) compound. This is a reification of decomp:subTerm: A compound relation entails that the constituent is a subterm of the compound.

TODO: text describing compound head

CompoundHead (Class)

URI: http://www.w3.org/ns/lemon/morph#CompoundHead

morph:CompoundHead is a morph:WordFormationRelation that connects the (lexical entry representing the) morphological head of a compound with the (lexical entry representing the) compound.

Open questions

These are questions we decided to postpone until finalization of the module. Don’t use that for on-going discussions, that’s what minutes are for.