This document describes the morphology module of the Lexicon Model for Ontologies as a result of the work of the Ontology Lexicon community group (OntoLex-Morph). The module is targeted at the representation of linguistic morphology in dictionaries and other linguistic resources, as well as the formalization of rules for word formation and inflection as employed in computational morphology and grammatical appendices as frequently provided as part of bilingual dictionaries.
This module operates in combination with the lemon core module and extends it with support for two distinct views on linguistic morphology:
OntoLex-Morph enables the enrichment of lexical entries and individual forms with information about the morphological units that they consist of (descriptive morphology). This improves the capability of OntoLex-Lemon to encode, preserve and document the structure of morphologically complex forms or lexical entries.
OntoLex-Morph allows the formalization of morphological rules that can be used to produce complex lexical entries and inflected forms from their component morphs and their base forms (generative morphology). This extends OntoLex-Lemon resources with a framework that describes how to produce and analyze complex lexical entries or inflected forms.
OntoLex-Morph has been designed with the premise to make OntoLex-lemon applicable to morphologically rich languages of any type, supporting both fusional and agglutinating morphology, and thereby contributing to a truly multilingual web.
The RDF file with the OntoLex lemon lexicography module can be found at http://www.w3.org/ns/lemon/morphThis document is an official report of the OntoLex community group. It does not represent the view of single individuals but reflects the consensus and agreement reached as part of the regular group discussions. The report should be regarded as the official specification of lemon.
If you wish to make comments regarding this document, please send them to public-ontolex@w3.org (subscribe, archives).
Morphology is a vital and, in many languages, very sophisticated part of language, and as such it has been an important part of the work of lexicographers. In the traditional print form, morphological information is provided in brief abbreviated terms that can only be deciphered with significant knowledge of the language, however, digital dictionaries are capable of representing this information in a more structured and machine-readable way. The OntoLex-Morph module is designed to provide a way to represent morphological information in a structured way that is compatible with the OntoLex-Lemon model.
The morphology module aims at fulfilling two modelling purposes:
Morphological decomposition on the lexical entry level.
The kind of elements of which a lexical entry can consist should be as non-restrictive as possible. I.e. The decomposition of lexical entries encompasses lexical entries, components, derivational affixes, inflectional affixes, stems, roots and zero morphs. However, a lexical entry can NEVER be composed of a form!
Morphological decomposition on the form level.
Elements of which a form can consist include roots, stems, inflectional affixes and zero morphs.
While the model is not capable of providing fine-grained descriptions of phonological and morphophonological processes, the module is capable of representing the morphological processes that are involved in the formation of lexical entries and forms. The module covers use cases typically found in the representation of dictionary data, such as the representation of inflectional and derivational paradigms, as well as the representation of morphological rules that are used in computational morphology. In order to model more complex morphological representations, users can employ the MMoOn Core ontology — The Multilingual Morpheme Ontology.
The OntoLex-Morph module aims to represent both traditional dictionary content (which contains only abbreviated information about morphological rules and paradigms, often organized in appendices) and structured computational data (morphological dictionaries) as used in language technology, with the goal of making resources from one community more accessible to the other.
OntoLex-Morph is an extension of the OntoLex model designed to represent morphological processes—such as inflection, derivation, and compounding—across languages, supporting both structural and semantic aspects of morphemes, rules, and grammatical constraints. It enables the formalization of morphological data from various sources, facilitates interoperability between computational morphologies, and supports use cases like generating linguistic labels for ontologies or modelling morphology as a knowledge graph. Central to the model are three classes—morph, rule, and grammatical meaning—which interact to describe how base forms are transformed into surface forms while capturing their morphosyntactic and semantic features.
Morphs are the basic building blocks within the module that represent a single indivisible unit of meaning or grammatical function. They can be roots, stems, affixes, or even zero morphs. Morphs represent a single version (allomorph) of a morpheme, which is a more abstract concept. Morphs can be either bound morphs, which cannot stand alone, or free morphs, which can stand alone as a word. A free morph may be a single concept such as “tea” or “pot” in “teapots”, while the plural suffix “-s” is a bound morph.
The class morph:Morph provides a way to represent sub-word elements and attach grammatical information to them.
URI: http://www.w3.org/ns/lemon/morph#Morph
morph:Morph represents any element of morphological analysis below the word level.
The property morph:consistsOf relates a form with the morphs from which it is constructed.
URI: http://www.w3.org/ns/lemon/morph#consistsOf
Property morph:consistsOf states into which Morph resources a Form resource can be segmented.
Here is a simple example of a segmentation of the English plural form cats:
This representation does not give the order of the morphs within the word, which may be useful for many applications. To give this information, the morphs can be modelled as an rdf:Seq
by means of the rdf:_1
, rdf:_2
, etc. properties.
The class morph:GrammaticalMeaning is used to gloss information associated with the morph. This can be either a single element or a node which bundles together several grammatical meanings, e.g. first person and singular. Typically, the bundles will be expressed as blank nodes. The recommended vocabulary to use for the meanings is LexInfo.
URI: http://www.w3.org/ns/lemon/morph#GrammaticalMeaning
morph:GrammaticalMeaning can be used to represent (bundles of) values of different morpho-syntactic or morpho-semantic features expressed by a form, morph or rule (e.g., value ‘nominative’ for feature ‘case’, value ‘singular’ for feature ‘number’, etc.; or the feature bundle composed by the latter two values, in a fusional language where they are expressed cumulatively, e.g. Latin)
The property morph:grammaticalMeaning relates an instance of the class morph:Morph
to an instance of the class morph:GrammaticalMeaning
. In addition to morphs, the subject of this property can be a ontolex:Form
as an aggregate of morphs or morph:Rule
— a rule stating how the form was formed. More details on the rules can be found in the corresponding section.
URI: http://www.w3.org/ns/lemon/morph#grammaticalMeaning
property morph:grammaticalMeaning assigns a grammatical meaning to a morph, form, or rule
For instance, we can update the previous example of the English plural form cats, and add the assignment of grammatical meaning to the form and to the corresponding plural morph, which can be expressed in this way.
In this case, we create a blank node for the grammatical meaning that corresponds to a single feature in Lexinfo. In practice, it might be better to define instances for common morphological meanings and reuse these objects.
For example, in the Latin form lupus, nominative case and singular number are expressed cumulatively by the affix -us. This is a common combination, therefore, an instance of morph:GrammaticalMeaning is introduced for that feature bundle. This time we use Lexinfo vocabulary alongside with the Paralex vocabulary — even though Lexinfo is the preferred way to represent grammatical features in OntoLex, there is no restriction on this.
Constraints may be specified on a morph to indicate which grammatical features it can be combined with. This is important for the generation of inflected forms, as it allows us to specify which morphs can be used together in a particular context. For example, in the case of the English plural morpheme ‘-s’, it can only be used with nouns. This information can be encoded using the property morph:baseConstraint
, which links a morph to its constraints.
The property morph:baseConstraint is used to encode information about morphosyntactic constraints for a certain morph, i.e. which grammatical characteristics it requires.
URI: http://www.w3.org/ns/lemon/morph#baseConstraint
morph:baseConstraint defines the grammatical characteristics of the stem or base that a derivational or inflectional morpheme can be combined with
For example, an element for nominal inflection can only be applied to nouns, and derivational affixes can have similar constraints. Note that such information is not applicable to an ontolex:Form
because this describes only the result of the application of a rule or the addition of a particular form.
As a concrete example, the fact that the English affix -s expresses plural number if attached to nouns, and 3rd person singular agreement if attached to verbs, can be coded as follows using morph:baseConstraint.
The property morph:baseForm is used when some of the derived or inflected forms are formed from a non-canonical form of a lexical entry. This property is necessary both to represent this information for manual consumption and to be used together with generation rules to provide input data for generating inflected or derived forms.
URI: http://www.w3.org/ns/lemon/morph#baseForm
baseForm is a subproperty of ontolex:lexicalForm
that indicates the form that is taken as base for the application of inflection or derivation rules to generate other forms.
One example is German verbal inflection (e.g., for gehen
“to go”), where the canonical form (gehen
, infinitive) is derived from the base form (geh-
, stem) by means of a suffix (-en
, infinitive marker), like other inflected forms (geh
, gehst
, geht
“I/you go; he/she/it goes”).
Morphology in a dictionary is often given in short forms, which the reader must interpret to generate the full forms. In the context of a digital resource, this process can be automated. The OntoLex-Morph module provides classes and properties that allows the specification of rules that can be used to generate derived lexical entries or inflected forms based on existing lexical entries. The application of these rules is intended to be equivalent across different implementations of the model.
Here are the 4 most common situations regarding when and how the generation happens:
In order to keep the model from becoming too complex, one rule is associated with exactly one morph and is used to describe the production of exactly one form (in case of inflection) or exactly one lexical entry (in case of derivation).
URI: http://www.w3.org/ns/lemon/morph#Rule
morph:Rule represents the formal operation applied to a base form to obtain another form (inflectionally or derivationally related to it). It must contain either morph:example
or morph:replacement
(or both). “Tabular” value of a morpheme must be stored in rdfs:label
(e.g. “-s”@en for usual PL in English). One rule applies exactly one morphological transformation, i.e. adds one Morph.
The property morph:example provides a way to link a rule to an example of a class of forms that share a morphological process. It is necessary in cases where the way the form is generated is not specified but we still want to represent a morphological transformation. This is a common case for retrodigitised dictionaries.
URI: http://www.w3.org/ns/lemon/morph#example
morph:example: A single form that demonstrates a class of forms that can be generated by a single rule with no allomorphy.
Replacements are used to describe transformations of stems by the replacement of zero or more characters by other characters.
URI: http://www.w3.org/ns/lemon/morph#Replacement
morph:Replacement is a class that can be used to represent the morphological transformation that is applied to a base form to obtain another form (inflectionally or derivationally related to it).
The property morph:replacement relates a rule with an object that describes the morphological transformation required to produce a valid form according to the rule.
Morph module does not limit the exact way to represent these transformations since this can be represented in many ways that have been developed and used in the field of computational morphology and beyond: finite state automata and equivalent to them regular expressions, morphology-specific formalisms like KIMMO for two-level morphology. As part of the model, we provide one such way — replacement with regular expressions, which will be used in the examples in the subsequent sections.
URI: http://www.w3.org/ns/lemon/morph#replacement
morph:replacement states the replacement pattern that is involved in a morphological rule for the generation of a form
The class morph:RegexReplacement is used to describe a morphological transformation using a regular expression. The specific syntax to use is the XPath syntax for compatibility with SPARQL.
URI: http://www.w3.org/ns/lemon/morph#RegexReplacement
morph:RegexReplacement can be used to represent the regular expression-based substitution that produces an inflected or derived surface form
The source and the target for the substitution are expressed with the properties morph:source and morph:target correspondignly.
URI: http://www.w3.org/ns/lemon/morph#source
morph:source: A string which is used as a basis for the substitution
URI: http://www.w3.org/ns/lemon/morph#target
morph:target: A string template that denotes a target for the substitution
The target can use backreferences (\1
) to refer to the captured groups in the source string.
In the above example, the source string ^(.*)en$
captures the stem of the verb, which is then used in the target string ge\1t
to form the perfect tense of the verb. The ge
prefix is added to the stem, and the t
suffix is added to indicate the perfect tense such as gemacht “done” from machen “to do”.
It is often desirable to preserve information about which rules were used for a form or an entry to be generated. The property morph:involves provides a way to do exactly that. We recommend adding this property to generated items in any implementation of the generation process.
URI: http://www.w3.org/ns/lemon/morph#involves
morph:involves links a Rule to the Morph that is involved in the process.
TO DISCUSS: This doesn’t work!
Inflection rules are used to represent operations that create an inflected form from a base form. This is a specific subclass of morph:Rule
for the purpose of modelling inflection.
URI: http://www.w3.org/ns/lemon/morph#InflectionRule
morph:InflectionRule represents the formal operation applied to a base form of a LexicalEntry to obtain another inflected form of that LexicalEntry.
morph:inflectionRule provides information on how to generate inflected forms and, in the case of a dataset with pre-generated forms, links these forms to InflectionRules that were used to generate them. If inflection slots were used, forms might have several rules attached to them.
Domain: ontolex:Form
Range: morph:InflectionRule
URI: http://www.w3.org/ns/lemon/morph#InflectionClass
morph:InflectionClass represents the inflection class to which a LexicalEntry belongs/is assigned – e.g., the declension of a noun, or the conjugation of a verb.
It may contain metadata information about this type of declension.
The link between inflection classes and lexical entries is not defined in OntoLex-Morph, but modelled using ontolex:morphologicalPattern
.
URI: http://www.w3.org/ns/lemon/morph#inflectionClass
morph:inflectionClass links an inflection rule to the inflection class it pertains to.
In the case of fusional morphology — languages like Greek, Latin or English — there is usually only one morph attached to a form that carries information about inflection. The situation is different for languages with agglutination, where each inflectional value is represented by its own morph. In order to represent this, the model has another class.
In a fusional language like Latin, there is no need to have different inflection slots: a single inflection rule (specific for the inflection class to which the lexical entry is assigned) allows for the generation of the genitive singular form as follows:
Inflection slots are used in agglutinative languages to represent the different grammatical categories that can be expressed by a single morph. In this case, the inflection rule is used to generate a form that is composed of several morphs, each of which corresponds to a different grammatical category. The inflection slots are used to specify the order in which the morphs are applied to the base form.
URI: http://www.w3.org/ns/lemon/morph#InflectionSlot
morph:InflectionSlot represents a single slot that can be filled with a morph of a corresponding grammatical category. Since one rule can introduce only one morph, inflection slots are necessary when we need to represent forms that are generated by several independent morphological processes.
For agglutinative languages like Finno-Ugric, Turkic and many more, each grammatical value that is encoded with a morph: e.g. number and case for Finnish nouns — is associated with a single slot. This way, there should be two separate rules for adding number and case morphs to form an inflected Finnish noun form.
URI: http://www.w3.org/ns/lemon/morph#inflectionSlot
morph:inflectionSlot links an inflection rule to the slot it pertains to
In order to set the order of morphs and also simplify the process of form generation, the property morph:next
points from one morph:InflectionRule
to the next.
URI: http://www.w3.org/ns/lemon/morph#next
morph:next links two consecutive inflection rules. The object rule can be applied after the first rule has been applied.
The example below illustrates the modelling of inflection classes and rules for the generation of the genitive singular of lupus in Latin.
On the other hand, in an agglutinative language like Turkish, it is useful to define separate inflection slots for each morphosyntactic feature, and separate inflection rules for each inflection slot, as illustrated in the example below.
TODO: Revise text based on if slots are removed
In order to generate forms of the entry :adam
, all the rules associated with the corresponding morphological pattern must first be extracted, namely sg_rule
, pl_rule
, and acc_rule
. Next, it establishes the order of inflection slots mentioned in the rules (by looking for the slot that is not used as an object in a morph:next
property).
Then, for the first inflection slot the correct form is chosen. If there is a morph:baseType
specified in the rule, the corresponding form is chosen. Otherwise, the canonical form is used. Finally, for each inflection slot, the transformation is applied. For the first slot the initial form is used, after that, the output of one transformation is used as an input for the next.
With each transformation, all the properties in the grammatical meaning associated with the rule are copied to a newly created grammatical meaning. After all the transformations have been applied, the form is created with the constructed grammatical meaning. The initial form and the morphs are added as objects for the morph:consistsOf
statements.
It is also possible to create Morph elements during generation in case they are not present in the data.
In the case of the example above, the successive application of the two appropriate rules for accusative and plural formation – in the order established by the use of the morph:next
property – allows for the generation of the accusative plural form as follows:
In many cases, the inflectional paradigm of a single lexical entry involves different bases, e.g., stems. In these cases, it is useful to be able to coindex a base form, an inflection rule and the forms generated by the rule from the respective base. The property morph:baseType is used for this purpose.
URI: http://www.w3.org/ns/lemon/morph#baseType
morph:baseType is used for coindexing a base form, an inflection rule and the forms generated by the rule from the respective base in cases in which the inflectional paradigm of a single lexical entry involves different bases, e.g., stems.
For instance, for Latin verbs, in addition to the citation form, dictionaries also record “principal parts” – i.e., a set of forms from which the full paradigm of a lexeme can be inferred. For example, the entry for rumpo in the Lewis and Short dictionary lists the forms:
This can be modelled with OntoLex-Morph as follows:
Note that the inflection rules operating on the perfect and third stem are not only connected to the inflection class of rumpo, but also other ones, as they are valid across conjugations. By applying these rules, the following forms can be generated:
For an inflection rule with morph:baseType
defined: If the lexical entry to which it is applied features a morph:baseForm
or (if these are not defined) a ontolex:canonicalForm
with identical morph:baseType
, apply the rule to this form, only. For a (generated) form, morph:baseType
can be used to indicate from which form or with which rule it was generated. morph:baseType
can also be used to mark stem classes in resources for which no explicit inflection rules are given.
This was introduced for modelling stem alternations. In this definition, we assume that we have one lexical entry for each stem variant so that an inflection rule whose baseType doesn’t match if its lexical entry doesn’t fire.
Another important component of morphological structure is word formation. While inflection is concerned with morphologically related forms of the same lexeme(s), word formation is concerned with morphologically related lexemes, focusing on the specific relationships between them on the one hand, and on the processes by which derivatives can be obtained from their bases (or from each other) on the other hand. Accordingly, at its core, the modelling of word formation in OntoLex-Morph operates with two main components:
In the following (sub)sections, these two components are described in detail and exemplified.
A piece of information regarding word formation that is often provided for both traditional dictionaries and digital morphological resources is which lexical entries are morphologically related: for instance, dictionaries often record the base of morphologically complex lexical entries, as illustrated below in the entry for the Italian noun trattamento ‘treatment’, derived from the verb trattare ‘to treat’ in the online Treccani dictionary.
trattaménto s. m. [der. di trattare]
To be able to not only encode this descriptive information but also possibly further specify it by expressing additional details, in OntoLex-Morph word formation relations are reified in a dedicated class, morph:WordFormationRelation
. Since word formation relations are relations between different lexical entries, this class is defined as a subclass of the class introduced in the vartrans module of OntoLex for such relations – namely, vartrans:LexicalRelation
. As a consequence, vartrans properties are also used to link lexical entries to the relations holding between them: specifically, each word formation relation is linked through vartrans:source
to its base(s) and through vartrans:target
to the derivative.
URI: http://www.w3.org/ns/lemon/morph#WordFormationRelation
morph:WordFormationRelation is a subclass of vartrans:LexicalRelation
that relates two lexical entries that are morphologically related, with the property vartrans:target
linking the relation to the resulting lexical entry, and the property vartrans:source
linking it to the morphological base (in derivation) or head and other constituents (in compounding).
Accordingly, the morphological derivation of German Schönheit ‘beauty’ can be encoded as follows:
The same kind of modelling can be applied to compounds – i.e., lexemes that are morphologically related to two or more bases; e.g. English wallpaper.
It should be noted that there is another OntoLex module that was envisaged to be usable also for compounding, namely decomp, devised for the decomposition of complex lexical entries (like Multi-Word Expressions) in their parts. However, in OntoLex-decomp the relationship between complex lexical entries and their parts is not reified, as there is no dedicated class, differently than what happens in OntoLex-morph for word formation relations.
As a consequence, to allow for a parallel treatment of different word formation processes (derivation and compounding), a subclass of morph:WordFormationRelation
is introduced for compounding – namely, morph:CompoundRelation
. This can be considered as a reification of the property decomp:subTerm
, which be used to decompose lexical entries into other lexical entries: hence, the existence of a compound relation entails that the source lexical entry is a subterm of the compound. Since, by definition, compounds have more than one base, there will also be more than one compound relation: one relation with the target compound should be introduced for each of the constituents of the compound.
URI: http://www.w3.org/ns/lemon/morph#CompoundRelation
morph:CompoundRelation is a morph:WordFormationRelation
that connects a (lexical entry representing a) morphological constituent of a compound with the (lexical entry representing the) compound.
Furthermore, compounds can have a head – i.e., a constituent that imposes its morphosyntactic and semantic properties on the whole word. For instance, It. capo-stazione ‘station master’ inherits the fact of being a masculine noun denoting a person from its head capo ‘chief’, rather than from the other constituent stazione ‘station’. In morph, a subclass of compound relations is introduced to express this information – namely, morph:CompoundHead
.
URI: http://www.w3.org/ns/lemon/morph#CompoundHead
morph:CompoundHead is a morph:WordFormationRelation
that connects the (lexical entry representing the) morphological head of a compound with the (lexical entry representing the) compound.
Accordingly, the morphological derivation of Italian capostazione ‘station mastes’ (from capo ‘head’ + stazione ‘station’) can be encoded as follows:
In addition to relations between morphologically related lexemes, one can be interested in expressing the formal instructions needed to generate derived lexemes from their bases. To do that, another sub-class of morph:Rule
is introduced, alongside morph:InflectionRule
, namely morph:WordFormationRule
. Like inflection rules, word formation rules can take as input either the canonical form of the input lexical entry, or another form that is used as the base form, and they can involve specific morphs.
URI: http://www.w3.org/ns/lemon/morph#WordFormationRule
morph:WordFormationRule represents the formal operation applied to a base form of a source LexicalEntry to obtain another, target LexicalEntry .
Word formation rules can also be related to the word formation relations existing between the lexical entries involved through the property morph:wordFormationRule
.
URI: http://www.w3.org/ns/lemon/morph#wordFormationRule
morph:wordFormationRule relates a word formation relation to the word formation rule that is applied to the source lexical entry in order to obtain the target lexical entry.
Unlike inflection rules, word formation rules generate lexical entries rather than forms – this can be expressed through the property morph:generates
.
URI: http://www.w3.org/ns/lemon/morph#generates
morph:generates connects a word formation rule to the lexical entries that are generated from it
Accordingly, if one wanted to express the formal operation involved in the morphological derivation of German Schönheit ‘beauty’, this can be done as follows:
Two sub-classes of morph:WordFormationRule
are introduced corresponding to the traditional division of the realm of word formation into derivation and compounding. In derivation rules, lexemes are obtained from a single base through the addition of one (or possibly more than one, as in the case of parasynthesis) derivational affixes.
URI: http://www.w3.org/ns/lemon/morph#DerivationRule
morph:DerivationRule refers to rules that take one LexicalEntry as input and generate another LexicalEntry as output through the addition of one or more derivational affix(es).
In compounding rules, two different bases are combined to obtain a new lexeme, possibly also involving an interfix or linking element.
URI: http://www.w3.org/ns/lemon/morph#DerivationRule
morph:CompoundingRule refers to rules that take more than one LexicalEntry as input to generate the output LexicalEntry.
To illustrate the usage of morph:DerivationRule
, the reader is referred to the example given above for word formation rules: indeed, the rule used there can be assigned to the more specific class for derivation rules, with every other assertion remaining unchanged as this class is a sub-class of morph:WordFormationRule
.
As for compounding, the example below illustrates the modelling of a rule involving a linking element for Dutch schaapskop ‘sheep head’.