Untitled Document

Conceptual Lexicon

by Adil KABBAJ

Previous: We suggest to consider first the Introduction followed by the Ontology document (with the associated document: Conceptual Structures).

Introduction

In Ontology we define Amine ontology as a conceptual and language-free ontology: Amine ontology is composed of Conceptual Structures (CS) and the description of a CS is not composed of identifiers but of references to CSs. However, to communicate with language-based systems, conceptual ontology should be augmented with a linguistic interface, i.e. a lexicon that records the association between words (identifiers) and CSs (especially Type and Individual CSs). Humans use, in general, a subset of a natural language lexicon as a source for the identifiers to consider, i.e. a French speaker will use French words as identifiers, an English speaker will use English words, etc. Amine Platform supports multi-lingua ontology: many lexicons (one per language) can be associated to one ontology and many identifiers in the same lexicon and/or in other lexicons can refer to the same CS. Those identifiers are synonyms.

Remark: To avoid any confusion, we should stress that we are using conceptual lexicon not linguistic lexicon: conceptual lexicon records the association between identifiers and CSs only, no other linguistic information is given or considered. However, a linguistic lexicon that is used in a typical natural language application, will contain morphological, lexical, syntactical, semantic and pragmatic information. Conceptual lexicon is a simple/abstract view of the linguistic lexicon.

Conceptual Lexicon

Definition: A conceptual lexicon is a lexicon that records for each identifier of a specific language its associated Conceptual Structure (CS). All these CSs are contained in the ontology associated to the lexicon. Thus two main attributes of a conceptual lexicon are the language and the ontology that supports the lexicon. A conceptual lexicon represents the "lexical gate" to the conceptual ontology.

Amine Platform supports multi-lingua ontology: many conceptual lexicons (one per language: French, English, Spanish, Arabic, etc.) can be associated to one ontology and many identifiers in the same lexicon and/or in other lexicons can refer to the same CS.

Definition: An identifier in a conceptual lexicon can play the role of a concept type identifier, an individual identifier or a relation type identifier. Concept type and relation type identifiers are associated to Type and RelationType CSs respectively. An individual identifier is associated to an Individual CS. Many synonyms in the same language and/or in different languages can be associated to the same Type or RelationType or Individual CS.

Identifier API

In general, an identifier (word) is declared in a lexicon of a specific language (English, French, Arabic, etc.) that is associated to an ontology: identifiers are associated to CSs to form entries of a language's lexicon. An identifier has a name that is a string that should start with an alphabetic character (underscore '_' is considered as an alphabetic character) and followed by zero or more alphabetic or digit characters. Please note that identifiers should not contain spaces.

A static boolean attribute, ignoreCase, which is common to all the identifiers, is defined in the class Identifier to switch on/off the consideration of case. Comparison methods, like equals() and hashCode() are overrided to support this possibility. If the user wants to ignore case for identifiers in the context of any application, he should assign true to ignoreCase, i.e. by calling the method Identifier.setIgnoreCase(true). If he does not want to ignore case, he should assign false to ignoreCase with: Identifier.setIgnoreCase(false). By default, ignoreCase is initialized to true. The method toString() returns the name of the identifier and the method wrap() can be used to "wrap a string in an Identifier" to search for an identifier in a structure or in a lexicon without creating a new identifier.

Conceptual Lexicon API

We consider first the implementation of the structural part of a Lexicon, then we present the API.

The implementation of Lexicon satisfies the goal to have a bidirectional Lexicon <Identifier, CS>: in some cases and for some processes, Identifier should be the key (and CS the value) and in other cases, CS should be the key (and Identifiers the value). The first case occurs, for instance, in parsing (from the identifier to the associated CS description) and the second case in generation (from the CS description to the associated identifier). This bidirectional lexicon is implemented in terms of two lexicons (two HashMaps), the first lexicon (lexIdent2CS : from identifier to CS) specifies for each identifier its associated CS and the second lexicon (lexCS2Idents : from CS to identifiers) specifies for each CS its associated ArrayList of Identifiers. The two lexicons (HashMaps) are two views of the same structure (the abstract lexicon). This implementation of the lexicon, in terms of two HashMaps, is hidden from the user: Amine provides a class called Lexicon and its methods are defined as operating on one (abstract) Lexicon.

Conceptual Lexicon API

Several methods are given to get, add, change or remove entries from the current lexicon. Other methods are given to add, get or remove synonyms in the same language/lexicon and/or in other languages/lexicons, to get the identifier for a specific CS, etc. The class Lexicon provides also a "lexical" version of methods that are defined in the ontology package: several methods are given to link CSs that are associated to the specified identifiers (like linkSubTypeToType(), linkIndividualsToType(), ...) to get superTypes (or subTypes) identifiers of a given type identifier, etc. Logically speaking, these methods should be defined in the ontology package only since they concern the creation (and consultation) of links between CSs; operations that constitute the basic ontology operations. However and for practical purposes (communication with humans should with identifiers, not with abstract CSs), Amine offers two versions for these methods: a) an 'identifier free' version defined in classes of the ontology package where only CSs are involved, b) an 'identifier based' version defined in Lexicon class.

Let us consider now the operations of Lexicon:

- constructors and destructor (finalize() and clear()),

- getters: getLanguage() to get the language of the current lexicon, getOntology() to get the ontology that supports the current lexicon, getAllIdentifiers() to get all the identifiers in the current lexicon, getAllSynonyms() to get all the synonyms of an identifier in all the lexicons, getSynonyms() to get the synonyms of specified identifier in the current lexicon, getIdentifier() to get the identifier associated to the specified CS, getIdentifiers() to get all the identifiers in the current lexicon for the specified CS, getCS() to get the CS for the specified identifier, getDirectSuperTypes() to get direct superType identifiers for the specified type, getSubTypes() to get the subtypes identifiers of the specified type, getMaxComSubType() to get the maximal common subtype identifier of the specified two types, getIndividualsOfType() to get the individual identifiers for the specified type, etc.,

- checkers: isMainLexicon() check if the current lexicon corresponds to the main lexicon, canRemoveIdentifier() check if the specified identifier can be removed from the current lexicon, isIdentifierKnown() check if the specified identifier is already contained in the current lexicon, isTypeIdentifier() check if the specified identifier is a Type identifier, isSubType() check if the first type is a subtype of the second type, isIndividualOfType() check if the specified individual identifier is an individual of the type identified by the specified identifier, areSynonyms() check if two identifiers are synonyms, etc.,

- adders: addEntry() add a new entry in the current lexicon for a new identifier and its CS, addEntries() add several entries to the current lexicon, addTypeEntry() add an entry for a Type identifier, addRelationTypeEntry() add an entry for a RelationType identifier, addIndividualEntry() add an entry for an Individual identifier, addSynonym() add a synonym for an identifier, addSynonyms() add several synonyms for an identifier, etc.,

- linkers: linkTypeToSuperTypes() link a type to its superTypes, linkTypeToSuperType() link a type to its superType, linkSubTypesToType() link subTypes to their type, linkSubTypeToType() link a subType to its Type, linkIndividualsToType() link individuals to their type, etc.,

- removers: removeEntry() remove an entry from the current lexicon, removeEntries() remove several entries from the current lexicon, removeAllEntries() remove all the entries for that concern the specified CS, etc.,

- replace: replaceIdentifierName() replace the name of the specified identifier in the current lexicon, replaceLanguageName() replace the name of the specified language, etc.

For more details, see API specification. Of course, the reader can consult even the source code for further implementation details. We make great effort to produce a very easy and readable code, very close to the conceptual and specification descriptions.

Remark: A reader can ask about the reason to associate lexicons to an ontology, instead of languages. In a previous modelization we had indeed a class called Language with Lexicon as component (and not the inverse as done actually). We opted for putting Lexicon as the main notion and language as a secondary notion because identifier operations (get an entry from the lexicon, add new entry to the lexicon, remove an entry from the lexicon, etc.) are done logically on lexicon, not language.

Examples

Example #1:

Our first example shows steps 11 to 14 of the example introduced in Ontology. The first ten steps were concerned by the creation of the ontology as a simple type hierarchy. English was used as the main language. Here, the aim is to show the specification of synonyms and the use of several lexicon's operations.

Note: Comments in Java code are in italic and green color.

11. Add the synonym Person to Human, in the EnglishLexicon

// add Synonyms

Identifier Person = new Identifier("Person");

12. Add identifiers Personne and Humain in the Lexicon for French as synonyms of Human (from the EnglishLexicon)

Identifier Personne = new Identifier("Personne");

Identifier Humain = new Identifier("Humain");

objs = new Object[] {Personne, Humain};

Identifier french = new Identifier("french");

The method addSynonyms() creates a new lexicon for the new language French (since the language French was not known before) and adds it as another lexicon for the current ontology. The method then adds two entries in frenchLexicon for Personne and Humain. These two French identifiers are associated to the same Type CS which is also the Type CS for Human in englishLexicon (in this way, the three identifiers are synonyms, i.e. they refeer to the same Type CS).

13. Add the identifier Eau in the frenchLexicon as a synonym of the English identifier Water.

Identifier Eau = new Identifier("Eau");

14. Get the synonyms of Person in French

System.out.println("Synonyms of " + Person + " in " + french + ":\n");

if (enumSynonyms != null)

while (enumSynonyms.hasMoreElements()) {

typeIdent = (Identifier) enumSynonyms.nextElement();

System.out.println(typeIdent + "\n");

}

else System.out.println("none \n");

Result of the code above:

Synonyms of Person in french :

Personne

Humain

Example #2

This example complements the previous example: it presents parts of the code of a Java class, LexiconSynonymsTest, which creates a simple type hierarchy ontology with an emphasis on Lexicons use. Basic parts of the code of this class are given here with comments. The example is also an opportunity to show how some methods of the Lexicon API can be easily used.

// Creation of the new Ontology

// Get the main lexicon to add, update or remove entries
Lexicon mainLexicon = ontology.getMainLexicon();

// Attempt to remove the identifier root from the lexicon

Note: The method removeEntry() aims to remove an identifier (its argument) from a lexicon. The identifier can not be removed however if it has children or if it corresponds to root identifier or relation root identifier.

// Create a Lexicon for a new language, French, and associate it to the

// current ontology

Note: The association of the new lexicon to an ontology is done by the constructor of Lexicon which establish a double reference between the new lexicon and the specified ontology (i.e. an ontology refeers to its lexicons and each lexicon refeers to its associated ontology).

Next : we suggest to consider next Tests (related to Ontology and Lexicons), LexiconsOntology GUI, Samples, APIs specification, and then higher levels of Amine Platform (Amine structures and operations, etc.).