aminePlatform.util.parserGenerator
Class TextTokenizer
java.lang.Object
java.io.StreamTokenizer
aminePlatform.util.parserGenerator.TextTokenizer
- All Implemented Interfaces:
- AmineConstants
- public class TextTokenizer
- extends java.io.StreamTokenizer
- implements AmineConstants
Title : parserGenerator.TextTokenizer Class
Description : TextTokenizer is responsible of the first step in "text" analysis ;
the lexical analysis of an Amine Object, of a CG or of a Prolog+CG program.
It scans the text in order to tokenize it, and the result, the sequence of
tokens (with their types) is stored in the vector vctTokenTokenType which is
used by the second step (syntactic analysis).
Copyright : Copyright (c) Adil KABBAJ 2004-2009
Fields inherited from class java.io.StreamTokenizer |
nval, sval, TT_EOF, TT_EOL, TT_NUMBER, TT_WORD |
Fields inherited from interface aminePlatform.util.AmineConstants |
ANALOGY, B_ASSIGN, B_DSPLY_WT_DELAY, B_DSPLY_WTT_DELAY, B_TRIGGER, B_WTT_DSPLY, BLOCK_BACKWARD_PROPAGATION, BLOCK_FORWARD_PROPAGATION, CANON, CGIF, CGRAPHIC, CHECK_PRECONDITIONS, COMPARE, COMPOSED_GOAL, CONCEPT_TYPE_IDENT, CONTEXT, COVERED_BY, CPLTE_CONTRACT, DEFINITION, EQ_OR_MORE_SPCFQ, EQUAL, EXPAND, FALSE_FOCUS_LIST, FUNCTIONAL, GENERALISE, GENERALIZE, HAVE_AN_INTERSECTION, ID_ADD, ID_DIV, ID_EQ, ID_INF, ID_IS, ID_MESSAGE, ID_MUL, ID_NOT, ID_NULL, ID_OPER_AND, ID_OPER_OR, ID_SUB, ID_SUP, IN_ACTIVATION, IN_MODE, IN_MODE2, INDIVIDUAL, INDIVIDUAL_IDENT, INTEGRATED, IS_CANONIC, KEY_GLOBAL_RULE, LC_ADD, LC_AMINE_BOOLEAN, LC_AMINE_DOUBLE, LC_AMINE_INTEGER, LC_AND, LC_BOOLEAN, LC_CG, LC_CLOSE_BRKT, LC_CLOSE_PARENT, LC_CLOSE_SET, LC_COMMA, LC_COMMA_SEMI, LC_CONCEPT, LC_CONSTRUCTOR, LC_CS, LC_CUT, LC_DIFF, LC_DIV, LC_DOUBLE, LC_DSBL_BKWRD_PRPGTN, LC_DSBL_FRWRD_PRPGTN, LC_EOF, LC_EQ, LC_FOUR_POINTS, LC_IDENTIFIER, LC_IF, LC_INF, LC_INTEGER, LC_INTEROG, LC_IS, LC_JAVA_OBJECT, LC_LEFT_ARROW, LC_LIST, LC_NULL, LC_OPEN_BRKT, LC_OPEN_PARENT, LC_OPEN_SET, LC_OPER_AND, LC_OPER_OR, LC_POINT, LC_RELATION, LC_RGHT_ARROW, LC_SEMI_COMMA, LC_SET, LC_STAR, LC_STATE, LC_STRING, LC_SUB, LC_SUP, LC_TERM, LC_TWO_POINTS, LC_VAR_LIST_CONSTRUCTOR, LC_VARIABLE, LF, MAXIMAL_JOIN, MORE_GENERAL, MORE_SPECIFIC, NOTHING_TO_INTEGRATE, OPERS_WITH_RSLT, OUT_MODE, OUT_MODE2, PARTIAL_CONTRACT, PARTIAL_SUBSUME, PRJCT_OPERS, PROJECT, READ, READ_SENTENCE, RELATION_TYPE_IDENT, S_AND, S_BOOLEAN, S_CG, S_CLOSE_BRKT, S_CLOSE_PARENT, S_CLOSE_SET, S_COMMA, S_CONCEPT, S_CONSTRUCTOR, S_CUT, S_DIFF, S_DOUBLE, S_EOF, S_EQUAL, S_EXPAND, S_FALSE, S_FOUR_POINTS, S_GENERALISE, S_GENERALIZE, S_IDENTIFIER, S_IF, S_INTEGER, S_INTEROG, S_IS, S_IS_CANONIC, S_LEFT_ARROW, S_LIST, S_MAXIMAL_JOIN, S_OPEN_BRKT, S_OPEN_PARENT, S_OPEN_SET, S_POINT, S_RGHT_ARROW, S_SEMI_COMMA, S_SOURCE, S_SPECIALIZE, S_STATE, S_STRING, S_SUBSUME, S_SUBSUME_WITH_RESULT, S_SUPER, S_TARGET, S_TERM, S_THIS, S_TRUE, S_TWO_POINTS, S_UNIFY, S_VARIABLE, SITUATION, SPECIALIZE, STEADY, SUBSUME, SUBSUME_WITH_RSLT, TRIGGER, UNCOMPARABLE, UNIFY, VAR_SUPER, WAIT_ASSIGNMENT, WAIT_END_OF_ASSIGNMENT, WAIT_PRECONDITIONS, WAIT_VALUE |
Constructor Summary |
TextTokenizer(java.lang.String s)
The constructor will create a Tokenizer and proceed for the tokenization of
the text in argument. |
Method Summary |
void |
back(int i)
back i elements in the vector of the token/tokenType couples. |
void |
changeToken(byte newTokenType)
|
boolean |
endOfStream()
Test if the next token is the end of the stream; the end of the vector of
token/tokenType couples. |
void |
finalize()
|
int |
getCursor()
Get the value of the Cursor; the index in the vector of the token/tokenType couples |
int |
getIndexOfCurrentToken()
|
java.lang.String |
getTextToParse()
|
java.lang.String |
getToken()
Get the current token |
java.lang.String |
getTokenAhead(int n)
|
byte |
getTokenType()
Get the token type |
java.lang.String |
getVctTokenTokenType()
|
(package private) boolean |
isTokenTowChars()
|
static boolean |
isVariableIdentifier(java.lang.String token)
A variable identifier should start with a letter followed optionally by
a digit or underscore. |
void |
lexicalAnalysis()
This method performs the lexical analysis of the current text and corresponds
to a loop that calls the method nxtToken1() at each iteration.
|
byte |
lookAhead(int n)
lookAhead(n) is used by syntactic analysis to look at the type of the n-th token,
starting from the current one. |
java.lang.String |
nameOfTknType(byte tknType)
Return the name of the specified token type, given as a byte. |
int |
numberOfTokens()
Get the number of tokens in the vector of token/tokenType couples |
void |
nxtToken()
This method assumes that the lexical analysis was done and that the tokens
and their types are now stored in the vector of token/tokenType couples.
|
(package private) void |
nxtToken1()
This method calls the method nextToken() to read the next token, and returns
its type (i.e. its lexical category). |
(package private) void |
recognizeIdentifier()
The current token begins with a letter or an underscore '_'.
|
void |
recognizeToken(byte pTokenType)
Like the method nxtToken(), this method assumes that the lexical analysis
was done and that the tokens and their types are now stored in the vector
of token/tokenType couples.
|
void |
recognizeToken(byte pTokenType1,
byte pTokenType2)
Like the method nxtToken(), this method assumes that the lexical analysis
was done and that the tokens and their types are now stored in the vector
of token/tokenType couples.
|
void |
setCursor(int i)
Set the cursor at the range i in the vector of the token/tokenType couples. |
int |
tokenLength()
Get the length of the current token |
Methods inherited from class java.io.StreamTokenizer |
commentChar, eolIsSignificant, lineno, lowerCaseMode, nextToken, ordinaryChar, ordinaryChars, parseNumbers, pushBack, quoteChar, resetSyntax, slashSlashComments, slashStarComments, toString, whitespaceChars, wordChars |
Methods inherited from class java.lang.Object |
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
TextTokenizer
public TextTokenizer(java.lang.String s)
- The constructor will create a Tokenizer and proceed for the tokenization of
the text in argument. The initialization of the tokenizer is as follow :
- the parameter s is the string to tokenize,
- underscore '_' is considered as alphabetic; as part of an identifier
- Java-style one-line comments are considered
- Java-style multi-lines comments are considered
- The Prolog-style comments, which starts with '%', are consired too
- Characters '.', '/' and '-' are considered as ordinary characters
- Parameters:
s
- : the string to tokenize
getVctTokenTokenType
public java.lang.String getVctTokenTokenType()
finalize
public void finalize()
getToken
public java.lang.String getToken()
- Get the current token
- Returns:
- the current token
getTokenAhead
public java.lang.String getTokenAhead(int n)
getTextToParse
public java.lang.String getTextToParse()
getTokenType
public byte getTokenType()
- Get the token type
- Returns:
- the current token type
getIndexOfCurrentToken
public int getIndexOfCurrentToken()
setCursor
public void setCursor(int i)
- Set the cursor at the range i in the vector of the token/tokenType couples.
- Parameters:
i
- : the new value for the cursor associated to the vector of the
token/tokenType couples
getCursor
public int getCursor()
- Get the value of the Cursor; the index in the vector of the token/tokenType couples
- Returns:
- the value of the Cursor
back
public void back(int i)
- back i elements in the vector of the token/tokenType couples.
numberOfTokens
public int numberOfTokens()
- Get the number of tokens in the vector of token/tokenType couples
- Returns:
- the number of tokens in the vector of token/tokenType couples
nameOfTknType
public java.lang.String nameOfTknType(byte tknType)
- Return the name of the specified token type, given as a byte. Here is the
list of the recognized token types :
S_BOOLEAN, S_IDENTIFIER, S_VARIABLE, S_STRING, S_INTEGER,S_DOUBLE
S_LIST, S_TERM, S_SUP, S_INF, S_COMMA, S_SEMI_COMMA, S_TWO_POINTS, S_CUT, S_INTEROG,
S_EQ, S_OPEN_BRKT, S_CLOSE_BRKT, S_OPEN_SET, S_CLOSE_SET, S_OPEN_PARENT, S_CLOSE_PARENT, S_CONSTRUCTOR,
S_POINT, S_ADD, S_SUB, S_STAR, S_DIV, S_DIFF, S_IF, S_FOUR_POINTS, S_RGHT_ARROW,
S_LEFT_ARROW, S_EOF.
These constants are defined in the interface util.AmineConstants
- Parameters:
tknType
- : token type as a byte
- Returns:
- the name of the specified token type, given as a byte
lexicalAnalysis
public void lexicalAnalysis()
throws ParsingException
- This method performs the lexical analysis of the current text and corresponds
to a loop that calls the method nxtToken1() at each iteration.
nxtToken1() calls the method nextToken() to read the next token, and returns
its type (i.e. its lexical category).
- Throws:
ParsingException
nxtToken1
void nxtToken1()
throws java.io.IOException
- This method calls the method nextToken() to read the next token, and returns
its type (i.e. its lexical category).
- Throws:
java.io.IOException
nxtToken
public void nxtToken()
- This method assumes that the lexical analysis was done and that the tokens
and their types are now stored in the vector of token/tokenType couples.
This method reads the current element from this vector and assigns, in the
two attributes token and tokenType, the current couple of token/tokenType.
This method and the two attributes (token/getToken() and tokenType/getTokenType())
are used by the syntactic analysis process and constitute the main interface
between the lexical analysis, done by this class, and syntactic analysis (of CG
or Prolog+CG program).
changeToken
public void changeToken(byte newTokenType)
endOfStream
public boolean endOfStream()
- Test if the next token is the end of the stream; the end of the vector of
token/tokenType couples.
lookAhead
public byte lookAhead(int n)
- lookAhead(n) is used by syntactic analysis to look at the type of the n-th token,
starting from the current one. The look is done in the vector of token/tokenType
couples.
recognizeIdentifier
void recognizeIdentifier()
- The current token begins with a letter or an underscore '_'.
An identifier is either a variable identifier, a boolean or a constant identifier.
isVariableIdentifier
public static boolean isVariableIdentifier(java.lang.String token)
- A variable identifier should start with a letter followed optionally by
a digit or underscore. After the digit or the underscore, a variable can have
any sequence of characters. A variable identifier can begin also with an underscore
followed optionally by any sequence of characters.
There is also special cases of identifiers "super", "this", "x_source", and
"y_target" that are considered as variables.
- Returns:
- true if the current token is a variable, and false otherwise
isTokenTowChars
boolean isTokenTowChars()
- Returns:
- true if the current token is composed of two characters
tokenLength
public int tokenLength()
- Get the length of the current token
- Returns:
- the length of the current token
recognizeToken
public void recognizeToken(byte pTokenType)
throws ParsingException
- Like the method nxtToken(), this method assumes that the lexical analysis
was done and that the tokens and their types are now stored in the vector
of token/tokenType couples.
The method reads the next token and determines its type, from the vector of
token/tokenType couples, and checks that the tokenType is equal to the
specified tokenType.
- Parameters:
pTokenType
- : the tokenType that should be recognized
- Throws:
:
- throws ParsingException if the tokenType of the current token is not
identical to the specified tokenType
ParsingException
recognizeToken
public void recognizeToken(byte pTokenType1,
byte pTokenType2)
throws ParsingException
- Like the method nxtToken(), this method assumes that the lexical analysis
was done and that the tokens and their types are now stored in the vector
of token/tokenType couples.
The method reads the next token and determines its type, from the vector of
token/tokenType couples, and checks that the tokenType is equal to the
specified tokenTypes : pTokenType1 or pTokenType2.
- Parameters:
pTokenType1
- : a tokenTypepTokenType2
- : a tokenType
- Throws:
:
- throws ParsingException if the tokenType of the current token is not
identical to the specified tokenTypes : pTokenType1 or pTokenType2
ParsingException