aminePlatform.util.parserGenerator
Class TextTokenizer

java.lang.Object
  extended byjava.io.StreamTokenizer
      extended byaminePlatform.util.parserGenerator.TextTokenizer
All Implemented Interfaces:
AmineConstants

public class TextTokenizer
extends java.io.StreamTokenizer
implements AmineConstants

Title : parserGenerator.TextTokenizer Class

Description : TextTokenizer is responsible of the first step in "text" analysis ; the lexical analysis of an Amine Object, of a CG or of a Prolog+CG program. It scans the text in order to tokenize it, and the result, the sequence of tokens (with their types) is stored in the vector vctTokenTokenType which is used by the second step (syntactic analysis).

Copyright : Copyright (c) Adil KABBAJ 2004-2009


Field Summary
 
Fields inherited from class java.io.StreamTokenizer
nval, sval, TT_EOF, TT_EOL, TT_NUMBER, TT_WORD
 
Fields inherited from interface aminePlatform.util.AmineConstants
ANALOGY, B_ASSIGN, B_DSPLY_WT_DELAY, B_DSPLY_WTT_DELAY, B_TRIGGER, B_WTT_DSPLY, BLOCK_BACKWARD_PROPAGATION, BLOCK_FORWARD_PROPAGATION, CANON, CGIF, CGRAPHIC, CHECK_PRECONDITIONS, COMPARE, COMPOSED_GOAL, CONCEPT_TYPE_IDENT, CONTEXT, COVERED_BY, CPLTE_CONTRACT, DEFINITION, EQ_OR_MORE_SPCFQ, EQUAL, EXPAND, FALSE_FOCUS_LIST, FUNCTIONAL, GENERALISE, GENERALIZE, HAVE_AN_INTERSECTION, ID_ADD, ID_DIV, ID_EQ, ID_INF, ID_IS, ID_MESSAGE, ID_MUL, ID_NOT, ID_NULL, ID_OPER_AND, ID_OPER_OR, ID_SUB, ID_SUP, IN_ACTIVATION, IN_MODE, IN_MODE2, INDIVIDUAL, INDIVIDUAL_IDENT, INTEGRATED, IS_CANONIC, KEY_GLOBAL_RULE, LC_ADD, LC_AMINE_BOOLEAN, LC_AMINE_DOUBLE, LC_AMINE_INTEGER, LC_AND, LC_BOOLEAN, LC_CG, LC_CLOSE_BRKT, LC_CLOSE_PARENT, LC_CLOSE_SET, LC_COMMA, LC_COMMA_SEMI, LC_CONCEPT, LC_CONSTRUCTOR, LC_CS, LC_CUT, LC_DIFF, LC_DIV, LC_DOUBLE, LC_DSBL_BKWRD_PRPGTN, LC_DSBL_FRWRD_PRPGTN, LC_EOF, LC_EQ, LC_FOUR_POINTS, LC_IDENTIFIER, LC_IF, LC_INF, LC_INTEGER, LC_INTEROG, LC_IS, LC_JAVA_OBJECT, LC_LEFT_ARROW, LC_LIST, LC_NULL, LC_OPEN_BRKT, LC_OPEN_PARENT, LC_OPEN_SET, LC_OPER_AND, LC_OPER_OR, LC_POINT, LC_RELATION, LC_RGHT_ARROW, LC_SEMI_COMMA, LC_SET, LC_STAR, LC_STATE, LC_STRING, LC_SUB, LC_SUP, LC_TERM, LC_TWO_POINTS, LC_VAR_LIST_CONSTRUCTOR, LC_VARIABLE, LF, MAXIMAL_JOIN, MORE_GENERAL, MORE_SPECIFIC, NOTHING_TO_INTEGRATE, OPERS_WITH_RSLT, OUT_MODE, OUT_MODE2, PARTIAL_CONTRACT, PARTIAL_SUBSUME, PRJCT_OPERS, PROJECT, READ, READ_SENTENCE, RELATION_TYPE_IDENT, S_AND, S_BOOLEAN, S_CG, S_CLOSE_BRKT, S_CLOSE_PARENT, S_CLOSE_SET, S_COMMA, S_CONCEPT, S_CONSTRUCTOR, S_CUT, S_DIFF, S_DOUBLE, S_EOF, S_EQUAL, S_EXPAND, S_FALSE, S_FOUR_POINTS, S_GENERALISE, S_GENERALIZE, S_IDENTIFIER, S_IF, S_INTEGER, S_INTEROG, S_IS, S_IS_CANONIC, S_LEFT_ARROW, S_LIST, S_MAXIMAL_JOIN, S_OPEN_BRKT, S_OPEN_PARENT, S_OPEN_SET, S_POINT, S_RGHT_ARROW, S_SEMI_COMMA, S_SOURCE, S_SPECIALIZE, S_STATE, S_STRING, S_SUBSUME, S_SUBSUME_WITH_RESULT, S_SUPER, S_TARGET, S_TERM, S_THIS, S_TRUE, S_TWO_POINTS, S_UNIFY, S_VARIABLE, SITUATION, SPECIALIZE, STEADY, SUBSUME, SUBSUME_WITH_RSLT, TRIGGER, UNCOMPARABLE, UNIFY, VAR_SUPER, WAIT_ASSIGNMENT, WAIT_END_OF_ASSIGNMENT, WAIT_PRECONDITIONS, WAIT_VALUE
 
Constructor Summary
TextTokenizer(java.lang.String s)
          The constructor will create a Tokenizer and proceed for the tokenization of the text in argument.
 
Method Summary
 void back(int i)
          back i elements in the vector of the token/tokenType couples.
 void changeToken(byte newTokenType)
           
 boolean endOfStream()
          Test if the next token is the end of the stream; the end of the vector of token/tokenType couples.
 void finalize()
           
 int getCursor()
          Get the value of the Cursor; the index in the vector of the token/tokenType couples
 int getIndexOfCurrentToken()
           
 java.lang.String getTextToParse()
           
 java.lang.String getToken()
          Get the current token
 java.lang.String getTokenAhead(int n)
           
 byte getTokenType()
          Get the token type
 java.lang.String getVctTokenTokenType()
           
(package private)  boolean isTokenTowChars()
           
static boolean isVariableIdentifier(java.lang.String token)
          A variable identifier should start with a letter followed optionally by a digit or underscore.
 void lexicalAnalysis()
          This method performs the lexical analysis of the current text and corresponds to a loop that calls the method nxtToken1() at each iteration.
 byte lookAhead(int n)
          lookAhead(n) is used by syntactic analysis to look at the type of the n-th token, starting from the current one.
 java.lang.String nameOfTknType(byte tknType)
          Return the name of the specified token type, given as a byte.
 int numberOfTokens()
          Get the number of tokens in the vector of token/tokenType couples
 void nxtToken()
          This method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples.
(package private)  void nxtToken1()
          This method calls the method nextToken() to read the next token, and returns its type (i.e. its lexical category).
(package private)  void recognizeIdentifier()
          The current token begins with a letter or an underscore '_'.
 void recognizeToken(byte pTokenType)
          Like the method nxtToken(), this method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples.
 void recognizeToken(byte pTokenType1, byte pTokenType2)
          Like the method nxtToken(), this method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples.
 void setCursor(int i)
          Set the cursor at the range i in the vector of the token/tokenType couples.
 int tokenLength()
          Get the length of the current token
 
Methods inherited from class java.io.StreamTokenizer
commentChar, eolIsSignificant, lineno, lowerCaseMode, nextToken, ordinaryChar, ordinaryChars, parseNumbers, pushBack, quoteChar, resetSyntax, slashSlashComments, slashStarComments, toString, whitespaceChars, wordChars
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TextTokenizer

public TextTokenizer(java.lang.String s)
The constructor will create a Tokenizer and proceed for the tokenization of the text in argument. The initialization of the tokenizer is as follow :

- the parameter s is the string to tokenize,

- underscore '_' is considered as alphabetic; as part of an identifier

- Java-style one-line comments are considered

- Java-style multi-lines comments are considered

- The Prolog-style comments, which starts with '%', are consired too

- Characters '.', '/' and '-' are considered as ordinary characters

Parameters:
s - : the string to tokenize
Method Detail

getVctTokenTokenType

public java.lang.String getVctTokenTokenType()

finalize

public void finalize()

getToken

public java.lang.String getToken()
Get the current token

Returns:
the current token

getTokenAhead

public java.lang.String getTokenAhead(int n)

getTextToParse

public java.lang.String getTextToParse()

getTokenType

public byte getTokenType()
Get the token type

Returns:
the current token type

getIndexOfCurrentToken

public int getIndexOfCurrentToken()

setCursor

public void setCursor(int i)
Set the cursor at the range i in the vector of the token/tokenType couples.

Parameters:
i - : the new value for the cursor associated to the vector of the token/tokenType couples

getCursor

public int getCursor()
Get the value of the Cursor; the index in the vector of the token/tokenType couples

Returns:
the value of the Cursor

back

public void back(int i)
back i elements in the vector of the token/tokenType couples.


numberOfTokens

public int numberOfTokens()
Get the number of tokens in the vector of token/tokenType couples

Returns:
the number of tokens in the vector of token/tokenType couples

nameOfTknType

public java.lang.String nameOfTknType(byte tknType)
Return the name of the specified token type, given as a byte. Here is the list of the recognized token types : S_BOOLEAN, S_IDENTIFIER, S_VARIABLE, S_STRING, S_INTEGER,S_DOUBLE S_LIST, S_TERM, S_SUP, S_INF, S_COMMA, S_SEMI_COMMA, S_TWO_POINTS, S_CUT, S_INTEROG, S_EQ, S_OPEN_BRKT, S_CLOSE_BRKT, S_OPEN_SET, S_CLOSE_SET, S_OPEN_PARENT, S_CLOSE_PARENT, S_CONSTRUCTOR, S_POINT, S_ADD, S_SUB, S_STAR, S_DIV, S_DIFF, S_IF, S_FOUR_POINTS, S_RGHT_ARROW, S_LEFT_ARROW, S_EOF. These constants are defined in the interface util.AmineConstants

Parameters:
tknType - : token type as a byte
Returns:
the name of the specified token type, given as a byte

lexicalAnalysis

public void lexicalAnalysis()
                     throws ParsingException
This method performs the lexical analysis of the current text and corresponds to a loop that calls the method nxtToken1() at each iteration. nxtToken1() calls the method nextToken() to read the next token, and returns its type (i.e. its lexical category).

Throws:
ParsingException

nxtToken1

void nxtToken1()
         throws java.io.IOException
This method calls the method nextToken() to read the next token, and returns its type (i.e. its lexical category).

Throws:
java.io.IOException

nxtToken

public void nxtToken()
This method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples. This method reads the current element from this vector and assigns, in the two attributes token and tokenType, the current couple of token/tokenType. This method and the two attributes (token/getToken() and tokenType/getTokenType()) are used by the syntactic analysis process and constitute the main interface between the lexical analysis, done by this class, and syntactic analysis (of CG or Prolog+CG program).


changeToken

public void changeToken(byte newTokenType)

endOfStream

public boolean endOfStream()
Test if the next token is the end of the stream; the end of the vector of token/tokenType couples.


lookAhead

public byte lookAhead(int n)
lookAhead(n) is used by syntactic analysis to look at the type of the n-th token, starting from the current one. The look is done in the vector of token/tokenType couples.


recognizeIdentifier

void recognizeIdentifier()
The current token begins with a letter or an underscore '_'. An identifier is either a variable identifier, a boolean or a constant identifier.


isVariableIdentifier

public static boolean isVariableIdentifier(java.lang.String token)
A variable identifier should start with a letter followed optionally by a digit or underscore. After the digit or the underscore, a variable can have any sequence of characters. A variable identifier can begin also with an underscore followed optionally by any sequence of characters. There is also special cases of identifiers "super", "this", "x_source", and "y_target" that are considered as variables.

Returns:
true if the current token is a variable, and false otherwise

isTokenTowChars

boolean isTokenTowChars()
Returns:
true if the current token is composed of two characters

tokenLength

public int tokenLength()
Get the length of the current token

Returns:
the length of the current token

recognizeToken

public void recognizeToken(byte pTokenType)
                    throws ParsingException
Like the method nxtToken(), this method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples. The method reads the next token and determines its type, from the vector of token/tokenType couples, and checks that the tokenType is equal to the specified tokenType.

Parameters:
pTokenType - : the tokenType that should be recognized
Throws:
: - throws ParsingException if the tokenType of the current token is not identical to the specified tokenType
ParsingException

recognizeToken

public void recognizeToken(byte pTokenType1,
                           byte pTokenType2)
                    throws ParsingException
Like the method nxtToken(), this method assumes that the lexical analysis was done and that the tokens and their types are now stored in the vector of token/tokenType couples. The method reads the next token and determines its type, from the vector of token/tokenType couples, and checks that the tokenType is equal to the specified tokenTypes : pTokenType1 or pTokenType2.

Parameters:
pTokenType1 - : a tokenType
pTokenType2 - : a tokenType
Throws:
: - throws ParsingException if the tokenType of the current token is not identical to the specified tokenTypes : pTokenType1 or pTokenType2
ParsingException