|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--HTMLTokenizer
The HTMLTokenizer
class takes an input stream and
parses it into "tokens", allowing the tokens to be
read one at a time. The html tokenizer can recognize tags,
entitites, and raw text.
Whitespace that is ignored by browsers is also ignored by instances of this class.
Each instance has two flags. These flags indicate:
A typical application first constructs an instance of this class and
then repeatedly loops, calling the nextToken
method in each
iteration of the loop until it returns the value TT_EOF
note: TT_EOF
will also be returned in the case of
invalid HTML code
java.io.StreamTokenizer
Field Summary | |
static int |
TT_ENTITY
A constant indicating that an HTML entity has been read. |
static int |
TT_EOF
A constant indicating that the end of the stream has been read. |
static int |
TT_TAG
A constant indicating that an HTML tag has been read. |
static int |
TT_TEXT
A constant indicating that raw text has been read. |
Constructor Summary | |
HTMLTokenizer(java.io.Reader r)
creates an HTMLTokenizer for a Reader input stream |
|
HTMLTokenizer(java.lang.String url)
creates an HTMLTokenizer for a file with a given url. |
Method Summary | |
void |
entityMode(boolean flag)
entityMode(false) indicates that html entities should be treated as normal text. |
java.lang.String |
getToken()
If the type is TT_TEXT, TT_TAG, or TT_ENTITY, the corresponding text will be returned. |
void |
lowerCaseMode(boolean flag)
lowerCaseMode(true) indicates that all tokens should be returned as lower-case. |
boolean |
nextEntityMatch(java.lang.String entity)
nextEntityMatch repeatedly calls nextToken() until EOF is reached or a an exact match is found between the specified entity and a type TT_ENTITY token. |
java.lang.String |
nextEntitySubstring(java.lang.String entity)
nextEntitySubstring repeatedly calls nextToken() until EOF is reached or the specified entity is found as a substring of an html entity. |
java.lang.String |
nextEntityToken()
nextEntityToken repeatedly calls nextToken() until EOF is reached or an entity type token is found. |
boolean |
nextTagMatch(java.lang.String tag)
nextTagMatch repeatedly calls nextToken() until EOF is reached or a an exact match is found between the specified tag and a type TT_TAG token. |
java.lang.String |
nextTagSubstring(java.lang.String tag)
nextTagSubstring repeatedly calls nextToken() until EOF is reached or the specified tag is found as a substring of an html tag. |
java.lang.String |
nextTagToken()
nextTagToken repeatedly calls nextToken() until EOF is reached or a tag type token is found. |
boolean |
nextTextMatch(java.lang.String phrase)
nextTextMatch repeatedly calls nextToken() until EOF is reached or a an exact match is found between the specified phrase and a type TT_TEXT token. |
java.lang.String |
nextTextSubstring(java.lang.String phrase)
nextTextSubstring repeatedly calls nextToken() until EOF is reached or the specified phrase is found in the html text. |
java.lang.String |
nextTextToken()
nextTextToken repeatedly calls nextToken() until EOF is reached or a text type token is found. |
int |
nextToken()
nextToken() returns the type of the token read. |
void |
pushBack()
The pushBack() method allows you to 'unread' the last token so that the next call to nextToken() will return the same value |
java.lang.String |
toString()
The method toString() returns a string representation of the current token. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final int TT_EOF
public static final int TT_TAG
public static final int TT_ENTITY
public static final int TT_TEXT
Constructor Detail |
public HTMLTokenizer(java.io.Reader r)
public HTMLTokenizer(java.lang.String url) throws java.io.FileNotFoundException, java.net.MalformedURLException, java.io.IOException
Method Detail |
public void entityMode(boolean flag)
public void lowerCaseMode(boolean flag)
public java.lang.String getToken()
public int nextToken() throws java.io.IOException
public java.lang.String nextTextToken() throws java.io.IOException
public java.lang.String nextTagToken() throws java.io.IOException
public java.lang.String nextEntityToken() throws java.lang.IllegalStateException, java.io.IOException
public boolean nextTextMatch(java.lang.String phrase) throws java.io.IOException, java.lang.IllegalArgumentException
public java.lang.String nextTextSubstring(java.lang.String phrase) throws java.io.IOException, java.lang.IllegalArgumentException
public boolean nextTagMatch(java.lang.String tag) throws java.io.IOException, java.lang.IllegalArgumentException
public java.lang.String nextTagSubstring(java.lang.String tag) throws java.io.IOException, java.lang.IllegalArgumentException
public boolean nextEntityMatch(java.lang.String entity) throws java.io.IOException, java.lang.IllegalArgumentException, java.lang.IllegalStateException
public java.lang.String nextEntitySubstring(java.lang.String entity) throws java.io.IOException, java.lang.IllegalArgumentException, java.lang.IllegalStateException
public void pushBack()
public java.lang.String toString()
toString
in class java.lang.Object
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |