| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.harvard.hul.ois.jhove.module.pdf.Tokenizer
public abstract class Tokenizer
Tokenizer for PDF files. This is used in conjunction with the Parser, which assembled Tokens into higher-level constructs.
| Field Summary | |
|---|---|
| protected  int | _chCharacter code of current character. | 
| protected  java.io.RandomAccessFile | _fileSource from which to read bytes. | 
| static char[] | PDFDOCENCODINGMapping between PDFDocEncoding and Unicode code points. | 
| Constructor Summary | |
|---|---|
| Tokenizer()Constructor. | |
| Method Summary | |
|---|---|
|  void | addLanguageCode(java.lang.String langCode)Add a string to the language codes | 
| abstract  void | backupChar()Back up a byte so it will be read again. | 
|  java.util.Set<java.lang.String> | getLanguageCodes()Return the set of language codes. | 
|  Token | getNext()Parses out and returns a token from the input file. | 
|  Token | getNext(long max)Parses out and returns a token from the input file. | 
|  long | getOffset()Return the current offset into the file. | 
|  boolean | getPDFACompliant()Returns the value of the pdfACompliant flag, which indicates that the tokenizer hasn't detected non-compliance. | 
|  java.lang.String | getWSString()Returns the value of the last white space string read by the tokenizer. | 
| protected abstract  void | initStream(Stream token)Initialization code for Stream object. | 
| abstract  int | readChar()Get a character from the file or stream, using a buffer | 
|  int | readChar1(boolean utf16)Read a character in one-byte or 2-byte format, as requested | 
|  void | scanMode(boolean flag)If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings. | 
| abstract  void | seek(long offset)Set the Tokenizer to a new position in the file. | 
| protected  void | seekReset(long offset)Reset after a seek. | 
|  void | setEncrypted(boolean encrypted)Tell this object that the file is or isn't encrypted. | 
|  void | setPDFACompliant(boolean pdfACompliant)Set the value of the pdfACompliant flag. | 
| protected abstract  void | setStreamOffset(Stream token)Sets the offset of a Stream to the current file position. | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
public static char[] PDFDOCENCODING
protected java.io.RandomAccessFile _file
protected int _ch
| Constructor Detail | 
|---|
public Tokenizer()
| Method Detail | 
|---|
public Token getNext()
              throws java.io.IOException,
                     PdfException
java.io.IOException
PdfException
public Token getNext(long max)
              throws java.io.IOException,
                     PdfException
max - Maximum allowable size of the token
java.io.IOException
PdfExceptionpublic long getOffset()
public java.util.Set<java.lang.String> getLanguageCodes()
public void setEncrypted(boolean encrypted)
public boolean getPDFACompliant()
true
   is no guarantee that the file is compliant.
public void setPDFACompliant(boolean pdfACompliant)
public java.lang.String getWSString()
public abstract void seek(long offset)
                   throws java.io.IOException,
                          PdfException
offset - The offset in bytes from the start of the file.
java.io.IOException
PdfExceptionprotected void seekReset(long offset)
public abstract int readChar()
                      throws java.io.IOException
java.io.IOException
public int readChar1(boolean utf16)
              throws java.io.IOException
java.io.IOExceptionpublic abstract void backupChar()
public void addLanguageCode(java.lang.String langCode)
public void scanMode(boolean flag)
flag - Scan mode flag
protected abstract void initStream(Stream token)
                            throws java.io.IOException,
                                   PdfException
java.io.IOException
PdfException
protected abstract void setStreamOffset(Stream token)
                                 throws java.io.IOException,
                                        PdfException
java.io.IOException
PdfException| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||