Scanner
in package
The scanner scans over a given data input to react appropriately to characters.
Table of Contents
Constants
- CHARS_ALNUM = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890'
- CHARS_ALPHA = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
- CHARS_HEX = 'abcdefABCDEF01234567890'
Properties
- $errors : mixed
- Parse errors.
- $char : mixed
- The current integer byte position we are in $data.
- $data : mixed
- The string data we're parsing.
- $EOF : mixed
- Length of $data; when $char === $data, we are at the end-of-file.
Methods
- __construct() : mixed
- Create a new Scanner.
- charsUntil() : mixed
- Read chars until something in the mask is encountered.
- charsWhile() : int
- Read chars as long as the mask matches.
- columnOffset() : int
- Returns the current column of the current line that the tokenizer is at.
- consume() : mixed
- Silently consume N chars.
- current() : string
- Get the current character.
- currentLine() : int
- Returns the current line that is being consumed.
- getAsciiAlpha() : string
- Get the next group of characters that are ASCII Alpha characters.
- getAsciiAlphaNum() : string
- Get the next group of characters that are ASCII Alpha characters and numbers.
- getHex() : string
- Get the next group of that contains hex characters.
- getNumeric() : string
- Get the next group of numbers.
- next() : string
- Get the next character.
- peek() : string
- Take a peek at the next character in the data.
- position() : int
- Get the current position.
- remainingChars() : int
- Get all characters until EOF.
- sequenceMatches() : bool
- Check if upcomming chars match the given sequence.
- unconsume() : mixed
- Unconsume some of the data.
- whitespace() : int
- Consume whitespace.
- doCharsUntil() : mixed
- Read to a particular match (or until $max bytes are consumed).
- doCharsWhile() : string
- Returns the string so long as $bytes matches.
- replaceLinefeeds() : string
- Replace linefeed characters according to the spec.
Constants
CHARS_ALNUM
public
mixed
CHARS_ALNUM
= 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890'
CHARS_ALPHA
public
mixed
CHARS_ALPHA
= 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
CHARS_HEX
public
mixed
CHARS_HEX
= 'abcdefABCDEF01234567890'
Properties
$errors
Parse errors.
public
mixed
$errors
= array()
$char
The current integer byte position we are in $data.
private
mixed
$char
$data
The string data we're parsing.
private
mixed
$data
$EOF
Length of $data; when $char === $data, we are at the end-of-file.
private
mixed
$EOF
Methods
__construct()
Create a new Scanner.
public
__construct(string $data[, string $encoding = 'UTF-8' ]) : mixed
Parameters
- $data : string
-
Data to parse.
- $encoding : string = 'UTF-8'
-
The encoding to use for the data.
Tags
charsUntil()
Read chars until something in the mask is encountered.
public
charsUntil(string $mask) : mixed
Parameters
- $mask : string
charsWhile()
Read chars as long as the mask matches.
public
charsWhile(string $mask) : int
Parameters
- $mask : string
Return values
intcolumnOffset()
Returns the current column of the current line that the tokenizer is at.
public
columnOffset() : int
Newlines are column 0. The first char after a newline is column 1.
Return values
int —The column number.
consume()
Silently consume N chars.
public
consume([int $count = 1 ]) : mixed
Parameters
- $count : int = 1
current()
Get the current character.
public
current() : string
Note, this does not advance the pointer.
Return values
string —The current character.
currentLine()
Returns the current line that is being consumed.
public
currentLine() : int
Return values
int —The current line number.
getAsciiAlpha()
Get the next group of characters that are ASCII Alpha characters.
public
getAsciiAlpha() : string
Note, along with getting the characters the pointer in the data will be moved as well.
Return values
string —The next group of ASCII alpha characters.
getAsciiAlphaNum()
Get the next group of characters that are ASCII Alpha characters and numbers.
public
getAsciiAlphaNum() : string
Note, along with getting the characters the pointer in the data will be moved as well.
Return values
string —The next group of ASCII alpha characters and numbers.
getHex()
Get the next group of that contains hex characters.
public
getHex() : string
Note, along with getting the characters the pointer in the data will be moved as well.
Return values
string —The next group that is hex characters.
getNumeric()
Get the next group of numbers.
public
getNumeric() : string
Note, along with getting the characters the pointer in the data will be moved as well.
Return values
string —The next group of numbers.
next()
Get the next character.
public
next() : string
Note: This advances the pointer.
Return values
string —The next character.
peek()
Take a peek at the next character in the data.
public
peek() : string
Return values
string —The next character.
position()
Get the current position.
public
position() : int
Return values
int —The current intiger byte position.
remainingChars()
Get all characters until EOF.
public
remainingChars() : int
This consumes characters until the EOF.
Return values
int —The number of characters remaining.
sequenceMatches()
Check if upcomming chars match the given sequence.
public
sequenceMatches(string $sequence[, bool $caseSensitive = true ]) : bool
This will read the stream for the $sequence. If it's found, this will return true. If not, return false. Since this unconsumes any chars it reads, the caller will still need to read the next sequence, even if this returns true.
Example: $this->scanner->sequenceMatches('</script>') will see if the input stream is at the start of a '</script>' string.
Parameters
- $sequence : string
- $caseSensitive : bool = true
Return values
boolunconsume()
Unconsume some of the data.
public
unconsume([int $howMany = 1 ]) : mixed
This moves the data pointer backwards.
Parameters
- $howMany : int = 1
-
The number of characters to move the pointer back.
whitespace()
Consume whitespace.
public
whitespace() : int
Whitespace in HTML5 is: formfeed, tab, newline, space.
Return values
int —The length of the matched whitespaces.
doCharsUntil()
Read to a particular match (or until $max bytes are consumed).
private
doCharsUntil(string $bytes[, int $max = null ]) : mixed
This operates on byte sequences, not characters.
Matches as far as possible until we reach a certain set of bytes and returns the matched substring.
Parameters
- $bytes : string
-
Bytes to match.
- $max : int = null
-
Maximum number of bytes to scan.
Return values
mixed —Index or false if no match is found. You should use strong equality when checking the result, since index could be 0.
doCharsWhile()
Returns the string so long as $bytes matches.
private
doCharsWhile(string $bytes[, int $max = null ]) : string
Matches as far as possible with a certain set of bytes and returns the matched substring.
Parameters
- $bytes : string
-
A mask of bytes to match. If ANY byte in this mask matches the current char, the pointer advances and the char is part of the substring.
- $max : int = null
-
The max number of chars to read.
Return values
stringreplaceLinefeeds()
Replace linefeed characters according to the spec.
private
replaceLinefeeds(mixed $data) : string
Parameters
- $data : mixed