Inheritance diagram for HTMLPurifier_Lexer:

Public Member Functions
	extractBody ($html)
	Takes a string of HTML (fragment or document) and returns the content.

	normalize ($html, $config, $context)
	Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

	parseAttr ($string, $config)

	parseData ($string, $is_attr, $config)
	Parses special entities into the proper characters.

	parseText ($string, $config)

	tokenizeHTML ($string, $config, $context)
	Lexes an HTML string into tokens.

Static Public Member Functions
static	create ($config)
	Retrieves or sets the default Lexer as a Prototype Factory.

Public Attributes
	$tracksLineNumbers = false
	Whether or not this lexer implements line-number/column-number tracking.

Static Protected Member Functions
static	CDATACallback ($matches)
	Callback function for escapeCDATA() that does the work.

static	escapeCDATA ($string)
	Translates CDATA sections into regular sections (through escaping).

static	escapeCommentedCDATA ($string)
	Special CDATA case that is especially convoluted for <script>

Protected Attributes
	$_special_entity2str
	Most common entity to raw value conversion table for special entities.

Member Function Documentation

◆ CDATACallback()

static HTMLPurifier_Lexer::CDATACallback ( $matches )

staticprotected

Callback function for escapeCDATA() that does the work.

Warning: Though this is public in order to let the callback happen, calling it directly is not recommended.

Parameters

array $matches PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Return values

string Escaped internals of the CDATA section.

◆ create()

static HTMLPurifier_Lexer::create ( $config )

static

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Note: The behavior of this class has changed, rather than accepting a prototype object, it now accepts a configuration object. To specify your own prototype, set Core.LexerImpl to it. This change in behavior de-singletonizes the lexer object.

Parameters

HTMLPurifier_Config $config

Return values

HTMLPurifier_Lexer

Exceptions

HTMLPurifier_Exception

◆ escapeCDATA()

static HTMLPurifier_Lexer::escapeCDATA ( $string )

staticprotected

Translates CDATA sections into regular sections (through escaping).

Parameters

string $string HTML string to process.

Return values

string HTML with CDATA sections escaped.

◆ escapeCommentedCDATA()

static HTMLPurifier_Lexer::escapeCommentedCDATA ( $string )

staticprotected

Special CDATA case that is especially convoluted for <script>

Parameters

string $string HTML string to process.

Return values

string HTML with CDATA sections escaped.

◆ extractBody()

HTMLPurifier_Lexer::extractBody ( $html )

Takes a string of HTML (fragment or document) and returns the content.

Todo: Consider making protected

◆ normalize()

HTMLPurifier_Lexer::normalize	(	$html,
		$config,
		$context )

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string	$html	HTML.
HTMLPurifier_Config	$config
HTMLPurifier_Context	$context

Return values

string

Todo: Consider making protected

◆ parseData()

HTMLPurifier_Lexer::parseData	(	$string,
		$is_attr,
		$config )

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string $string String character data to be parsed.

Return values

string Parsed character data.

◆ tokenizeHTML()

HTMLPurifier_Lexer::tokenizeHTML	(	$string,
		$config,
		$context )

Lexes an HTML string into tokens.

Parameters

	$string	String HTML.
HTMLPurifier_Config	$config
HTMLPurifier_Context	$context

Return values

HTMLPurifier_Token[] array representation of HTML.

Reimplemented in HTMLPurifier_Lexer_DirectLex, HTMLPurifier_Lexer_DOMLex, and HTMLPurifier_Lexer_PH5P.

Member Data Documentation

◆ $_special_entity2str

HTMLPurifier_Lexer::$_special_entity2str

protected

Initial value:

=
        array(
            '&quot;' => '"',
            '&amp;' => '&',
            '&lt;' => '<',
            '&gt;' => '>',
            '&#39;' => "'",
            '&#039;' => "'",
            '&#x27;' => "'"
        )

Most common entity to raw value conversion table for special entities.

@type array

◆ $tracksLineNumbers

HTMLPurifier_Lexer::$tracksLineNumbers = false

Whether or not this lexer implements line-number/column-number tracking.

If it does, set to true.

The documentation for this class was generated from the following file:

lib/htmlpurifier/HTMLPurifier/Lexer.php

Public Member Functions

Static Public Member Functions

Public Attributes

Static Protected Member Functions

Protected Attributes

Member Function Documentation

◆ CDATACallback()

◆ create()

◆ escapeCDATA()

◆ escapeCommentedCDATA()

◆ extractBody()

◆ normalize()

◆ parseData()

◆ tokenizeHTML()

Member Data Documentation

◆ $_special_entity2str

◆ $tracksLineNumbers