Moodle PHP Documentation 4.5
Moodle 4.5dev (Build: 20240606) (d3ae1391abe)
PhpXmlRpc\Helper\Charset Class Reference

Public Member Functions

 encodeEntities ($data, $srcEncoding='', $destEncoding='')
 Convert a string to the correct XML representation in a target charset.
 
 getEntities ($charset)
 Used only for backwards compatibility (the .inc shims).
 
 isValidCharset ($encoding, $validList)
 Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.
 
 knownCharsets ()
 

Static Public Member Functions

static instance ()
 This class is singleton for performance reasons.
 

Protected Member Functions

 __construct ()
 Force usage as singleton.
 
 buildConversionTable ($tableName)
 

Protected Attributes

 $charset_supersets
 
 $xml_iso88591_Entities = array("in" => array(), "out" => array())
 

Static Protected Attributes

static Charset $instance = null
 $instance
 

Member Function Documentation

◆ buildConversionTable()

PhpXmlRpc\Helper\Charset::buildConversionTable ( $tableName)
protected
Parameters
string$tableName
Return values
void
Exceptions
ValueErrorExceptionfor unsupported $tableName
Todo

add support for cp1252 as well as latin-2 .. latin-10 Optimization creep: instead of building all those tables on load, keep them ready-made php files which are not even included until needed

should we add to the latin-1 table the characters from cp_1252 range, i.e. 128 to 159 ? Those will NOT be present in true ISO-8859-1, but will save the unwary windows user from sending junk (though no luck when receiving them...) Note also that, apparently, while 'ISO/IEC 8859-1' has no characters defined for bytes 128 to 159, IANA ISO-8859-1 does have well-defined 'C1' control codes for those - wikipedia's page on latin-1 says: "ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." Check what mbstring/iconv do by default with those?

◆ encodeEntities()

PhpXmlRpc\Helper\Charset::encodeEntities ( $data,
$srcEncoding = '',
$destEncoding = '' )

Convert a string to the correct XML representation in a target charset.

This involves:

  • character transformation for all characters which have a different representation in source and dest charsets
  • using 'charset entity' representation for all characters which are outside the target charset

To help correct communication of non-ascii chars inside strings, regardless of the charset used when sending requests, parsing them, sending responses and parsing responses, an option is to convert all non-ascii chars present in the message into their equivalent 'charset entity'. Charset entities enumerated this way are independent of the charset encoding used to transmit them, and all XML parsers are bound to understand them.

Note that when not sending a charset encoding mime type along with http headers, we are bound by RFC 3023 to emit strict us-ascii for 'text/xml' payloads (but we should review RFC 7303, which seems to have changed the rules...)

Parameters
string$data
string$srcEncoding
string$destEncoding
Return values
string
Todo

do a bit of basic benchmarking: strtr vs. str_replace, str_replace vs htmlspecialchars, hand-coded conversion vs mbstring when that is enabled

make use of iconv when it is available and mbstring is not

support aliases for charset names, eg ASCII, LATIN1, ISO-88591 (see f.e. polyfill-iconv for a list), but then take those into account as well in other methods, ie. isValidCharset)

when converting to ASCII, allow to choose whether to escape the range 0-31,127 (non-print chars) or not

allow picking different strategies to deal w. invalid chars? eg. source in latin-1 and chars 128-159

add support for escaping using CDATA sections? (add cdata start and end tokens, replace only ']]>' with ']]]]>>')

◆ getEntities()

PhpXmlRpc\Helper\Charset::getEntities ( $charset)

Used only for backwards compatibility (the .inc shims).

Deprecated
Parameters
string$charset
Return values
array
Exceptions
ValueErrorExceptionfor unknown/unsupported charsets

◆ instance()

static PhpXmlRpc\Helper\Charset::instance ( )
static

This class is singleton for performance reasons.

Return values
Charset
Todo
should we just make $xml_iso88591_Entities a static variable instead ?

◆ isValidCharset()

PhpXmlRpc\Helper\Charset::isValidCharset ( $encoding,
$validList )

Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.

Deprecated
kept around for BC, as it is not in use by the lib
Parameters
string$encodingcharset to be tested
string | array$validListcomma separated list of valid charsets (or array of charsets)
Return values
bool

◆ knownCharsets()

PhpXmlRpc\Helper\Charset::knownCharsets ( )
Return values
string[]

Member Data Documentation

◆ $charset_supersets

PhpXmlRpc\Helper\Charset::$charset_supersets
protected
Initial value:
= array(
'US-ASCII' => array('ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4',
'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8',
'ISO-8859-9', 'ISO-8859-10', 'ISO-8859-11', 'ISO-8859-12',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'UTF-8',
'EUC-JP', 'EUC-', 'EUC-KR', 'EUC-CN',),
)

The documentation for this class was generated from the following file: