|
| encodeEntities ($data, $srcEncoding='', $destEncoding='') |
| Convert a string to the correct XML representation in a target charset.
|
|
| getEntities ($charset) |
| Used only for backwards compatibility.
|
|
| isValidCharset ($encoding, $validList) |
| Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.
|
|
|
static | instance () |
| This class is singleton for performance reasons.
|
|
|
static Charset | $instance = null |
| $instance
|
|
◆ buildConversionTable()
PhpXmlRpc\Helper\Charset::buildConversionTable |
( |
| $tableName | ) |
|
|
protected |
- Parameters
-
- Exceptions
-
- Todo
add support for cp1252 as well as latin-2 .. latin-10 Optimization creep: instead of building all those tables on load, keep them ready-made php files which are not even included until needed
should we add to the latin-1 table the characters from cp_1252 range, i.e. 128 to 159 ? Those will NOT be present in true ISO-8859-1, but will save the unwary windows user from sending junk (though no luck when receiving them...) Note also that, apparently, while 'ISO/IEC 8859-1' has no characters defined for bytes 128 to 159, IANA ISO-8859-1 does have well-defined 'C1' control codes for those - wikipedia's page on latin-1 says: "ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." Check what mbstring/iconv do by default with those?
◆ encodeEntities()
PhpXmlRpc\Helper\Charset::encodeEntities |
( |
| $data, |
|
|
| $srcEncoding = '', |
|
|
| $destEncoding = '' ) |
Convert a string to the correct XML representation in a target charset.
This involves:
- character transformation for all characters which have a different representation in source and dest charsets
- using 'charset entity' representation for all characters which are outside of the target charset
To help correct communication of non-ascii chars inside strings, regardless of the charset used when sending requests, parsing them, sending responses and parsing responses, an option is to convert all non-ascii chars present in the message into their equivalent 'charset entity'. Charset entities enumerated this way are independent of the charset encoding used to transmit them, and all XML parsers are bound to understand them.
Note that when not sending a charset encoding mime type along with http headers, we are bound by RFC 3023 to emit strict us-ascii for 'text/xml' payloads (but we should review RFC 7303, which seems to have changed the rules...)
- Todo
do a bit of basic benchmarking (strtr vs. str_replace)
make usage of iconv() or mb_string() where available
support aliases for charset names, eg ASCII, LATIN1, ISO-88591 (see f.e. polyfill-iconv for a list), but then take those into account as well in other methods, ie.isValidCharset)
when converting to ASCII, allow to choose whether to escape the range 0-31,127 (non-print chars) or not
allow picking different strategies to deal w. invalid chars? eg. source in latin-1 and chars 128-159
add support for escaping using CDATA sections? (add cdata start and end tokens, replace only ']]>' with ']]]]>>')
- Parameters
-
string | $data | |
string | $srcEncoding | |
string | $destEncoding | |
- Return values
-
◆ getEntities()
PhpXmlRpc\Helper\Charset::getEntities |
( |
| $charset | ) |
|
Used only for backwards compatibility.
- Deprecated
- Parameters
-
- Return values
-
- Exceptions
-
◆ instance()
static PhpXmlRpc\Helper\Charset::instance |
( |
| ) |
|
|
static |
This class is singleton for performance reasons.
- Todo
- should we just make $xml_iso88591_Entities a static variable instead ?
- Return values
-
◆ isValidCharset()
PhpXmlRpc\Helper\Charset::isValidCharset |
( |
| $encoding, |
|
|
| $validList ) |
Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.
- Parameters
-
string | $encoding | charset to be tested |
string | array | $validList | comma separated list of valid charsets (or array of charsets) |
- Return values
-
◆ $charset_supersets
PhpXmlRpc\Helper\Charset::$charset_supersets |
|
protected |
Initial value:
'US-ASCII' =>
array(
'ISO-8859-1',
'ISO-8859-2',
'ISO-8859-3',
'ISO-8859-4',
'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8',
'ISO-8859-9', 'ISO-8859-10', 'ISO-8859-11', 'ISO-8859-12',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'UTF-8',
'EUC-JP', 'EUC-', 'EUC-KR', 'EUC-CN',),
)
The documentation for this class was generated from the following file:
- lib/phpxmlrpc/Helper/Charset.php