defines string api's for manipulating strings More...

Static Public Member Functions
static	code2utf8 ($num)
	Returns the utf8 string corresponding to the unicode value (from php.net, courtesy - roman.nosp@m.s@vo.nosp@m.id.lv)

static	convert ($text, $fromCS, $toCS='utf-8')
	Converts the text between different encodings.

static	encode_mimeheader ($text, $charset='utf-8')
	Generate a correct base64 encoded header to be used in MIME mail messages.

static	entities_to_utf8 ($str, $htmlent=true)
	Converts all the numeric entities &#nnnn; or &#xnnn; to UTF-8 Original from laurynas dot butkus at gmail at: http://php.net/manual/en/function.html-entity-decode.php#75153 with some custom mods to provide more functionality.

static	get_encodings ()
	Returns encoding options for select boxes, utf-8 and platform encoding first.

static	is_charset_supported (string $charset)
	Check whether the charset is supported by mbstring.

static	parse_charset ($charset)
	Standardise charset name.

static	remove_unicode_non_characters ($value)
	There are a number of Unicode non-characters including the byte-order mark (which may appear multiple times in a string) and also other ranges.

static	reset_caches ()
	Reset internal textlib caches.

static	specialtoascii ($text, $charset='utf-8')
	Try to convert upper unicode characters to plain ascii, the returned string may contain unconverted unicode characters.

static	str_max_bytes ($string, $bytes)
	Truncates a string to no more than a certain number of bytes in a multi-byte safe manner.

static	strlen ($text, $charset='utf-8')
	Multibyte safe strlen() function, uses mbstring or iconv.

static	strpos ($haystack, $needle, $offset=0)
	Find the position of the first occurrence of a substring in a string.

static	strrchr ($haystack, $needle, $part=false)
	Finds the last occurrence of a character in a string within another.

static	strrev ($str)
	Reverse UTF-8 multibytes character sets (used for RTL languages) (We only do this because there is no mb_strrev or iconv_strrev)

static	strrpos ($haystack, $needle)
	Find the position of the last occurrence of a substring in a string UTF-8 ONLY safe strrpos(), uses mbstring.

static	strtolower ($text, $charset='utf-8')
	Multibyte safe strtolower() function, uses mbstring.

static	strtotitle ($text)
	Makes first letter of each word capital - words must be separated by spaces.

static	strtoupper ($text, $charset='utf-8')
	Multibyte safe strtoupper() function, uses mbstring.

static	substr ($text, $start, $len=null, $charset='utf-8')
	Multibyte safe substr() function, uses mbstring or iconv.

static	trim_ctrl_chars (string $text)
	Trims control characters out of a string.

static	trim_utf8_bom ($str)
	Removes the BOM from unicode string

static	utf8_to_entities ($str, $dec=false, $nonnum=false)
	Converts all Unicode chars > 127 to numeric entities &#nnnn; or &#xnnn;.

static	utf8ord ($utf8char)
	Returns the code of the given UTF-8 character.

Public Attributes
string const	UTF8_BOM = "\xef\xbb\xbf"
	Byte order mark for UTF-8.

Static Protected Member Functions
static	get_entities_table ()
	Returns HTML entity transliteration table.

Static Protected Attributes
static string[]	$noncharacters
	Array of strings representing Unicode non-characters.

Detailed Description

defines string api's for manipulating strings

This class is used to manipulate strings under Moodle 1.6 an later. As utf-8 text become mandatory a pool of safe functions under this encoding become necessary. The name of the methods is exactly the same than their PHP originals.

This class was previously based on Typo3 which has now been removed and uses native functions now.

Copyright: 1999 onwards Martin Dougiamas

License: http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later

Member Function Documentation

◆ code2utf8()

static core_text::code2utf8 ( $num )

static

Returns the utf8 string corresponding to the unicode value (from php.net, courtesy - roman.nosp@m.s@vo.nosp@m.id.lv)

Parameters

int $num one unicode value

Return values

string the UTF-8 char corresponding to the unicode value

◆ convert()

static core_text::convert	(	$text,
		$fromCS,
		$toCS = 'utf-8' )

static

Converts the text between different encodings.

It uses iconv extension with //TRANSLIT parameter. If both source and target are utf-8 it tries to fix invalid characters only.

Parameters

string	$text
string	$fromCS	source encoding
string	$toCS	result encoding

Return values

string|bool converted string or false on error

◆ encode_mimeheader()

static core_text::encode_mimeheader	(		$text,
			$charset = 'utf-8' )

static

Generate a correct base64 encoded header to be used in MIME mail messages.

This function seems to be 100% compliant with RFC1342. Credits go to: paravoid (http://www.php.net/manual/en/function.mb-encode-mimeheader.php#60283).

Parameters

string	$text	input string
string	$charset	encoding of the text

Return values

string base64 encoded header

◆ entities_to_utf8()

static core_text::entities_to_utf8	(		$str,
			$htmlent = true )

static

Converts all the numeric entities &#nnnn; or &#xnnn; to UTF-8 Original from laurynas dot butkus at gmail at: http://php.net/manual/en/function.html-entity-decode.php#75153 with some custom mods to provide more functionality.

Parameters

string	$str	input string
boolean	$htmlent	convert also html entities (defaults to true)

Return values

string encoded UTF-8 string

◆ get_encodings()

static core_text::get_encodings ( )

static

Returns encoding options for select boxes, utf-8 and platform encoding first.

Return values

array encodings

◆ get_entities_table()

static core_text::get_entities_table ( )

staticprotected

Returns HTML entity transliteration table.

Return values

array with (html entity => utf-8) elements

◆ is_charset_supported()

static core_text::is_charset_supported ( string $charset )

static

Check whether the charset is supported by mbstring.

Parameters

string $charset Normalised charset

Return values

bool

◆ parse_charset()

static core_text::parse_charset ( $charset )

static

Standardise charset name.

Please note it does not mean the returned charset is actually supported.

Parameters

string $charset raw charset name

Return values

string normalised lowercase charset name

◆ remove_unicode_non_characters()

static core_text::remove_unicode_non_characters ( $value )

static

There are a number of Unicode non-characters including the byte-order mark (which may appear multiple times in a string) and also other ranges.

These can cause problems for some processing.

This function removes the characters using string replace, so that the rest of the string remains unchanged.

Parameters

string $value Input string

Return values

string Cleaned string value

Since: Moodle 3.5

◆ reset_caches()

static core_text::reset_caches ( )

static

Reset internal textlib caches.

Deprecated: since Moodle 4.0. See MDL-53544.

Todo: To be removed in Moodle 4.4 - MDL-71748

◆ specialtoascii()

static core_text::specialtoascii	(		$text,
			$charset = 'utf-8' )

static

Try to convert upper unicode characters to plain ascii, the returned string may contain unconverted unicode characters.

With the removal of typo3, iconv conversions was found to be the best alternative to Typo3's function. However using the standard iconv call iconv($charset, 'ASCII//TRANSLIT//IGNORE', (string) $text); resulted in invalid strings with special character from Russian/Japanese. To solve this, the transliterator was used but this resulted in empty strings for certain strings in our test. It was decided to use a combo of the 2 to cover all our bases. Refer MDL-53544 for further information.

Parameters

string	$text	input string
string	$charset	encoding of the text

Return values

string converted ascii string

◆ str_max_bytes()

static core_text::str_max_bytes	(		$string,
			$bytes )

static

Truncates a string to no more than a certain number of bytes in a multi-byte safe manner.

UTF-8 only!

Parameters

string	$string	String to truncate
int	$bytes	Maximum length of bytes in the result

Return values

string Portion of string specified by $bytes

Since: Moodle 3.1

◆ strlen()

static core_text::strlen	(		$text,
			$charset = 'utf-8' )

static

Multibyte safe strlen() function, uses mbstring or iconv.

Parameters

string	$text	input string
string	$charset	encoding of the text

Return values

int	number of characters

◆ strpos()

static core_text::strpos	(	$haystack,
		$needle,
		$offset = 0 )

static

Find the position of the first occurrence of a substring in a string.

UTF-8 ONLY safe strpos(), uses mbstring

Parameters

string	$haystack	the string to search in
string	$needle	one or more charachters to search for
int	$offset	offset from begining of string

Return values

int	the numeric position of the first occurrence of needle in haystack.

◆ strrchr()

static core_text::strrchr	(	$haystack,
		$needle,
		$part = false )

static

Finds the last occurrence of a character in a string within another.

UTF-8 ONLY safe mb_strrchr().

Parameters

string	$haystack	The string from which to get the last occurrence of needle.
string	$needle	The string to find in haystack.
boolean	$part	If true, returns the portion before needle, else return the portion after (including needle).

Return values

string|false False when not found.

Since: Moodle 2.4.6, 2.5.2, 2.6

◆ strrev()

static core_text::strrev ( $str )

static

Reverse UTF-8 multibytes character sets (used for RTL languages) (We only do this because there is no mb_strrev or iconv_strrev)

Parameters

string $str the multibyte string to reverse

Return values

string the reversed multi byte string

◆ strrpos()

static core_text::strrpos	(		$haystack,
			$needle )

static

Find the position of the last occurrence of a substring in a string UTF-8 ONLY safe strrpos(), uses mbstring.

Parameters

string	$haystack	the string to search in
string	$needle	one or more charachters to search for

Return values

int	the numeric position of the last occurrence of needle in haystack

◆ strtolower()

static core_text::strtolower	(		$text,
			$charset = 'utf-8' )

static

Multibyte safe strtolower() function, uses mbstring.

Parameters

string	$text	input string
string	$charset	encoding of the text (may not work for all encodings)

Return values

string lower case text

◆ strtotitle()

static core_text::strtotitle ( $text )

static

Makes first letter of each word capital - words must be separated by spaces.

Use with care, this function does not work properly in many locales!!!

Parameters

string $text input string

Return values

string

◆ strtoupper()

static core_text::strtoupper	(		$text,
			$charset = 'utf-8' )

static

Multibyte safe strtoupper() function, uses mbstring.

Parameters

string	$text	input string
string	$charset	encoding of the text (may not work for all encodings)

Return values

string upper case text

◆ substr()

static core_text::substr	(	$text,
		$start,
		$len = null,
		$charset = 'utf-8' )

static

Multibyte safe substr() function, uses mbstring or iconv.

Parameters

string	$text	string to truncate
int	$start	negative value means from end
int	$len	maximum length of characters beginning from start
string	$charset	encoding of the text

Return values

string portion of string specified by the $start and $len

◆ trim_ctrl_chars()

static core_text::trim_ctrl_chars ( string $text )

static

Trims control characters out of a string.

Example: (::x00-::x1f) and (\x7f)

Parameters

string $text Input string

Return values

string Cleaned string value

◆ trim_utf8_bom()

static core_text::trim_utf8_bom ( $str )

static

Removes the BOM from unicode string

Parameters

string $str input string

Return values

string

◆ utf8_to_entities()

static core_text::utf8_to_entities	(	$str,
		$dec = false,
		$nonnum = false )

static

Converts all Unicode chars > 127 to numeric entities &#nnnn; or &#xnnn;.

Parameters

string	$str	input string
boolean	$dec	output decadic only number entities
boolean	$nonnum	remove all non-numeric entities

Return values

string converted string

◆ utf8ord()

static core_text::utf8ord ( $utf8char )

static

Returns the code of the given UTF-8 character.

Parameters

string $utf8char one UTF-8 character

Return values

int	the code of the given character

The documentation for this class was generated from the following file:

lib/classes/text.php

Static Public Member Functions

Public Attributes

Static Protected Member Functions

Static Protected Attributes

Detailed Description

Member Function Documentation

◆ code2utf8()

◆ convert()

◆ encode_mimeheader()

◆ entities_to_utf8()

◆ get_encodings()

◆ get_entities_table()

◆ is_charset_supported()

◆ parse_charset()

◆ remove_unicode_non_characters()

◆ reset_caches()

◆ specialtoascii()

◆ str_max_bytes()

◆ strlen()

◆ strpos()

◆ strrchr()

◆ strrev()

◆ strrpos()

◆ strtolower()

◆ strtotitle()

◆ strtoupper()

◆ substr()

◆ trim_ctrl_chars()

◆ trim_utf8_bom()

◆ utf8_to_entities()

◆ utf8ord()