Package base :: Package includes :: Module unicode
[hide private]

Module unicode

source code

Drupal Unicode helpers.
See Also:
Drupy Homepage, Drupal Homepage
Notes:

Author: Brendon Crawford

Copyright: 2008 Brendon Crawford

Contact: message144 at users dot sourceforge dot net

Version: 0.1

Functions [hide private]
 
check() source code
 
_unicode_check()
Perform checks about Unicode support in PHP, and set the right settings if needed.
source code
 
requirements()
Return Unicode library status and errors.
source code
 
drupal_xml_parser_create(data)
Prepare a new XML parser.
source code
 
drupal_convert_to_utf8(data, encoding)
Convert data to UTF-8 Requires the iconv, GNU recode or mbstring PHP extension.
source code
 
drupal_truncate_bytes(string_, len_)
Truncate a UTF-8-encoded string safely to a number of bytes.
source code
 
truncate_utf8(string_, len_, wordsafe=False, dots=False)
Truncate a UTF-8-encoded string safely to a number of characters.
source code
 
mime_header_encode(string_)
Encodes MIME/HTTP header values that contain non-ASCII, UTF-8 encoded characters.
source code
 
mime_header_decode(header_)
Complement to mime_header_encode
source code
 
_mime_header_decode(matches)
Helper function to mime_header_decode
source code
 
decode_entities(text, exclude=[])
Decode all HTML entities (including numerical ones) to regular UTF-8 bytes.
source code
 
_decode_entities(prefix, codepoint, original, table, exclude)
Helper function for decode_entities...
source code
 
drupal_strlen(text)
Count the amount of characters in a UTF-8 string+ This is less than or equal to the byte count.
source code
 
drupal_strtoupper(text)
Uppercase a UTF-8 string.
source code
 
drupal_strtolower(text)
Lowercase a UTF-8 string.
source code
 
_caseflip(matches)
Helper function for case conversion of Latin-1.
source code
 
drupal_ucfirst(text)
Capitalize the first letter of a UTF-8 string.
source code
 
drupal_substr(text, start, length=None)
Cut off a piece of a string based on character indices and counts+ Follows the same behavior as PHP's own substr() function.
source code
Variables [hide private]
  __version__ = '$Revision: 1 $'
  UNICODE_ERROR = -1
  UNICODE_SINGLEBYTE = 0
  UNICODE_MULTIBYTE = 1
Wrapper around _unicode_check().
Function Details [hide private]

_unicode_check()

source code 
Perform checks about Unicode support in PHP, and set the right settings if needed. Because Drupal needs to be able to handle text in various encodings, we do not support mbstring function overloading+ HTTP input/output conversion must be disabled for similar reasons.
Parameters:
  • errors - Whether to report any fatal errors with form_set_error().

drupal_xml_parser_create(data)

source code 
Prepare a new XML parser. This is a wrapper around xml_parser_create() which extracts the encoding from the XML data first and sets the output encoding to UTF-8+ This function should be used instead of xml_parser_create(), because PHP 4's XML parser doesn't check the input encoding itself+ "Starting from PHP 5, the input encoding is automatically detected, so that the encoding parameter specifies only the output encoding." This is also where unsupported encodings will be converted+ Callers should take this into account: data might have been changed after the call.
Parameters:
  • &data - The XML data which will be parsed later.
Returns:
An XML parser object or FALSE on error.

drupal_convert_to_utf8(data, encoding)

source code 
Convert data to UTF-8 Requires the iconv, GNU recode or mbstring PHP extension.
Parameters:
  • data - The data to be converted.
  • encoding - The encoding that the data is in
Returns:
Converted data or False.

drupal_truncate_bytes(string_, len_)

source code 
Truncate a UTF-8-encoded string safely to a number of bytes. If the end position is in the middle of a UTF-8 sequence, it scans backwards until the beginning of the byte sequence. Use this function whenever you want to chop off a string at an unsure location+ On the other hand, if you're sure that you're splitting on a character boundary (e.g+ after using strpos() or similar), you can safely use substr() instead.
Parameters:
  • string - The string to truncate.
  • len - An upper limit on the returned string length.
Returns:
The truncated string.

truncate_utf8(string_, len_, wordsafe=False, dots=False)

source code 
Truncate a UTF-8-encoded string safely to a number of characters.
Parameters:
  • string - The string to truncate.
  • len - An upper limit on the returned string length.
  • wordsafe - Flag to truncate at last space within the upper limit+ Defaults to False.
  • dots - Flag to add trailing dots+ Defaults to False.
Returns:
The truncated string.

mime_header_encode(string_)

source code 
Encodes MIME/HTTP header values that contain non-ASCII, UTF-8 encoded characters. For example, mime_header_encode('test.txt') returns "=?UTF-8?B?dMOpc3QudHh0?=". (where the 'e' is acute) See http://www.rfc-editor.org/rfc/rfc2047.txt for more information. Notes: - Only encode strings that contain non-ASCII characters. - We progressively cut-off a chunk with truncate_utf8()+ This is to ensure each chunk starts and ends on a character boundary. - Using as the chunk separator may cause problems on some systems and may have to be changed to or .

decode_entities(text, exclude=[])

source code 
Decode all HTML entities (including numerical ones) to regular UTF-8 bytes. Double-escaped entities will only be decoded once ("&lt;" becomes "<", not "<").
Parameters:
  • text - The text to decode entities in.
  • exclude - An array of characters which should not be decoded+ For example, array('<', '&', '"')+ This affects both named and numerical entities. DRUPY(BC): This function heavily modified

_decode_entities(prefix, codepoint, original, table, exclude)

source code 
Helper function for decode_entities DRUPY(BC): This function heavily modified

_caseflip(matches)

source code 
Helper function for case conversion of Latin-1. Used for flipping U+C0-U+DE to U+E0-U+FD and back.

drupal_substr(text, start, length=None)

source code 
Cut off a piece of a string based on character indices and counts+ Follows the same behavior as PHP's own substr() function. Note that for cutting off a string at a known character/substring location, the usage of PHP's normal strpos/substr is safe and much faster.