Use Only mb_convert_encoding in StringHelper sanitizeUTF8 (#2994)

* Test if UConverter Exists Without Autoload

Fix #2982. That issue is actually closed, but it did expose a problem. Our test environments all enable php-intl, but that extension isn't a formal requirement for PhpSpreadsheet. Perhaps it ought to be. Nevertheless ...

Using UConverter for string translation solved some problems for us. However, it is only available when php-intl is enabled. The code tests if it exists before using it, so no big deal ... except it seems likely that the people reporting the issue not only did not have php-intl, but they do have their own autoloader which issues an exception when the class isn't found. The test for existence of UConverter defaulted to attempting to autoload it if not found. So, on a system without php-intl but with a custom autoloader, there is a problem. Code is changed to suppress autoload when testing UConverter existence.

Pending this fix, the workaround for this issue is to enable php-intl.

* Minor Improvement

Make mb_convert_encoding use same substitution character as UConverter, ensuring consistent results whatever the user's environment.

* And Now That I Figured That Out

Since mb_convert_encoding can now return the same output as UConverter, we don't need UConverter (or iconv) after all in sanitizeUTF8.
This commit is contained in:
oleibman 2022-08-12 18:59:28 -07:00 committed by GitHub
parent d13b07ba6e
commit 0492ea6d8a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 17 deletions

View File

@ -3,7 +3,6 @@
namespace PhpOffice\PhpSpreadsheet\Shared;
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
use UConverter;
class StringHelper
{
@ -334,26 +333,13 @@ class StringHelper
public static function sanitizeUTF8(string $textValue): string
{
$textValue = str_replace(["\xef\xbf\xbe", "\xef\xbf\xbf"], "\xef\xbf\xbd", $textValue);
if (class_exists(UConverter::class)) {
$returnValue = UConverter::transcode($textValue, 'UTF-8', 'UTF-8');
if ($returnValue !== false) {
return $returnValue;
}
}
// @codeCoverageIgnoreStart
// I don't think any of the code below should ever be executed.
if (self::getIsIconvEnabled()) {
$returnValue = @iconv('UTF-8', 'UTF-8', $textValue);
if ($returnValue !== false) {
return $returnValue;
}
}
$subst = mb_substitute_character(); // default is question mark
mb_substitute_character(65533); // Unicode substitution character
// Phpstan does not think this can return false.
$returnValue = mb_convert_encoding($textValue, 'UTF-8', 'UTF-8');
mb_substitute_character($subst);
return $returnValue;
// @codeCoverageIgnoreEnd
}
/**