Fix #2897. We have been relying on iconv/mb_convert_encoding to detect invalid UTF-8, but all techniques designed to validate UTF-8 seem to accept FFFE and FFFF. This PR explicitly converts those characters to FFFD (Unicode substitution character) before validating the rest of the string. It also substitutes one or more FFFD when it detects invalid UTF-8 character sequences. A comment in the code being change stated that it doesn't handle surrogates. It is right not to do so. The only case where we should see surrogates is reading UTF-16. Additional tests are added to an existing test reading a UTF-16 Csv to demonstrate that surrogates are handled correctly, and that FFFE/FFFF are handled reasonably. |
||
|---|---|---|
| .. | ||
| Trend | ||
| CodePageTest.php | ||
| DateTest.php | ||
| DrawingTest.php | ||
| FileTest.php | ||
| FontTest.php | ||
| OLEReadTest.php | ||
| PasswordHasherTest.php | ||
| PasswordReloadTest.php | ||
| StringHelperInvalidCharTest.php | ||
| StringHelperTest.php | ||
| TimeZoneTest.php | ||