PhpSpreadsheet/tests/PhpSpreadsheetTests/Reader/Html
oleibman 5de82981d8
Html Reader Not Handling non-ASCII Data Correctly (#2943)
* Html Reader Not Handling non-ASCII Data Correctly

Fix #2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).

* Better Implementation

Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers.

* Additional Tests

Test non-ASCII outside of cell contents: sheet title, image alt attribute.

* Apply Same Change in Second Location

Forgot to change loadFromString.

* Additional Test

Confirm escaped ampersand is handled correctly.
2022-07-16 22:08:44 -07:00
..
HtmlBorderTest.php Coverage Improvements (#2859) 2022-06-01 08:29:56 -07:00
HtmlHelper.php Reduce PHPStan error in tests 2021-04-12 11:10:23 +09:00
HtmlImageTest.php Html Reader Not Handling non-ASCII Data Correctly (#2943) 2022-07-16 22:08:44 -07:00
HtmlLoadStringTest.php Avoid Duplicate Titles When Reading Multiple HTML Files (#1829) 2021-02-27 15:10:04 +01:00
HtmlTagsTest.php Improve Coverage for HTML Reader 2020-06-25 22:42:38 -07:00
HtmlTest.php Coverage Improvements (#2859) 2022-06-01 08:29:56 -07:00
Issue2029Test.php Fix for Issue 2029 (Invalid Cell Coordinate A-1) (#2032) 2021-04-29 22:59:01 +02:00
Issue2810Test.php Html Reader Converting Cell Containing 0 to Null String (#2813) 2022-05-10 07:33:45 -07:00
Issue2942Test.php Html Reader Not Handling non-ASCII Data Correctly (#2943) 2022-07-16 22:08:44 -07:00