PhpSpreadsheet/tests/data/Reader/HTML
oleibman 5de82981d8
Html Reader Not Handling non-ASCII Data Correctly (#2943)
* Html Reader Not Handling non-ASCII Data Correctly

Fix #2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).

* Better Implementation

Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers.

* Additional Tests

Test non-ASCII outside of cell contents: sheet title, image alt attribute.

* Apply Same Change in Second Location

Forgot to change loadFromString.

* Additional Test

Confirm escaped ampersand is handled correctly.
2022-07-16 22:08:44 -07:00
..
badhtml.html Improve Coverage for HTML Reader 2020-06-25 22:42:38 -07:00
csv_with_angle_bracket.csv Make HTML checks more strict 2016-11-16 22:21:30 +09:00
image.jpg [Feature] Html reader improvements (#884) 2019-02-16 23:11:16 +01:00
rowspan.html Best effort to support invalid colspan values in HTML reader 2019-07-27 23:31:23 -07:00
utf8chars.html Html Reader Not Handling non-ASCII Data Correctly (#2943) 2022-07-16 22:08:44 -07:00