PhpSpreadsheet

Commit Graph

Author	SHA1	Message	Date
oleibman	5de82981d8	Html Reader Not Handling non-ASCII Data Correctly (#2943 ) * Html Reader Not Handling non-ASCII Data Correctly Fix #2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than). * Better Implementation Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers. * Additional Tests Test non-ASCII outside of cell contents: sheet title, image alt attribute. * Apply Same Change in Second Location Forgot to change loadFromString. * Additional Test Confirm escaped ampersand is handled correctly.	2022-07-16 22:08:44 -07:00
oleibman	c936f1d9f8	Coverage Improvements (#2859 ) Mostly new tests, some code annotations, some minor code changes: - RichText clone logic is wrong - TextElement doesn't have object properties, doesn't need clone	2022-06-01 08:29:56 -07:00
oleibman	070bc68514	Html Reader Converting Cell Containing 0 to Null String (#2813 ) Fix #2810. Repairing some Phpstan diagnostics, used `?:` rather than `??` in a few places. 2 different Html modules are affected. Also, Ods Reader, but its problem is with sheet title rather than cell contents. And, as it turns out, Ods Reader was already not handling sheets with a title of `0` correctly - it made a truthy test before setting sheet title. That is now changed to truthy or numeric. Other readers are not susceptible to this problem. Tests are added.	2022-05-10 07:33:45 -07:00
Mark Baker	05466e99ce	Html import dimension conversions (#2152 ) Allows basic column width conversion when importing from Html that includes UoM... while not overly-sophisticated in converting units to MS Excel's column width units, it should allow import without errors Also provides a general conversion helper class, and allows column width getters/setters to specify a UoM for easier usage	2021-06-11 17:29:49 +02:00
oleibman	cc5c0205d5	Fix for Issue 2029 (Invalid Cell Coordinate A-1) (#2032 ) * Fix for Issue 2029 (Invalid Cell Coordinate A-1) Fix for #2021. When Html Reader encounters an embedded table, it tries to shift it up a row. It obviously should not attempt to shift it above row 1. @danmodini reported the problem, and suggests the correct solution. This PR implements that and adds a test case. Performing some additional testing, I found that Html Reader cannot handle inline column width or row height set in points rather than pixels (and HTML writer with useInlineCss generates these values in points). It also doesn't handle border style when the border width (which it ignores) is omitted. Fixed and added tests.	2021-04-29 22:59:01 +02:00
Adrien Crivelli	49f87de165	Reduce PHPStan error in tests	2021-04-12 11:10:23 +09:00
oleibman	cb23cca3ec	Avoid Duplicate Titles When Reading Multiple HTML Files (#1829 ) This issue arose while researching issue #1823. The issue was not a bug; it just required clarification to the author of how to use the software. But, while researching, I discovered that loading html into 2 sheets of a spreadsheet has a problem if the html title tag is the same for the 2 sheets. PhpSpreadsheet would be able to save the resulting file, but Excel would not be able to read it properly because of the duplicate title. The worksheet setTitle method allows for disambiguation is such a circumstance. The html reader passed a parameter indicating "don't disambiguate", but I can't see any harm in changing that to "disambiguate". An extremely simple fix, with tests to back it up.	2021-02-27 15:10:04 +01:00
Adrien Crivelli	6a41381c1d	PSR12 code style	2020-07-26 14:13:11 +09:00
Adrien Crivelli	4739f8b2e7	Merge branch 'readhtml'	2020-07-26 13:11:15 +09:00
Owen Leibman	752a0a5a6c	Scrutinizer Recommendations Two unneeded assignments in tests, one unused parameter in source code.	2020-06-25 23:11:30 -07:00
Owen Leibman	6080c4561d	Improve Coverage for HTML Reader Reader/Html is now covered except for 1 statement. There is some coverage of RichText when you know in advance that the html will expand into a single cell. It is a tougher nut, one that I have not yet cracked, to try to handle rich text while converting unkown html to multiple cells. The original author left this as a TODO, and so for now must I. It made sense to restructure some of the code. There are some changes. - Issue #1532 is fixed (links are now saved when using rowspan). - Colors can now be specified as html color name. To accomplish this, Helper/Html function colourNameLookup was changed from protected to public, and changed to static. - Superfluous empty lines were eliminated in a number of places, e.g. <ul><li>A</li><li>B</li><li>C</li></ul> had formerly caused a wrapped cell to be created with 2 empty lines followed by A, B, and C on separate lines; it will now just have the 3 A/B/C lines, which seems like a more sensible interpretation. - Img alt tag, which had been cast to float, is now used as a string. Private member "encoding" is not used. Functions getEncoding and setEncoding have therefore been marked deprecated. In fact, I was unable to get SecurityScanner to pass any html which is not UTF-8. There are possibly ways of getting around this (in Reader/Html - I have no intention of messing with Security Scanner), as can be seen in my companion pull request for Excel2003 Xml Reader. Doing this would be easier for ASCII-compatible character sets (like ISO-8859-1), than for non-compatible charsets (like UTF-16). I am not convinced that the effort is worth it, but am willing to investigate further. I added a number of tests, creating an Html directory, and moving HtmlTest to that directory.	2020-06-25 22:42:38 -07:00

11 Commits