Shared/Font is hardly covered in unit tests (as opposed to Style/Font which is completely covered). And it presented some good opportunities for code optimization. I wrote and tested the new unit tests first, then optimized the code and confirmed that everything still works.
There is still a bit of a gap with "exact" measurements. I had tests ready, but had to withdraw them when I discovered they weren't quite portable (see https://github.com/php/php-src/issues/9073).
* Fix Some Pdf Problems
Fix#1747. No support for text rotation in Pdf. That issue actually has a decent workaround, but PhpSpreadsheet should handle it on its own. Mpdf requires the proprietary text-rotate css attribute; Html and Dompdf will use the CSS3 attribute transform:rotate.
Fix#1713. Some paper-size values in PhpSpreadsheet are strings, some are 2-element float arrays. Dompdf accepts strings or 4-element float arrays, where the first 2 elements are always 0. Convert the PhpSpreadsheet array accordingly before passing it to Dompdf.
Some tests had been disabled when Dompdf and Tcpdf were slow to achieve PHP8 compliance. They achieved it some time ago. Re-enable the tests.
* Remove Tcpdf From One Test
No problem with the other tests I added it in for.
* Fix Spreadsheet Copy, Disable Clone, Improve Coverage
This PR was supposed to be merely to increase coverage in Spreadsheet. However, in doing so, I discovered that neither clone nor copy worked correctly. Neither had been covered in the test suite. Copy not only did not work, it broke the source spreadsheet as well. I tried to debug and got nowhere; I even tried using myclabs/deep-copy which is already in use in the test suite, but it failed as well. However, write and reload ought to work just fine for copy. It can't be used for clone; however, since copy does what clone ought to do, there's no reason why clone needs to be used, so __clone is changed to throw an exception if attempted.
One other source change was needed, an obvious bug where an if condition uses 'or' when it should use 'and'. Also, one docblock declaration needed a change. Aside from that, the rest of this PR is test cases, and overall coverage passes 89% for the first time.
* Clone is Okay After All
But copy wasn't, changing it to just return clone. Perhaps save and reload will be needed instead at some point, but not yet.
* An Error I Cannot Reproduce
PHP8.1 unit test says error because GdImage can't be serialized. I can't reproduce this error on any of my test systems. I have no idea why GdImage is even involved. Using try/catch to see if it helps.
* Weird Failures in Github
I thought restoring clone was a good idea. That left me in a state where, after one change, copy/clone no longer worked on Github (unable to reproduce on any of my test systems). After a second change, copy worked but clone didn't, again unable to reproduce. So, reverting to original version - copy does save and reload, clone throws exception.
Fix#2257. Fix#2929. Fix#2935 (probably in a way that will not satisfy the requester). 2257 and 2929 requested changes that ultimately affect the same section of code, so it's appropriate to deal with them together. 2257 requests the ability to make the chart background transparent (so that the Excel gridlines are visible beneath the chart), and the ability to hide an Axis. 2929 requests the ability to set a gradient background on the chart.
* Big Memory Leak in One Test
Many tests leak a little by not issuing disconnectWorksheets at the end. CellMatcherTest leaks a lot by opening a spreadsheet many times. Phpunit reported a high watermark of 390MB; fixing CellMatcherTest brings that figure down to 242MB. I have also changed it to use a formal assertion in many cases where markTestSkipped was (IMO inappropriately) used.
* Another Leak
Issue2301Test lacks a disconnect. Adding one reduces HWM from 242MB to 224.
Fix#2705. Add to Axis class, Reader Xlsx Chart, and Writer Xlsx Chart. Add feature to an existing 32* sample, to an existing 33* sample, and a formal unit test.
Fix#2934. Null is passed to StringHelper::strtolower which expects string. Same problem appears to be applicable to HLOOKUP.
I noted the following problem in the code, but will document it here as well. Excel's results are not consistent when a non-numeric string is passed as the third parameter. For example, if cell Z1 contains `xyz`, Excel will return a REF error for function `VLOOKUP(whatever,whatever,Z1)`, but it returns a VALUE error for function `VLOOKUP(whatever,whatever,"xyz")`. I don't think PhpSpreadsheet can match both behaviors. For now, it will return VALUE for both, with similar results for other errors.
While studying the returned errors, I realized there is something that needs to be deprecated. `ExcelError::$errorCodes` is a public static array. This means that a user can change its value, which should not be allowed. It is replaced by a constant. Since the original is public, I think it needs to stay, but with a deprecation notice; users can reference and change it, but it will be unused in the rest of the code. I suppose this might be considered a break in functionality (that should not have been allowed in the first place).
Fix#2931. If surface charts are written with c:grouping tag, Excel will treat them as corrupt. Also, some 3D-ish Xml tags are required when 2D surface charts are written, else they will be treated as 3D. Also eliminate a duplicate line identified by #2932.
* Html Reader Not Handling non-ASCII Data Correctly
Fix#2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).
* Better Implementation
Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers.
* Additional Tests
Test non-ASCII outside of cell contents: sheet title, image alt attribute.
* Apply Same Change in Second Location
Forgot to change loadFromString.
* Additional Test
Confirm escaped ampersand is handled correctly.
This was supposed to be mopping up some longstanding chart issues. But one of the sample files exposed a memory leak in Xlsx Writer, unrelated to charts. Since that is my best sample file for this problem, I would like to fix both problems at the same time.
Xlsx Writer for Worksheets calls getRowDimension for all rows on the sheet. As it happens, the sample file had data in the last rows after a huge gap of rows without any data. It correctly did not write anything for the unused rows. However, the call to getRowDimension actually creates a new RowDimension object if it doesn't already exist, and so it wound up creating over a million totally unneeded objects. This caused it to run out of memory when I tried to make a copy of the 8K input file. The logic is changed to call getRowDimension if and only if (there is data in the row or the RowDimension object already exists). It still has to loop through a million rows, but it no longer allocates the unneeded storage.
As for the Chart problems - fix#1797. This is where the file that caused the memory leak originated. Many of its problems were already resolved by the earlier large set of changes to Charts. However, there were a few new properties that needed to be added to Layout to make things complete - numberFormat code and source-linked, and dLblPos (position for labels); and autoTitleDeleted needs to be added to Charts.
Also fix#2077, by allowing the format to be specified in the Layout rather than the DataSeriesValues constructor.
* Move Gridlines from Chart to Axis
This could, I hope, be my last major change to Chart for a while. When I first noticed this problem, I thought it would be a breaking change. However, although this change establishes some deprecations, I don't think it breaks anything. Major and minor gridlines had only been settable by the Chart constructor. This PR moves them where they belong, to Axis (eexisting Chart constructor code will still work). This allows them to be specified from both X and Y axis.
Chart is now entirely covered except for 2 statements, one deprecated and one that I just can't figure out. 99.71% for Charts, 88.96% overall. All references to the Chart directory in Phpstan baseline are eliminated.
* Minor Fixes, Unit Tests
Line style color type should default to null not prstClr.
Chart X-axis and Y-axis should alway be Axis, never null.
Add some unit tests.
* More Tests, Some Improvements
Make it easier to change line styles, adding an alternate method besides a setter function with at least a dozen parameters.
* Charts Additional Support for Layout and DataSeriesValues
The dLbls tag in more or less the Xml equivalent of the Layout class. It is currently read and written only for the Chart as a whole. It can, however, also be applied to DataSeriesValues. Further it has properties which are currently ignored, namely label fill, border, and font colors. All of these omissions are handled by this PR. There are other properties which can be applied to the labels, but, for now, only the 3 colors are added.
DataSeriesValues can have effects (like glow). Since DSV now descends from Properties, these are already supported, but support needs to be added to the Reader and Writer to handle them. This PR adds the support.
* Add Unit Tests
Based on new samples.
* Minor Improvements
Slight increase to coverage.
* Keep Calculated String Results Below 32K
This is the result of an investigation into issue #2884 (see also PR #2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time.
Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet.
However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting.
I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.
* Implement Some Suggestions
... from @MarkBaker.
Fix#2897. We have been relying on iconv/mb_convert_encoding to detect invalid UTF-8, but all techniques designed to validate UTF-8 seem to accept FFFE and FFFF. This PR explicitly converts those characters to FFFD (Unicode substitution character) before validating the rest of the string. It also substitutes one or more FFFD when it detects invalid UTF-8 character sequences.
A comment in the code being change stated that it doesn't handle surrogates. It is right not to do so. The only case where we should see surrogates is reading UTF-16. Additional tests are added to an existing test reading a UTF-16 Csv to demonstrate that surrogates are handled correctly, and that FFFE/FFFF are handled reasonably.