* Html Reader Not Handling non-ASCII Data Correctly
Fix#2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).
* Better Implementation
Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers.
* Additional Tests
Test non-ASCII outside of cell contents: sheet title, image alt attribute.
* Apply Same Change in Second Location
Forgot to change loadFromString.
* Additional Test
Confirm escaped ampersand is handled correctly.
This was supposed to be mopping up some longstanding chart issues. But one of the sample files exposed a memory leak in Xlsx Writer, unrelated to charts. Since that is my best sample file for this problem, I would like to fix both problems at the same time.
Xlsx Writer for Worksheets calls getRowDimension for all rows on the sheet. As it happens, the sample file had data in the last rows after a huge gap of rows without any data. It correctly did not write anything for the unused rows. However, the call to getRowDimension actually creates a new RowDimension object if it doesn't already exist, and so it wound up creating over a million totally unneeded objects. This caused it to run out of memory when I tried to make a copy of the 8K input file. The logic is changed to call getRowDimension if and only if (there is data in the row or the RowDimension object already exists). It still has to loop through a million rows, but it no longer allocates the unneeded storage.
As for the Chart problems - fix#1797. This is where the file that caused the memory leak originated. Many of its problems were already resolved by the earlier large set of changes to Charts. However, there were a few new properties that needed to be added to Layout to make things complete - numberFormat code and source-linked, and dLblPos (position for labels); and autoTitleDeleted needs to be added to Charts.
Also fix#2077, by allowing the format to be specified in the Layout rather than the DataSeriesValues constructor.
* Move Gridlines from Chart to Axis
This could, I hope, be my last major change to Chart for a while. When I first noticed this problem, I thought it would be a breaking change. However, although this change establishes some deprecations, I don't think it breaks anything. Major and minor gridlines had only been settable by the Chart constructor. This PR moves them where they belong, to Axis (eexisting Chart constructor code will still work). This allows them to be specified from both X and Y axis.
Chart is now entirely covered except for 2 statements, one deprecated and one that I just can't figure out. 99.71% for Charts, 88.96% overall. All references to the Chart directory in Phpstan baseline are eliminated.
* Minor Fixes, Unit Tests
Line style color type should default to null not prstClr.
Chart X-axis and Y-axis should alway be Axis, never null.
Add some unit tests.
* More Tests, Some Improvements
Make it easier to change line styles, adding an alternate method besides a setter function with at least a dozen parameters.
* Charts Additional Support for Layout and DataSeriesValues
The dLbls tag in more or less the Xml equivalent of the Layout class. It is currently read and written only for the Chart as a whole. It can, however, also be applied to DataSeriesValues. Further it has properties which are currently ignored, namely label fill, border, and font colors. All of these omissions are handled by this PR. There are other properties which can be applied to the labels, but, for now, only the 3 colors are added.
DataSeriesValues can have effects (like glow). Since DSV now descends from Properties, these are already supported, but support needs to be added to the Reader and Writer to handle them. This PR adds the support.
* Add Unit Tests
Based on new samples.
* Minor Improvements
Slight increase to coverage.
* Keep Calculated String Results Below 32K
This is the result of an investigation into issue #2884 (see also PR #2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time.
Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet.
However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting.
I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.
* Implement Some Suggestions
... from @MarkBaker.
Fix#2897. We have been relying on iconv/mb_convert_encoding to detect invalid UTF-8, but all techniques designed to validate UTF-8 seem to accept FFFE and FFFF. This PR explicitly converts those characters to FFFD (Unicode substitution character) before validating the rest of the string. It also substitutes one or more FFFD when it detects invalid UTF-8 character sequences.
A comment in the code being change stated that it doesn't handle surrogates. It is right not to do so. The only case where we should see surrogates is reading UTF-16. Additional tests are added to an existing test reading a UTF-16 Csv to demonstrate that surrogates are handled correctly, and that FFFE/FFFF are handled reasonably.
* Additional Support for Chart DataSeriesValues
Fix#2863. DataSeriesValues now extends Properties, allowing it to share code in common with Axis and Gridlines. This causes some minor breakages; in particular line width is now initialized to null instead of Excel's default value, and is specified in points, as the user would expect from Excel, rather than the value stored in Xml.
This change:
- adds support for 1 or 2 marker colors.
- adds support for `smoothLine` to DataSeriesValues.
- will determine `catAx` or `valAx` for Axis based on what is read from the Xml when available, rather than guessing based on format. (Another minor break.)
- reads `formatCode` and `sourceLinked` for Axis.
- correct 2 uses of `$plotSeriesRef` to `$plotSeriesIndex` in Writer/Xlsx/Chart.
- pushes coverage over 90% for Chart (88.70% overall).
* Update Change Log
I had updated previously but forgot to stage the member.
* Adopt Some Suggestions
Incorporate some changes suggested by @bridgeplayr.
* Use ChartColor for DSV Fill And Font Text
DataSeriesValues Fill could be a scalar or an array, so I saved it till last.
* Some Final Cleanup
No code changes.
Illustrate even more of the new features in existing sample files.
Deprecate *_ARGB in Properties/ChartColors in favor of *_RGB, because it uses only 6 hex digits. The alpha value is stored separately.
Fix#2908. When support for two-cell anchors was added for drawings, we neglected to adjust the second cell address when rows or columns are added or deleted. It also appears that "twoCell" and "oneCell" were introduced as lower-case literals when support for the editAs attribute was subsequently introduced.
* Handling of #REF! Errors in Subtotal, and More
This PR derives from, and supersedes, PR #2870, submitted by @ndench. The problem reported in the original is that SUBTOTAL does not handle #REF! errors in its arguments properly; however, my investigation has enlarged the scope.
The main problem is in Calculation, and it has a simple fix. When the calculation engine finds a reference to an uninitialized cell, it uses `null` as the value. This is appropriate when the cell belongs to a defined sheet; however, for an undefined sheet, #REF! is more appropriate.
With that fix in place, SUBTOTAL still needs a small fix of its own. It tries to parse its cell reference arguments into an array, but, if the reference does not match the expected format (as #REF! will not), this results in referencing undefined array indexes, with attendant messages. That assignment is changed to be more flexible, eliminating the problem and the messages.
Those 2 fixes are sufficient to ensure that the original problem is resolved. It also resolves a similar problem with some other functions (e.g. SUM). However, it does not resolve it for all functions. Or, to be more particular, many functions will return #VALUE! rather than #REF! if this arises, and the same is true for other errors in the function arguments, e.g. #DIV/0!. This PR does not attempt to address all functions; I need to think of a systematic way to pursue that. However, at least for most MathTrig functions, which validate their arguments using a common method, it is relatively easy to get the function to propagate the proper error result.
* Arrange Array The Way call_user_func_array Wants
Problem with Php8.0+ - array passed to call_user_func_array must have int keys before string keys, otherwise Php thinks we are passing positional parameters after keyword parameters.
7 other functions use flattenArrayIndexed, but Subtotal is the only one which uses that result to subsequently pass arguments to call_user_func_array. So the others should not require a change. A specific test is added for SUM to validate that conclusion.
* Change Needed for Hidden Row Filter
Same as change made to Formula Args filter.
This one class consumes a lot of space in Phpstan baseline. The problem is that it is an interface to Jpgraph, which is not maintained in Composer. This means that we have to disable tests involving this module, since we are dealing with very old code in our test suite. This means that we are very unlikely to do any work on this member, so the code error reports are more of a distraction than anything else. Remove them for now, restoring them if we ever solve this problem.
* New Class ChartColor and Refactoring
Chart colors are written to Xml in a different manner than font colors, and there are several variations. It will simplify things to create a new class for them. This PR will make use of the new class in Property/Axis/Gridline glow, shadow, and line colors; in Axis fill color; and in Font underline color (used only in charts). It will be used elsewhere in future; in particular, DataSeriesValues, which I will tackle next, will use it for at least one existing and two new properties. This PR is a refactoring; no functionality is added. Some public functions are moved from Properties to ChartColor, but all of these have been introduced after the last release 1.23, so there isn't really any compatibility break. No tests needed to be revised as a result of the source changes.
* Simplify Logic in Xlsx/Writer/Chart
Minor change.
Unit testing now results in 100% coverage for Axis and Properties. All the properties in methods in Gridlines were more or less duplicated in Axis, and these duplications are moved to the common ancestor Properties. So, there isn't anything left in Gridlines. PhpSpreadsheet Chart is now over 85% covered (it was below 35% until recently).
Properties are in many cases set to default to null/null-string, rather than the default values they receive from Excel, and are not written to Xml if unchanged. This is consistent with how Excel behaves. A new property `crossBetween` is added to Axis, and, with support for that added to Xlsx Reader and Writer, some minor Sample peculiarities are corrected, in particular, the charts were sometimes slightly truncated on the left and right edges.
* Php8.2 Deprecation in Reader/Xlsx
Using `${var}` will be deprecated, with the suggested resolution being to use `{$var}`. This appears to be the only place in PhpSpreadsheet which does this. Some vendor packages will need to change for 8.2 for this and other reasons.
* mb_convert_encoding and HTML_ENTITIES
Also scheduled to be deprecated with 8.2. It appears to have not been needed in PhpSpreadsheet in the first place.
* Ignore square-$-brackets prefix in format string
* Test for square-$-brackets prefix in format string issue fixed
* Fix for phpstan compliance
* Additional assert for checking number format of tested source cell
* Expand Chart Support for schemeClr and prstClr
Fix#2219. Address some, but not all, issues mentioned in #2863.
For Pie Charts, fill color is stored in XML as part of dPt node, which had been ignored by Reader/Xlsx/Chart. Add support for it, including when specified as schemeClr or prstClr rather than srgbClr. Add support for prstClr in other cases where schemeClr is supported.
* Update Change Log
Add this PR.
File from https://www.rondebruin.nl/win/s2/win003.htm. I have been in conversation with the author, who has no objection to its use. I have not actually opened the file in Excel (at least not with macros enabled); I am using it merely to demonstrate that the ribbon data is read and written correctly. Test added; no source code changed. This should slightly increase coverage for Reader/Xlsx (moderate), Writer/Xlsx (slight), and Spreadsheet (substantial). Note that this file has no Ribbon Bin objects, so some coverage is still lacking.