* R1C1 Format and Internationalization, plus Relative Offsets
Fix#1704, albeit imperfectly. Excel's implementation of this feature makes it impossible to fix perfectly. I don't know why it was necessary to internationalize R1C1 in the first place - the benefits are so minimal,and the result is worksheets that break when opened in different locales. Ugh. I can't even find complete documentation about the format in different languages; I am using https://answers.microsoft.com/en-us/officeinsider/forum/all/indirect-function-is-broken-at-least-for-excel-in/1fcbcf20-a103-4172-abf1-2c0dfe848e60 as my definitive reference.
This fix concentrates on the original report, using the INDIRECT function; there may be other areas similarly affected. As with ambiguous date formats, PhpSpreadsheet will do a little better than Excel itself when reading spreadsheets with internationalized R1C1 by trying all possibilities before giving up. When it does give up, it will now return `#REF!`, as Excel does, rather than throwing an exception, which is certainly friendlier. Although read now works better, when writing it will use whatever the user specified, so spreadsheets breaking in the wrong locale will still happen.
There were some bugs that turned up as I added test cases, all of them concerning relative addressing in R1C1 format, e.g. `R[+1]C[-1]`. The regexp for validating the format allowed for minus signs, but not plus signs. Also, the relevant functions did not allow for passing the current cell address, which made relative addressing impossible. The code now allows these, and suitable test cases are added.
* Use Locale for Formats, but Not for XML
Implementing a suggestion from @MarkBaker to use the system locale for determining R1C1 format rather than looping through a set of regexes and accepting any that work. This is closer to how Excel itself operates. The assumption we are making is to use the first character of the translated ROW and COLUMN functions. This will not work for Russian or Bulgarian, where each starts with the same letter, but it appears that Russian, at least, still uses R1C1. So our algorithm will not use non-ASCII characters, nor characters where ROW and COLUMN start with the same letter, falling back to R/C in those cases. Turkish falls into that category. Czech uses an accented character for one of the functions, and I'm guessing to use the unaccented character in that case. Polish COLUMN function is NR.KOLUMNY, and I'm guessing to use K in that case.
The function that converts R1C1 references is also used by the XML reader *where the format is always R1C1*, not locale-based (confirmed by successfully opening in Excel an XML spreadsheet when my language is set to French). The conversion code now handles that distinction through the use of an extra parameter. Xml Reader Load Test is duplicated to confirm that spreadsheet is loaded properly whether the locale is English or French. (No, I did not add an INDIRECT function to the Xml spreadsheet.)
Tests CsvIssue2232Test and TranslationTest both changed locale without resetting it when done. That omission was exposed by the new code, and both are now corrected.
* OpenOffice and Gnumeric
OpenOffice and Gnumeric make it much easier to test with other languages - they can be handled with an environment variable. Sensibly, they require R and C as the characters for R1C1 notation regardless of the language. Change code to recognize this difference from Excel.
* Handle Output of ADDRESS Function
One other function has to deal with R1C1 format as a string. Unlike INDIRECT, which receives the string on input, ADDRESS generates the string on output. Ensure that the ADDRESS output is consistent with the INDIRECT input.
ADDRESS expects its 4th arg to be bool, but it can also accept int, and many examples on the net supply it as an int. This had not been handled properly, but is now corrected.
* More Structured Test
I earlier introduced a new test for relative R1C1 addressing. Rewrite it to be clearer.
* Add Row for This to Locale Spreadsheet
It took a while for me to figure out how it all works. I have added a new row (with English value `*RC`) to Translations.xlsx, in the "Lookup and Reference" section of sheet "Excel Functions". By starting the "function name" with an asterisk, it will not be confused with a "real" function (confirmed by a new test). This approach also gives us the flexibility to do something similar if another surprise case occurs in future; in particular, I think this is more flexible than adding this as another option on the "Excel Localisation" sheet. It also means that any errors or omissions in the list below will be handled as with any other translation problem, by updating the spreadsheet without needing to touch any code.
The spreadsheet has the following entries in the *RC row:
- first letter of ROW/COLUMN functions for da, de, es, fi, fr, hu, nl, nb, pt, pt_br, sv
- no value for locales where ROW/COLUMN functions start with same letter - bg, ru, tr
- no value for locales with a multi-part name for ROW and/or COLUMN - it, pl (I had not previously noted Italian as an exception)
- no value for locales where ROW and/or COLUMN starts with a non-ASCII character - cs (this would also apply to bg and ru which are already included under "same letter")
- it does nothing for locales which are defined on the "Excel Localisation" sheet but have no entries yet on the "Excel Functions" sheet (e.g. eu)
Note that all but the first bullet item will continue to use R/C, which leaves them no worse off than they were before this change.
Fix#2934. Null is passed to StringHelper::strtolower which expects string. Same problem appears to be applicable to HLOOKUP.
I noted the following problem in the code, but will document it here as well. Excel's results are not consistent when a non-numeric string is passed as the third parameter. For example, if cell Z1 contains `xyz`, Excel will return a REF error for function `VLOOKUP(whatever,whatever,Z1)`, but it returns a VALUE error for function `VLOOKUP(whatever,whatever,"xyz")`. I don't think PhpSpreadsheet can match both behaviors. For now, it will return VALUE for both, with similar results for other errors.
While studying the returned errors, I realized there is something that needs to be deprecated. `ExcelError::$errorCodes` is a public static array. This means that a user can change its value, which should not be allowed. It is replaced by a constant. Since the original is public, I think it needs to stay, but with a deprecation notice; users can reference and change it, but it will be unused in the rest of the code. I suppose this might be considered a break in functionality (that should not have been allowed in the first place).
* Keep Calculated String Results Below 32K
This is the result of an investigation into issue #2884 (see also PR #2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time.
Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet.
However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting.
I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.
* Implement Some Suggestions
... from @MarkBaker.
These changes have already been implemented twice, and been regressed twice. I'll try once more (with a different approach), then give up ...
As configured, Phpstan running under Php7 reports no errors. However, running under Php8, it reports 100 (!) errors. The vast majority of these are due to two reasons:
- renaming parameters in Php builtin functions in preparation for named parameters.
- using the new class GdImage rather than type resource as the argument type for many image-based functions.
Regardless of the cause, this will be a problem sooner or later. This PR is an attempt to get ahead of that problem. For source members, it mostly adds annotations or updates doc-blocks. Only 2 members have changes to executable code, and these are very minor - BitWise and Writer/Xlsx. For test members, all baseline errors are deleted and the code is fixed. Php7 and Php8 both report no errors with this configuration.
Note that this method is used when translating Excel functions between en and other locale languages, as well as when converting formulae between different spreadsheet formats (e.g. Ods to Excel)
Nor is this a perfect solution, as there may still be issues when function calls have array arguments that themselves contain function calls; but it's still better than the current logic
Implement Array-enabled for ERROR.TYPE() function
Extract ERROR.TYPE() function tests into separate test file
Extract error function tests into separate test files
And thus complete the implemented Information functions
* First steps toward array-enabling the information functions
Also includes moving unit tests out from Functions and into a separate, dedicated Information folder
* Resolve issue with IF(), branch pruning and calculation cache (ensure that we don't convert the if condition to a bool before we've tested to see if it evaluates to an error)
More refactoring
* Start work on array-enabling the Lookup and Reference functions
Requires a new method (`evaluateArrayArgumentsSubsetFrom()`) in the `ArrayEnabled` Trait to handle functions where the arguments that need special array handling are trailing rather than leading arguments
* Split Information functions into a dedicated class and namespace and categorise as Value or Error
* Refactor all error functions into the new ExcelError class
* Add unit test for erroneous translations from Russian to English, and a quick/dirty fix
* Additional translation unit tests with accented characters from Spanish, Bulgarian, Czech and Turkish
* Update Change Log
* Implementation of the SEQUENCE() Excel365 function
Note that the Calculation Engine does not yet support the Spill operator, or spilling functions
* Handle the use-case of step = 0; and tests for exception handling for invalid arguments
* Update Change Log
* Refinement for XIRR
Fix#2469. The algorithm used for XIRR is known not to converge in some cases, some of which are because the value is legitimately unsolvable; for others, using a different guess might help.
The algorithm uses continual guesses at a rate to hopefully converge on the solution. The code in Python package xirr (https://github.com/tarioch/xirr/) suggests a refinement when this rate falls below -1. Adopting this refinement solves the problem for the data in issue 2469 without any adverse effect on the existing tests. My thanks to @tarioch for that refinement.
The data from 2469 is, of course, added to the test cases. The user also mentions that an initial guess equal to the actual result doesn't converge either. A test is also added to confirm that that case now works.
The test cases are changed to run in the context of a spreadsheet rather than by direct calls to XIRR calculation routine. This revealed some data validation errors which are also cleaned up with this PR. This suggests that other financial tests might benefit from the same change; I will look into that.
* More Unit Tests
From https://github.com/RayDeCampo/java-xirr/blob/master/src/test/java/org/decampo/xirr/XirrTest.javahttps://github.com/tarioch/xirr/blob/master/tests/test_math.py
Note that there are some cases where the PHP tests do not converge, but the non-PHP tests do. I have confirmed in each of those cases that Excel does not converge, so the PhpSpreadsheet results are good, at least for now. The discrepancies are noted in comments in the test member.
Replace mock tests with real ones when possible. The original tests are all still present; they just take place in a more representative scenario.
After this, there will be 4 remaining uses of mocking. Of these, 3 are needed for scenarios which are otherwise hard to test - WebServiceTest, CellsTest, and SampleCoverageTest. For the other one, AutoFilterTest, I just can't figure out what it's trying to accomplish, so have left it alone.
This change is almost entirely restricted to tests. There is a one-line change in src. When the first argument passed to OFFSET is null or nullstring, the returned value is currently 0. However, according to the documentation for Excel, it should be `#VALUE!`. The code is changed accordingly.
* Handle a wildcard match that contains a forward slash in the pattern by adding / to the delimiter list of preg_quote
* Fix SUMIF doing a wildcard match on empty cells (NULL)
* Fix compare logic to return false when value is an empty string or NULL (Verified against LibreOffice SUMIF and MATCH handling of empty cells)
See issue #2123. HLOOKUP needs to do some conversions between column numbers and letters which it had not been doing.
HLOOKUP tests were performed using direct calls to the function in question rather than in the context of a spreadsheet. This contributed to keeping this error obscured even though there were, in theory, sufficient test cases. The tests are changed to perform in spreadsheet context. For the most part, the test cases are unchanged. One of the expected results was wrong; it has been changed, and a new case added to cover the case it was supposed to be testing.
After getting the HLOOKUP tests in order, it turned out that a test using literal arrays which had been succeeding now failed. The array constructed by the literals are considerably different than those constructed using spreadsheet cells; additional code was added to handle this situation.
* isFormula Referencing Sheet With Space in Title
See issue #2304. User sets a cell to `ISFORMULA(cell)`, where `cell` exists on a sheet whose title contains a space, and receives an error. Coordinates are not being passed correctly to Functions::isFormula; in particular, the sheet name is not enclosed in apostrophes, e.g. `Sheet Name!A1` rather than `'Sheet Name'!A1`. (Note that sheet name was not specified in Cell; PhpSpreadsheet adds it before calling isFormula.) Sheets without embedded spaces (or other non-word characters) are handled correctly with or without apostrophes, but spaces require surrounding apostrophes.
As part of this investigation, I determined that Excel handles defined names and cell ranges in ISFORMULA (subject to spills), and that PhpSpreadsheet does not. It is changed to handle them. In the absence of spill support, it will use only the first cell in the range.
Existing tests for ISFORMULA used mocking unneccesarily. They are moved to a separate test member, and mocking is no longer used.
* PhpUnit and Jpgraph
35_Char_render.php had previously been a problem only for PHP8+. It is now a problem for PHP7.4, and will therefore be skipped all the time.
Per suggestion from @MarkBaker.
WildcardMatch did not handle double tilde correctly. It has been changed to do so and its logic simplified (and commented).
Existing AutoFilter test covered this situation, but I added a test for MATCH as well.
* Improve Identification of Samples in Coverage Report
The Phpunit coverage report currently contains bullet items like `PhpOffice\PhpSpreadsheetTests\Helper\SampleTest\testSample with data set "49"`. This extremely simple change takes advantage of Phpunit's ability to accept an array with keys which are either strings or integers, by using the sample filenames as the array keys rather than sequential but otherwise meaningless integers (e.g. `49` in the earlier cited item). The bullet item will now read `PhpOffice\PhpSpreadsheetTests\Helper\SampleTest\testSample with data set "Basic/38_Clone_worksheet.php"`.
* Fix for Issue 2158 (AverageIf Calculation Problem)
Issue #2158 reports an error calculating AverageIf because a function returns null rather than a string. There turn out to be several components to this problem:
- The nominal fix to the problem is to add some null-to-nullstring coercion in DatabaseAbstract.
- This fixes the error, but does not necessarily lead to the correct result because buildQuery treats values of null and null-string identically, whereas Excel does not. So change that to treat null-string as any other string.
- But that doesn't lead to the correct result either. That's because Functions/ifCondition recognizes a null string, but then continues to (over-)process it until it returns the wrong result. Fix this problem in conjunction with the other two, and we finally get the correct result.
A new unit test is added for AVERAGEIF, and new test cases are added for SUMIF. In each case, there are complementary tests for conditions of null and null-string, and the results agree with Excel. There may or may not be value in adding new tests to other functions, and I will be glad to do so for any functions which you care to identify, but no existing tests broke as a result of these changes.
* PHP8.1 Deprecation Passing Null to String Function
For each of the files in this PR, one or more statements can pass a null to string functions like strlower. This is deprecated in PHP8.1, and, when deprecated messages are enabled, causes many tests to error out. In every case, use coercion to pass null string rather than null.
* TextData - Minor Changes, Test Coverage
Per agreement on a previous push, I looked into standardizing the initialization of the TextData functions (like Engineering and MathTrig), with particular regard for avoiding multiple later null coercions. This simplifies the code quite a bit. This PR also increases coverage to 100% for all TextData modules. All entries in Phpstan baseline for non-deprecated TEXTDATA functions are removed. There were some minor bugfixes.
Whereas Excel (and Gnumeric) treat booleans when supplied as strings as 'TRUE' or 'FALSE', ODS treats them as '1' or '0'. Unlike Excel, ODS generally does not allow bool for int arguments; it does, however, allow them for FIND and SEARCH. ODS allows boolean for into for SUBSTITUTE even though Excel doesn't. ODS allows bool for string for NUMBERVALUE and VALUE even though Excel doesn't. ODS accepts 0 as an argument for CHAR; Excel doesn't. Most of this seems like random decisions on the part of the developers; I've done my best to follow the products in each case. There is a new test member devoted to ODS tests.
Gnumeric has an anomaly vis-a-vis the others - if length is supplied to LEFT/MID/RIGHT as null, Gnumeric treats it as 0 rather than 1.
All tests now take place in the context of a spreadsheet ...
Except for RETURNSTRING, which is not the implementation of an Excel function, and is referred to in the rest of PhpSpreadsheet only in the unit tests for itself. It should probably be deprecated, but that is not part of this PR, just in case there is some reason for it that I couldn't discern.
I have tried to make the first line of each doc block identify the Excel function name rather than its name in PhpSpreadsheet. I think it makes things more comprehensible.
Some tests call Settings::setLocale, but there was no Settings::getLocale. At the end of the tests which do it, they invoke setLocale('EN-US'), which, in a practical sense, is sufficient. However, in theory it would be better for them to get the current locale before changing it, then changing it back to the original when the time came. I have added getLocale and made the appropriate testing change.
The CHAR function took an interesting turn. One can set the value of a cell to, say, CHAR(2), the ASCII/UTF-8 representation of a control character, which is not legal in certain contexts. The only Reader/Writer that could handle this without problems is Xls, which deals with binary data all the time. However, if you tried to write it to Xlsx, Excel would not be able to open the resulting file because of what it considers an illegal character. I changed the Xlsx writer to escape such characters when writing the value of a string function. I did not make any other changes to the Xlsx writer - it seems to me that setting a cell to CHAR(2) is legitimate, but setting it to say `"\x02"` seems less likely to be legitimate, so the latter will still fail (although `="\x02"` should work). The Xlsx reader already supports the escape mechanism that I added to the writer.
CHAR control character and Ods - not supported by either Reader or Writer. I did not attempt to add this now. There is lots still missing from ODS, and this item just can't be a high priority amongst all of those.
CHAR control character and Csv - it is supported by reader and writer if the file has a csv extension. However, trying to guess the mime type without an extension - the control character makes mime_get_type guess application/octet-stream, and PhpSpreadsheet therefore thinks that Csv can't read it.
CHAR control character and Html. Actual use of the control character in the file is subject to the same problems as Xml (i.e. Xlsx and Ods). It wasn't terribly difficult to get the Html Writer to change `"\x02"` to "``". I believe that this is technically legal; however, DOMDocument.loadHTML rejects it as an illegal entity, and I am not convinced that it is wrong to do so, so I haven't changed the Html writer.
* Scrutinizer
Correct 3 minor errors.
* Initia work on differentiating between empty arguments and null arguments passed to Excel functions
Previously we always passed a null value for an empty argument (i.e. where there was an argument separator in the function call without an argument.... PHP doesn't support empty arguments, so we needed to provide some value but then it wasn't possible to differentiate between a genuine null argument (either a literal null, or a null cell value) and the null that we were passing to represent an empty argument value.
This change evaluates empty arguments within the calculation engine, and instead of passing a null, it reads the signature of the required Excel function, and passes the default value for that argument; so now a null argument really does mean a null value argument.
* If the Excel function implementation doesn't accept any arguments; or once we reach a variadic argument, or try to pass more arguments than the method supports in its signature, then there's no point in checking for defaults, and to do so will lead to PHP errors, so break out of the default replacement loop
See issue #2116. Code for handling end of month (method couponFirstPeriodDate) needed a fix. Fixed it, confirmed it covered the reported issue with no regression problems. Then added some extra similar tests to all the callers of couponFirstPeriodDate, and ...
One new test, in COUPDAYSNC, does not agree with Excel. It also does not agree with LibreOffice. It does, however, agree with Gnumeric, and with my (hardly guaranteed) hand calculation of what the result should be. So, I'm going with it (and have added an appropriate comment to the test data). I'm glad to discuss the matter with anyone more familiar than I with how this is supposed to work - those 360-day years are killers.
* Additional unit tests for VLOOKUP() and HLOOKUP()
* Additional unit tests for CHOOSE()
* Unit tests for HYPERLINK() function
* Fix CHOOSE() test for spillage