* R1C1 Format and Internationalization, plus Relative Offsets
Fix#1704, albeit imperfectly. Excel's implementation of this feature makes it impossible to fix perfectly. I don't know why it was necessary to internationalize R1C1 in the first place - the benefits are so minimal,and the result is worksheets that break when opened in different locales. Ugh. I can't even find complete documentation about the format in different languages; I am using https://answers.microsoft.com/en-us/officeinsider/forum/all/indirect-function-is-broken-at-least-for-excel-in/1fcbcf20-a103-4172-abf1-2c0dfe848e60 as my definitive reference.
This fix concentrates on the original report, using the INDIRECT function; there may be other areas similarly affected. As with ambiguous date formats, PhpSpreadsheet will do a little better than Excel itself when reading spreadsheets with internationalized R1C1 by trying all possibilities before giving up. When it does give up, it will now return `#REF!`, as Excel does, rather than throwing an exception, which is certainly friendlier. Although read now works better, when writing it will use whatever the user specified, so spreadsheets breaking in the wrong locale will still happen.
There were some bugs that turned up as I added test cases, all of them concerning relative addressing in R1C1 format, e.g. `R[+1]C[-1]`. The regexp for validating the format allowed for minus signs, but not plus signs. Also, the relevant functions did not allow for passing the current cell address, which made relative addressing impossible. The code now allows these, and suitable test cases are added.
* Use Locale for Formats, but Not for XML
Implementing a suggestion from @MarkBaker to use the system locale for determining R1C1 format rather than looping through a set of regexes and accepting any that work. This is closer to how Excel itself operates. The assumption we are making is to use the first character of the translated ROW and COLUMN functions. This will not work for Russian or Bulgarian, where each starts with the same letter, but it appears that Russian, at least, still uses R1C1. So our algorithm will not use non-ASCII characters, nor characters where ROW and COLUMN start with the same letter, falling back to R/C in those cases. Turkish falls into that category. Czech uses an accented character for one of the functions, and I'm guessing to use the unaccented character in that case. Polish COLUMN function is NR.KOLUMNY, and I'm guessing to use K in that case.
The function that converts R1C1 references is also used by the XML reader *where the format is always R1C1*, not locale-based (confirmed by successfully opening in Excel an XML spreadsheet when my language is set to French). The conversion code now handles that distinction through the use of an extra parameter. Xml Reader Load Test is duplicated to confirm that spreadsheet is loaded properly whether the locale is English or French. (No, I did not add an INDIRECT function to the Xml spreadsheet.)
Tests CsvIssue2232Test and TranslationTest both changed locale without resetting it when done. That omission was exposed by the new code, and both are now corrected.
* OpenOffice and Gnumeric
OpenOffice and Gnumeric make it much easier to test with other languages - they can be handled with an environment variable. Sensibly, they require R and C as the characters for R1C1 notation regardless of the language. Change code to recognize this difference from Excel.
* Handle Output of ADDRESS Function
One other function has to deal with R1C1 format as a string. Unlike INDIRECT, which receives the string on input, ADDRESS generates the string on output. Ensure that the ADDRESS output is consistent with the INDIRECT input.
ADDRESS expects its 4th arg to be bool, but it can also accept int, and many examples on the net supply it as an int. This had not been handled properly, but is now corrected.
* More Structured Test
I earlier introduced a new test for relative R1C1 addressing. Rewrite it to be clearer.
* Add Row for This to Locale Spreadsheet
It took a while for me to figure out how it all works. I have added a new row (with English value `*RC`) to Translations.xlsx, in the "Lookup and Reference" section of sheet "Excel Functions". By starting the "function name" with an asterisk, it will not be confused with a "real" function (confirmed by a new test). This approach also gives us the flexibility to do something similar if another surprise case occurs in future; in particular, I think this is more flexible than adding this as another option on the "Excel Localisation" sheet. It also means that any errors or omissions in the list below will be handled as with any other translation problem, by updating the spreadsheet without needing to touch any code.
The spreadsheet has the following entries in the *RC row:
- first letter of ROW/COLUMN functions for da, de, es, fi, fr, hu, nl, nb, pt, pt_br, sv
- no value for locales where ROW/COLUMN functions start with same letter - bg, ru, tr
- no value for locales with a multi-part name for ROW and/or COLUMN - it, pl (I had not previously noted Italian as an exception)
- no value for locales where ROW and/or COLUMN starts with a non-ASCII character - cs (this would also apply to bg and ru which are already included under "same letter")
- it does nothing for locales which are defined on the "Excel Localisation" sheet but have no entries yet on the "Excel Functions" sheet (e.g. eu)
Note that all but the first bullet item will continue to use R/C, which leaves them no worse off than they were before this change.
Fix#2934. Null is passed to StringHelper::strtolower which expects string. Same problem appears to be applicable to HLOOKUP.
I noted the following problem in the code, but will document it here as well. Excel's results are not consistent when a non-numeric string is passed as the third parameter. For example, if cell Z1 contains `xyz`, Excel will return a REF error for function `VLOOKUP(whatever,whatever,Z1)`, but it returns a VALUE error for function `VLOOKUP(whatever,whatever,"xyz")`. I don't think PhpSpreadsheet can match both behaviors. For now, it will return VALUE for both, with similar results for other errors.
While studying the returned errors, I realized there is something that needs to be deprecated. `ExcelError::$errorCodes` is a public static array. This means that a user can change its value, which should not be allowed. It is replaced by a constant. Since the original is public, I think it needs to stay, but with a deprecation notice; users can reference and change it, but it will be unused in the rest of the code. I suppose this might be considered a break in functionality (that should not have been allowed in the first place).
The code could stil do with some cleaning up, and better optimisation for memory usage; but all tests are passing... that's for full multi-level sorting (including direction), and allowing for correct sorting of sting/numeric datatypes.
* Initial work on implementing Array-enabled for the HLOOKUP() and VLOOKUP() functions
* In the MATCH() function, we should also use `evaluateArrayArgumentsIgnore()` because the lookupvalue and matchType arguments can be array arguments, but lookupArray is always a dataset matrix
* Start work on array-enabling the Lookup and Reference functions
Requires a new method (`evaluateArrayArgumentsSubsetFrom()`) in the `ArrayEnabled` Trait to handle functions where the arguments that need special array handling are trailing rather than leading arguments
* Start work on array-enabling the Lookup and Reference functions
Requires a new method (`evaluateArrayArgumentsSubsetFrom()`) in the `ArrayEnabled` Trait to handle functions where the arguments that need special array handling are trailing rather than leading arguments
* Split Information functions into a dedicated class and namespace and categorise as Value or Error
* Refactor all error functions into the new ExcelError class
Replace mock tests with real ones when possible. The original tests are all still present; they just take place in a more representative scenario.
After this, there will be 4 remaining uses of mocking. Of these, 3 are needed for scenarios which are otherwise hard to test - WebServiceTest, CellsTest, and SampleCoverageTest. For the other one, AutoFilterTest, I just can't figure out what it's trying to accomplish, so have left it alone.
This change is almost entirely restricted to tests. There is a one-line change in src. When the first argument passed to OFFSET is null or nullstring, the returned value is currently 0. However, according to the documentation for Excel, it should be `#VALUE!`. The code is changed accordingly.
See issue #2123. HLOOKUP needs to do some conversions between column numbers and letters which it had not been doing.
HLOOKUP tests were performed using direct calls to the function in question rather than in the context of a spreadsheet. This contributed to keeping this error obscured even though there were, in theory, sufficient test cases. The tests are changed to perform in spreadsheet context. For the most part, the test cases are unchanged. One of the expected results was wrong; it has been changed, and a new case added to cover the case it was supposed to be testing.
After getting the HLOOKUP tests in order, it turned out that a test using literal arrays which had been succeeding now failed. The array constructed by the literals are considerably different than those constructed using spreadsheet cells; additional code was added to handle this situation.
* Additional unit tests for VLOOKUP() and HLOOKUP()
* Additional unit tests for CHOOSE()
* Unit tests for HYPERLINK() function
* Fix CHOOSE() test for spillage
* Improved Support for INDIRECT, ROW, and COLUMN Functions
This should address issues #1913 and #1993. INDIRECT had heretofore not supported an optional parameter intended to support addresses in R1C1 format which was introduced with Excel 2010. It also had not supported the use of defined names as an argument. This PR is a replacement for #1995, which is currently in draft status and which I will close in a day or two.
The ROW and COLUMN functions also should support defined names. I have added that, and test cases, with the latest push. ROWS and COLUMNS already supported it correctly, but there had been no test cases. Because ROW and COLUMN can return arrays, and PhpSpreadsheet does not support dynamic arrays, I left the existing direct-call tests unchanged to demonstrate those capabilities.
The unit tests for INDIRECT had used mocking, and were sorely lacking (tested only error conditions). They have been replaced with normal, and hopefully adequate, tests. This includes testing globally defined names, as well as locally defined names, both in and out of scope.
The test case in 1913 was too complicated for me to add as a unit test. The main impediments to it are now removed, and its complex situation will, I hope, be corrected with this fix.
INDIRECT can also support a reference of the form Sheetname!localName when localName on its own would be out of scope. That functionality is added. It is also added, in theory, for ROW and COLUMN, however such a construction is rejected by the Calculation engine before passing control to ROW or COLUMN. It might be possible to change the engine to allow this, and I may want to look into that later, but it seems much too risky, and not nearly useful enough, to attempt to address that as part of this change.
Several unusual test cases (leading equals sign, not-quite-as-expected name definition in file, complex indirection involving concatenation and a dropdown list) were suggested by @MarkBaker and are included in this request.
* First step extracting INDIRECT() and OFFSET() to their own classes
* Start building unit tests for OFFSET() and INDEX()
* Named ranges should be handled by the Calculation Engine, not by the implementation of the Excel INDIRECT() function
* When calling the calculation engine to get the range of cells to return, INDIRECT() and OFFSET() should use the instance of the calculation engine for the current workbook to benefit from cached results in that range
There's a couple of minor bugfixes in here; but it's basically just refactoring of the INDIRECT() and OFFSET() Excel functions into their own classes - still needs a lot of work on unit testing; and there's a lot more that could be improved in the code itself (including handling of the a1 flag for R1C1 format in INDIRECT()
* Extract LookupRef\INDEX() into index() method of LookupRef\Matrix class
Additional tests
* Bugfix for returning a column using INDEX()
* Some improvements to ROW() and COLUMN()
* Simplify some of the INDEX() logic, eliminating redundant code
* Start refactoring the Lookup and Reference functions
- COLUMN(), COLUMNS(), ROW() and ROWS()
- LOOKUP(), VLOOKUP() and HLOOKUP()
- Refactor TRANSPOSE() and ADDRESS() functions into their own classes
* Additional unit tests
- LOOKUP()
- TRANSPOSE()
- ADDRESS()