PhpSpreadsheet

Commit Graph

Author	SHA1	Message	Date
oleibman	90422bf1d2	R1C1 Format and Internationalization, plus Relative Offsets (#3052 ) * R1C1 Format and Internationalization, plus Relative Offsets Fix #1704, albeit imperfectly. Excel's implementation of this feature makes it impossible to fix perfectly. I don't know why it was necessary to internationalize R1C1 in the first place - the benefits are so minimal,and the result is worksheets that break when opened in different locales. Ugh. I can't even find complete documentation about the format in different languages; I am using https://answers.microsoft.com/en-us/officeinsider/forum/all/indirect-function-is-broken-at-least-for-excel-in/1fcbcf20-a103-4172-abf1-2c0dfe848e60 as my definitive reference. This fix concentrates on the original report, using the INDIRECT function; there may be other areas similarly affected. As with ambiguous date formats, PhpSpreadsheet will do a little better than Excel itself when reading spreadsheets with internationalized R1C1 by trying all possibilities before giving up. When it does give up, it will now return `#REF!`, as Excel does, rather than throwing an exception, which is certainly friendlier. Although read now works better, when writing it will use whatever the user specified, so spreadsheets breaking in the wrong locale will still happen. There were some bugs that turned up as I added test cases, all of them concerning relative addressing in R1C1 format, e.g. `R[+1]C[-1]`. The regexp for validating the format allowed for minus signs, but not plus signs. Also, the relevant functions did not allow for passing the current cell address, which made relative addressing impossible. The code now allows these, and suitable test cases are added. * Use Locale for Formats, but Not for XML Implementing a suggestion from @MarkBaker to use the system locale for determining R1C1 format rather than looping through a set of regexes and accepting any that work. This is closer to how Excel itself operates. The assumption we are making is to use the first character of the translated ROW and COLUMN functions. This will not work for Russian or Bulgarian, where each starts with the same letter, but it appears that Russian, at least, still uses R1C1. So our algorithm will not use non-ASCII characters, nor characters where ROW and COLUMN start with the same letter, falling back to R/C in those cases. Turkish falls into that category. Czech uses an accented character for one of the functions, and I'm guessing to use the unaccented character in that case. Polish COLUMN function is NR.KOLUMNY, and I'm guessing to use K in that case. The function that converts R1C1 references is also used by the XML reader where the format is always R1C1, not locale-based (confirmed by successfully opening in Excel an XML spreadsheet when my language is set to French). The conversion code now handles that distinction through the use of an extra parameter. Xml Reader Load Test is duplicated to confirm that spreadsheet is loaded properly whether the locale is English or French. (No, I did not add an INDIRECT function to the Xml spreadsheet.) Tests CsvIssue2232Test and TranslationTest both changed locale without resetting it when done. That omission was exposed by the new code, and both are now corrected. * OpenOffice and Gnumeric OpenOffice and Gnumeric make it much easier to test with other languages - they can be handled with an environment variable. Sensibly, they require R and C as the characters for R1C1 notation regardless of the language. Change code to recognize this difference from Excel. * Handle Output of ADDRESS Function One other function has to deal with R1C1 format as a string. Unlike INDIRECT, which receives the string on input, ADDRESS generates the string on output. Ensure that the ADDRESS output is consistent with the INDIRECT input. ADDRESS expects its 4th arg to be bool, but it can also accept int, and many examples on the net supply it as an int. This had not been handled properly, but is now corrected. * More Structured Test I earlier introduced a new test for relative R1C1 addressing. Rewrite it to be clearer. * Add Row for This to Locale Spreadsheet It took a while for me to figure out how it all works. I have added a new row (with English value `RC`) to Translations.xlsx, in the "Lookup and Reference" section of sheet "Excel Functions". By starting the "function name" with an asterisk, it will not be confused with a "real" function (confirmed by a new test). This approach also gives us the flexibility to do something similar if another surprise case occurs in future; in particular, I think this is more flexible than adding this as another option on the "Excel Localisation" sheet. It also means that any errors or omissions in the list below will be handled as with any other translation problem, by updating the spreadsheet without needing to touch any code. The spreadsheet has the following entries in the RC row: - first letter of ROW/COLUMN functions for da, de, es, fi, fr, hu, nl, nb, pt, pt_br, sv - no value for locales where ROW/COLUMN functions start with same letter - bg, ru, tr - no value for locales with a multi-part name for ROW and/or COLUMN - it, pl (I had not previously noted Italian as an exception) - no value for locales where ROW and/or COLUMN starts with a non-ASCII character - cs (this would also apply to bg and ru which are already included under "same letter") - it does nothing for locales which are defined on the "Excel Localisation" sheet but have no entries yet on the "Excel Functions" sheet (e.g. eu) Note that all but the first bullet item will continue to use R/C, which leaves them no worse off than they were before this change.	2022-09-16 08:25:26 -07:00
MarkBaker	4089aede0a	Resolve default values when a null argument is passed for HLOOKUP(), VLOOKUP() and ADDRESS() functions	2021-05-27 12:02:38 +02:00
Mark Baker	70f372d88c	Start refactoring the Lookup and Reference functions (#1912 ) * Start refactoring the Lookup and Reference functions - COLUMN(), COLUMNS(), ROW() and ROWS() - LOOKUP(), VLOOKUP() and HLOOKUP() - Refactor TRANSPOSE() and ADDRESS() functions into their own classes * Additional unit tests - LOOKUP() - TRANSPOSE() - ADDRESS()	2021-03-10 21:18:33 +01:00

Author

SHA1

Message

Date

oleibman

90422bf1d2

R1C1 Format and Internationalization, plus Relative Offsets (#3052 )

* R1C1 Format and Internationalization, plus Relative Offsets

Fix #1704, albeit imperfectly. Excel's implementation of this feature makes it impossible to fix perfectly. I don't know why it was necessary to internationalize R1C1 in the first place - the benefits are so minimal,and the result is worksheets that break when opened in different locales. Ugh. I can't even find complete documentation about the format in different languages; I am using https://answers.microsoft.com/en-us/officeinsider/forum/all/indirect-function-is-broken-at-least-for-excel-in/1fcbcf20-a103-4172-abf1-2c0dfe848e60 as my definitive reference.

This fix concentrates on the original report, using the INDIRECT function; there may be other areas similarly affected. As with ambiguous date formats, PhpSpreadsheet will do a little better than Excel itself when reading spreadsheets with internationalized R1C1 by trying all possibilities before giving up. When it does give up, it will now return `#REF!`, as Excel does, rather than throwing an exception, which is certainly friendlier. Although read now works better, when writing it will use whatever the user specified, so spreadsheets breaking in the wrong locale will still happen.

There were some bugs that turned up as I added test cases, all of them concerning relative addressing in R1C1 format, e.g. `R[+1]C[-1]`. The regexp for validating the format allowed for minus signs, but not plus signs. Also, the relevant functions did not allow for passing the current cell address, which made relative addressing impossible. The code now allows these, and suitable test cases are added.

* Use Locale for Formats, but Not for XML

Implementing a suggestion from @MarkBaker to use the system locale for determining R1C1 format rather than looping through a set of regexes and accepting any that work. This is closer to how Excel itself operates. The assumption we are making is to use the first character of the translated ROW and COLUMN functions. This will not work for Russian or Bulgarian, where each starts with the same letter, but it appears that Russian, at least, still uses R1C1. So our algorithm will not use non-ASCII characters, nor characters where ROW and COLUMN start with the same letter, falling back to R/C in those cases. Turkish falls into that category. Czech uses an accented character for one of the functions, and I'm guessing to use the unaccented character in that case. Polish COLUMN function is NR.KOLUMNY, and I'm guessing to use K in that case.

The function that converts R1C1 references is also used by the XML reader *where the format is always R1C1*, not locale-based (confirmed by successfully opening in Excel an XML spreadsheet when my language is set to French). The conversion code now handles that distinction through the use of an extra parameter. Xml Reader Load Test is duplicated to confirm that spreadsheet is loaded properly whether the locale is English or French. (No, I did not add an INDIRECT function to the Xml spreadsheet.)

Tests CsvIssue2232Test and TranslationTest both changed locale without resetting it when done. That omission was exposed by the new code, and both are now corrected.

* OpenOffice and Gnumeric

OpenOffice and Gnumeric make it much easier to test with other languages - they can be handled with an environment variable. Sensibly, they require R and C as the characters for R1C1 notation regardless of the language. Change code to recognize this difference from Excel.

* Handle Output of ADDRESS Function

One other function has to deal with R1C1 format as a string. Unlike INDIRECT, which receives the string on input, ADDRESS generates the string on output. Ensure that the ADDRESS output is consistent with the INDIRECT input.

ADDRESS expects its 4th arg to be bool, but it can also accept int, and many examples on the net supply it as an int. This had not been handled properly, but is now corrected.

* More Structured Test

I earlier introduced a new test for relative R1C1 addressing. Rewrite it to be clearer.

* Add Row for This to Locale Spreadsheet

It took a while for me to figure out how it all works. I have added a new row (with English value `*RC`) to Translations.xlsx, in the "Lookup and Reference" section of sheet "Excel Functions". By starting the "function name" with an asterisk, it will not be confused with a "real" function (confirmed by a new test). This approach also gives us the flexibility to do something similar if another surprise case occurs in future; in particular, I think this is more flexible than adding this as another option on the "Excel Localisation" sheet. It also means that any errors or omissions in the list below will be handled as with any other translation problem, by updating the spreadsheet without needing to touch any code.

The spreadsheet has the following entries in the *RC row:
- first letter of ROW/COLUMN functions for da, de, es, fi, fr, hu, nl, nb, pt, pt_br, sv
- no value for locales where ROW/COLUMN functions start with same letter - bg, ru, tr
- no value for locales with a multi-part name for ROW and/or COLUMN - it, pl (I had not previously noted Italian as an exception)
- no value for locales where ROW and/or COLUMN starts with a non-ASCII character - cs (this would also apply to bg and ru which are already included under "same letter")
- it does nothing for locales which are defined on the "Excel Localisation" sheet but have no entries yet on the "Excel Functions" sheet (e.g. eu)

Note that all but the first bullet item will continue to use R/C, which leaves them no worse off than they were before this change.

2022-09-16 08:25:26 -07:00

MarkBaker

4089aede0a

Resolve default values when a null argument is passed for HLOOKUP(), VLOOKUP() and ADDRESS() functions

2021-05-27 12:02:38 +02:00

Mark Baker

70f372d88c

Start refactoring the Lookup and Reference functions (#1912 )

* Start refactoring the Lookup and Reference functions
 - COLUMN(), COLUMNS(), ROW() and ROWS()
 - LOOKUP(), VLOOKUP() and HLOOKUP()
 - Refactor TRANSPOSE() and ADDRESS() functions into their own classes

* Additional unit tests
 - LOOKUP()
 - TRANSPOSE()
 - ADDRESS()

2021-03-10 21:18:33 +01:00

3 Commits