diff --git a/CHANGELOG.md b/CHANGELOG.md index c2acd357..d68fc896 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,13 +9,23 @@ and this project adheres to [Semantic Versioning](https://semver.org). ### Added -- Implementation of the ISREF() information function +- Implementation of the ISREF() information function. +- Added support for reading "formatted" numeric values from Csv files; although default behaviour of reading these values as strings is preserved. + + (i.e a value of "12,345.67" will be read as numeric `1235.67`, not as a string `"12,345.67"`. + + This functionality is locale-aware, using the server's locale settings to identify the thousands and decimal separators. ### Changed - Gnumeric Reader now loads number formatting for cells. - Gnumeric Reader now correctly identifies selected worksheet. - Some Refactoring of the Ods Reader, moving all formula and address translation from Ods to Excel into a separate class to eliminate code duplication and ensure consistency. +- Make Boolean Conversion in Csv Reader locale-aware when using the String Value Binder. + + This is determined b the Calculation Engine locale setting. + + (i.e. `"Vrai"` wil be converted to a boolean `true` if the Locale is set to `fr`.) ### Deprecated @@ -27,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org). ### Fixed -- Fixed behaviour of XLSX font style vertical align settings +- Fixed behaviour of XLSX font style vertical align settings. - Resolved formula translations to handle separators (row and column) for array functions as well as for function argument separators; and cleanly handle nesting levels. Note that this method is used when translating Excel functions between en and other locale languages, as well as when converting formulae between different spreadsheet formats (e.g. Ods to Excel). diff --git a/docs/topics/accessing-cells.md b/docs/topics/accessing-cells.md index 346e5858..9d3fb1cb 100644 --- a/docs/topics/accessing-cells.md +++ b/docs/topics/accessing-cells.md @@ -37,9 +37,7 @@ $spreadsheet->getActiveSheet() ### Creating a new Cell If you make a call to `getCell()`, and the cell doesn't already exist, then -PhpSpreadsheet will (by default) create the cell for you. If you don't want -to create a new cell, then you can pass a second argument of false, and then -`getCell()` will return a null if the cell doesn't exist. +PhpSpreadsheet will create that cell for you. ### BEWARE: Cells assigned to variables as a Detached Reference @@ -532,7 +530,7 @@ types of entered data using a cell's `setValue()` method (the Optionally, the default behaviour of PhpSpreadsheet can be modified, allowing easier data entry. For example, a `\PhpOffice\PhpSpreadsheet\Cell\AdvancedValueBinder` class is available. -It automatically converts percentages, number in scientific format, and +It automatically converts percentages, numbers in scientific format, and dates entered as strings to the correct format, also setting the cell's style information. The following example demonstrates how to set the value binder in PhpSpreadsheet: @@ -577,7 +575,9 @@ $stringValueBinder->setNumericConversion(false) \PhpOffice\PhpSpreadsheet\Cell\Cell::setValueBinder( $stringValueBinder ); ``` -**Creating your own value binder is relatively straightforward.** When more specialised +### Creating your own value binder + +Creating your own value binder is relatively straightforward. When more specialised value binding is required, you can implement the `\PhpOffice\PhpSpreadsheet\Cell\IValueBinder` interface or extend the existing `\PhpOffice\PhpSpreadsheet\Cell\DefaultValueBinder` or diff --git a/docs/topics/reading-files.md b/docs/topics/reading-files.md index 6c30f266..38428166 100644 --- a/docs/topics/reading-files.md +++ b/docs/topics/reading-files.md @@ -560,6 +560,44 @@ Xlsx | NO | Xls | NO | Xml | NO | Ods | NO | SYLK | NO | Gnumeric | NO | CSV | YES | HTML | NO + +### Reading formatted Numbers from a CSV File + +Unfortunately, numbers in a CSV file may be formatted as strings. +If that number is a simple integer or float (with a decimal `.` separator) without any thousands separator, then it will be treated as a number. +However, if the value has a thousands separator (e.g. `12,345`), or a decimal separator that isn't a `.` (e.g. `123,45` for a European locale), then it will be loaded as a string with that formatting. +If you want the Csv Reader to convert that value to a numeric when it loads the file, the you need to tell it to do so. The `castFormattedNumberToNumeric()` lets you do this. + +(Assuming that our server is configured with German locale settings: otherwise it may be necessary to call `setlocale()` before loading the file.) +```php +$inputFileType = 'Csv'; +$inputFileName = './sampleData/example1.de.csv'; + +/** It may be necessary to call setlocale() first if this is not your default locale */ +// setlocale(LC_ALL, 'de_DE.UTF-8', 'deu_deu'); + +/** Create a new Reader of the type defined in $inputFileType **/ +$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType); +/** Enable loading numeric values formatted with German , decimal separator and . thousands separator **/ +$reader->castFormattedNumberToNumeric(true); + +/** Load the file to a Spreadsheet Object **/ +$spreadsheet = $reader->load($inputFileName); +``` +This will attempt to load those formatted numeric values as numbers, based on the server's locale settings. + +If you want to load those values as numbers, but also to retain the formatting as a number format mask, then you can pass a boolean `true` as a second argument to the `castFormattedNumberToNumeric()` method to tell the Reader to identify the format masking to use for that value. This option does have an arbitrary limit of 6 decimal places. + +If your Csv file includes other formats for numbers (currencies, scientific format, etc); then you should probably also use the Advanced Value Binder to handle these cases. + +Applies to: + +Reader | Y/N |Reader | Y/N |Reader | Y/N | +----------|:---:|--------|:---:|--------------|:---:| +Xlsx | NO | Xls | NO | Xml | NO | +Ods | NO | SYLK | NO | Gnumeric | NO | +CSV | YES | HTML | NO + ### A Brief Word about the Advanced Value Binder When loading data from a file that contains no formatting information, diff --git a/src/PhpSpreadsheet/Reader/Csv.php b/src/PhpSpreadsheet/Reader/Csv.php index e79f7942..e894e9a4 100644 --- a/src/PhpSpreadsheet/Reader/Csv.php +++ b/src/PhpSpreadsheet/Reader/Csv.php @@ -2,12 +2,14 @@ namespace PhpOffice\PhpSpreadsheet\Reader; +use PhpOffice\PhpSpreadsheet\Calculation\Calculation; use PhpOffice\PhpSpreadsheet\Cell\Cell; use PhpOffice\PhpSpreadsheet\Cell\Coordinate; use PhpOffice\PhpSpreadsheet\Reader\Csv\Delimiter; use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException; use PhpOffice\PhpSpreadsheet\Shared\StringHelper; use PhpOffice\PhpSpreadsheet\Spreadsheet; +use PhpOffice\PhpSpreadsheet\Style\NumberFormat; class Csv extends BaseReader { @@ -91,6 +93,16 @@ class Csv extends BaseReader */ private $testAutodetect = true; + /** + * @var bool + */ + protected $castFormattedNumberToNumeric = false; + + /** + * @var bool + */ + protected $preserveNumericFormatting = false; + /** * Create a new CSV Reader instance. */ @@ -294,6 +306,14 @@ class Csv extends BaseReader return $retVal; } + public function castFormattedNumberToNumeric( + bool $castFormattedNumberToNumeric, + bool $preserveNumericFormatting = false + ): void { + $this->castFormattedNumberToNumeric = $castFormattedNumberToNumeric; + $this->preserveNumericFormatting = $preserveNumericFormatting; + } + /** * Loads PhpSpreadsheet from file into PhpSpreadsheet instance. */ @@ -330,6 +350,7 @@ class Csv extends BaseReader $columnLetter = 'A'; foreach ($rowData as $rowDatum) { $this->convertBoolean($rowDatum, $preserveBooleanString); + $numberFormatMask = $this->convertFormattedNumber($rowDatum); if ($rowDatum !== '' && $this->readFilter->readCell($columnLetter, $currentRow)) { if ($this->contiguous) { if ($noOutputYet) { @@ -339,6 +360,10 @@ class Csv extends BaseReader } else { $outRow = $currentRow; } + // Set basic styling for the value (Note that this could be overloaded by styling in a value binder) + $sheet->getCell($columnLetter . $outRow)->getStyle() + ->getNumberFormat() + ->setFormatCode($numberFormatMask); // Set cell value $sheet->getCell($columnLetter . $outRow)->setValue($rowDatum); } @@ -365,9 +390,9 @@ class Csv extends BaseReader private function convertBoolean(&$rowDatum, bool $preserveBooleanString): void { if (is_string($rowDatum) && !$preserveBooleanString) { - if (strcasecmp('true', $rowDatum) === 0) { + if (strcasecmp(Calculation::getTRUE(), $rowDatum) === 0 || strcasecmp('true', $rowDatum) === 0) { $rowDatum = true; - } elseif (strcasecmp('false', $rowDatum) === 0) { + } elseif (strcasecmp(Calculation::getFALSE(), $rowDatum) === 0 || strcasecmp('false', $rowDatum) === 0) { $rowDatum = false; } } elseif ($rowDatum === null) { @@ -375,6 +400,39 @@ class Csv extends BaseReader } } + /** + * Convert numeric strings to int or float values. + * + * @param mixed $rowDatum + */ + private function convertFormattedNumber(&$rowDatum): string + { + $numberFormatMask = NumberFormat::FORMAT_GENERAL; + if ($this->castFormattedNumberToNumeric === true && is_string($rowDatum)) { + $numeric = str_replace( + [StringHelper::getThousandsSeparator(), StringHelper::getDecimalSeparator()], + ['', '.'], + $rowDatum + ); + + if (is_numeric($numeric)) { + $decimalPos = strpos($rowDatum, StringHelper::getDecimalSeparator()); + if ($this->preserveNumericFormatting === true) { + $numberFormatMask = (strpos($rowDatum, StringHelper::getThousandsSeparator()) !== false) + ? '#,##0' : '0'; + if ($decimalPos !== false) { + $decimals = strlen($rowDatum) - $decimalPos - 1; + $numberFormatMask .= '.' . str_repeat('0', min($decimals, 6)); + } + } + + $rowDatum = ($decimalPos !== false) ? (float) $numeric : (int) $numeric; + } + } + + return $numberFormatMask; + } + public function getDelimiter(): ?string { return $this->delimiter; diff --git a/tests/PhpSpreadsheetTests/Reader/Csv/CsvIssue2232Test.php b/tests/PhpSpreadsheetTests/Reader/Csv/CsvIssue2232Test.php index c463c271..f9321102 100644 --- a/tests/PhpSpreadsheetTests/Reader/Csv/CsvIssue2232Test.php +++ b/tests/PhpSpreadsheetTests/Reader/Csv/CsvIssue2232Test.php @@ -2,6 +2,7 @@ namespace PhpOffice\PhpSpreadsheetTests\Reader\Csv; +use PhpOffice\PhpSpreadsheet\Calculation\Calculation; use PhpOffice\PhpSpreadsheet\Cell\Cell; use PhpOffice\PhpSpreadsheet\Cell\IValueBinder; use PhpOffice\PhpSpreadsheet\Cell\StringValueBinder; @@ -31,7 +32,7 @@ class CsvIssue2232Test extends TestCase * @param mixed $b2Value * @param mixed $b3Value */ - public function testEncodings(bool $useStringBinder, ?bool $preserveBoolString, $b2Value, $b3Value): void + public function testBooleanConversions(bool $useStringBinder, ?bool $preserveBoolString, $b2Value, $b3Value): void { if ($useStringBinder) { $binder = new StringValueBinder(); @@ -60,4 +61,41 @@ class CsvIssue2232Test extends TestCase [true, true, 'FaLSe', 'tRUE'], ]; } + + /** + * @dataProvider providerIssue2232locale + * + * @param mixed $b4Value + * @param mixed $b5Value + */ + public function testBooleanConversionsLocaleAware(bool $useStringBinder, ?bool $preserveBoolString, $b4Value, $b5Value): void + { + if ($useStringBinder) { + $binder = new StringValueBinder(); + if (is_bool($preserveBoolString)) { + $binder->setBooleanConversion($preserveBoolString); + } + Cell::setValueBinder($binder); + } + + Calculation::getInstance()->setLocale('fr'); + + $reader = new Csv(); + $filename = 'tests/data/Reader/CSV/issue.2232.csv'; + $spreadsheet = $reader->load($filename); + $sheet = $spreadsheet->getActiveSheet(); + self::assertSame($b4Value, $sheet->getCell('B4')->getValue()); + self::assertSame($b5Value, $sheet->getCell('B5')->getValue()); + $spreadsheet->disconnectWorksheets(); + } + + public function providerIssue2232locale(): array + { + return [ + [true, true, 'Faux', 'Vrai'], + [true, true, 'Faux', 'Vrai'], + [false, false, false, true], + [false, false, false, true], + ]; + } } diff --git a/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatLocaleTest.php b/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatLocaleTest.php new file mode 100644 index 00000000..1ac093c4 --- /dev/null +++ b/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatLocaleTest.php @@ -0,0 +1,145 @@ +currentLocale = setlocale(LC_ALL, '0'); + + if (!setlocale(LC_ALL, 'de_DE.UTF-8', 'deu_deu')) { + $this->localeAdjusted = false; + + return; + } + + $this->localeAdjusted = true; + + $this->filename = 'tests/data/Reader/CSV/NumberFormatTest.de.csv'; + $this->csvReader = new Csv(); + } + + protected function tearDown(): void + { + if ($this->localeAdjusted && is_string($this->currentLocale)) { + setlocale(LC_ALL, $this->currentLocale); + } + } + + /** + * @dataProvider providerNumberFormatNoConversionTest + * + * @param mixed $expectedValue + */ + public function testNumberFormatNoConversion($expectedValue, string $expectedFormat, string $cellAddress): void + { + if (!$this->localeAdjusted) { + self::markTestSkipped('Unable to set locale for testing.'); + } + + $spreadsheet = $this->csvReader->load($this->filename); + $worksheet = $spreadsheet->getActiveSheet(); + + $cell = $worksheet->getCell($cellAddress); + + self::assertSame($expectedValue, $cell->getValue(), 'Expected value check'); + self::assertSame($expectedFormat, $cell->getFormattedValue(), 'Format mask check'); + } + + public function providerNumberFormatNoConversionTest(): array + { + return [ + [ + -123, + '-123', + 'A1', + ], + [ + '12.345,67', + '12.345,67', + 'C1', + ], + [ + '-1.234,567', + '-1.234,567', + 'A3', + ], + ]; + } + + /** + * @dataProvider providerNumberValueConversionTest + * + * @param mixed $expectedValue + */ + public function testNumberValueConversion($expectedValue, string $cellAddress): void + { + if (!$this->localeAdjusted) { + self::markTestSkipped('Unable to set locale for testing.'); + } + + $this->csvReader->castFormattedNumberToNumeric(true); + $spreadsheet = $this->csvReader->load($this->filename); + $worksheet = $spreadsheet->getActiveSheet(); + + $cell = $worksheet->getCell($cellAddress); + + self::assertSame(DataType::TYPE_NUMERIC, $cell->getDataType(), 'Datatype check'); + self::assertSame($expectedValue, $cell->getValue(), 'Expected value check'); + } + + public function providerNumberValueConversionTest(): array + { + return [ + 'A1' => [ + -123, + 'A1', + ], + 'B1' => [ + 1234, + 'B1', + ], + 'C1' => [ + 12345.67, + 'C1', + ], + 'A2' => [ + 123.4567, + 'A2', + ], + 'B2' => [ + 123.456789012, + 'B2', + ], + 'A3' => [ + -1234.567, + 'A3', + ], + ]; + } +} diff --git a/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatTest.php b/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatTest.php new file mode 100644 index 00000000..c4c59d01 --- /dev/null +++ b/tests/PhpSpreadsheetTests/Reader/Csv/CsvNumberFormatTest.php @@ -0,0 +1,173 @@ +filename = 'tests/data/Reader/CSV/NumberFormatTest.csv'; + $this->csvReader = new Csv(); + } + + /** + * @dataProvider providerNumberFormatNoConversionTest + * + * @param mixed $expectedValue + */ + public function testNumberFormatNoConversion($expectedValue, string $expectedFormat, string $cellAddress): void + { + $spreadsheet = $this->csvReader->load($this->filename); + $worksheet = $spreadsheet->getActiveSheet(); + + $cell = $worksheet->getCell($cellAddress); + + self::assertSame($expectedValue, $cell->getValue(), 'Expected value check'); + self::assertSame($expectedFormat, $cell->getFormattedValue(), 'Format mask check'); + } + + public function providerNumberFormatNoConversionTest(): array + { + return [ + [ + -123, + '-123', + 'A1', + ], + [ + '12,345.67', + '12,345.67', + 'C1', + ], + [ + '-1,234.567', + '-1,234.567', + 'A3', + ], + ]; + } + + /** + * @dataProvider providerNumberValueConversionTest + * + * @param mixed $expectedValue + */ + public function testNumberValueConversion($expectedValue, string $cellAddress): void + { + $this->csvReader->castFormattedNumberToNumeric(true); + $spreadsheet = $this->csvReader->load($this->filename); + $worksheet = $spreadsheet->getActiveSheet(); + + $cell = $worksheet->getCell($cellAddress); + + self::assertSame(DataType::TYPE_NUMERIC, $cell->getDataType(), 'Datatype check'); + self::assertSame($expectedValue, $cell->getValue(), 'Expected value check'); + } + + public function providerNumberValueConversionTest(): array + { + return [ + 'A1' => [ + -123, + 'A1', + ], + 'B1' => [ + 1234, + 'B1', + ], + 'C1' => [ + 12345.67, + 'C1', + ], + 'A2' => [ + 123.4567, + 'A2', + ], + 'B2' => [ + 123.456789012, + 'B2', + ], + 'A3' => [ + -1234.567, + 'A3', + ], + 'B3' => [ + 1234.567, + 'B3', + ], + ]; + } + + /** + * @dataProvider providerNumberFormatConversionTest + * + * @param mixed $expectedValue + */ + public function testNumberFormatConversion($expectedValue, string $expectedFormat, string $cellAddress): void + { + $this->csvReader->castFormattedNumberToNumeric(true, true); + $spreadsheet = $this->csvReader->load($this->filename); + $worksheet = $spreadsheet->getActiveSheet(); + + $cell = $worksheet->getCell($cellAddress); + + self::assertSame(DataType::TYPE_NUMERIC, $cell->getDataType(), 'Datatype check'); + self::assertSame($expectedValue, $cell->getValue(), 'Expected value check'); + self::assertSame($expectedFormat, $cell->getFormattedValue(), 'Format mask check'); + } + + public function providerNumberFormatConversionTest(): array + { + return [ + 'A1' => [ + -123, + '-123', + 'A1', + ], + 'B1' => [ + 1234, + '1,234', + 'B1', + ], + 'C1' => [ + 12345.67, + '12,345.67', + 'C1', + ], + 'A2' => [ + 123.4567, + '123.4567', + 'A2', + ], + 'B2' => [ + 123.456789012, + '123.456789', + 'B2', + ], + 'A3' => [ + -1234.567, + '-1,234.567', + 'A3', + ], + 'B3' => [ + 1234.567, + '1234.567', + 'B3', + ], + ]; + } +} diff --git a/tests/data/Reader/CSV/NumberFormatTest.csv b/tests/data/Reader/CSV/NumberFormatTest.csv new file mode 100644 index 00000000..d2ba90a4 --- /dev/null +++ b/tests/data/Reader/CSV/NumberFormatTest.csv @@ -0,0 +1,3 @@ +"-123","1,234","12,345.67" +"123.4567","123.456789012" +"-1,234.567",1234.567 diff --git a/tests/data/Reader/CSV/NumberFormatTest.de.csv b/tests/data/Reader/CSV/NumberFormatTest.de.csv new file mode 100644 index 00000000..47c28453 --- /dev/null +++ b/tests/data/Reader/CSV/NumberFormatTest.de.csv @@ -0,0 +1,3 @@ +"-123","1.234","12.345,67" +"123,4567","123,456789012" +"-1.234,567" diff --git a/tests/data/Reader/CSV/issue.2232.csv b/tests/data/Reader/CSV/issue.2232.csv index 626d0255..aa83ee0c 100644 --- a/tests/data/Reader/CSV/issue.2232.csv +++ b/tests/data/Reader/CSV/issue.2232.csv @@ -1,3 +1,5 @@ 1,2,3 a,FaLSe,b cc,tRUE,cc +dd,Faux,ee +ff,Vrai,gg