Commit Graph

15 Commits

Author SHA1 Message Date
oleibman 252474c1bd
Scrutinizer Clean Up Tests (#3061)
* Scrutinizer Clean Up Tests

No source code involved.

* Scrutinizer Whack-a-mole

Fixed 17, added 10. Trying again.

* Simplify Some Tests

Eliminate some null assertions.

* Dead Code

Remove 2 statements.
2022-09-14 07:11:20 -07:00
oleibman c3f53854b6
Php/iconv Should Not Treat FFFE/FFFF as Valid (#2910)
Fix #2897. We have been relying on iconv/mb_convert_encoding to detect invalid UTF-8, but all techniques designed to validate UTF-8 seem to accept FFFE and FFFF. This PR explicitly converts those characters to FFFD (Unicode substitution character) before validating the rest of the string. It also substitutes one or more FFFD when it detects invalid UTF-8 character sequences.

A comment in the code being change stated that it doesn't handle surrogates. It is right not to do so. The only case where we should see surrogates is reading UTF-16. Additional tests are added to an existing test reading a UTF-16 Csv to demonstrate that surrogates are handled correctly, and that FFFE/FFFF are handled reasonably.
2022-07-02 08:53:39 -07:00
oleibman 567b9601b0
Allow Csv Reader to Store Null String in Spreadsheet (#2842)
Fix #2840 (and also #2839 but that's Q&A, not an issue). Csv Reader does not populate cells which contain null string. This PR provides an option for the reader to store null strings as it does with any other string.
2022-05-21 07:21:14 -07:00
oleibman 35530502d2
Allow Csv Reader to Treat String as Contents of File (#2792)
See #1285, which went stale and was closed, but recently received some positive feeback. It seems easy to implement, and the only other plaintext file format, Html, allows loading from a string. Those two reasons combined suggest that we should do it.
2022-05-07 08:24:53 -07:00
oleibman 68158c8120
Phpstan Differences from Php7 to Php8, Again (#2665)
These changes have already been implemented twice, and been regressed twice. I'll try once more (with a different approach), then give up ...

As configured, Phpstan running under Php7 reports no errors. However, running under Php8, it reports 100 (!) errors. The vast majority of these are due to two reasons:
- renaming parameters in Php builtin functions in preparation for named parameters.
- using the new class GdImage rather than type resource as the argument type for many image-based functions.

Regardless of the cause, this will be a problem sooner or later. This PR is an attempt to get ahead of that problem. For source members, it mostly adds annotations or updates doc-blocks. Only 2 members have changes to executable code, and these are very minor - BitWise and Writer/Xlsx. For test members, all baseline errors are deleted and the code is fixed. Php7 and Php8 both report no errors with this configuration.
2022-03-11 23:28:30 -08:00
MarkBaker aad91f1d25 Merge branch 'master' into CSV-Reader-Dataype-casting-improvements
# Conflicts:
#	src/PhpSpreadsheet/Reader/Csv.php
2022-03-02 12:43:19 +01:00
MarkBaker f3d5028518 Work on setting up locale-aware formatted number conversion for the Csv Reader
Unit tests for locale-aware boolean conversion for Csv Reader
2022-03-02 08:53:29 +01:00
oleibman f575d2b8b2
Add Ability to Suppress Mac Line Ending Check for CSV Reader (#2623)
With the deprecation of `auto_detect_line_endings` in Php8.1, there have been some tickets (issue #2609 and PR #2438). Although the deprecation message is suppressed, users with a homegrown error handler may still see it. I am not very concerned about that symptom, but I imagine that there will be more similar tickets in future. This PR adds a new property/method to Reader/CSV to allow the user to avoid the deprecated code, at the negligible cost of being unable to read a CSV with Mac line endings even on a Php version that could support it.
2022-03-01 02:01:37 -08:00
oleibman 07271c83aa
Rename Two Test Files (#2459)
* Rename Two Test Files

When I run unit tests only for Reader/Xlsx, phpunit is issuing a deprecation message because the names of 2 files have an extra dot in them and thus don't match the class name in the file. I do not see these warnings when I run the entire test suite.

* Remove Phpstan Annotations

It was a bit difficult to handle a cast from mixed to string.

* Fix Same Phpstan Problem in One Other Test

This is the only other test case that tries to cast mixed to string.
2021-12-25 09:05:54 -08:00
oleibman 2f1f3a19b8
Csv, Boolean, and StringValueBinder (#2374)
See the discussion in PR #2232 which came about 3 months after it was merged. It caused a problem in an unusual situation which did not come to light until the change was part of the new release version. The original PR changed PhpSpreadsheet's behavior to match Excel's for (not case sensitive) strings `TRUE` and `FALSE`. Excel treats the values as boolean, and now so does PhpSpreadsheet.

When StringValueBinder is used, this becomes a tricky situation. The user wants the original strings preserved, including the case of all the letters. This PR changes the behavior of CSV reader as follows:
- If StringValueBinder is not in effect, convert to boolean.
- If StringValueBinder (actually any binder with method getBooleanConversion) is in effect, and the result of getBooleanConversion is true (which is the default in StringValueBinder), leave the value coming out of Csv Reader as the unchanged string.
- Otherwise, convert to boolean.

This should mean that there are no regression problems with StringValueBinder, while allowing PhpSpreadsheet to continue to match Excel in the default situation. No new settings are required.
2021-11-12 00:04:08 -08:00
Adrien Crivelli 69f633420b
Merge branch 'master' into PHP8-Sane-Property-Names 2021-10-31 15:25:01 +09:00
oleibman cc14a48604
Permit CSV Delimiter to be Set to Null (#2288)
* Permit CSV Delimiter to be Set to Null

See issue #2287. A 1-character change. The delimiter variable is defined as nullable, and getDelimiter can return null; setDelimiter should follow suit.

* Scrutinizer Inanity

Are you sure the test always returns null?????
Yes, I'm sure, that's why it's part of the test.
Let's see if we can recode it and miss this "problem".
2021-09-15 12:40:03 -07:00
oleibman 0cd20f3099
Csv Handling of Booleans (and an 8.1 Deprecation) (#2232)
* Csv Handling of Booleans (and an 8.1 Deprecation)

PhpSpreadsheet writes boolean values to a Csv as null-string/1, and treats input values of 'true' and 'false' as if they were strings. On the other hand, Excel writes boolean values to a Csv as TRUE/FALSE, and case-insensitively treats a matching string as boolean on read. This PR changes PhpSpreadsheet to match Excel.

A side-effect of this change is that it fixes behavior incorrectly reported as a bug in PR #2048. That issue was closed, correctly, as user error. The user had altered Csv Writer, including adding ```declare(strict_types=1);```; that declaration was the cause of the error. The "offending" statements, calls to strpbrk and str_replace, will now work correctly whether or not strict_types is in use.

And, just as I was getting ready to push this, the dailies for PHP 8.1 introduced a change deprecating auto_detect_line_endings. Csv Reader uses that setting; it allows it to process a Csv with Mac line endings, which happens to be something that Excel can do. As they say in https://wiki.php.net/rfc/deprecations_php_8_1, where the proposal passed without a single dissenting vote, "These newlines were used by “Classic” Mac OS, a system which has been discontinued in 2001, nearly two decades ago. Interoperability with such systems is no longer relevant." I tend to agree, but I don't know that we're ready to pull the plug yet. I don't see an easy way to emulate that functionality. For now, I have silenced the deprecation notices with at signs. I have also added a test case which will fail when support for that setting is pulled; this will give time to consider alternatives.

* Scrutinizer: Handling ini_set

This could be interesting. It doesn't like not handling an error condition for ini_set. Let's see if this satisfies it.
2021-08-04 07:00:17 -07:00
oleibman 294933c9e5
Update CsvContiguousTest.php 2021-05-16 11:48:12 -07:00
Owen Leibman ae80c12ef0 CSV Reader Enhancements
This PR came about as I pondered how feasible it was to change the default escape character from backslash to null string, since the latter emulates Excel's own actions. Also, surveying issues relating to CSV, it seems that people are often in a situation where the current defaults aren't optimal for them (e.g. they are in a region where semicolon rather than comma is a better default delimiter). My case and that case can both be handled by methods after a reader is constructed. However, the issues also show that many use `IOFactory::load` rather than `new Csv()`, and the methods to affect the defaults are not available in that case.

Adding a static callback that can be invoked by the constructor addresses all these problems. This can be set as part of the user application's normal initialization, and no special attention needs to be paid to CSV loads thereafter, no matter how they are invoked.

This also makes it feasible to use 'guess' as inputEncoding, by providing a new setFallbackEncoding (default CP1252) method to use if none of the heuristic tests pass. There was already the ability to guess the encoding before `$reader->load()`, but not before `IOFactory::load`.

Almost all typehints in Reader/Csv and Reader/Csv/Delimiter are now part of the function signature rather than in the DocBlock. The exceptions are one method in Delimiter which uses a `resource` parameter, and the `canRead` and `load` methods, which must match the signature in IOFactory. I will look into changing those later.

The Csv Reader tests are moved into their own directory. All Phpstan baseline entries involving Csv Reader are eliminated.
2021-05-16 06:05:02 -07:00