PhpSpreadsheet/tests/data/Reader/CSV
oleibman e768cb0f19
CSV - Guess Encoding, Handle Null-string Escape (#1717)
* CSV - Guess Encoding, Handle Null-string Escape

This is in response to issue #1647 (detect CSV character encoding).
First, my tests with mb_detect_encoding indicate that it doesn't work
well enough; regardless, users can always do that on their own
if they deem it useful.
Rolling my own is also troublesome, but I can at least:
a. Check for BOM (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE).
b. Do some heuristic tests for each of the above encodings.
c. Fallback to a user-specified encoding (default CP1252)
  if a and b don't yield result.
I think this is probably useful enough to include, and relatively
easy to expand if other potential encodings should be considered.

Starting with PHP7.4, fgetcsv allows specification of null string as
escape character in fgetcsv. This is a much better choice than the PHP
(and PhpSpreadsheet) default of backslash in that it handles the file
in the same manner as Excel does. There is one statement in Reader/CSV
which would be adversely affected if the caller so specified (building
a regular expression under the assumption that escape character is
a single character). Fix that statement appropriately and add tests.
2020-12-25 17:47:29 +01:00
..
backslash.csv Allow CSV escape character to be set 2018-05-23 10:31:41 +09:00
contains_html.csv Could not open CSV file containing HTML fragment 2018-06-25 11:12:27 +09:00
csv_without_extension Could not open CSV file containing HTML fragment 2018-06-25 11:12:27 +09:00
empty.csv Check for MIME type to know if CSV reader can read a file 2018-02-05 21:33:23 +09:00
enclosure.csv Better auto-detection of CSV separators 2017-12-28 12:25:37 +09:00
encoding.iso88591.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf8.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf8bom.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf16be.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf16le.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf32be.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
encoding.utf32le.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
escape.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
line_break_escaped_32le.csv CSV Sample File Was Miscoded (#1489) 2020-05-24 19:57:39 +09:00
line_break_in_enclosure.csv Fix CSV delimiter detection on line breaks 2018-10-21 18:23:55 +11:00
line_break_in_enclosure_with_escaped_quotes.csv CSV Sample File Was Miscoded (#1489) 2020-05-24 19:57:39 +09:00
no_delimiter.csv Csv reader avoid notice when the file is empty 2018-10-28 14:16:53 +11:00
premiere.utf8.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf8bom.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf16be.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf16bebom.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf16le.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf16lebom.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf32be.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf32bebom.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf32le.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.utf32lebom.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
premiere.win1252.csv CSV - Guess Encoding, Handle Null-string Escape (#1717) 2020-12-25 17:47:29 +01:00
semicolon_separated.csv Infer CSV delimiter if it hasn't been set explicitly 2017-04-20 17:02:03 +09:00
sep.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00
utf16be.line_break_in_enclosure.csv Improve Coverage for CSV (#1475) 2020-05-17 18:15:18 +09:00