Commit Graph

40 Commits

Author SHA1 Message Date
oleibman a2b2984104
Document Charset Restriction for Html/Xml Reader (#3068)
Fix #1681, although probably not to the originator's satisfaction. Html and Xml readers will handle documents only with a charset of UTF-8. This PR documents that restriction. No change to source code; see the original issue for explanation.
2022-09-19 06:49:01 -07:00
oleibman 567b9601b0
Allow Csv Reader to Store Null String in Spreadsheet (#2842)
Fix #2840 (and also #2839 but that's Q&A, not an issue). Csv Reader does not populate cells which contain null string. This PR provides an option for the reader to store null strings as it does with any other string.
2022-05-21 07:21:14 -07:00
oleibman 35530502d2
Allow Csv Reader to Treat String as Contents of File (#2792)
See #1285, which went stale and was closed, but recently received some positive feeback. It seems easy to implement, and the only other plaintext file format, Html, allows loading from a string. Those two reasons combined suggest that we should do it.
2022-05-07 08:24:53 -07:00
oleibman f575d2b8b2
Add Ability to Suppress Mac Line Ending Check for CSV Reader (#2623)
With the deprecation of `auto_detect_line_endings` in Php8.1, there have been some tickets (issue #2609 and PR #2438). Although the deprecation message is suppressed, users with a homegrown error handler may still see it. I am not very concerned about that symptom, but I imagine that there will be more similar tickets in future. This PR adds a new property/method to Reader/CSV to allow the user to avoid the deprecated code, at the negligible cost of being unable to read a CSV with Mac line endings even on a Php version that could support it.
2022-03-01 02:01:37 -08:00
JRK d0c9ea340e
Fix documentation, instantiation example (#2564) 2022-02-07 17:37:07 +01:00
oleibman 580c741c8e
Improve PDF Support for Page Size and Orientation (#2410)
Fix #1691. PhpSpreadsheet allows the setting of different page size and orientation on each worksheet. It also allows the setting of page size and orientation on the PDF writer. It isn't clear which is supposed to prevail when the two are in conflict. In the cited issue, the user expects the PDF writer setting to prevail, and I tend to agree. Code is changed to do this, and handling things in this manner is now explicitly documented.

PhpSpreadsheet uses a default paper size of Letter, and a default orientation of Default (which Excel treats as Portrait). New static routines are added to change the default for sheets created subsequent to such calls. This could allow users to configure these defaults better for their environments. The new functions are added to the documentation.
2021-12-01 23:00:02 -08:00
Adrien Crivelli 69f633420b
Merge branch 'master' into PHP8-Sane-Property-Names 2021-10-31 15:25:01 +09:00
Mark Baker 19724e3217
Reader writer flags (#2136)
* Use of passing flags with Readers to identify whether speacial features such as loading charts should be enabled; no need to instantiate a reader and manually enable it before loading any more.

This is in preparation for supporting new "boolean" Reaer/Writer features, such as pivot tables

* Use of passing flags with Writers to identify whether speacial features such as loading charts should be enabled; no need to instantiate a writer and manually enable it before loading any more.

* Update documentation with details of changes to the StringValueBinder
2021-06-04 13:45:32 +02:00
Mark Baker 781b2470f6
Documentation updates (#2131)
* Update notes in documentation for memory sizing on 32-bit and 64-bit PHP versions

* Additional notes on the fact that PHPSpreadsheet does not change cell addresses when loading a spreadsheet using a Read Filter
2021-05-30 12:27:56 +02:00
oleibman d21d943d99
Merge branch 'master' into csvdflts 2021-05-29 23:38:53 -07:00
Athena Metis e400b35122 Update reading-and-writing-to-file.md
Added a note about formulas still being calculated where column autosizing is turned on, even if pre-calculation is set to false. This is true at least for the Xlsx writer but probably others to if they use calculateColumnWidths from Worksheet/Worksheet.php
2021-05-26 15:33:01 -07:00
Owen Leibman ae80c12ef0 CSV Reader Enhancements
This PR came about as I pondered how feasible it was to change the default escape character from backslash to null string, since the latter emulates Excel's own actions. Also, surveying issues relating to CSV, it seems that people are often in a situation where the current defaults aren't optimal for them (e.g. they are in a region where semicolon rather than comma is a better default delimiter). My case and that case can both be handled by methods after a reader is constructed. However, the issues also show that many use `IOFactory::load` rather than `new Csv()`, and the methods to affect the defaults are not available in that case.

Adding a static callback that can be invoked by the constructor addresses all these problems. This can be set as part of the user application's normal initialization, and no special attention needs to be paid to CSV loads thereafter, no matter how they are invoked.

This also makes it feasible to use 'guess' as inputEncoding, by providing a new setFallbackEncoding (default CP1252) method to use if none of the heuristic tests pass. There was already the ability to guess the encoding before `$reader->load()`, but not before `IOFactory::load`.

Almost all typehints in Reader/Csv and Reader/Csv/Delimiter are now part of the function signature rather than in the DocBlock. The exceptions are one method in Delimiter which uses a `resource` parameter, and the `canRead` and `load` methods, which must match the signature in IOFactory. I will look into changing those later.

The Csv Reader tests are moved into their own directory. All Phpstan baseline entries involving Csv Reader are eliminated.
2021-05-16 06:05:02 -07:00
Akira Taniguchi 3da51de1a9
Update reading-and-writing-to-file.md 2021-04-06 01:26:10 +09:00
Akira Taniguchi e13fcf27fd
Update reading-and-writing-to-file.md 2021-04-06 01:22:22 +09:00
MarkBaker 11522afee0 Merge branch 'master' into PHP8-Sane-Property-Names
# Conflicts:
#	CHANGELOG.md
#	src/PhpSpreadsheet/Shared/Drawing.php
#	src/PhpSpreadsheet/Spreadsheet.php
#	src/PhpSpreadsheet/Style/Conditional.php
2020-12-27 15:07:50 +01:00
oleibman e768cb0f19
CSV - Guess Encoding, Handle Null-string Escape (#1717)
* CSV - Guess Encoding, Handle Null-string Escape

This is in response to issue #1647 (detect CSV character encoding).
First, my tests with mb_detect_encoding indicate that it doesn't work
well enough; regardless, users can always do that on their own
if they deem it useful.
Rolling my own is also troublesome, but I can at least:
a. Check for BOM (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE).
b. Do some heuristic tests for each of the above encodings.
c. Fallback to a user-specified encoding (default CP1252)
  if a and b don't yield result.
I think this is probably useful enough to include, and relatively
easy to expand if other potential encodings should be considered.

Starting with PHP7.4, fgetcsv allows specification of null string as
escape character in fgetcsv. This is a much better choice than the PHP
(and PhpSpreadsheet) default of backslash in that it handles the file
in the same manner as Excel does. There is one statement in Reader/CSV
which would be adversely affected if the caller so specified (building
a regular expression under the assumption that escape character is
a single character). Fix that statement appropriately and add tests.
2020-12-25 17:47:29 +01:00
MarkBaker 02e88971e8 Update todocumentation 2020-11-19 16:45:01 +01:00
Adrien Crivelli a90bf863ab
Merge pull request #1499 from oleibman/htmledit
Add ability to save edited Html/Pdf
2020-06-28 17:46:56 +09:00
oleibman ce6ac1f040
Fix For #1509 (#1518)
* Fix For #1509

User expected no CSV enclosures after $writer->setEnclosure(''),
which had been changed to be consistent with $reader->setEnclosure('').
Writer will now omit enclosures after code above; no change to Reader.
Tests have been added for this condition.

* Add Option to Write CSV Enclosure Only When Required

Allowing the user to specify no enclosure when writing a CSV can lead to
a situation where PhpSpreadsheet (likewise Excel) will not read the
resulting file as intended, e.g. if any cell contains a delimiter character.
This is demonstrated in new test TestBadReread.
No existing setting will rectify this situation.

A better choice would be to add an option to write the enclosure
only when it is needed, which is what Excel does. The RFC4180 spec at
https://tools.ietf.org/html/rfc4180
states when it is needed - when the cell contains the delimiter,
or the enclosure, or a newline.
New test TestGoodReread demonstrates that the file is read as intended.

The documentation has been updated to describe the new function,
and to change the write example where the enclosure is set to null.

* Scrutinizer Suggestions

3 minor changes, all in tests.
2020-06-19 20:28:57 +02:00
oleibman 360c8d8284
Merge branch 'master' into htmledit 2020-06-09 00:39:52 -07:00
Owen Leibman c47b407e39 Different Example for Callback
Replace default gridlines with different style. Usable in PDF
as well as HTML.

Documentation mentioned use of setUseBOM with Html, but that method
does not exist, and there is no real reason to support it.
Removed it from documentation.
2020-06-09 00:22:22 -07:00
Adrien Crivelli dcf3b9860d
Code highlight in docs for PhpStorm 2020-05-31 22:44:25 +09:00
Owen Leibman edc411e6dd Add ability to save edited Html/Pdf
We give users the ability to edit Html/Pdf, but it's a little cumbersome
to use the edited Html for an Html file, and difficult to use it
for a Pdf. I believe we could make it fairly painless in both cases
by allowing the user to set a callback to edit the generated Html.
This can be accomplished with fewer than a dozen lines of very simple code.
I think this would be easier than grabbing the Html in pieces,
editing it, and reassembling it. I think it would also be simpler
than an alternative I considered, namely the addition of a new method
(e.g. saveEditedHtml) to each of the Html and Pdf writers.

One edit that users might like to make when editing html is to add
fallback fonts, something that is not currently available in
PhpSpreadsheet, and might be difficult to add. A natural extension to
that idea would be the use of webfonts, something which is guaranteed
difficult to add. See samples/Basic/17b_Html for an example of this.

None of the PDF writers support webfonts yet. That doesn't mean they
won't do so in future, but, for now, samples/Pdf/21a_Pdf is a prosaic
example of something you could do with this callback. In fact, this
opens the door to letting the user replace the entire body with data
of their choosing, effectively allowing PhpSpreadsheet (where you can
set things like paper size and orientation) to be used as a front-end to
the Pdf processor without the user having to be be overly familiar with
the vagaries of the PDF processor. I think this is actually a pretty
nice idea. YMMV. See samples/Basic/21b_Pdf for an example.
2020-05-30 21:27:35 -07:00
oleibman 97a80f383c
Improve HTML Writer (#1464)
There are a number of situations where HTML write was producing
HTML which could not be validated. These include:

  - inconsistent use of backslash terminating META, IMG, and COL tags
  - @page style tags in body rather than header. Aside from being
    non-standard, HTML Reader treats those as spreadsheet data.
  - <div style="page-break-before:always" />, a construct which is
    usually better handled through css anyhow.
  - no alt tag for images (drawings and charts)

Other problems:

  - Windows file names not handled correctly for images
  - Memory drawings not handled in extendRowsForChartsAndImages
  - No handling of different values for showing gridlines
    for screen and print
  - Mpdf and Dompdf do not require the use of inline css.
    Tcpdf remains a holdout in the use of this inferior approach.
  - no need to chunk base64 encoding of embedded images
  - support for colors in number format was buggy (html tags
    run through htmlspecialchars)

Code has been refactored when practical to reduce the number of
very large functions.

Coverage is now 100% for the entire HTML Writer module,
from 75% lines and 39% methods beforehand.

All functions dealing only with charts
are bypassed for coverage because the version of Jpgraph available in
Composer is not suitable for PHP7. The code will, nevertheless,
run successfully, but with warning messages. I have confirmed that
the code is entirely covered, without warnings, when the current
version of Jpgraph is used in lieu of the one available in Composer.
I will be glad to revisit this when the Jpgraph problem is resolved.

Directory PhpSpreadsheetTests/Writer/Html was created to house
the new tests. It seemed logical to move HtmlCommentsTest to
the new directory from PhpSpreadsheetTests/Functional.

A function to generate all the HTML is useful, especially for testing,
but also in lieu of the multiple other generate* functions. I have
added and documented generateHTMLAll.

The documentation for the generate* functions (a) produces invalid html,
(b) produces html which cannot be handled correctly by HTML reader,
and (c) even if those were correct, does not actually affect
the display of the spreadsheet. The documentation has been replaced
by a valid, and more instructive, example.

The (undocumented) useEmbeddedCss property, and the functions
to test and set it are no longer needed. Rather than breaking
existing code by deleting them, I marked the functions deprecated.

This change borrows a change to LocaleFloatsTest from
pull request 1456, submitted a little over a week before this one.


## Improve NumberFormat Support

First phase of this change included correcting NumberFormat handling
in HTML Writer. Certain complex formats could not be handled without
changes to Style/NumberFormat, and I did not wish to combine those changes.

Once the original change had been pushed, I took this part of it back up.
HTML Writer can now handle conditions in formats like:
[Blue][>=3000.5]$#,##0.00;[Red][<0]$#,##0.00;$#,##0.00
In testing, I discovered several errors and omissions
in handling of some other formats.
These are now corrected, and tests added.
2020-05-18 12:43:18 +09:00
oleibman 7517cdd008
Improve Coverage for CSV (#1475)
I believe that both CSV Reader and Writer are 100% covered now.

There were some errors uncovered during development.

The reader specifically permits encodings other than UTF-8 to be used.
However, fgetcsv will not properly handle other encodings.
I tried replacing it with fgets/iconv/strgetcsv, but that could not
handle line breaks within a cell, even for UTF-8.
This is, I'm sure, a very rare use case.
I eventually handled it by using php://memory to hold the translated
file contents for non-UTF8. There were no tests for this situation,
and now there are (probably too many).

"Contiguous" read was not handle correctly. There is a file
in samples which uses it. It was designed to read a large sheet,
and split it into three. The first sheet was corrrect, but the
second and third were almost entirely empty. This has been corrected,
and the sample code was adapted into a formal test with assertions
to confirm that it works as designed.

I made a minor documentation change. Unlike HTML, where you never
need a BOM because you can declare the encoding in the file,
a CSV with non-ASCII characters must explicitly include a BOM
for Excel to handle it correctly. This was explained in the Reading CSV
section, but was glossed over in the Writing CSV section, which I
have updated.
2020-05-17 18:15:18 +09:00
Nathanael d. Noblet 22bf54ca11 Allow Html Reader to write into existing spreadsheet
Sometimes you may want to read html into multiple worksheets within one
spreadsheet. Allowing the passing of a spreadsheet in makes this possible.
2019-11-17 21:17:56 +01:00
Claudio Galdiolo 74bc0b826d reword repeated text 2019-08-17 13:49:27 -07:00
Nathanael Noblet 95c8bb9918
Allow HTML Reader to load from string
We often want to export a table as an excel sheet. The system renders the
html and it seems like a waste of time to write it to the file system to
use the reader. This allows us to render the html and then just pass it to
a reader

Closes #1136
2019-08-17 12:54:22 -07:00
Jon Dufresne 5b3870c508
Prefer https:// URLs when available in docs & comments
Fixes #737
2018-10-28 13:55:00 +11:00
Adrien Crivelli ab541b9b5f
Emphasis the point being discussed 2018-06-04 14:09:21 +09:00
Paul Klimov eb612157dd array short syntax in documentaiton (#373) 2018-02-25 13:16:04 +01:00
Gabriel Caruso ae5fadff56 Trailing whitespaces
Signed-off-by: Gabriel Caruso <carusogabriel34@gmail.com>
2017-12-31 22:04:01 +09:00
Adrien Crivelli c46008b2be
Quote class names in docs 2017-12-30 19:44:32 +09:00
Adrien Crivelli 341e82f6eb
Line ending is detected automatically since commit f2310e05d
Closes #293
2017-12-17 16:49:37 +09:00
Maxim Bulygin 442e612202
Support custom PDF library instances or configurations
This allow to create and configure the standard instance of the
external PDF libary, before returning it to the standard writer.

Or, more powerful, this allow to provide a custom implementation
of the external PDF library, allowing for custom behaviors. An
example of that would something like: https://tcpdf.org/examples/example_003/

Closes #266
2017-11-04 16:01:09 +09:00
Adrien Crivelli 25ff914aa6
Simplify IOFactory to rely on autoloading 2017-10-22 01:54:14 +09:00
Adrien Crivelli 79ab852bf5
Expose PDF writer to be used directly
We used to have some kind of wrapper that didn't do much except
forward methods to the real instance. That unnecessary complexity
made it harder to work with the real writer instance.
2017-10-14 14:57:44 +09:00
Markus Lanthaler 3ee9cc5ce6
Infer CSV delimiter if it hasn't been set explicitly
Closes #141
2017-04-20 17:02:03 +09:00
Adrien Crivelli 8e8d0e4a30
Improve doc formatting with backticks 2017-03-13 14:57:37 +09:00
Adrien Crivelli b8adec4938
Update documentation with composer instructions
And split first page into several topics to improve sidebar navigation

Closes #114
2017-03-13 12:30:26 +09:00