Drawings in an Xlsx file are stored in such a way that Php can read their contents using the `zip:` protocol. This does not, however, work when the file is read by PhpSpreadsheet and then saved as Html or Pdf, since the browser will not recognize that protocol even if the file is available. Such drawings need to be saved in the html as embedded images in order for the copy to display them properly. This is true even when the writer is set to not embed images (default).
An additional problem arises when an Html file with an embedded image is read, because `Worksheet\Drawing::setPath` attempts to validate the path, which it cannot do for the `data:image` Url which embedded images use.
And yet another problem. Writer/Html writes out a MemoryDrawing as a png using the imagepng function; but then declares it as jpeg in the Html. This is now corrected.
And a fourth problem. Writer/Html ignores the last row if it contains nothing but a Memory Drawing, which can be true when copying an Xls file.
These changes are testable (it's how I discovered the second part of this parlay). I think it is also useful to add a sample to see the results of this type of copy.
* Add editAs Property for 2-cell Anchor Drawings
This change builds on PR #2532 (@naotake51 as PR #2532), using ideas from PR #2237 (@AdamGaskins), which has had changes requested for several months. It covers a lot of the same ground as 2532. In Excel, two-cell anchor drawings can be edited as "twocell", "onecell", or "absolute". This PR adds support for those options, with a sample file that demonstrates the difference in addition to unit tests. Several other tests are added to improve the spotty coverage for Drawings.
There have been several other tickets referencing two cell anchors, including issue #1159 and PR #1160 (@sgarwood, who also added support for editAs), PR #643, and issue #126, all now closed but not necessarily entirely resolved. I will try to ensure that those tickets are addressed with this one.
And, in trying to make sure 1160 is covered, I stumbled upon a bug. If you use the same image resource to create two+ memory drawings, the MemoryDrawing destructor for the first will cause the rest to generate a very long warning message. This is not a problem for Php8+, only for Php7-. I have suppressed the message in the MemoryDrawing constructor. 1160 went stale due to an unresolved test error, but I don't think this was the problem. At any rate, its test works now.
* Scrutinizer
It reported 1 minor issue (fixed normally), and 2 major. One is fixed with a kludge. The other is a case where Scrutinizer's analysis is just wrong, and I can't figure out a kludge. But I was able to add an annotation (the first time I've managed to get one past phpcs/php-cs-fixer). We'll see.
Sample39 was not adjusted when Defined Name support was changed to include both relative and absolute addresses. As a result, the Xlsx output is currently incorrect. The Xls output is in even worse shape, since PhpSpreadsheet 'Cannot yet write formulae with defined names to Xls' (see Writer/Xls/Parser.php). (Working around that restriction was the reason for my first contribution to this project.) The sample is changed to define its named ranges properly, and to produce only Xlsx output.
* Fixed handling of inline strings
* Added an example for the "inline string" datatype for which the generated XML was fixed in the previous commit.
* Handle inline strings exactly the same way as regular strings in the XLS writer, because this is only relevant in the XLSX format.
* Fill Pattern Start and End Colors
Fix#2441. The Fill constructor sets start color to white and end color to black and the Xlsx writer writes these values to the output file. This appears to be the wrong setting for all 7 LIGHT* pattern types, 2 of the 7 DARK* patterns (DARKGRAY and DARKTRELLIS), and 1 of the 3 GRAY patterns (GRAY0625). When the wrong colors are written at save time, those patterns are not as expected. Xls writer does not appear to have the same problem.
The XML does not require either a start or end color, and the omission of these colors in the file being read was responsible for the problem. The code is changed to mimic that behavior by omitting the color tags at write time if they have not changed from when they were created by the Fill constructor (they will be written for gradient or solid patterns regardless).
This is another change which is easier to confirm via samples rather than tests. There are separate samples for Xlsx and Xls; as Excel will be quick to warn you, Xls is not as fully functional as Xlsx with respect to fill patterns. The samples do include a cell where one of the cells (LightGrid in C11) explicitly specifies the "default" colors.
* Scrutinizer
It somehow ascribed to me a problem in code which was unchanged by this PR. Correct it anyhow, along with some Phpstan fixes (errors now ignored because of change).
* Added Tests
Also corrected some docBlock problems with Style/*/parent and getSharedComponent.
* Create 2 Abstract Methods
Scrutinizer complained that 2 methods found in all Supervisor sub-types were not defined in Supervisor. Add abstract methods to satisfy it.
* Scrutinizer Ignoring Typehints
Try this instead.
* Slight Improvement
Better handling of Style->getParent().
* Special Characters in Image File Name
Fix#2415. Fix#1470. If path name of image contains anything other than ASCII, or if it contains # or space or probably other exceptions, PhpSpreadsheet creates a file that Excel cannot, for whatever reason, read (it is valid xml). When adding an image to a spreadsheet, Excel does not retain the original path name; PhpSpreadsheet does, but probably shouldn't. It is changed to save the image file in the zip as the MD5 hash of the original path name. This produces a file that Excel can read. In addition, it ensures that, if the image is used in multiple places, it is saved in the Excel file only once.
Because this error becomes evident only when opening the file in Excel, it is difficult to write a test case. I have instead duplicated sample Basic/05... using image files whose names match the reported error conditions.
* Scrutinizer Minor Error
Remove some newly dead code.
Specifically, the default for these two functions has been changed from `ENT_COMPAT` to `ENT_QUOTES | ENT_SUBSTITUTE`
This PR configures the argument used for those functions in Settings, and then explicitly applies it everywhere they are used in the codebase.
* Document Properties - Coverage and 32-bit-safe Timestamps
While researching an issue, I noticed that coverage of Document/Properties was poor. Further, the use of int timestamps will eventually lead to problems for 32-bit PHP (see issue #1826).
Coverage Changes:
- Many property types with no special handling are enumerated but not tested. These are removed, but will continue to function as before.
- Existing code theoretically allows property to be set to an object, but there is no means to read or write such a property, and, even if there were, I don't believe Excel supports it. Setting a property to an object will now be changed to a no-op (can throw an exception if preferred).
- Since the Properties object now has no members which are themselves objects, there is no need for a deep clone. The untested __clone method is removed.
- Large switch statements are replaced with associative arrays. Scrutinizer will like that.
- Coverage is now 100%.
<!-- end of coverage changes list -->
Timestamp Changes:
- Timestamps will be stored as int if possible, or float if not. This is, or will soon be, needed for 32-bit systems. Tests have been added for beyond-epoch dates, and run successfully with 32-bit.
- LibreOffice doesn't quite get the Created/Modified properties correct. These are written to the file as a string which includes offset from UTC, but LibreOffice ignores the offset portion when displaying them. Code had been generating these in UTC, but now generates them in default timezone, which should meet user's expectations.
<!-- end of timestamp changes list -->
Other Changes:
- Custom properties added to ODS Writer.
- Samples had not been generating any ODS files. One is now generated.
- Ods uses a single 'keywords' property rather than multiple 'keyword' properties.
- Breaking change - default company is changed to null string from Microsoft Corporation.
- Breaking change of sorts - PropertiesTest incorrectly tested a custom date property against a string, Reader/XlsxTest correctly tested against a timestamp converted to a string. PropertiesTest was defective, and will no longer work as coded; anyone using it as a model will likewise have a problem.
- PHP8.1 has been complaining for weeks about a time zone conversion test. I have now downloaded a version, and changed the code so that it will work in 8.1 as well as prior releases. (It is still likely that the existing code should work in 8.1, but I haven't yet figured out how to file a bug report.) In the course of testing, 3 additional 8.1 problems were reported (all along the lines of "can't pass null to strpos"), and are fixed with null coercion.
- Two Calculation tests failed because of large results on 32-bit system. These are corrected by allowing the functions involved to return float|int rather than int. I suspect that there are other functions with this problem, and will investigate as a follow-up activity.
- See issue #2090. I believe that changes between 17.1 and master will merely cause the problematic spreadsheet to fail in a different way. I believe that enclosing in quotes some variables passed to Document/Properties by Reader/Xlsx will eliminate the problem, but, in the absence of an example file, cannot say for sure.
- Properties tests are now separated out from Reader/XlsxTest and Reader/OdsTest, and now test both Read and Write (via reload).
<!-- end of other changes list -->
Miscellaneous Notes:
- There remains no support for Custom Properties in Xls Reader or Writer.
- We now have default timezones for all of PHP itself, Shared/Date, and Shared/Timezone. That is least one too many. I was unable to disentangle the latter two for this change, but will look into deprecating one or the other in future.
* Phpstan
6 baseline deletions, 2 docblock changes
* Scrutinizer's Turn
3 minor errors that hadn't blocked the request.
19_NamedRange.php was not changed to use absolute addressing when that was introduced to Named Ranges. Consequently, the output from this sample has been wrong ever since, for both Xls and Xlsx.
There was an additional problem with Xls. It appears that the Xls Writer Parser does not parse multiple concatenations using the ampersand operator correctly. So, `=B1+" "+B2` was parsed as `=B1+" "`. I believe that this is due to ampersand being treated as a condition rather than an operator; `A1>A2>A3` isn't valid, but `A1&A2&A3` is. My original PR (#1992, which I will now close) only partially resolved this, but I think moving ampersand handling from `condition` to `expression` is fully successful.
There are already more than ample tests for Named Ranges, so I did not add a new one for that purpose. However, I did add a new test for the Xls parser problem.
As issue #2042 documents, SUM behaves differently with invalid strings depending on whether they come from a cell or are used as literals in the formula. SUM is not alone in this regard; COUNTA is another function within this behavior, and the solution to this one is modeled on COUNTA. New tests are added for SUM, and the resulting tests are duplicated to confirm correct behavior for both cells and literals.
Samples 16 (CSV), 17 (Html), and 21 (PDF) were adversely affected by this problem. 17 and 21 were immediately fixed, but 16 had another problem - Excel was not interpreting the UTF8 currency symbols correctly, even though the file was saved with a BOM. After some experimenting, it appears that the `sep=;` line generated by setExcelCompatibility(true) causes Excel to mis-handle the file. This seems like a bug - there is apparently no way to save a UTF-8 CSV with non-ASCII characters which specifies a non-standard separator which Excel will open correctly. I don't know if this is a recent change or if it is just the case that nobody noticed this problem till now. So, I changed Sample 16 to use setUseBom rather than setExcelCompatibility, which solved its problem. I then added new tests for setExcelCompatibility, with documentation of this problem.
* Stacked Alignment - Use Class Constant Rather than Literal
PR #1580 defined constants for "stacked" alignment in cells.
Using those constants outside of Style/Alignment was beyond the
scope of the original PR, but I said I would get to it.
This PR replaces all uses of literal -165, and appropriate uses of
literal 255, with the named constants, and adds tests to make sure
that the changed code is covered in the test suite.
* Improving Coverage for Excel2003 XML Reader
Reader/Xml is now 100% covered.
File templates/Excel2003XMLTest.xml, used in some tests, is *not*
readable by a current version of Excel. I have substituted a new file
excel2003.xml to be used in its place. I have not deleted the original
in case someone in future (possibly me) wants to see what it needs to
make it usable.
There are minimal code changes.
- Unused protected functions pixel2WidthUnits and widthUnits2Pixel
are deleted.
- One regex looking to convert hex characters is changed from a-z to a-f,
and made case insensitive.
- No calculation performed for "error" cell (previously calculation
was attempted and threw exception).
- Empty relative row/cell is now handled correctly.
- Style applied to empty cell when appropriate.
- Support added for textRotation.
- Support added for border styles.
- Support added for diagonal borders.
- Support added for superscript and subscript.
- Support added for fill patterns.
In theory, encodings other than UTF-8 were supported.
In fact, I was unable to get SecurityScanner to pass *any* xml which is
not UTF-8. Eliminating the assumption that strings might not be UTF-8
allowed much of the code to be greatly simplified.
After that, I added some code that would permit the use of
some ASCII-compatible encodings (there is a test of ISO-8859-1).
It would be more difficult to handle other encodings (such as UTF-16).
I am not convinced that even the ISO-8859 effort is worth it,
but am willing to investigate either expanding or eliminating
non-UTF8 support.
I added a number of tests, creating an Xml directory, and moving
XmlTest to that directory.
Pull Request had problems reading old invalid sample in the code
coverage phase, not in any of the other test phases, and not in
the code coverage phase on my local machine.
As it turns out, aside from being invalid, the sample
is much larger than any of the other samples. Tests have been
adjusted accordingly.
* Smaller Test File
Should eliminate need to avoid test during xml coverage.
* Break Up Style Test into Multiple Tests
Per suggestion from Mark Baker.
* Integrate AddressHelper Change
The introduction of AddressHelper introduced a conflict which needed to
be resolved. I wanted to test it locally before resolving. This required
me to add (unchanged) AddressHelper to my local copy. I hope this is
an okay manner of resolving the conflict.
* Weird Travis Error
XmlOddTest works just fine on my local machine, but Travis failed it.
Even worse, the lines which Travis flags don't even make any sense
(one was the empty line between two methods!).
This test is not essential to the rest of the change. I am removing
it from the package, and will attempt to re-add it when I have a chance
to sync up my fork with the main project.
Reader/Html is now covered except for 1 statement.
There is some coverage of RichText when you know in advance that the
html will expand into a single cell.
It is a tougher nut, one that I have not yet cracked,
to try to handle rich text while converting unkown html to multiple cells.
The original author left this as a TODO, and so for now must I.
It made sense to restructure some of the code. There are some changes.
- Issue #1532 is fixed (links are now saved when using rowspan).
- Colors can now be specified as html color name. To accomplish this,
Helper/Html function colourNameLookup was changed from protected
to public, and changed to static.
- Superfluous empty lines were eliminated in a number of places, e.g.
<ul><li>A</li><li>B</li><li>C</li></ul>
had formerly caused a wrapped cell to be created with 2 empty lines
followed by A, B, and C on separate lines; it will now just have the
3 A/B/C lines, which seems like a more sensible interpretation.
- Img alt tag, which had been cast to float, is now used as a string.
Private member "encoding" is not used. Functions getEncoding and setEncoding
have therefore been marked deprecated. In fact, I was unable to get
SecurityScanner to pass *any* html which is not UTF-8. There are
possibly ways of getting around this (in Reader/Html - I have no
intention of messing with Security Scanner), as can be seen in my
companion pull request for Excel2003 Xml Reader. Doing this would be
easier for ASCII-compatible character sets (like ISO-8859-1),
than for non-compatible charsets (like UTF-16). I am not
convinced that the effort is worth it, but am willing to investigate
further.
I added a number of tests, creating an Html directory, and moving
HtmlTest to that directory.
Replace default gridlines with different style. Usable in PDF
as well as HTML.
Documentation mentioned use of setUseBOM with Html, but that method
does not exist, and there is no real reason to support it.
Removed it from documentation.
We give users the ability to edit Html/Pdf, but it's a little cumbersome
to use the edited Html for an Html file, and difficult to use it
for a Pdf. I believe we could make it fairly painless in both cases
by allowing the user to set a callback to edit the generated Html.
This can be accomplished with fewer than a dozen lines of very simple code.
I think this would be easier than grabbing the Html in pieces,
editing it, and reassembling it. I think it would also be simpler
than an alternative I considered, namely the addition of a new method
(e.g. saveEditedHtml) to each of the Html and Pdf writers.
One edit that users might like to make when editing html is to add
fallback fonts, something that is not currently available in
PhpSpreadsheet, and might be difficult to add. A natural extension to
that idea would be the use of webfonts, something which is guaranteed
difficult to add. See samples/Basic/17b_Html for an example of this.
None of the PDF writers support webfonts yet. That doesn't mean they
won't do so in future, but, for now, samples/Pdf/21a_Pdf is a prosaic
example of something you could do with this callback. In fact, this
opens the door to letting the user replace the entire body with data
of their choosing, effectively allowing PhpSpreadsheet (where you can
set things like paper size and orientation) to be used as a front-end to
the Pdf processor without the user having to be be overly familiar with
the vagaries of the PDF processor. I think this is actually a pretty
nice idea. YMMV. See samples/Basic/21b_Pdf for an example.
No code changes. The tests in all of these scripts write to at least
one temporary file, which is then read and not used again. The file
should be deleted to avoid filling up the disk system.
There are a number of situations where HTML write was producing
HTML which could not be validated. These include:
- inconsistent use of backslash terminating META, IMG, and COL tags
- @page style tags in body rather than header. Aside from being
non-standard, HTML Reader treats those as spreadsheet data.
- <div style="page-break-before:always" />, a construct which is
usually better handled through css anyhow.
- no alt tag for images (drawings and charts)
Other problems:
- Windows file names not handled correctly for images
- Memory drawings not handled in extendRowsForChartsAndImages
- No handling of different values for showing gridlines
for screen and print
- Mpdf and Dompdf do not require the use of inline css.
Tcpdf remains a holdout in the use of this inferior approach.
- no need to chunk base64 encoding of embedded images
- support for colors in number format was buggy (html tags
run through htmlspecialchars)
Code has been refactored when practical to reduce the number of
very large functions.
Coverage is now 100% for the entire HTML Writer module,
from 75% lines and 39% methods beforehand.
All functions dealing only with charts
are bypassed for coverage because the version of Jpgraph available in
Composer is not suitable for PHP7. The code will, nevertheless,
run successfully, but with warning messages. I have confirmed that
the code is entirely covered, without warnings, when the current
version of Jpgraph is used in lieu of the one available in Composer.
I will be glad to revisit this when the Jpgraph problem is resolved.
Directory PhpSpreadsheetTests/Writer/Html was created to house
the new tests. It seemed logical to move HtmlCommentsTest to
the new directory from PhpSpreadsheetTests/Functional.
A function to generate all the HTML is useful, especially for testing,
but also in lieu of the multiple other generate* functions. I have
added and documented generateHTMLAll.
The documentation for the generate* functions (a) produces invalid html,
(b) produces html which cannot be handled correctly by HTML reader,
and (c) even if those were correct, does not actually affect
the display of the spreadsheet. The documentation has been replaced
by a valid, and more instructive, example.
The (undocumented) useEmbeddedCss property, and the functions
to test and set it are no longer needed. Rather than breaking
existing code by deleting them, I marked the functions deprecated.
This change borrows a change to LocaleFloatsTest from
pull request 1456, submitted a little over a week before this one.
## Improve NumberFormat Support
First phase of this change included correcting NumberFormat handling
in HTML Writer. Certain complex formats could not be handled without
changes to Style/NumberFormat, and I did not wish to combine those changes.
Once the original change had been pushed, I took this part of it back up.
HTML Writer can now handle conditions in formats like:
[Blue][>=3000.5]$#,##0.00;[Red][<0]$#,##0.00;$#,##0.00
In testing, I discovered several errors and omissions
in handling of some other formats.
These are now corrected, and tests added.
I believe that both CSV Reader and Writer are 100% covered now.
There were some errors uncovered during development.
The reader specifically permits encodings other than UTF-8 to be used.
However, fgetcsv will not properly handle other encodings.
I tried replacing it with fgets/iconv/strgetcsv, but that could not
handle line breaks within a cell, even for UTF-8.
This is, I'm sure, a very rare use case.
I eventually handled it by using php://memory to hold the translated
file contents for non-UTF8. There were no tests for this situation,
and now there are (probably too many).
"Contiguous" read was not handle correctly. There is a file
in samples which uses it. It was designed to read a large sheet,
and split it into three. The first sheet was corrrect, but the
second and third were almost entirely empty. This has been corrected,
and the sample code was adapted into a formal test with assertions
to confirm that it works as designed.
I made a minor documentation change. Unlike HTML, where you never
need a BOM because you can declare the encoding in the file,
a CSV with non-ASCII characters must explicitly include a BOM
for Excel to handle it correctly. This was explained in the Reading CSV
section, but was glossed over in the Writing CSV section, which I
have updated.
Column indexes are always based on 1 everywhere in PhpSpreadsheet.
This is consistent with rows starting at 1, as well as Excel
function `COLUMN()`. It should also make it easier to reason about
columns and rows and remove any doubts whether a specific method is
expecting 0 based or 1 based indexes.
Fixes#273
Fixes https://github.com/PHPOffice/PHPExcel/issues/307
Fixes https://github.com/PHPOffice/PHPExcel/issues/476
We used to have some kind of wrapper that didn't do much except
forward methods to the real instance. That unnecessary complexity
made it harder to work with the real writer instance.