Commit Graph

2 Commits

Author SHA1 Message Date
oleibman ad5532e2f4
Namespacing Phase 2 - Styles (#2471)
* WIP Namespacing Phase 2 - Styles

This is part 2 of a several-phase process to permit PhpSpreadsheet to handle input Xlsx files which use unexpected namespacing. The first phase, introduced as part of release 1.19.0, essentially handled the reading of data. This phase handles the reading of styles. More phases are planned.

It is my intention to leave this in draft status for at least a month. This will give time for additional testing, by me and, I hope, others who might be interested.

This fixes the same problem addressed by PR #2458, if it reaches mergeable status before I am ready to take this out of draft status. I do not anticipate any difficult merge conflicts if the other change is merged first.

This change is more difficult than I'd hoped. I can't get xpath to work properly with the namespaced style file, even though I don't have difficulties with others. Normally we expect:
```xml
<stylesheet xmlns="http://whatever" ...
```
In the namespaced files, we typically see:
```xml
<x:stylesheet xmlns:x="http://whatever" ...
```

Simplexml_load_file specifying a namespace handles the two situations the same, as expected. But, for some reason that I cannot figure out, there are significant differences when xpath processes the result. However, I can manipulate the xml if necessary; I'm not proud of doing that, and will gladly accept any suggestions. In the meantime, it seems to work.

My major non-standard unit test file had disabled any style-related tests when phase 1 was installed. These are now all enabled.

* Scrutinizer

Its analysis is wrong, but the "errors" it pointed out are easy to fix.

* Eliminate XML Source Manipulation

Original solution required XML manipulation to overcome what appears to be an xpath problem. This version replaces xpath with iteration, eliminating the need to manipulate the XML.

* Handle Some Edge Cases

For example, Style file without a Fills section.

* Restore RGB/ARGB Interchangeability

Fix #2494. Apparently EPPlus outputs fill colors as `<fgColor rgb="BFBFBF">` while most output fill colors as `<fgColor rgb="FFBFBFBF">`. EPPlus actually makes more sense. Regardless, validating length of rgb/argb is a recent development for PhpSpreadsheet, under the assumption that an incorrect length is a user error. This development invalidates that assumption, so restore the previous behavior.

In addition, a comment in Colors.php says that the supplied color is "the ARGB value for the colour, or named colour". However, although named colors are accepted, nothing sensible is done with them - they are passed unchanged to the ARGB value, where Excel treats them as black. The routine should either reject the named color, or convert it to the appropriate ARGB value. This change implements the latter.
2022-02-11 06:42:04 -08:00
oleibman cd84020693
Xlsx Reader Better Namespace Handling Phase 1 Try2 (#2173)
* Xlsx Reader Better Namespace Handling Phase 1 Try2

This is a replacement for #2088, which has run into merge conflicts. I will close that PR in the near future, however the comments in that PR may prove useful for this one. While that PR has been in draft status all along, I am marking this one as ready. I will gladly add additional tests (and, of course, make code changes) that anyone has to suggest, but, with my most recent test files which I will describe in a separate comment, I have no further ideas on useful additions.

As mentioned in the earlier ticket, this is a risky change. But, as has been demonstrated, delaying it comes with its own set of risks. It would be helpful to have a temporary moratorium on changes to Reader/Xlsx until this change is merged.

The original commit message follows.

There have been a number of issues concerning the handling of legitimate but unexpected namespace prefixes in Xlsx spreadsheets created by software other than Excel and PhpSpreadsheet/PhpExcel.I have studied them, but, till now, have not had a good idea on how to act on them. A recent comment https://github.com/PHPOffice/PhpSpreadsheet/issues/860#issuecomment-824926224 in issue #860 by @IMSoP has triggered an idea about how to proceed.

Gnumeric Reader was recently changed to handle namespaces better. Using that as a model, this PR begins the process of doing the same for Xlsx. Xlsx is much larger and more complicated than Gnumeric, hence the need to tackle it in multiple phases. I believe that this PR handles all of:
- listWorkSheetNames
- listWorkSheetInfo. Note that there was a bug in this function which would cause it to count only used columns rather than all columns. That bug is corrected.
- active sheet
- selected cell and top left cell
- cell content (formulas, numbers, text)
- hyperlinks
- comments (partial - see below)

This PR does not address:
- styles
- images and charts
- VBA and ribbons
- many other items, I'm sure

The issue for non-standard namespacing till now has been the use of unexpected prefixes. While I was working on this change, @Lambik introduced issue #2067 PR #2068 which introduced a completely different problem - the use of unexpected URLs. That PR and the issue associated with it were quite well documented, including the supplying of a test file and tests for it. I asked if I could take a look to see if it could be integrated with my change, and the result seems to be yes, so those changes are also part of this PR.

While adding a comment to my test file, I discovered that Microsoft had added "threaded comments" as a new feature. I believe these are not yet supported by PhpSpreadsheet, and I am not going to add it, at least not now. I believe that, among other things, this will make identifying the author of a comment more difficult.

Although there are a number of Phpstan baseline changes as part of this PR, I did not attempt to resolve all Phpstan reports for Reader/Xlsx. Nor did I do anything to increase coverage. This change is already large and complex enough without those efforts.
2021-06-25 09:05:49 +02:00