Commit Graph

240 Commits

Author SHA1 Message Date
MarkBaker 8629337101 Retrieving print/page setup for the Xml Reader 2020-07-05 16:22:35 +02:00
Owen Leibman 752a0a5a6c Scrutinizer Recommendations
Two unneeded assignments in tests, one unused parameter in source code.
2020-06-25 23:11:30 -07:00
Owen Leibman 6080c4561d Improve Coverage for HTML Reader
Reader/Html is now covered except for 1 statement.
There is some coverage of RichText when you know in advance that the
html will expand into a single cell.
It is a tougher nut, one that I have not yet cracked,
to try to handle rich text while converting unkown html to multiple cells.
The original author left this as a TODO, and so for now must I.

It made sense to restructure some of the code. There are some changes.
- Issue #1532 is fixed (links are now saved when using rowspan).
- Colors can now be specified as html color name. To accomplish this,
  Helper/Html function colourNameLookup was changed from protected
  to public, and changed to static.
- Superfluous empty lines were eliminated in a number of places, e.g.
  <ul><li>A</li><li>B</li><li>C</li></ul>
  had formerly caused a wrapped cell to be created with 2 empty lines
  followed by A, B, and C on separate lines; it will now just have the
  3 A/B/C lines, which seems like a more sensible interpretation.
- Img alt tag, which had been cast to float, is now used as a string.

Private member "encoding" is not used. Functions getEncoding and setEncoding
have therefore been marked deprecated. In fact, I was unable to get
SecurityScanner to pass *any* html which is not UTF-8. There are
possibly ways of getting around this (in Reader/Html - I have no
intention of messing with Security Scanner), as can be seen in my
companion pull request for Excel2003 Xml Reader. Doing this would be
easier for ASCII-compatible character sets (like ISO-8859-1),
than for non-compatible charsets (like UTF-16). I am not
convinced that the effort is worth it, but am willing to investigate
further.

I added a number of tests, creating an Html directory, and moving
HtmlTest to that directory.
2020-06-25 22:42:38 -07:00
oleibman 38fab4e632
Fix for #1505 (#1525)
This problem is the same as #1238, which was resolved by #1239.
For that issue, the fix was to check in one place whether
$this->mapCellXfIndex[$xfIndex] was set before using it.
The sample spreadsheet supplied as a description for this
problem had exactly the same problem in 2 other places in the code.
In addition, there were 7 other places in the code where that
particular item was used unchecked. This fix corrects all 9 locations.
The spreadsheet supplied with the problem is used as the basis
for some new tests, which particularly test column dimensions
and styles, the problems involved in this case.
2020-06-19 21:01:18 +02:00
oleibman 262896086a
Improve Coverage for Sylk (#1514)
* Improve Coverage for Sylk

I believe that both BaseReader and Sylk Reader are now 100% covered.

Documentation available for this format is sparse.
It was always incomplete, and in some cases inaccurate.
My goal was to use PhpSpreadsheet to load the test file,
save it as Xlsx, and visually compare the two, then add a test
loaded with assertions. Cell values and calculated values,
and border styles were generally handled pretty well without changes.
Other types of styling were not handled so well. I added a few cells
to exercise some previously uncovered code.

Sylk files must be ASCII. I have deprecated the use of the
setEncoding and getEncoding functions, which had no test cases.
2020-06-19 20:35:44 +02:00
oleibman 73379cdfb1
Improve Coverage for Gnumeric (#1517)
* Improve Coverage for Gnumeric

I believe that both BaseReader and Gnumeric Reader are now 100% covered.

My goal was to use PhpSpreadsheet to load the test file,
save it as Xlsx, and visually compare the two, then add a test
loaded with assertions. Results were generally pretty good,
but there were no tests with assertions. I added a few cells
to exercise some previously uncovered code. Code was extensively
refactored; logic changes are noted below.

Code allowed for specifying document properties in an old format.
I considered removing that, but I found the original spec at
http://www.jfree.org/jworkbook/download/gnumeric-xml.pdf
This allowed me to create an old file, which was not handled
correctly because of namespace differences. The code was corrected
to allow for this difference.

Added support for textRotation.

Mapping of fill types was not correct.

* PHP7.2 Error

One assertion failed under PHP7.2. Apparently there was some change in
the handling of SimpleXMLElement between 7.2 and 7.3. Casting to string
before use eliminates the problem.

* Scrutinizer Recommendations

All minor, solved (hopefully) mostly by casts.

* One Last Scrutinizer Fix

... I hope.
2020-06-19 20:34:02 +02:00
oleibman 585409a949
Testing - Delete Temp Files When No Longer Needed (#1488)
No code changes. The tests in all of these scripts write to at least
one temporary file, which is then read and not used again. The file
should be deleted to avoid filling up the disk system.
2020-05-24 20:03:07 +09:00
oleibman 41b95c1542
CSV Sample File Was Miscoded (#1489)
File author erroneously assumed that backslash was used to escape
quotes in CSV; in fact, doubling the quote is used for escape.
The test still worked, but mainly because the content of the cell
with the escape wasn't tested. The file is now fixed, and
a new test added.
2020-05-24 19:57:39 +09:00
Adrien Crivelli 137268d61a
Remove undesired annotations 2020-05-18 15:49:29 +09:00
Adrien Crivelli fcd9f10663
Update PHP-CS-Fixer rules 2020-05-18 13:49:57 +09:00
Adrien Crivelli e868e58d20
Allow to run an entire folder of tests
We now can do something like:

```sh
./vendor/bin/phpunit tests/PhpSpreadsheetTests/Reader/
```
2020-05-17 18:35:55 +09:00
oleibman 7517cdd008
Improve Coverage for CSV (#1475)
I believe that both CSV Reader and Writer are 100% covered now.

There were some errors uncovered during development.

The reader specifically permits encodings other than UTF-8 to be used.
However, fgetcsv will not properly handle other encodings.
I tried replacing it with fgets/iconv/strgetcsv, but that could not
handle line breaks within a cell, even for UTF-8.
This is, I'm sure, a very rare use case.
I eventually handled it by using php://memory to hold the translated
file contents for non-UTF8. There were no tests for this situation,
and now there are (probably too many).

"Contiguous" read was not handle correctly. There is a file
in samples which uses it. It was designed to read a large sheet,
and split it into three. The first sheet was corrrect, but the
second and third were almost entirely empty. This has been corrected,
and the sample code was adapted into a formal test with assertions
to confirm that it works as designed.

I made a minor documentation change. Unlike HTML, where you never
need a BOM because you can declare the encoding in the file,
a CSV with non-ASCII characters must explicitly include a BOM
for Excel to handle it correctly. This was explained in the Reading CSV
section, but was glossed over in the Writing CSV section, which I
have updated.
2020-05-17 18:15:18 +09:00
Adrien Crivelli f1a019e492
Upgrad PHP deps 2020-04-27 19:29:45 +09:00
bbinotto e2f87e8b7a
Load with styles should not default to black fill color
Fixes #1353
Closes #1361
2020-04-26 22:33:30 +09:00
Matthijs Alles 87f71e1930 Support whitespaces in CSS style in Xlsx
Indentation in the xml leaves spaces in style string even after
replacing newlines. Replacing the spaces ensures no spaces in keys
of the resulting style-array

Fixes #1347
2020-04-05 19:50:57 +09:00
Stronati Andrea 9f5a472426 Fix XLSX file loading with autofilter containing '$'
The `setRange` method of the `Xlsx/AutoFilter` class expects a filter
range format like "A1:E10". The returned value from
`$this->worksheetXml->autoFilter['ref']` could contain "$" and returning
a value like "$A$1:$E$10".

Fixes #687
Fixes #1325
Closes #1326
2020-03-02 18:43:27 +07:00
oleibman 082266aacd Conditionals - Extend Support for (NOT)CONTAINSBLANKS (#1278)
Support for the CONTAINSBLANKS conditional style was added a while ago.
However, that support was on write only; any cells which used
CONTAINSBLANKS on a file being read would drop that style.

I am also adding support for NOTCONTAINSBLANKS, on read and write.
2020-01-04 18:50:04 +01:00
oleibman afd070a756 Handle ConditionalStyle NumberFormat When Reading Xlsx File (#1296)
* Handle ConditionalStyle NumberFormat When Reading Xlsx File

ReadStyle in Reader/Xlsx/Styles.php expects numberFormat to be a string.
However, when reading conditional style in Xlsx file, NumberFormat
   is actually a SimpleXMLElement, so is not handled correctly.
While testing this change, it turned out that reader always expects
   that there is a "SharedString" portion of the XML, which is not
   true for spreadsheets with no string data, which causes a
   run-time message.
Likewise, when conditional number format is not one of the built-in
   formats, a run-time message is issued because 'isset' is used
   to determine existence rather than 'array_key_exists'.
The new workbook added to the testing data demonstrates both those
   problems (prior to the code changes).

* Move Comment to Resolve Conflict

Github reports conflict involving placement of one comment statement.

* Respond to Scrutinizer Style Suggestion

Change detection for empty SimpleXMLElement.
2020-01-04 00:10:41 +01:00
coolhub 86fa5424a6
Correct column style even when using rowspan
Closes #1249
2019-11-30 15:40:42 +01:00
Nathanael d. Noblet 22bf54ca11 Allow Html Reader to write into existing spreadsheet
Sometimes you may want to read html into multiple worksheets within one
spreadsheet. Allowing the passing of a spreadsheet in makes this possible.
2019-11-17 21:17:56 +01:00
Nathanael Noblet 95c8bb9918
Allow HTML Reader to load from string
We often want to export a table as an excel sheet. The system renders the
html and it seems like a waste of time to write it to the file system to
use the reader. This allows us to render the html and then just pass it to
a reader

Closes #1136
2019-08-17 12:54:22 -07:00
Mahmoud Abdo 785705b712
Best effort to support invalid colspan values in HTML reader
Closes #878
2019-07-27 23:31:23 -07:00
Adrien Crivelli fa54ca79a3
Migrate away from deprecated PHPUnit asserts 2019-07-25 10:15:53 -07:00
Mark Baker bf59cf0cbc
Html cellwrapping (#1075)
* When <br> appears in a table cell, set the cell to wrap.

If the cell is not set to wrap, it appears as a single line when first
displayed in Excel, although editing the cell will cause Excel to wrap
it.

* fix whitespace

Upstream has a coding standard that includes whitespace

* Add Unit tests for cell wrapping

* Update changelog
2019-07-12 07:52:03 +02:00
Mark Baker d8047b071b
Basic unit test and fix for loading data validations from xlsx file (#1063) 2019-07-08 19:55:14 +02:00
rtek 6ab969e9cc Allow XmlScanner to correctly restore libxml entity_loader setting (#1050)
XmlScanner was not restoring libxml_disable_entity_loader since
destruct was not being called until script shutdown. This is because
the shutdown handler required an XmlScanner instance.

Also fix an unrelated bug where the UTF-8 encoding test was
case sensitive.
2019-07-03 09:53:43 +02:00
Mark Baker 0e6238c69e
CVE-2019-12331 (#1041)
* Detect doubly-encoded xml to hide XXE attacks
Correct use of LibXml_Disable_Entity_Loader

* New test for double-encoded xml in security scanner
2019-07-01 00:55:25 +02:00
Mark Baker 1e711541f1
Refactoring xlsx reader (#1033)
Start work on breaking up monolithic Reader and Writer classes into dedicated subclasses to make maintenance work easier
2019-06-30 23:42:25 +02:00
Mark Baker 6c25b6f422
Refactor Xlsx Properties Reader code into a separate class (#1001)
* Unit tests for refactoring Spreadsheet properties
* Refactor Xlsx Properties Reader code into a separate class
2019-06-10 16:44:55 +02:00
MarkBaker d6018a273e Codestyle fixes in tests.... spawn of the devil 2019-05-30 12:23:25 +02:00
MarkBaker 9ba96efc97 Still test against 5.6, but with allowed failures, and skip tests explicitly for features that require PHP >= 7.0.0 2019-05-30 12:11:49 +02:00
kraser 906bdc613c Fix failure when parsing xlsx with drawing having double (redefined) … (#945)
* Fix failure when parsing xlsx with drawing having double (redefined) attributes

* Fix failure when parsing xlsx with drawing having double (redefined) attributes
2019-05-30 11:42:00 +02:00
AlexPravdin ebc0b56959 Fix #853 when loading and saving XLSX file with empty drawing cause c… (#882)
* Fix #853 when loading and saving XLSX file with empty drawing cause corrupted output file. Store empty drawing as unparsed entity and save it as is when saving the file.

* Fix code style
2019-05-30 10:38:03 +02:00
Mark Baker 9b004b1e6a
Ignore escaped enclosures within an enclosure when inferring csv separator (#906) 2019-02-25 23:20:50 +01:00
Patrick Brouwers 1c99f4999c [Feature] Html reader improvements (#884)
* Extract character set, so we can convert to UTF-8 if required

* Set column width and row height when defined on tr/td

* Parse align and valign on td

* Specify number format of cell via html attribute

* Formatting of b, strong, i and em tags

* Inserting image in cell when using img tag in html

* Add applying inline styles: border, fonts, alignment, dimensions

* Add tests for applying inline styles
2019-02-16 23:11:16 +01:00
Adrien Crivelli d0dea580ad
Fix a few Scrutinizer issues 2019-01-02 15:38:13 +11:00
Mahmoud Abdo 86c635b3f5
Fix color from CSS when reading from HTML
In case we generate Spreadsheet from html file and the code
in file have text color in css "color:#FF00FF" it will showing
as black color because it will render like rgb content with } "FF00FF}"
So, we fix it by adding missing bracket "{".

Closes #831
2019-01-02 11:57:30 +11:00
Philipp Kolesnikov 8918888e7c
libxml_disable_entity_loader() changes global state so it should be used as local as possible
Fixes #801
Closes #802
Closes #803
2019-01-01 17:25:24 +11:00
Dennis Birkholz e56fbe2745
Fix column names if read filter calls in XLSX reader skip columns
Fixes #777
Closes #778
2018-12-10 20:00:26 +11:00
MarkBaker 3abb7ccb35 CS Complaining about not uisng $this->assertInternalType('object', $scanner); 2018-11-25 14:41:11 +01:00
MarkBaker 14159d985c Coding standards 2018-11-25 14:33:01 +01:00
MarkBaker 41bcf9a21c Support for additional callback in XML Security Scanner 2018-11-25 14:00:35 +01:00
MarkBaker c708411529 Refactor scanner into base reader class 2018-11-25 12:14:54 +01:00
MarkBaker abad49d426 Use factory for XMLcanner 2018-11-23 23:05:17 +01:00
MarkBaker 5854ce3738 phpcs cleanup 2018-11-20 08:18:35 +01:00
MarkBaker 7a06d71e1c Add UTF-7 XXE Unit test data 2018-11-19 23:22:59 +01:00
MarkBaker a4d97ba896 Clean handle charset in XXE scanner 2018-11-19 22:47:34 +01:00
Laurent 79d86ef5cc
Csv reader avoid notice when the file is empty
Fixes #337
2018-10-28 14:16:53 +11:00
Jon Dufresne 5b3870c508
Prefer https:// URLs when available in docs & comments
Fixes #737
2018-10-28 13:55:00 +11:00
Paul Barton 813855b2b2
Fix CSV delimiter detection on line breaks
The CSV Reader can now correctly ignore line breaks inside
enclosures which allows it to determine the delimiter
correctly.

Fixes #716
Fixes #717
2018-10-21 18:23:55 +11:00
bayzhanov 08b4456641
Xls file threw exception during open by Xls reader
Ignore some exception in property, if stream is empty

Fixes #402
Fixes #659
2018-10-07 18:49:01 +11:00
Adrien Crivelli 9fdcaabe3c
Could not open CSV file containing HTML fragment
We now always trust the file extension to avoid false positive of mime
detection for most simple cases. But we still try to guess the mime type
if the file extension does not match or is missing.

Fixes #564
2018-06-25 11:12:27 +09:00
Robin D'Arcy c723833d6f Allow CSV escape character to be set
Fixes #492
Closes #510
2018-05-23 10:31:41 +09:00
Adrien Crivelli e31878ceb1
Check for MIME type to know if CSV reader can read a file
CSV reader used to accept any file without any kind of check. That made
users incorrectly believe that things were ok, even though there is no
way for CSV reader to read anything else that plain text files.

Fixes #167
2018-02-05 21:33:23 +09:00
Adrien Crivelli c96e2dae02
Update to PHP-CS-Fixer 2.10 2018-01-28 15:59:38 +09:00
Adrien Crivelli 481fc4a7c6
Support XML file without styles
Closes #331
Closes https://github.com/PHPOffice/PHPExcel/pull/559
Fixes https://github.com/PHPOffice/PHPExcel/issues/558
2018-01-14 17:08:50 +09:00
Adrien Crivelli 4dd486fb94
Clean up very obsolete links 2017-12-30 19:07:22 +09:00
Adrien Crivelli 139d85d874
Better auto-detection of CSV separators
Closes #305
2017-12-28 12:25:37 +09:00
Adrien Crivelli 32a55a3f13
Introduce identical functional tests across several formats 2017-12-17 16:35:20 +09:00
Adrien Cohen 11b055b29f
Able to set the `topLeftCell` in freeze panes
Fixes #260
Closes #261
2017-12-17 13:32:16 +09:00
Adrien Crivelli 962367c95f
Can read very small HTML files
Fixes #194
2017-12-11 11:09:25 +09:00
Gabriel Caruso aed27a0bed Use PHPUnit\Framework\TestCase instead of PHPUnit_Framework_TestCase (#271)
Use the `PHPUnit\Framework\TestCase` notation instead of `PHPUnit_Framework_TestCase` while extending our TestCases. This will help us migrate to PHPUnit 6, that [no longer support snake case class names](https://github.com/sebastianbergmann/phpunit/blob/master/ChangeLog-6.0.md#changed-1).
2017-11-09 00:48:01 +09:00
Adrien Crivelli 40efcd2fdd
Rename tests according to the class the class they are testing 2017-11-03 12:47:19 +09:00
Adrien Crivelli 557e80dc03
Rename classes to keep them in their related namespaces 2017-10-29 17:39:42 +09:00
Adrien Crivelli 4fd8e742e7
Upgrade to PHP-CS-Fixer 2.7 2017-10-01 20:07:04 +09:00
GreatHumorist 2abe56b946 Support missing attribute `r` in `c` node when reading xlsx
When describing a cell, the cell reference (r="A1") is optional.
When not present, we should just increment the index of the last processed row.

Fixes #201 
Closes #225
2017-09-22 14:49:38 +09:00
GreatHumorist 7aa6233185
Added xml reader hyperlink support
Closes #223
2017-09-22 14:40:47 +09:00
Adrien Crivelli aef4d711f5
Use `self::assert*()` instead of `$this->assert*()`
Because even if it doesn't make a difference in practice, it is
technically more correct to call static methods statically. It
also better advertise that those methods can be used from any context.
2017-09-22 14:22:44 +09:00
GreatHumorist 0477e6fcfe In Xml reader throw exception in case of invalid XML (#222)
When the xml file is not a standard xml file, the `simplexml_load_string` will return false, this will cause an error on "$xml->getNamespaces(true);" . So instead of showing the error, we throw an exception.
2017-09-20 14:20:12 +09:00
Adrien Crivelli 31daed0048
Fix class name case 2017-08-02 23:13:08 +02:00
Zharikov Viktor 07455d24f6
Make global usage of `use` instead of FQCN
Closes #78
Closes #147
2017-05-18 00:10:16 +02:00
Markus Lanthaler 3ee9cc5ce6
Infer CSV delimiter if it hasn't been set explicitly
Closes #141
2017-04-20 17:02:03 +09:00
Adrien Crivelli fd9c925a7b
Refactor CachedObjectStorage to PSR-16
This drop a lot of non-core features code and delegate their maintainance
to third parties. Also it open the door to any missing implementation
out of the box, such as Redis for the moment.

Finally this consistently enforce a constraint where there can be one and
only one active cell at any point in time in code. This used to be true for
non-default implementation of cache, but it was not true for default
implementation where all cells were kept in-memory and thus were never
detached from their worksheet and thus were all kept functionnal at any
point in time.

This inconsistency of behavior between in-memory and off-memory could
lead to bugs when changing cache system if the end-user code was badly
written. Now end-user will never be able to write buggy code in the first
place, avoiding future headache when introducing caching.

Closes #3
2017-04-14 16:56:27 +09:00
Kurounin b01671213a
Removed double un-escaping when reading CSV
Removed "unescape enclosure functionality", since the unescaping is already handled by fgetcsv,
and performing the unescaping again would actually result int the text from the cell being read wrong.

As an example try parsing the folowing CSV:

```
"<img alt="""" src=""http://example.com/image.jpg"" />"
```

With the additional unescaping it would have ended up as:

```
<img alt=" src="http://example.com/image.jpg" />
```

instead of the correct:
```
<img alt="" src="http://example.com/image.jpg" />
```

Fixes https://github.com/PHPOffice/PHPExcel/pull/1171
2017-04-03 11:57:10 +09:00
Adrien Crivelli 93e2204774
Document ODS supported features
This should be completed in the future.
2017-03-06 14:40:27 +09:00
Paolo Agostinetto 9785f926c1 php-cs run: fixed code style for new/changed files 2017-02-20 21:05:25 +01:00
Paolo Agostinetto a0321fd6fd Removed PhpStorm comment on top of file 2017-02-20 21:02:49 +01:00
Paolo Agostinetto c954eddf57 Ods reader: fix sheet count and added a test for sheet names 2017-02-20 21:02:04 +01:00
Paolo Agostinetto 6d6353c0f1 Ods reader: fix reading of cells with hyperlinks 2017-02-18 20:59:14 +01:00
Paolo Agostinetto 1dba2d1766 Ods reader: tests for repeated spaces and rich text 2017-02-18 20:49:48 +01:00
Paolo Agostinetto bcd1bc364c Ods reader: test loading of Worksheets 2017-02-18 13:55:22 +01:00
Paolo Agostinetto 3c7b2e23da Added unit tests for Ods reader 2017-02-18 13:36:08 +01:00
Adrien Crivelli 031af1e9d2
Standardize writers and readers name to be the format most common extension in CamelCase 2017-01-22 17:39:23 +09:00
Adrien Crivelli 8c66afe39a
Upgrade to PHP-CS-Fixer 2.0 2016-12-22 23:46:26 +09:00
Alexander Kurilo 408da0c17a Make HTML checks more strict 2016-11-16 22:21:30 +09:00
Alexander Kurilo 928b592c14 Add readable labels for data provider samples 2016-11-16 22:21:30 +09:00
Alexander Kurilo 2809cce298 Remove unused data from test provider data set 2016-11-16 22:21:30 +09:00
Alexander Kurilo edb3974a0d Move XEEE test data to add data for other readers 2016-11-16 22:21:30 +09:00
Adrien Crivelli 47cde0dadc
Introduce vendor prefix `PhpOffice` to namespace 2016-09-01 02:20:47 +09:00
Adrien Crivelli 29bdbd4e0b
Respect PSR-0 with matching folder name and namespace `PhpSpreadsheetTests` 2016-08-25 13:53:15 +09:00