Introducing ZIP Foundation
Tuesday 04. July 2017 16:28:30 UTC+02:00 -
The Cocoa developer community already provides several useful and well-maintained ZIP libraries. But while working on my next app, I realized that I have some requirements that were not met by the existing libraries I found. So I set out to create my own with the following design goals in mind:
- High performance
- Random access to specific entries within an archive
- Deterministic memory consumption
- Stream based API to read from or write to archives
- Low-maintainance package dependencies (ideally the ZIP library should not pull in any 3rd party code at all)
- Usable on the server (Linux)
I recently open sourced my results on GitHub. The project is called ZIP Foundation and it is a Swift framework to create, read and update ZIP archive files. It uses Apple's libcompression to handle de- and encoding of compressed archives.
Unlike most other Cocoa ZIP framework authors, I decided against wrapping an existing C implementation (like the excellent
minizip
) and implemented the
ZIP specification
by myself. By doing that, I learned a lot about the format, which later allowed me to write a minimal and modern implementation of the specification.
Writing everything from scratch also has the nice side effect that ZIP Foundation has no 3rd party dependencies and that it was relatively easy to port it to Linux.
Another design goal was a modern, Swift-style API. The API should provide easy to use high-level methods on one hand and fine grained low-level access to ZIP entries on the other.
The high level API is implemented as extension to FileManager
and only consists of two methods: zipItem
and unzipItem
.
It allows you to zip and unzip files and directories with a single line of code:
try fileManager.zipItem(at: sourceURL, to: destinationURL)
and
try fileManager.unzipItem(at: sourceURL, to: destinationURL)
To access individual entries, you can use the Archive
class. Archive
conforms to Sequence
and therefore supports subscripting. To access an arbitrary entry within an archive you can simply write:
guard let entry = archive["file.txt"] else {
return
}
Archive
also provides a closure-based API to access and modify ZIP files. This allows you to perform ZIP operations on-the-fly, without round trips to the file system.
To learn more about that, please refer to the
README on GitHub
or the method documentation in Xcode.
Performance
High performance was very high on my requirements list and so I designed the framework for it right from the beginning.
With macOS 10.11/iOS 9.0, Apple introduced a new compression library: libcompression. It comes with several compression encoders and decoders, including a highly optimized version of the zlib decoding algorithm. Zlib is available on virtually all platforms and its deflate implementation is the most common compression method used for ZIP archives. Besides its wide availability, zlib also provides a good balance between compression ratio and encoding/decoding speed. It is open source and has been constantly improved since its initial release in 1995. Apple somehow managed to speed-up the
decoding part of zlib by 60%
while still maintaining compatibility. Given that impressive decoding speed, using libcompression for ZIP Foundation was an obvious choice. The library is used on all Apple platforms (iOS 9.0+ / macOS 10.11+ / tvOS 9.0+ / watchOS 2.0+). On Linux, ZIP Foundation falls back to zlib.
For the following performance tests, I wrote a small command line utility that performs extract, add and remove operations with various entry sizes. The baseline performance is established by the default zip/unzip command line tools on macOS. Each operation is performed 10 times for each test entry size. The final result is the average of the ten runs in milliseconds. ZIP Foundation was compiled with -Ofast
and -whole-module-optimization
. The benchmark was executed on macOS Sierra 10.12.5. You can download the utility
here
.
During extraction, ZIP Foundation heavily benefits from the fast zlib decoding in libcompression. When taking into account how common zlib decoding is in everyday computing (not only for ZIP files but also for ubiquitous formats like PNG), it soon becomes clear that Apple's engineers really helped us to save a lot of time and battery with that optimization.
In WWDC Session 712/2015 , the libcompression engineers mention that they only optimized the decoding part of zlib. The graphs for addition and removal clearly show that. The small gains were all achieved by tweaking IO and compression buffer sizes.
Code Quality
As ZIP is often used as a container format for proprietary file formats, reliability and performance should be a high priority for every ZIP framework. ZIP Foundation is my first non-trivial project that is fully covered by unit and performance tests.
I don't think that achieving 100% coverage will instantly guarantee a certain level of quality. However, I found that while aiming for it, I had to permanently question program flow, API design and type choices. That lead to a lot of refactoring which in the end made the framework more robust and easier to use. Another advantage of 100% coverage is, that it uncovers dead code. If certain code paths are not exercised after refactoring, they are either dead or the refactoring has introduced flawed logic. Both aspects are also noticeable in code bases with less than 100% coverage, but the root causes would not be as easy to spot. (To easily find uncovered code, it helps to have
Code Coverage Visualization turned on
.)
Non-Apple Platform Support
When Apple open sourced Swift, they also made an initial push to bring the language to non-Apple platforms. As ZIP Foundation does not use any platform specific dependencies, I decided to support Linux right from the beginning. Apple provides a
Swift implementation of Foundation
, so it was possible to get a buildable framework with very little effort. I used Ole Begemann's excellent tips to get a
Linux test environment for Swift projects
using Docker. While getting the project into a compilable state was not too difficult, I soon ran into a wall when trying to execute the project's test suite. The main issues all stemmed from the fact that Apple's open source implementation of Foundation.framework is far from feature complete at the moment (The current Swift development version at the time of writing is 4.0). For instance fundamental methods like FileManager replaceItem
are missing. Another class that currently only offers a subset of the
real
Foundation.framework was (NS)URL, where things like retrieving file metadata via resourceValues(forKeys keys: [URLResourceKey])
is not possible. Luckily Swift has built-in interoperability with C APIs and so I was able to use POSIX counterparts of the aforementioned methods to work around most methods that threw NSUnimplemented
.
Conclusion
ZIP Foundation was my first 100%-Swift project and I have learned a lot about ZIP, Swift and framework development along the road.
I used Swift since its introduction, but mostly for high level UI code and app development. During the course of this project, it also turned out as good fit for lower level tasks such as large file processing and compression.