It is even slower than you say
> The catch is that it takes one hundred per cent longer to do so.
Actually, the web site says it is ~100x slower, not the same thing. 100% slower would be only twice as slow.
Google has open sourced a new compression algorithm called Zopfli that it says is a slower-but-stronger data squasher than the likes of zlib. The product of Googler Lode Vandevenne's 20 per cent time, the day a week Google allows its staff to work on side projects, Zopfli is said to reduce files to sizes 3.7–8.3 per cent …
Indeed.
Also this: 3.7–8.3 per cent smaller than its rivals (PDF).
So...it's 100x slower, for only a 8% decrease in size. Hardly seems worth it, really, given storage is hardly a big issue these days, even on phones.
Maybe he should direct his obvious skills into some new forms of real-time encryption/decryption, which would be more relevant today?
Actually, I understood it fully. But maybe I could have made my point clearer.
Given the meagre 3-8% reduction in size, the disproportionate increase in CPU (and thus battery life) to obtain it will not mean any significant advantage for the consumer,
It might make the carriers happy but even then, 3% (even if *every* device used this new algo), will hardly dent their year-on-year data increases, and 8% probably only works on specific scenarios.
The consumer will hardly notice the benefit on local storage, given the "growth" of the phone's OS and application footprints will probably wipe out any benefits there.
So given the real world useage, chances are, all the user will see is a massive increase in battery drain, with no real benefit, in *either* local storage or improved data transmission performance.
Far better to improve security on the devices, as this will have immediate benefits to the consumer, given it's likely the smartphone will become a significant attack vector for hackers in the coming years. Preferably one with minimum CPU requirements.
You just showed you didn't understand it fully - the intention is this compression would be used only on the server side, and the decompression is no more taxing than existing algorithms (the same in fact), so unless you're running your servers on batteries, this would not affect battery life in any way.
@Silverburn,
The client enjoys less CPU cycles and less bandwidth - it's the server/compressor that has to do way too much work for little benefit.
As for storage -bzip2/xzip is the way to go if you wish to save space (bzip2 is 20% better than zopfli on enwik8 test... and orders of magnitude faster)
On a completely arbitrary and wholly inappropriate dataset, using 7z, lzma2 did 2.8% better than bz2.
Neither of them will decompress with zlib and there's no realistic chance of updating enough of the worlds browsers with alternatives in a hurry - I imagine Microsoft would resist supporting bzip2 with a passion! We have universal zlib because PNG requires it (or a compatible implementation) and the permissive licence makes using real zlib a no brainer. Even for Microsoft ;)
There's no killer application to drive bzip, lzma2 or any other advanced compression. I expect someone to improve the compression speed pretty rapidly so it probably doesn't matter anyway.
Neither of them will decompress with zlib
@Paul,
I quite realize that (having 1st hand experience porting zlib to java, java lacks Z_SYNC_FLUSH support - lame as it gets). However, the idea is that even if one browser starts providing support for a better algorithm and there is a mod for apache/tomcat/nginx/whatever it will be automatically available.
The use of compression is quite flexible w/ http and the user agent (browser) can announce any supported one, it's up to the server to decide. It could be a good selling point and it doesn't require a lot of work actually.
Also, LZMA SDK is placed in the public domain. So there is no issue w/ the licensing terms.
Btw I posted results about lzma (xzip) and the compression level is noticeably improved compared to bzip2 but the cpu cycles are increased an order of magnitude too - still a lot better than the "new compression".
@stanimir: I suspect without IE support it would be as successful as WebM/P, JPEG2000 and countless other badly supported file format changes have been :(
Be nice to see exhaustive search applied to more compressors, to see how far they converge on the same 'perfect' compression ratio. Or more immediately, all the PNG files on web recompressed, though the streamability requirement might limit the gain.
The idea is that the compression works w/ deflate/inflate that's readily available on every browser - so there is no need to change anything on the client side
On a wimpy Opteron 2372 (@2.1GHz) -- benchmark on sizes/speed vs
corpus - 100000000 enwik8
gzip -9 36’445’248 enwik8.gz (~10sec) - matches the PDF results
bzip2 29’008’758 enwik8.bz2 (~17sec)
xz 26’375’764 enwik8.xz (123 sec) [lzma -- the algorithm is just better]
Zopfli 34’995’756 (didn't wish to way 15+ minutes to complete, data from the linked PDF)
there are no stupid questions.
Only moronic commentards who have nothing better to do than fill up the storage space, cpu cycles, and bandwidth with content-free comments trying to indicate how much more clever they are than other commentators.
This post has been deleted by its author
No testing of bzip2 or xz in the benchmarks...
Because. They. Are. Not. Deflate. Compressors.
Good lord. If this wasn't clear enough in the article, it's frickin' crystal in the (very short) paper.
The point of this exercise is to produce a Deflate-compatible output that's smaller than what the best current Deflate compressors produce.
And really, there's very little point in testing against bzip2, other than its relative popularity in the obscure-compressor sweepstakes. The Burrows-Wheeler Transform, while clever, is really nothing more than a roundabout way of constructing a Hidden Markov Model, and consequently a BWT-based compressor is just a less-successful variant of better HMM-based compressors, such as those implementing some version of PPM. The reigning lossless compressors, for minimal output size on English text corpora, seem to be PPM or hybrid ones that use multiple predictive models.
By default deflate (the zlib compression) does not use maximum compression nor max memory for dictionaries. -9 would be the max compression but not max memory
Also gzip uses crc32 that's practically unnecessary over tcp, so they should have tested via simple deflate.
80times as cited in the pdf is just too slow for few percentages for files in sizes of megabytes. The deflate algorithm is essentially 16bit as well. When transferring such files they would be compressed w/ better algorithms no necessary compatible w/ inflate on the client side. Since "web content" like images and video that are large enough are already compressed internally and their is nothing much to gain.
@jamie jones
bzip is not "inflate" compatible on the decompression end.
.
This from the company wanting to push all sorts of advertising on their mobile platform. D'you think in-app adverts and such don't consume part of the allocated bandwidth?
Always follow the money tails and see who gains from this. Either google just want to make more room for the ads, or they are trying to develop their own compression algorithms so that they don't have to pay royalties to anyone. Perhaps Google are going to serve all google searches compressed in this format, and mobile phone makers will have to pay google to use the decompression algorithm???? Either way I'm sure google are not doing this for the benefit of the users.
The fact that this "new" algorithm is x100 slower than other compression methods may explain why it is not used in other compression algorithms, either that or the chocolate factory is following the advise of Mark Weatherford and have employed people who do do not understand the problems of combinatorial explosion in search functions.
Could be useful.
I see a market for a device that sits between a webserver and the Internet and applies Zopfli compression to HTTP replies transparently (gzip-compatible, and gzipped web pages are extremely common to save bandwidth), and caches common results just like a Squid reverse-proxy or similar. Make it an embedded device that does nothing by Zopfli compression, implement the algorithm in an ASIC or similar embedded chip (so the speed difference would hardly begin to matter) and there's no reason it can't lop off another 8% off your data transfer for practically zero cost if you're large enough.
No changes required to your viewers - gzip compressed content is bog-standard - no changes required to your webservers, and an 8% saving in traffic sent externally for the cost of a set of Zopfli boxes on the outgoing lines. I can see where someone like Google, Facebook etc. could make a large profit on their investment there, if they have big enough lines.
(Strangely, the Google homepage does not seem to use gzip compression, but BBC and Facebook appear to).
Probably not for the likes of us mere mortals, though.
and an 8% saving in traffic sent externally for the cost of a set of Zopfli boxes on the outgoing lines
The quoted 8% would be applicable to large enough static text files that are actually cacheable.
If the file is cacheable - the client would likely fetch it once only anyway. So there you have it - if you have huge, huge javascript file Zopfli can be of a tiny use (and you can just compress it yourself anyways instead of using proxy). "Tiny" - b/c the clients will use their own cached version instead of downloading it on each request.
Facebook and BCC would have dynamic content mostly, so they need online compression. Zopfli compression algorithm is way way way too slow for online compression. Deflate is fast enough, though.
Side note: sites shall use 'deflate' not 'gzip' - gzip comes w/ crc32 and extra header.
For reference compared to LZMA algorithm Zopfli still ~25% worse in size and still slower processing.
To save bandwidth some browsers shall start supporting LZMA and announce that in Accept-Encoding header - the server would pick the best from gzip/deflate/lzma and server the content w/o wasting tons of cycles and bandwidth on a dated 16bit compression.
>But using a file format - on the internet - where the header is at the end of the file, is really dumb.
Depends upon the content and the intended use, remember the checksum is always at the end, so it is only when the recipient has received the entire communication will they know if the transfer was corrupt or not. Hence for system updates, for example, it might be wise to wait to receive the entire transmission before starting to process it...
Oh, the ninnies are out in force today.
7zip supports a variety of compression algorithms. 7zip with deflate produces output larger than Zopfli on the tested corpora. 7zip with LZMA is not compatible with Deflate, which is the whole goddamned point.
Incidentally, for large English-text corpora, there are encoders which produce smaller output than LZMA. Do a little research.
But has anybody ever managed to recreate the neutron packing density Novel achieved on their 3.5 inch driver disk for Netware 3.12? You put that thing in and could leave the top floor of the World Trade Center (cause it was still there back then) walk down to the corner sit at the counter, order a cup of coffee and a danish, finish both, and then come back as the disk was just about done unpacking.
How big are the java scripts on youtube? Say 100k? So saving at least 3k on millions (billions?) of transfers per day seems a small trade for a one off cost of 100x compression time. Any extra power / time used will be saved many times over daily, not just for Google, but across the internets routers. The pdf says it's deflate compatable, so it sounds like the only change needed is server side to use the compression. The server would need to be smart enough to know what's static content and what's not, which is why I used js as my example - easy target to implement :)
I Already mentioned it -> the javascript would be downloaded once only, as it'd be cached by the user-agent.
Simple rule - if a resource can be served by that algorithm it'd be done once only - also the algorithm will not be so "good" on smaller resources.
Compared to the video sizes - 8k (optimistic ratio of your quote 100k) once is less than a drop in the sea.
Took a quick browse through the code, and it looks like Zopfli is pretty much stock Deflate, augmented with more testing to determine the best places to split blocks and recalculate the dynamic tree. So it's pretty much exactly what you'd do if you wanted to improve the compression ratio of Deflate: put more work into figuring out the optimal Deflate representation of the input.
So not groundbreaking, but a perfectly solid piece of incremental improvement.
Still, it's fun to see yet another bit of performance squeezed out of Lempel-Ziv '77. That algorithm is older than most of the people using it.