Compress? Let's listen to what Mr Stallman has to say on that:
Google wants to bring to life the HBO series Silicon Valley: it's pitching a new open source compression algorithm into the world, with the hope that it can eventually end-of-life the venerable Deflate. Brotli (“small bread” in Swiss German, apparently) follows on from Zopfli (“little braid,” also bread-themed), but with …
PKWare patented it in 1990, and first used it in PKZip 2.
It was soon found that the algorithm could be done just as effectively without using the patented methods, which is why it was used so widely - hence the RFC.
I'll happily admit that I had to go and check Wikipedia for the details, but did so because I was sure it predated 1996 due to the PKZip connection - I'm pretty sure I had PKZip 2 in 1993ish, on my trusty 8086 based computer. Not that I used it to create many archives, as LHA was often better at compressing and I soon found ARJ, which spanked everything else comprehensively until the arrival of the early DOS-based RAR archiver and solid archiving. It was a good time for compression then - plenty of different solutions, lots of competition...
Ahem. Sorry - got off topic there. Anyway, Deflate - older than you think, by around six years...
.. that adding a very large static dictionary could reduce output size at the cost of increasing the size of the (de)compressor?
I have a new algorithm that works much better than this one. If the file is a single 'one' byte, it decompresses to the King James Bible. If it's 'two', it decompresses to a dvd rip of Deep Throat. If...
Why yes, the program is rather large, but look at the compression ratios!
This is why many compressor comparisons include the size of the compressor (inc any static data) in the results and your compressor would instantly rank as the worst ever!
For Googles specific use on the net/web it's a no-brainer and particularly useful for small payloads.
It seems to offer fractional improvements in speed and compression ratio for the cost of incompatibility and a larger program. I'd say it won't stand out (we all still use .zip archives even though alternatives are faster), but maybe google can push it simply because they already own your browser.
In tests … claimed Brotli running at 3.381:1 compression ratio could compress at 98.3 MB/s
Deflate could run at … 2.913:1 compression ratio was 93.5 MB/s.
Is it just me or do those numbers seem underwhelming to others, considering that compatibility is broken? Compression ratio goes up a respectable 30% and speed around 6%. I Would have thought dedicated silicon might deliver better results without changing the format.
The static dictionary sounds an interesting idea though I shudder to think what's in it!
After running some tests it does look like they carefully cherry picked the publicised performance, what I saw was a nearly random pattern of relative compression compared to lzma and deflate. In one test deflate beat it by >10%!
The lzma comparison was more interesting, with the quality at usable settings it was comfortably faster with roughly the same compression ratio. That's pretty surprising given the use of huffman coding. Will definitely be trialling it some more, lzma just isn't fast enough for us (neither is lzham), this might just be.
A future version with arithmetic coding could be very interesting, once the current patent woes on ANS are dealt with.
I don't believe there is any 7z compression algorithm. 7z is a file format.
The 7z compressor supports a variety of compression algorithms. Maybe you're thinking of LZMA, which is probably what 7z is most notable for. 7z also supports Deflate, Bzip2, PPMd, and LZMA2 - the last a moderately enhanced LZMA without any major algorithmic differences.
Deflate is basically a Lempel-Ziv variant. It's Huffman encoding - use shorter symbols to represent the more common sequences - that adapts as it runs over the input. (The original Huffman algorithm built one dictionary for the entire input.)
Bzip2, which is based primarily on the Burrows-Wheeler Transform, and PPMd (Prediction by Partial Matching) are both approaches that model the input and compress by predicting what's most likely to come next, though the BWT does it only implicitly by rearranging the input to clump the information entropy. Ultimately though they're both constructing Hidden Markov Models of the input stream.
LZMA is another Lempel-Ziv variant, but after the LZ compression it applies a second layer of entropy coding using range coding. Range coding is an older variant of arithmetic coding that is no longer patent-encumbered. Arithmetic coding schemes in effect let you use partial bits, reclaiming some of the inefficiency of whole-bit-codes like Huffman. This second stage apparently (I haven't looked at it closely) does some prediction using Markov chains.
Brotli appears to take the same approach: LZ compression followed by another layer of entropy coding, plus various techniques like the built-in dictionary and tuning of parameters. Discarding the Deflate format also lets it use one that's tuned specifically for it. The authors say it compresses the Canterbury corpus a little better than LZMA does.