Historyqip zip
They follow where the delivery trucks go. Some ZIP Codes are special cases. Among them are "military" ZIP Codes, which include everything from military bases domestic or otherwise to battleships at sea. Then there are "unique" codes. Businesses and organizations sometimes get their own ZIP Codes, due to the volume of mail they send and receive.
These are frequently benefiting from bulk mailing discounts, since the organization usually has a mail department that 1 presorts mail before giving it to the USPS, and 2 distributes mail internally so the USPS doesn't have to. Using full ZIP Codes ensure the fastest, most accurate mailing possible.
They're only provided for an address once the address has been standardized , validated and proven real. These codes indicate a specific delivery route , meaning the actual path the mail truck would travel in a single drop-off.
Usually this comprises ten to twenty homes or locations. Because ZIP codes plus 4 extra digits are based on delivery routes instead of more permanent boundaries, the last 4 digits of a complete ZIP Code can change often. Not so for the full 9-digit ZIP Code. For another thing, they can help get you those bulk mailing discounts.
Another important benefit to using the last four digits of ZIP Codes is delivery speed. That's right, your mail can show up faster if you label things right. The first queue contains the original symbols, sorted by frequency. When a composite symbol is created, it is added to the second queue.
This way, the lowest-frequency symbol will always be found at the front of one of the queues. While Huffman codes are optimal as far as prefix-free codes go, there are more efficient ways to encode data beyond prefix coding, such as Arithmetic coding and Asymmetric numeral systems.
By walking down the tree from the root and using 0 for left branches and 1 for right branches, we ended up with the following code:. The decision to use 0 for left 1 for right branches seems arbitrary. If we do the reverse we get:. In fact, we can label the two edges from a node with 0 or 1 arbitrarily as long as the labels are different and still end up with an equivalent code:.
This shows that while Huffman's algorithm gives the requisite codeword lengths for a minimum-redundancy prefix-free code, there are many ways of assigning the individual codewords. Given codeword lengths computed by Huffman's algorithm, a Canonical Huffman code assigns codewords to symbols in a specific way.
This is useful because it makes it sufficient to store and transmit the codeword lengths with the compressed data: the decoder can reconstruct the codewords based on the lengths. One could of course also store and transmit the symbol frequencies and run Huffman's algorithm in the decoder, but that would require more work for the decoder and likely more storage space too.
Another very important property is that the structure of canonical codes facilitates efficient decoding. The idea is to assign codewords to the symbols sequentially, one codeword length at a time. The initial codeword is 0. The next codeword of some length is the previous one plus 1. The first codeword of length N is constructed by taking the last codeword of length N-1, adding one to get a new codeword and shifting left one step to increase the length. Viewed in terms of a Huffman tree, codewords are assigned in sequence to the leaves in left-to-right order, one level at a time, shifting left when we move down one level.
The first codeword is 0. That is also the last codeword of length 1. For length 2, we take the 0, add 1 to get the next code which will be the prefix of the two-bit codes: we shift it left and obtain That is also the last codeword of length 2. To get to length 3, we add one and shift: To get the next one of length 3, we add one: The implementation for generating the canonical codes is shown below.
Note that the Deflate algorithm expects codewords to be emitted LSB-first, that is, the first bit of a codeword should be stored in the least significant bit. This means we have to reverse the bits, which can be done using a lookup table.
The most basic way of doing Huffman decoding is to walk the Huffman tree from the root, reading one bit of input at a time to decide whether to take the next left or right branch. Once a leaf node is reached, that is the decoded symbol.
The method above is often taught at universities and in textbooks. It is simple and elegant, but processing one bit at a time is relatively slow. A very fast way of decoding is to use a lookup table. For the code above where the max codeword length is three bits, we could use the following table:. Although there are only four symbols, the table needs to have eight entries to cover all possible three-bit inputs.
Symbols with codewords shorter than three bits have multiple entries in the table. For example, the 10 codeword has been "padded" to 10 0 and 10 1 to cover all three-bit inputs starting with To perform decoding using this method, one would index into the table using the next three bits of input, and immediately find the corresponding symbol and its codeword length.
The length is important, because even though we looked at the next three bits, we should only consume as many bits of input as the actual codeword is long. The lookup table approach is very fast, but there is a downside: the table size doubles with each extra bit of codeword length. This means that building the table becomes exponentially slower, and using it may also become slower if it no longer fits in the CPU's cache.
Because of this, a lookup table is typically only used for codewords up to a certain length, and some other approach is used for longer codewords. As Huffman coding assigns shorter codewords to more frequent symbols, using a lookup table for short codewords is a great optimization for the common case. The method used by zlib is to have multiple levels of lookup tables. If a codeword is too long for the first table, the table entry will point to a secondary table, to be indexed with the remaining bits.
However, there is another very elegant method based on the properties of canonical Huffman codes. Consider the codewords from our canonical code above: 0, 10, , We will keep track of the first codeword of each length, and where in the sequence of assigned codewords it is, the "symbol index". Because the codewords are assigned sequentially, once we know how many bits of input to consider, the table above lets us find out what symbol index those bits represent.
For example, for the 3-bit input , we see that this is at offset 1 from the first codeword of that length The first symbol index of that length is 3, and the offset of 1 takes us to symbol index 4. Another table maps the symbol index to the symbol:. As a small optimization, instead of storing the first symbol index and first codeword separately, we can store the first symbol index minus the first codeword in a table:.
To determine how many bits of input to consider, we again use the sequential property of the code. In our example code, the valid 1-bit codewords are all strictly less than 1, the 2-bit codewords are strictly less than 11, and the 3-bit codewords are strictly less than trivially true for all 3-bit values. In other words, a valid N-bit codeword must be strictly less than the first N-bit codeword plus the number of N-bit codewords.
What is even more exciting is that we can left-shift those limits so that they are all 3 bits wide. Let us call them the sentinel bits for each codeword length:. This means we can look at three bits of input and compare against the sentinel bits to figure out how long our codeword is. Once that is done, we shift the input bits as to only consider the right number of them, and then find the symbol index as shown above:. The time complexity of this is linear in the number of codeword bits, but it is space efficient, requires only a load and comparison per step, and since shorter codewords are more frequent it optimizes for the common case.
Deflate, introduced with PKZip 2. It is also the compression method used in gzip, PNG, and many other file formats. It uses LZ77 compression and Huffman coding in a combination which will be described and implemented in this section. Although those methods are rarely seen in use today, they were still in use some time after the introduction of Deflate since they required less memory. Those legacy methods are covered in a follow-up article. Deflate stores Huffman codewords in a least-significant-bit-first LSB-first bitstream, meaning that the first bit of the stream is stored in the least significant bit of the first byte.
For example, consider this bit stream read left-to-right : When stored LSB-first in a byte, the byte's value becomes 0b binary or 0x19 hexadecimal. This might seem backwards in a sense it is , but one advantage is that it makes it easy to get the first N bits from a computer word: just mask off the N lowest bits. The following routines are from bitstream. For our Huffman decoder, we want to look at the next bits in the stream enough bits for the longest possible codeword , and then advance the stream by the number of bits used by the decoded symbol:.
For the output bitstream, we write bits using a read-modify-write sequence. In the fast case, a bit write can be done by a bit read, some bit operations, and a bit write. We also want an efficient way of writing bytes to the stream. One could of course perform repeated 8-bit writes, but using memcpy is much faster:. Since the compression algorithm is called Deflate —to let the air out of something—the decompression process is sometimes referred to as Inflation.
Studying this process first will give us an understanding of how the format works. The code is available in the first part of deflate. Deflate-compressed data is stored as a series of blocks. Each block starts with a 3-bit header where the first least significant bit is set if this is the final block of the series, and the other two bits indicate the block type. There are three block types: uncompressed 0 , compressed with fixed Huffman codes 1 and compressed with "dynamic" Huffman codes 2.
The following code drives the decompression, relying on helper functions for the different block types which will be implemented further below. The simplest block type is the non-compressed or "stored" block.
It begins at the next 8-bit boundary of the bitstream, with a bit word len indicating the length of the block, followed by another bit word nlen which is the ones' complement all bits inverted of len. The idea is presumably that nlen acts as a simple checksum of len : if the file is corrupted, it is likely that the values are no longer each others' complements, and the program can detect the error.
After len and nlen follows the non-compressed data. Because the block length is a bit value, it is limited to 65, bytes. Compressed Deflate blocks use Huffman codes to represent a sequence of LZ77 literals and back references terminated by an end-of-block marker.
One Huffman code, the litlen code , is used for literals, back reference lengths, and the end-of-block marker. A second code, the dist code , is used for back reference distances. The litlen code encodes values between 0 and Values 0 through represent literal bytes, is the end-of-block marker, and values through represent back reference lengths.
Back references are between 3 and bytes long. The litlen value determines a base length to which zero or more extra bits from the stream are added to get the full length according to the table below.
For example, a litlen value of indicates a base length of 19 and two extra bits. Adding the next two bits from the stream yields a final length between 19 and Note that litlen value plus five extra bits could actually represents lengths —, but the specification indicates that , the maximum back reference length, should be represented using a separate litlen value.
This is presumably to allow for a shorter encoding in cases where the maximum length is common. The decompressor uses a table that maps from litlen value minus to base length and extra bits:.
The fixed litlen Huffman code is a canonical code using the following codeword lengths — are not valid litlen values, but they participate in the code construction :. Back reference distances, ranging from 1 to 32,, are encoded using a scheme similar to the one for lengths. The dist Huffman code encodes values between 0 and 29, each corresponding to a base length to which a number of extra bits are added to get the final distance:. The fixed dist code is a canonical Huffman code where all codewords are 5 bits long.
Note that as an optimization when there is enough room in the output buffer, we output back references using the routine below which copies 64 bits at a time. In fact, short back references will now all be handled by a single iteration, which is great for branch prediction.
Deflate blocks using dynamic Huffman codes work similarly to the blocks described above, but instead of using pre-determined Huffman codes for the litlen and dist codes, they use codes that are stored in the Deflate stream itself, at the start of the block.
The name is perhaps unfortunate, since dynamic Huffman codes can also refer to codes that change during the coding process, sometimes called adaptive Huffman coding. The codes described here have nothing to do with that; they are only dynamic in the sense that different blocks can use different codes.
The litlen and dist codes for a dynamic Deflate block are stored as a series of codeword lengths. Those codeword lengths are themselves encoded using a third Huffman code, which we will call the codelen code.
Did I mention it was intricate? At the beginning of the dynamic block are 14 bits that define the number of litlen, dist, and codelen codeword lengths that should be read from the block:. After those bits follow the codeword lengths for the codelen code.
It is for this reason the lengths are in a special order: to increase the chance that latter lengths will all be zero and do not have to be stored in the block. With the codelen decoder set up, we can proceed to read the litlen and dist codeword lengths from the stream. Lengths 16, 17, and 18 are not real lengths, but indicate that the previous length should be repeated some number of times, or that a zero length should be repeated:.
They could not be read separately, because code length runs can carry over from the last litlen lengths to the first dist lengths. With the codeword lengths ready for use, we can set up the Huffman decoders and return to the task of decoding literals and back references:. From the sections above, we have all the tools needed for Deflate compression: Lempel—Ziv, Huffman coding, bitstreams, and the description of the three Deflate block types.
This section puts the pieces together to finally perform Deflate compression. Lempel—Ziv compression parses the source data into a sequence of back references and literals. This sequence needs to be divided and encoded into Deflate blocks as described in the previous section. Choosing how to do this division is sometimes referred to as block splitting. On the one hand, each new block carries some overhead which varies depending on block type and contents, so fewer blocks means less overhead.
On the other hand, the overhead from starting a new block might be worth it, for example if the characteristics of the data lead to a more efficient Huffman encoding in the new block and smaller output overall. Block splitting is a difficult optimization problem. Some compressors such as Zopfli try harder than others, but most just use a greedy approach: output a block once a certain size has been reached. To be able to freely choose any of the three types for block, we limit the block size to at most 65, bytes:.
We use a structure to keep track of the output bitstream and the contents of the current block during deflation:. The interesting part is of course writing the blocks. Writing an uncompressed block is straight-forward:. To write a static Huffman block, we first generate canonical Huffman codes based on the fixed codeword lengths for the litlen and dist codes. Then we iterate through the block, writing the symbols using those codes:.
Dynamic Huffman blocks are of course the trickiest to write, since they include the intricate encoding of the litlen and dist codes. We will use this struct to represent their encoding:. First, we drop trailing zero litlen and dist codeword lengths, and copy them into a common array for encoding. We cannot drop all trailing zeros: it is not possible to encode a Deflate block with fewer than one dist code.
It is also not possible to have fewer then litlen codes, but since there is always an end-of-byte marker, there will always be a non-zero code length for symbol Once the code lengths are in a single array, we perform the encoding, using special symbols for runs of identical code lengths. The symbols used in the encoding above will in turn get written using a Huffman code, the "codelen code". The codeword lengths of the codelen code are written to the block in a certain order, with lengths more likely to be zero coming last.
A function is used to count how many of the lengths that need to be written:. Assuming we have the litlen and dist codes set up, the encoding of their codeword lengths, and the code for that encoding, we can write the dynamic Huffman block:. See System Requirements. Available on PC. Description Your all-in-one compression and file management system is here. Show More.
People also like. Zip Rar Extractor Free. Zip Extractor Pro - Free Free. Introducing the new WinZip. What's new in this version NEW! Additional information Published by WinZip Computing. Published by WinZip Computing. WinZip is a registered trademark of Corel Corporation. Approximate size Age rating For all ages.
0コメント