From: Leandro Lucarella Date: Mon, 14 Jun 2004 03:43:32 +0000 (+0000) Subject: Se agrega documentación y una implementación de prueba de PPMC. X-Git-Tag: svn_import~170 X-Git-Url: https://git.llucax.com/z.facultad/75.06/jacu.git/commitdiff_plain/0f7b83372a77836665757c3b4037f63204e349f0?hp=36b1973b672029dbb51c14fdd3e99d485f326e0a Se agrega documentación y una implementación de prueba de PPMC. --- diff --git a/doc/ppmc/ac_ppmc.html b/doc/ppmc/ac_ppmc.html new file mode 100644 index 0000000..6e154a3 --- /dev/null +++ b/doc/ppmc/ac_ppmc.html @@ -0,0 +1,3134 @@ + + + + + + + + + Implementing ppmc with hash tables by Arturo Campos + + + +
"Implementing ppmc with hash tables" +
by +
Arturo San Emeterio Campos
+ +


+
+
+
+
+
+
+

Disclaimer +
Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved. +Permission is granted to make verbatim copies of this document for private +use only. +
  +

Download +
Download the whole article zipped. It +includes the C source code as well as compiled binaries for Dos. +
  +

Table of contents +

  • +Introduction
  • + +
  • +Ppmc, history review and data structures used
  • + +
  • +Some terminology and a brief look to a model
  • + +
  • +Cumulative probabilities and how to +manage them
  • + +
  • +Escape method C, how to compute and use it
  • + +
  • +Linked lists for the counts of the bytes
  • + +
  • +Coding and decoding with +linked lists
  • + +
  • +Hash tables for contexts
  • + +
  • +Global variables
  • + +
  • +Functions nomenclature and use
  • + +
  • +The main part
  • + +
  • +Order-(-1)
  • + +
  • +Order-0 and order-1
  • + +
  • +Order-2
  • + +
  • +Order-3 and order-4
  • + +
  • +Memory requirements and management
  • + +
  • +How to implement the fast order-3 hashed +version
  • + +
  • +Adding full exclusion to the already done functions
  • + +
  • +Compression and speed results of the different +implementations
  • + +
  • +References
  • + +
  • +Closing words
  • + +
  • +Contacting the author
  • + +
      +

      +
      +
      +
      +
      +
      +
      +
      +

    Introduction +

    This article is about ppmc, a lossless compression +algorithm which achieves compression higher than some commercial products. +This articles shows in deep how to implement ppmc using hash tables and +linked lists. Different versions of it are explained, ppmc with lazy exclusions, +ppmc with full exclusions (slower than the previous but achieving more compression) +and a tricky version of ppmc, ppmc order-3 +hashed, the only ppm implementation which is practical in the compression/speed/memory +tradeoff. Ppmc can be used as an standalone compressor or with other algorithms +like lzp for coding of the literals [14], +due to his high compression level. You are encouraged to see the +results +in speed and compression to see if this is the algorithm that you need. +Understanding ppmc is the key to understand ppmz, the state of art in text +compression [3]. The explanations follow the code +that is released along with the article. This article presupposes a theory +which is explained in my article about finite context modeling +[6]. +

    The article is distributed in the following way: +Some historical review is done. After that, some terminology +is explained. First I talk about +cumulative +probabilities, linked lists, the probability +of the escape code, and hash +tables for contexts. These sections describe all the algorithms and +code used for managing the model and encoder which are later needed. Then +we see how to put everything together. Some global +variables and data structures are discussed, the initialization and +some other code is explained too. The next section explains more about +the code and the functions for +coding, decoding and updating. Then the algorithm is commented from the +lowest +order till order-4, once there it's only a matter +of implementing the code explained before. Due to the fact that the data +structures and functions from one context don't depend on others, the idea +is to have order-(-1) working before starting implementing order-0, and +so on. This makes programming very easy, and it helps finding bugs. My +implementation uses bytes as symbols as its goal is to compress files. +At first all the functions will be explained for lazy exclusion. Then there's +a section about how much memory +needs every data structure and every order, and how it's managed. After +this there's a section about full exclusion +which shows how to make ppmc with full exclusions, from the code already +done. Also I talk about a fast order-3 hashed +ppmc implementation, which is as fast as order-2 but achieving more compression +(not as much as real order-3 of course). At the end of the article you +may find the +results of different +ppmc implementations, in both compression and speed. The algorithm is explained +in the way I found easier to understand, the basic order-0 is explained, +then the concept "context" is used order-1. We make order-2 from order-1 +using linked lists for probabilities, and a hash table for contexts. Based +on order-2 we make both order-3 and order-4, which are a natural extension, +using a hash table and linked lists for storing the contexts. Every concept +is based on the previous, and thus I don't add difficulties to the already +complex process of learning. +
      +

    Ppmc, history review and data structures +used +

    Prediction by Partial Matching is the Markovian +model which achieves higher compression. Markovian models are those who +have the Markov property: the probability of the current symbol only depends +on the symbols which appeared before. Due to the increasing power and memory +available in today's machines some ppm implementations are practical. Some +others are not like the actual state of art in lossless text compression: +ppmz2 by Charles Bloom. Other states of art are CTW [13] +and the switching schemes of Paul Volf (though nowadays they are still +hardly know). The main programming difficulty of ppm models is the data +structure that holds all the contexts to be used. The first data structure +used was the context trie in [3] or [4]. +The context trie uses a linked list for every order. In every node there's +a byte and its count, and pointer to the next element in a linked list +containing the rest of the bytes under that context. For traversing the +structure there's a pointer to the next higher order, obviously at the +root of the tree you can find order-0. Optionally (for higher speed) there's +another pointer to the next lower order (used when escaping). This data +structure is used for bounded (limited) orders of a length of 5 or so. +But for unbounded orders (that is, of non previously chosen length), which +are very long, this is not satisfactory due to the high memory usage. +

    The patricia tree and, mainly, the suffix tree are +a solution to this problem. The suffix tree is used in ppmd [5]. +However suffix trees are more difficult to implement than other solutions. +

    In [8] hash tables were used, +I've not checked this paper though. Thus I'll explain how to implement +ppmc with hash tables without following it. The ideas exposed in this article +are based on the implementation of ppmz by Malcolm Taylor. Hash tables are +easy to learn, so they fit quite well this article. Moreover they are +the appropriate structure structure for ppmz, and another of the goals of +this article is to learn the base of ppmz. Just as a little overview of +ppmc with hash tables, every context has its own hash table and linked +lists where the bytes (or symbols) and counts are stored, then it's only +a matter to read the count of a symbol, compute probabilities and code +it. All the theory behind ppm was explained in "Finite context modeling" +[6] +so it's recommended to read it before this one. Along with this article, +I've released my implementation of ppmc in C. I'll not try to focus much +on it, because this would cause problems to the reader interested in implementing +it in other language, anyway I'll follow it. +

    Ppm was invented by John Cleary and Ian Witten back +in 1984. The original paper [9], talked about ppma +and ppmb. The suffix letter ('a' or 'b') denotes the escape code used in +the algorithm, however the rest of the algorithm is roughly the same. In +the book "Text compression" +[2] all of them were explained. +Ppmc was presented by Alistair Moffat in [7]. In the +paper by Moffat also ppmc' was presented, which is ppmc but using lazy +exclusion. Other variants such as ppmd,  ppm* or ppmz fall outside +the scope of this article, however I'll probably talk about them too in +a near future. The difference between ppmc and ppma or ppmb are only the +probability computation for the escape code. +

    As we saw in [6] ppm variants +are finite context models. Practical implementations don't use blending. +They use escape codes instead. They fallback to lower orders when needed. +That is, they start at the highest order initialized, if the byte is present +they code it under the current context, otherwise they emit an escape code, +fallback to (use) a lower context and try to code the byte under that context. +The last context (order-(-1)) contains all the bytes, so we'll surely find +it there if it wasn't present in higher orders. We code the probability +with arithmetic coding, or any variant (like in the case of the source, +the range coder +[11]). Once the byte is coded, it's +only a matter of updating the probability of the symbol which we have just +coded. As we are going to see later, ppmc uses update exclusions, which +does not only make it faster but also achieves slightly better compression. +How to implement update exclusions is explained in the section Global +variables. +

    If we are using escape codes, when we are in a given +context we exclude lower orders from probability computing, because we +just take in account the probabilities of current one,  that's ppmc +using lazy exclusions. If we exclude symbols which have appeared in higher +orders from the probability computation that's called full exclusion. +

    Once we have the probabilities, we code them optimally +(respect to the entropy) with arithmetic coding, which is explained in +[10] and [2]. The range coder, +a variant of it is explained in [11] and the original +source code can be found in [12]. +
      +

    Some terminology and a +brief look to a model +

    This is only a brief look to some terms which may +cause problems, for more theory you are encouraged to read [6] +Context are the symbols or symbol which come before or after current one +(in ppmc only before). Order-n refers to n bytes of context. In a given +message a symbol can appear k times, this is its frequency. From there +the model computes its probability, which in ppmc comes in the form of +probability/total_probability, like 1/4. Ppmc rescales the frequencies, +so they are called counts instead. If we have the message "bacabac" the +ppmc model would look like the following if we don't take the escape code +in account (bytes in order of appearance). +
      + + + + + + + + + + + + +
    Order-3: +
    Context: "bac" list: 'a'-1 +
    Context: "aca" list: 'b'-1 +
    Context: "cab" list: 'a'-1 +
    Context: "aba" list: 'c'-1
    Order-2: +
    Context: "ba" list: 'c'-2 +
    Context: "ac" list: 'a'-1 +
    Context: "ca" list: 'b'-1 +
    Context: "ab" list: 'a'-1
    Order-1: +
    Context: "b" list: 'a'-2 +
    Context: "a" list: 'c'-2  'b'-1 +
    Context: "c" list: 'a'-1
    Order-0: +
    Context: "" list: 'b'-2  'a'-3  'c'-2
    + +

    Under order-0 the probabilities are: 'a' 3/7, 'b' 2/7 and 'c' 2/7. For +more examples, see [6]. +
      +

    Cumulative probabilities +and how to manage them +

    Arithmetic coding optimally encodes a symbol given +its probability (optimal compared with the entropy, which has been proven +to be the minimum, read [1]). In arithmetic coding +we distribute the symbols in a probability line (between 0 and 1), then +to encode a symbol we need to know its cumulative probability and its probability. +The cumulative probability of the first symbol in the probability line +is 0. For any other symbol its cumulative probability is the sum of all +the probabilities of the symbols before it. We do so to assign ranges of +this probability line to every symbol, and that thus we can latter know +which is the encoder symbol, knowing only its cumulative probability. +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: abcde
    Count: 23012
    Probability:2/83/80/81/82/8
    Cumulative probability: 0 (0.0)2 (0.25)5 (0.625)5 (0.625)6 ( 0,75)
    Range:[0 , 0.25) [0.25 , 0.625) [0.625 , 0.625) [0.625 , 0.75) [0.75 , 1) 
    + +

    In the example above we have in first file the count +of the byte, in some cases that could be the frequency. But as we'll see +later ppmc uses renormalization, and thus it's the count instead. The second +has the probability of the byte, note that 8 is the total count (the sum +of all the counts). Then the cumulative probability is computed for every +symbol as explained above. The last file has the theorical range assigned +for this symbol. The symbol 'c' has an invalid range, however it's supposed +that we'll never try to code it (in a real implementation of ppmc instead +we would have coded an escape code). When implementing ppmc you don't need +to compute the range for the symbols. Arithmetic coding only needs the +count of the symbol, its cumulative probability and the total count (the +sum of all the counts). +

    Though in my implementation and some other you can +find other terms like probability referring to the count, the terms used +above are the correct ones. Probability is used to refer to count just +because count/total_count is the probability. The total count is also called +maximum cumulative probability. Expect different names and misuses not +only in my article but also in others, but don't forget the correct names. +

    In the implementation we'll keep tables (or linked +lists for orders higher than 1) with the count of every byte. Based on +this we have to compute the cumulative probabilities and maximum cumulative +probability. We don't need all the cumulative probabilities, only the cumulative +probability for the symbol that we want to code (the symbol that we are +modeling now). From the explanation of how to compute the cumulative probabilities +we can derive the following pseudo code: +

    +

    This is the code used for order-0 and order-1, where +a byte's probability can be found in counts_table[byte]. The complexity +of this code is O(K) where K is the position of the byte in the table. +It's worth mentioning that this is the greatest overhead of simple a arithmetic +coding implementation, and also an important part (in time execution) of +ppmc. Once the code above is executed and assuming that we have the maximum +cumulative probability (as it will be seen later, that's stored in every +context) we can code the symbol. In this example counts_table[] stores +the counts of the symbols under the current order (which is assumed for +simplicity to be 0). +

    +

    With this information the arithmetic coder can compute +its range: +
    [ symbol_cumulative_probability / maximum_cumulative_probability , +
    ( symbol_cumulative_probability + symbol_probability ) / maximum_cumulative_probability +) +
    And encode the symbol. This is not the point of this article, however +note that arithmetic coding compresses best symbols with higher count because +it can use all the range to represent this number, for example in the table +above 'b' ( [0.25 , 0.625) ) can be represented with 0.25, 0.3, 0.4, 0.5, +0.6, 0.6249999, etc. +

    When decoding you need the maximum cumulative probability +(which we have assumed to be already know), and the arithmetic decoder +will return a cumulative probability. This cumulative probability distinguishes +the symbol coded. We have to search it in our table, and then update the +arithmetic coding state. As stated before we only store the counts, so +we have to compute the cumulative probability on the fly to search our +symbol. Let's say that we have the same table as in the previous example: +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: 
    Count: 23012
    Cumulative probability: 0
    + +

    We need to read till the next byte of the coded one. +For example with a cumulative probability of 4 we know that the encoded +byte was 'b' (because 4 lands in its range) however to know so we have +to read till the 'c', whose cumulative probability is 5, so we can be sure +that the decoded byte is the right one, the previous. We'll compare the +decoded cumulative probability with the one computed on the fly, when it's +higher we can be sure that the previous byte was the encoded one. But what +about if both have the same value? in this case we can't be sure. Let's +say we have encoded a 'd', the cumulative probability is 5, but when decoding +if we stop comparing when they are the same, we would decode a 'c' which +is wrong. If we however stop only when the computed one is greater, we +would stop in the 'e' and thus correctly decode a 'd' (the previous symbol). +We have to do so because we have to care about both the lower and higher +bound. Having all the previous in mind we do the following decoding routine: +

    +

    In this code we compute the cumulative probability (on_the_fly_cumulative_probability) +only for the current symbol in each step, because we don't need the cumulative +probability of symbols which are in positions after the byte to decode. +Note that this code is the same as for any simple order-0 arithmetic coding +implementation. And that they use a table where the entry of a byte is +the value of the byte itself (the count of 'byte' is in count_table[byte]). +Due to this at the end of this routine we decide that the decoded byte +is counter-1, because counter is the current one and counter-1 is the previous, +note that the symbols are in order. In linked lists this will be different, +but we'll see this later. We already know the decoded byte thus we know +its probability and cumulative probability, so we can update the state +of the arithmetic coder. +

    To save some space in higher order tables we may +want to reduce the size of the count. For lower orders we can use double words +(32 bits) but for higher orders is better to use only a byte (8 bits). +Every time the count is incremented we have to check that it isn't the +maximum, if it is we have to halve all the counts. We divide all of them +by a factor of 2. This is called renormalization or rescaling. In the case +of linked lists we don't want that the count of a byte is 0, so when this +happens, we change its value to 1. Renormalization does not only save space +but also increases compression a little bit. That is because the probabilities +of text change within a file, thus giving more importance to newer frequencies +improve compression, because it reflects the changes in the source (the +probabilities of the file). +
      +

    Escape method C, how to +compute and use it +

    The escape method C was already discussed at [6] +and can be also found in [2]. I'll recall the example +table, where a symbol ('c') had a count of 0. In ppmc we'll suppose that +all the orders are initially empty (unless the order-(-1)), when we want +to code in any order that is totally empty we just don't code anything. +But what happens if there's only one symbol or just a few? how can we transmit +to the decoder that the byte that we want to finally decode is not present +in this table? For doing this we use the escape code. Both compressor and +decompressor assign the same probability to the escape code. There are +different methods for estimating the escape probability. Ppmc uses the +escape method called 'C' (and that's why it's called ppmc) Method C assigns +a count to the escape code equal to the number of different symbols that +have appeared under this context. For practical issues we assume that +the escape code is the last one. We also have to take care of its probability +when coding. Let's see the table that we are using for examples, how it +is without using escape code and with it: +

    + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: a
    Count: 23012
    Cumulative probability: 0
    + +
    Without escape code (total_cump=8)
    +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: ceEsc
    Count: 230124
    Cumulative probability: 08
    + +
    With escape code (total_cump=12)
    +
    +

    As I have said before, in every context we'll store +the maximum cumulative probability. Also we'll store the number of different +bytes (which in my implementation is called defined_bytes). Due to this +I call the maximum cumulative probability  "max_cump", but because +we also have to take in account the code space of  the escape code, +I have another variable called "total_cump", which is the maximum cumulative +probability plus the count of the escape code (which for method C is the +number of different bytes), this is the value which must be using when +calling the decoding functions. +

    But how does the escape code affect to coding and +decoding? In encoding before trying to make the cumulative probability +of the current byte and coding it, we have to check that it exists or that +it has a non zero probability. If everything is ok then we can code it, +otherwise the context is present but the byte is not, so we have to code +an escape code. We know the probability (which comes from the count) of +the escape code, it's simply the number of defined bytes in that context. +But what about its cumulative probability? Due to how the cumulative probabilities +are computed and also due to the fact that we assume the escape code to +be the last symbol, the cumulative probability of the escape code is the +maximum cumulative probability. +
    "max_cump" is the total count before adding it the escape code's count. +If you have a look at the table above, you'll see that max_cump is 8, and +that the escape code's cumulative probability is also 8. Taking all those +facts in account we use the following pseudo code: +

    +

    In the case of decoding we have to check if the coded +symbol is an escape code, we do so based on its cumulative probability +(which is stored) otherwise decode as usual: +

    + +


    Linked lists for the counts +of the bytes +

    All the algorithms used above work only if we have +all the symbols in order and in a table where the entry of 'byte' is in +count_table[byte]. This is a good idea for order-0 and order-1 where the +tables may take up to 1,024 and 262,144 bytes respectively (in case we +use double words (32 bits) for the count). But this is not acceptable for +order-2 which would need 67,108,864 bytes to hold all the tables (256*256*256*4). +For higher orders the amount of memory is not available nowadays. Moreover +due to the fact that there are less high context than lower ones, most +of the entries are not used. The solution to both problems is using a linked +list for storing the counts. A linked list is made of different elements, +every element has its own value and a pointer to the next element. We store +on it the count, and also the byte. In the case of tables (which in fact +were hash tables with direct addressing) we knew the value of a byte by +its entry, now we don't, so we have to store the value of the byte too. +The last element in the linked list, has a pointer with a value of 0, meaning +that it's the last one. Using linked list we can dynamically allocate memory +only for the entries that we are going to use. Let's have a look at our +table of example, which now is a linked lists: +
      + + + + + + + + +
    + + + + + + + + + + +
    + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'a'2to 'b'
    +
    + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'b'3to 'd'
    +
    + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'd'1to 'e'
    +
    + + + + + + + + + + + + + + + + +
    Byte CountPointer 
    'e'20
    +
    + +
    Linked list form
    +
    + + + + + + + +

    As you can see in linked list form, there's no entry +for 'c' because it has not appeared, in our example the linked list is +sorted, but in the implementation you'll find them in order of appearance, +the first that appeared will be the first in the linked list. +

    In my implementation I kept a buffer +where new nodes were allocated. Once this buffer was full, another one +was allocated. The pointers to all the allocated buffers were stored in +one array. When compression or decompression is done, all the allocated +buffers are freed.

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: 
    Count: 23012
    + +
    Table form
    +
    + +

    The data structure of a linked list in C is the following: +
    struct _byte_and_freq +
    { +
    unsigned char byte;    //The value of the byte +
    unsigned char freq;    //The count of the byte +
    struct _byte_and_freq *next;    //Pointer to the next +node in the linked list +
    } +

    In a linked list new nodes are stored at the end. +For example if we had to put 'c' in the list, we should read the linked +list till the end, once at the end, allocate memory for this new buffer. +Then put in the node of 'e' a pointer to this new location. The new element +should have the following values: byte 'c', count 1, pointer 0. Its pointer +is 0 because now it is the last element in the linked list. In our implementation +we don't need to delete nodes of the linked list, so we make it even easier. +
      +

    Coding +and decoding with linked lists +

    For coding we need to know if a given byte is present +in the linked list, if it is we can code it. To check if it's present, +we have to read the whole linked list till we find it or reach the end. +When implementing this we'll already have a pointer to the first element, +so we we'll assume so. Checking all the elements is only a matter of reading +current one, if it's not the one we are looking for, get the pointer to +the next, and so on, till we hit the end of the linked list. The pseudo +code is like the following: +

    +

    For +the readers who don't know the C language, pointer_ll->byte, means accessing +the value called 'byte' in the data structure addressed by the pointer +'pointer_ll'. We also need to know the cumulative probability of the byte, +to do so we need the code explained before. We have to read all the elements +till the byte. Both the code needed for searching a symbol and checking +if it's present and the code for making the cumulative probability do the +same. So why not merging them? In the same loop we have to check if the +byte is present or not and compute its cumulative probability, for the +case that it is present. Later we'll have to update the count of this byte, +and because we want to avoid having to read the linked list again, we store +the pointer in a global variable. But if the byte is not present when we +update we'll have to add a new pointer to the end of the linked list, for +that case we store a pointer to the last element. +

    + + + + + + + +


    Updating +the count of a byte with tables was done just by accessing its entry. For +linked list we don't know its position beforehand, so we have to search +it in the linked list, when found, update its count. But in fact we know +its position: when we code the symbol we need to read the linked list, +so if we keep a pointer to it, we'll not need to read the whole linked +list again. As I've explained before, the linked list nodes are allocated +in a big buffer, to avoid having to call the functions of memory allocation +for every new node. When updating we have to check whatever the encoder +emitted an escape code or a byte. One could think that we could check if +the pointer of the node (o2_ll_node) is 0, in that case this is the last +element and thus the end of the linked list was reached. But this is false. +Imagine that we found the byte but it was in the last position, in that +case the pointer is also 0. The solution is to check if the byte is the +same, that unambiguously distinguishes between the two cases. In all the +examples before we assumed that max_cump and defined_bytes were stored +in the context, now we have to update them aswell. As long as we have coded +the byte (or an escape) with the code above, the following pseudo code +would update the model. +

    +Let's say +we have again our example, in linked list form, and not in order (as in +a real implementation it would). + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'e'2to 'b'
    +
      + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'b'3to 'a'
    +
      + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'a'2to 'd'
    +
      + + + + + + + + + + + + + + + + +
    Byte Count Pointer 
    'd'10
    +
    +Such a linked list would produce a table like the following: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: a
    Count:2321
    Cumulative probability: 0
    +

    However as I've explained before we don't need to actually +build this table, only as much as needed. Linked lists are a little bit +more difficult to manage than tables for decoding, mainly in the search +(when we have to get the previous symbol). Also another problem is in updating:  +we can't ensure that the pointer (in case of an escape code) points to +the last element, because the decoding routine doesn't need to read the +linked list to identify an escape code. The latter problem has no other +solution that reading the whole linked list when updating and an escape +code was emitted. However the second has two solutions. Remember that we +have to read till the next byte of the one to decode? at first glance one +could think that like we did with the tables we have to read till the next +symbol. But then how do we get the previous? in that case we should have +a variable which always has the last byte in the linked list. Fortunately +there's a better solution. Why do why need to read the next element? because +we need to know the upper bound of the cumulative probability. For tables +the fastest (and thus it was used) way is reading the next element's cumulative +probability. However for linked lists its better to sum to the current +cumulative probability the count of the byte. That way we also have the +upper bound. Then we can check if we have to stop. And the node pointer +will point to the byte that we have to decode. +

    +

    To avoid problems when using linked lists for cumulative +probabilities, we don't allow any count to be 0. When renormalizing we +check if it's 0, in this case we put the value of 1. +
      +

    Hash tables for contexts +

    Hash tables are arrays used to store and find items. +A hash table which allows direct addressing has one entry for every item. +This is obviously very memory consuming if the number of items is big. +In the cases where direct addressing is not possible, we use chaining for +resolving collisions (collisions are two or more contexts sharing the same +hash entry). The item of the hash table instead is a pointer to a linked +list. This linked list contains the different contexts. The value of the +context itself is stored for checking. Also we store a pointer to the linked +list with the bytes and counts. One hash table is used for every order. +Hash tables and linked list for contexts are only used for orders above +2. +

    Let's say we have a hash table for order-1 which +has only two entries. And we have a hash function which maps the items +in the following way: 'a','b','c' entry 1. 'd', 'e', 'f' entry 2. After +having stored the contexts 'e','b','d','a','c' it would look like the following: +

    Hash table: + + + + + + + + + + + + + + + + + + +
    Hash entry Value 
    1Pointer to linked list, element 'b' 
    2Pointer to linked list, element 'e' 
    + +

    Big memory buffer for linked lists: + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + +
    Byte Pointer 
    'e'to 'd'
    +
      + + + + + + + + + + + + +
    Byte Pointer 
    'b'to 'a'
    +
      + + + + + + + + + + + + +
    Byte Pointer 
    'd'0
    +
      + + + + + + + + + + + + +
    Byte Pointer 
    'a'to 'c'
    +
      + + + + + + + + + + + + +
    Byte Pointer 
    'c'0
    +
    + +

    The structure of a context +node in a linked list is: +
    struct context +
    { +
    struct context *next;    //Next context in the linked +list (corresponding to this hash entry) +
    unsigned long order4321;    //Order-4-3-2-1 (or order-3-2-1 +for order-3), that's the value of the context +
    struct _byte_and_freq *prob;    //Pointer to linked +lists containing bytes and counts of this context +
    unsigned int max_cump;    //Maximum cumulative probability +in this context +
    unsigned int defined_bytes;    //The number of bytes +in this context +
    }; +

    Note that we could reduce the space needed by using +a byte (unsigned char) instead of a double word (unsigned int or unsigned +long, 32 bits) for the +number of bytes in the context, however then we should add 1 every time +we read this value (to make 0 represent 1 instead, and so on). I decided +to not make the code more difficult to read. Thus this is not implemented +in my ppmc implementation. See the section of +fast order-3 hashed for another example. +

    The code for managing the linked lists with contexts +is the same as the code for the counts and bytes. For contexts, the value +that makes every element different is "order4321" instead of "byte". When +we want to read the counts of a context, we have to read the pointer in +its hash entry. If this entry is empty, the context was not initialized +yet, so we fallback to the next lower order without emitting escape code. +Then use this pointer to search current context in the linked list of contexts, +if it's not present we also have to fallback without escape code. In case +we found it, we'll read the pointer to the linked list with the bytes and +counts. Because we already know the maximum cumulative probability, and +the number of defined bytes, we can code the byte as explained before. +But let's have a look at the implementation. +

    The first function called by both encoding and decoding +functions is "ppmc_get_totf_order3". Thus this is the function which is +going to read the linked list with contexts and ensure that the current +one is present. The first thing is to do the hash key. After that we need +to have the context in a variable, in that case because it's order-3, an +unsigned long is ok, if we want higher orders, say 6, we would have order-4-3-2-1 +in an unsigned long, and the rest, order-6-5 in another one. We'll use +this variable for comparing while searching for the context. +

    +

    The first thing to check is that current entry in the +hash table for this order is initialized. The table is initialized to 0, +so if current entry is 0, it's not initialized, otherwise it would have +a pointer. If it isn't we can be sure that there's no context in this entry, +so we have to abort. "ppmc_get_totf_order3" unlike functions for lower +orders returns a value. I decided to make it return a 0 in case that the +hash entry was not initialized or the context not present, and a 1 in case +it is. +

    +

    Now we get the pointer from the entry to the linked +list with contexts and store it in "cntxt_node". And now we have to read +all the linked list till we find current context, if we reach the end before +we find it, the context is not present and thus we have to return a 0, +so the encoding or decoding functions know that they can't code in this +order. +

    +

    Once we know that the context is present we have to +return the total cumulative probability. Note that this is different for +full exclusions. +

    +

    Now encoding and decoding functions would be able to +do their job. After that we have to update the count of current byte. Now +we rely on the fact that "o3_cntxt" and "full_o3_cntxt" are initialized. +While updating four different cases can happen: +

      +
    1. +Hash entry was not initialized.
    2. + +
    3. +Symbol was coded under current context.
    4. + +
    5. +An escape code was emitted.
    6. + +
    7. +Current context was not present in the linked list.
    8. +
    +

    In the first case we have +to create a new node in the linked list of context in the current hash +entry. The way we allocate memory is explained in the section about memory +management. Then we have to initialize this node, and also put pointer +to this new node in the hash table. After that we have to put a new symbol +in the linked list with probabilities, see the section about linked +lists for that matter. The code is like the following: +

    +

    If the byte was coded under current context we just +have to update its count. We can assume that "o3_ll_node" and "o3_context" +are initialized, so the code becomes as easy as that: +

    +

    Once the two first cases have been discarded, it may +happen that an escape occurred or that In the third case an escape code +has to be coded because current context was not present in the linked list +of context. Because the case of an uninitialized hash entry was already +discarded we can be sure that there's at least one context in the linked +list of current hash entry for contexts. The way to know which one of those +cases is to check the context in the pointer, if it's the same as the current +one we can be sure that an escape was coded, because in that case the context +existed, and we already discarded the case that the byte was coded under +this context. +

    +

    Otherwise we know that it's pointing to the last element +in the linked list of contexts, because when looking for current context, +it wasn't found so it reached the end of the linked list, and also updated +the variable "o3_context". But however we know that the context was present +(because there was at least 1 symbol). Thus the only thing to do is to +create a new node in the linked list for bytes and probabilities as shown +in the section of linked +lists. +

    In the fourth case the context was not present in +the linked list corresponding to this hash entry. What we do is reading +till the end of the linked list for context (though I think this is not +really needed) and then put a new node as explained above, +with the only difference that the pointer is stored in the last element +in the linked list instead in the hash entry of the current order. +
      +

    Global variables +

    In some cases (unbounded length contexts) we need +to have the whole file loaded in a buffer. However in this implementation +the maximum context that we need is order-4 (but we could very easily extent +this to order-8, at the cost of speed and memory, though). Thus we store +the contexts in another way. Because C has some restrictions, we store +the contexts in unsigned chars. One unsigned char for every byte, order-1 +is stored in o1_byte, order-2 is stored in o2_byte and so on. Thus is we +have the context "ther", o1_byte = 'r',  o2_byte = 'e', o3_byte = +'h', o4_byte = 't'. If the next byte to code is an 'e', after coding it, +the context would be: o1_byte = 'e', o2_byte = 'r',  o3_byte = 'e', +o4_byte = 'h'. In assembly language we could store the whole order-4 in +a double word (32 bits), and then just read a byte for order-1 and so on. +

    In high orders (3,4) we need to have the hash entry +and also the full context so we can compare it when finding the context +in the linked list. In the case of order-3 we have "o3_cntxt" which contains +the hash key used to index in the order-3 hash table. And "full_o3_cntxt" +which contains all the order-3. In the previous example "o3_cntxt" depends +on the hash function, and "full_o3_cntxt" = "ere". For order-4 only the +number changes, that is "o4_cntxt" instead of "o3_cntxt". +

    For making the code easier to read, the arithmetic +coding routines (range coder) are called with the same variables. They +are: total_cump with the total cumulative probability, symb_cump symbol +cumulative probability, and symb_prob the probability (count) of the symbol. +

    Ppmc uses update exclusion. That is we only update +in the order where the byte was coded and higher ones. In my implementation +I use a variable called "coded_in_order" whose value is where the byte +was coded. If the byte was coded under order 3 the value of "coded_in_order" +will be 3. If it was coded in order-(-1) its value will be 0 though. It +has a 0 value to update the right orders. The code in C used for updating +is this: +

    switch(coded_in_order) +
       { +
       case 0: ppmc_update_order0(); +
       case 1: ppmc_update_order1(); +
       case 2: ppmc_update_order2(); +
       case 3: ppmc_update_order3(); +
       case 4: ppmc_update_order4(); +
       default: break; +
       }; +

    Which means that if the value of  coded_in_order +is 0, all the functions for updating will be called. And if it's 3 then +only the functions for order-3 and order-4 will be called. +

    For memory usage I use "ppmc_out_of_memory". If it's +0 then there's still memory available. If it's 1 there's not. Every function +which needs to allocate memory aborts in the case its value is 1. Once +we have a problem allocating memory, this variable is set to 1. At the +end of the loop we check this variable, and in case it's 0 we flush the +model. This is explained in the section of memory +management. +

    Before starting to model and code the input we have +to reinitialize the model. The main job is to allocate the memory needed +for the hash tables and linked lists. Moreover tables from low orders, +like order-0 and order-1 must be cleared, that is all the counts must have +a value of 0. The hash tables of high orders must be cleared too, that +is we must put a value of 0 in the offsets stored. Apart from that we +have to care about initialising the context variables by reading from the +input, updating the variables and writing directly to the output. Once +this is done we can start with the main loop. Have a look at the source +if you have any question about these steps. +

    Now we'll see how to implement every order, don't +try doing order-0 until order-(-1) doesn't work, and so on. But before +going in that matter we'll see which functions uses every order and what +they do. +

    Note that there are some variables different for +every order which are pointers to linked list of contexts or bytes, and +which are initialized in a function, and then in another function are used. +One example is a pointer to the linked list with bytes of the order-2 which +is initialized +(to whatever it has to be) in the function of encoding, +and which is used later while updating. So it's not a good idea to modify +these kind of variables between those functions. +
      +

    Functions nomenclature +and use +

    Every order uses different functions to manage their +data structures. For increasing readability of the code they follow some +guidelines, all functions that do the same have the same name (the only +difference is in the number of the order). They do the same things, in +the sense that they are called in the same way and achieve the same goals, +though most of them use different code. +

    When decoding we need to know the maximum cumulative +probability for being able to decode the input, so in this case, and also +while encoding, the first function called is "ppmc_get_totf_order2". Which +stands for: get the total frequency, in fact frequency and probability +are different things, as I explained before, unfortunately I perpetuated +this error, but I suggest you to not do so. However the term maximum cumulative +is already used for referring to the sum of the counts of all the bytes +so not using this term for different things can help to avoid misunderstandings. +The function "ppmc_get_totf_order2" does nothing else than returning in +"total_cump" the maximum cumulative probability of the current context. +As explained in the section about the escape method this can be quickly +done by adding the number of defined bytes in the context (the count of +the escape code) and the maximum cumulative probability. Also remember +that "total_cump", "symb_cump" and "symb_prob" are used as the parameters +for arithmetic coding. (in the case of my source a range coder) +

    The function "ppmc_code_byte_order2" (the number +'2' should be changed if we want the function for another order) codes +the byte in the current symbol if it's present, in that case it returns +a 1. It may happen that under this order, the current context is not present, +in that case it doesn't code anything, and returns a 0. It could happen +too that in current order the context was present but the byte we have +to code wasn't, so an escape code must be coded and the function will return +a 0. From this you can see that the main loop of the compressor just have +to call the encoding routine for highest order, and call the next lower +one in case it returns a 0, or stop when it returns a 1. +

    Another job of the ppmc model is to update the count +of the current symbol, this is done by calling "ppmc_update_order2". This +function returns nothing, and the main loop doesn't have to check whatever +it worked or not. At least not till the updating is done. In higher orders +the decoder uses a different function for updating "ppmc_update_dec_order2",  +because some care should be taken with the linked lists. +

    When the count of a byte gets too big for the date +type used for storing it, renormalization (halving counts) must be done. +The function "ppmc_renormalize_order2" does this. It's called from the +routine of updating. +

    For decompression, "ppmc_decode_order2" is called. +It just returns "-1" if the byte was not decoded (because current context +didn't exist or it was an escape code) otherwise it returns the byte in +the variable "byte". The main loop of the decoder would check if the returned +value is "-1" in that case it would try to decode in a lower order. +
      +

    The main part +

    Every order uses its own functions and the only +work of the main part is calling them, this makes the code easier to read +and to expand. The first thing is writing to the output file the length +of the input file, that way the decoder will know when to stop. After that +we start initializing the model and coder: +

    +

    Once this is done we start the main loop, we read a +byte from the input till there are no more bytes. Then we try to code in +the highest order and if failed, in a lower one. In the worst case we'll +code the byte in order-1. After that we have to do the updating, keeping +in mind that we do update exclusion, that is we only update the same order +as the byte was coded in, and higher ones. Then we update the context variables +and check for memory problems. The code is like the following: +
      +

  • +while((byte=fgetc(file_input))!=EOF)
  • + + + + + + + + +Order-(-1) +

    The order-(-1) in the code is called "ordern1" meaning +negative one. As we saw in the article about finite context +modeling order-(-1) is the lowest order whose only job is to be able +to code all the symbols that may appear in the input. Order-1 is not used +for achieving compression, so it's never updated. Because it must give a +non-zero count to all the symbols but it's not updated, we use a flat probability +distribution. That is every symbol has the same probability. For matters +of the implementation every symbol has a 1 count. The alphabet has all +the bytes, from 0-255 in the same positions, and an special code, the 256 +which is used for flushing the model when running out of memory. Thus we +have that the probability of every symbol in the order-(-1) table is 1/257. +

    At first we could think about having a table with +the counts and cumulative probabilities of every symbol, but having a look +at those values we realize that this is not needed. +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Symbol: 256
    Count: 111111
    Cumulative probability: 012n256
    + +

    We see that the cumulative probability of a given +byte is the value of the byte itself. We can take profit of this fact and +avoid using any table. So the first function to implement "ppmc_get_prob_ordern1" +is quite easy, it just has to assign a 257 for the maximum cumulative probability +(no need of escape code), 1 to the probability, and the value of the byte +to the cumulative probability. +

    The function for returning the byte of a given cumulative +probability is very easy too, the byte is the value of the cumulative probability. +

    Because this order has the only job of being able +to code all symbols, it's not update, so we don't need to care about that +matter. +
      +

    Order-0 and order-1 +

    Order-0 and order-1 are not memory hungry, so we +can store them in tables. If you have already implemented any arithmetic +coder, you surely know how does order-0 works. Given this, order-1 is only +a little modification: in order-0 we keep the probabilities in this table +"order0_array[]" and access it directly (that is the probability for "byte" +is in "order0_array[byte]"), order-1 uses instead "order1_array[o1_byte][byte]". +The only difference from an usual implementation is the escape code. Now +it's only a matter of implementing it having in mind the differences pointed +in the section about Cumulative probabilities. +
      +

    Order-2 +

    Order-2 needs more memory than order-1, the probabilities +can't be stored on tables, because it would take too much memory. Thus +the probabilities are stored in linked lists. Order-2 has 2^16 = 65536 +contexts, we can afford to put the contexts in a table. Like a hash table +with direct addressing, obviously the hash key is just the order-2, for +the context "there" the hash key would be "re", thus we would find the +context in "order2_array['re']". Due to the fact that in every entry we +only have one context, we don't need to store the context nor use linked +lists for contexts, so the data structure used is similar to the one +used for higher orders, but with these differences. +

    struct o2_context +
    { +
    struct _byte_and_freq *prob;    //Pointer to linked +lists containing bytes and counts of this context +
    unsigned int max_cump;    //Maximum cumulative probability +in this context +
    unsigned int defined_bytes;    //The number of bytes +in this context +
    }; +

    Putting all together we have that the maximum cumulative probability +of order-2 with a context like "which" is in: +

  • +order2_array['re'].max_cump.
  • + +
    Once this is clear we already know how to encode a byte, first we have +to make the hash key. +
  • +o2_cntxt = ppmc_order2_hash_key(o1_byte,o2_byte);
  • + +

    The hash key as already explained is very simple, +and the only thing it does is putting two bytes, the order-1 and order-2 +bytes, in a variable, which will be used to access the array for order-2. +
    #define ppmc_order2_hash_key(k,j) ((k)+(j<<8)) +

    Now it could happen that the context hasn't been +created yet, so we have to check it. As explained before once a byte in +a linked list in created, it's never deleted, so just by checking the number +of defined bytes we can see if the context has been already created. If +the count is 0 it hasn't, so we have to return without coding anything. +

  • +if(order2_array[o2_cntxt].defined_bytes == 0)  //it's empty
  • + +
  • +return 0;   //byte not coded, nothing done
  • + +

    Now we use the code explained in the section of +Coding and decoding +with linked lists and code an escape code or the byte depending +on if the byte wasn't present or it was. +

    The next function to implement is "ppmc_update_order2", +because we saved in "o2_ll_node" the pointer to the linked list, we don't +need to read the linked list again, and thus we can update the order as +explained in Coding +and decoding with linked lists. +

    Decoding is not much more difficult just follow the +explanations at +Coding +and decoding with linked lists, and remember to check first if context +exists. The function for updating while decoding can't rely on the fact +that "o2_ll_node" points to the last element, so it has to read all the +linked lists, till the end and put new nodes there. This happens when the +byte was not coded under order-2. Check the code if you have any question. +
      +

    Order-3 and order-4 +

    We can't use a table for contexts in order-3 because +it would be too big, it would require 16,777,216 entries and every entry +would take around 12 bytes. We can't afford that, so instead we use linked +lists for storing contexts. Read Hash tables for +contexts. +

    While encoding or decoding the first function called +is "ppmc_get_totf_order3", this one is responsible for checking if current +context is present, and also updates the variable  "o3_context" (or +"o4_context" for order-4), it contains a pointer to the entry, and thus +the functions now are as in order-2 but instead of "order2_array[o2_cntxt]" +we use "o3_context". Apart from some special care while updating, all +the functions are the same, so you only have to change that and you already +have order-3 working. Both order-3 and order-4 are managed in the same +way. So do one, and copy the code for the next one. +
      +

    Memory +requirements and management +
    The basic data structures are the following: +
      + + + + + + + + +
    struct _byte_and_freq{ +
    unsigned char byte; +
    unsigned char freq; +
    struct _byte_and_freq *next; };
    struct context{ +
    struct context *next; +
    unsigned long order4321; +
    struct _byte_and_freq *prob;  +
    unsigned int max_cump; +
    unsigned int defined_bytes; };
    struct context_o2{ +
    struct _byte_and_freq *prob; +
    unsigned int max_cump; +
    unsigned int defined_bytes; };
    + +

    Thus they take: +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Structure Unaligned Aligned 
    _byte_and_freq6 bytes8 bytes
    context20 bytes20 bytes
    context_o212 bytes12 bytes
    + +

    The static memory of the program, memory which is +not allocated while the program runs takes is the showed in the table. +This is the memory that the programs just need to start. +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Order-0 256 
    Order-1 66048 
    Order-2 786432
    Order-3 262144
    Order-4 262144 
    Bytes pool 1000000
    Context pool 1000000
    Total3377024 
    + +

    Order-(-1) doesn't need any array. Order-0 uses an +array with the count (8 bits) for every byte. Order-1 uses the same table +but 256 times, thus 256*256, moreover we need two arrays for every byte +(256 entries thus) for "defined_bytes" and "max_cump". Order-2 has a hash +table with direct addressing, because we'll address two bytes (order-2 +context) we need 256*256=65,536 entries. Every entry uses the data structure +context_o2, thus it takes 65536*12=786,432 bytes. Order-3 instead has a +hash table with pointers to contexts, it has 65,536 entries and is a pointer +which (in my implementation) takes 4 bytes, 65536*4=262,144. Order-4 uses +exactly the same. The pool for bytes and contexts take 2,000,000 bytes, +we'll explain now why. +

    We have to allocate memory for every new element +in the linked list. It wouldn't be a good idea to call the functions for +doing so every time. Also there's another fact that we should be aware of +while thinking about how to allocate memory, an element in the linked list +is never deleted. Due to these facts the solution that I chose for my +implementation was having a big buffer, and put there new nodes there. +Keeping track of the next position and the last one with pointers. Once +this buffer is filled we have to allocate a new one. Given this solution, +there's still two parameters to choose, the length of the buffer, and the +maximum number of buffers to allocate. I decided to give them a value of +1,000,000 and 100 respectively. We can choose to have another value, the +length of new buffers, that way you could allocate a very bug buffer at +the start (suited to your needs) and allocating less memory for the new +buffers, expecting to not need a lot. This could save some memory, but +should be tuned. Though in my implementation I have both definitions, they +have the same value. +

    Again having code clarity in mind I chose to have +two buffers, one for the linked lists containing bytes and probabilities, +and another for the contexts. The code is the same and only the nomenclature +(names assigned) for the variables is different, thus instead of "_bytes_pool_elements" +there's "_context_pool_elements". The variables used for this code are +the following ones: +

    +

    The defined values follow too. We get the length of +any buffer by multiplying it by its size, so the size of a buffer for contexts +is 50,000*20=1,000,000. I decided to make every buffer one mega long, it +seemed like a reasonable value, because it was big enough to not need to +allocate another buffer soon, and that it was not so big that it couldn't +be allocated it today's computers. +

    +

    If we allocate a new element the steps to follow are +clear: first let the routine use the pointer at "_context_pool" for the +new entry, second increment the pointer and then check that this is not +the last entry, if its we should allocate another buffer. We can do another +paranoid check, that is checking that we still have any entry free in the +table where we store the pointers to the big buffers. We could avoid it +by being sure that we have enough buffers (which probably is the case). +

    +

    After allocating memory for the new buffer, we put its +value in the table, and compute the value of the end of the buffer, obviously +it's the pointer plus the maximum number of entries (in case of assembly +you first should multiply this value by the length of every entry). Then +we increment the index tot the table with pointers to the big buffers, +which will be used when freeing memory. +

    When the compressing process is done, or when we +have to flush the encoder, we have to free all the buffers. At the start +of the program we cleared all the entries of the table with pointers to +the big buffers, so we have to check if it's non zero, in case it isn't, +then it's a pointer, and we have to free it. In C we only need to know +the pointer of the buffer to free it, in other programming languages you +might need storing additional information. +

    The +flushing routine has to ensure that everything (variables, tables and buffers) +is like if it was just initialized. First we have to free all the memory. +Then clear all the tables for all orders, that is all the probabilities +of order-0 and order-1 set to zero, and all the pointers of higher orders +to 0. Then the big buffers must be allocated again. In my implementation +I had to set the count of the current byte in order-0 to 1 too. Next we +have to emit the flush code. We have to visit each context and if there's +any probability, emit an escape code in this order. Once we reach order-(-1) +we can emit the flush code, which in my implementation is done with: +

    +

    The decoder checks the symbol decoded under order-(-1) +before writing it in the output, in case it's a flush code, it has to flush +its model, no matter if it still has memory (this could happen if the file +was compressed in a computer with more memory). In my code I do even another +check, in case that "ppmc_out_of_memory" = 1, but we didn't decode a flush +code, that means that compressor had more memory than decompressor, in +that case we just can't decompress the rest of the file correctly, because +we can't keep track of changes in the model (new nodes), so we can only +exit. To check this at the end of the loop we check if the variable "expected_flush"=1 +(it's initialized to 0) in case it's we exit. After that we check if we +run out of memory. If we do, we set "expected_flush" to 1, and thus if +next time we don't decode a flush code, it means that we have to exit, +which will be done by the check explained before. +
      +
      +

    How to implement the +fast order-3 hashed version +

    With hash tables we can do a trick which in most +of the cases is of not use, but in this one resulted in something which +works quite well. The trick is simply not resolving hash collisions. Thus +let different contexts share the same entry. We only need to implement +order-2, once we have it working we change the hash key for order-2 for +the hash key of order-3: +

    #define ppmc_order2_hash_key(k,j) ((k)+(j<<8)) +
    #define ppmc_order3_hash_size 65536 +
    #define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) +& ppmc_order4_hash_size-1 +

    If you are not used to C '&' means the AND mask, +in the case of hash keys we use it for erasing the high bits that we don't +use (because we restricted the length of the hash table). The '<<' +are left shifts. +

    Now we just have to change the main part to use order-3 +instead of order-2, we only have to use the variable "o3_byte" too. And +it's all done. Now in the same hash entry there will be different contexts +sharing the same probability list. This doesn't produce bad results, instead +it works better than order-2, though of course it works worse than real +order-3. However if we extent the same trick for order-4 the results are +bad. These are the results for book2. (time for compression in kbyps) +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Order Bpb Time (c)
    22.8629 246.696
    3h2.3713237.134
    32.2849205.995
    4h2.4934146.716
    + +

    It's worth mentioning that order-3h uses almost as +much memory as order-2, because they use the same hash tables. Order-3 +can take a little bit more of memory in linked lists for bytes and probabilities. +The difference from order-2 and order-3h in compression is big while in +speed and memory usage is very little! This happens because though different +contexts are sharing the same probability list, they aren't quite different +or because there are just a few contexts in the same entry. In the other +hand in order-4 a lot of contexts share the same entry, with may be different +probabilities, so at the end we get worse compression, because the same +context (hash entry in that case) gives a probability distribution which +doesn't reflect current file. +

    The differences in speed can be explained too. They +all come from insertion and updating of nodes in the list of bytes and +their counts. 3h or 4h have to create and update more nodes, and while +coding and decoding there are more symbols in the cumulative probabilities, +so the process is slower. +

    Order-3h is a ppmc implementation well suited for +real world compression because it only takes a little bit more of speed +and memory than order-2, but achieves 0.5bpb more, which ranks it in a +good position. +

    If +you wonder how this works, let's put an example. Let's say we have a binary +message, that is, the alphabet only contains 1 and 0. And our hash key +looks like the following: +
    #define ppmc_order3_hash_key(k,j,i) ((k)+(j<<1)+(i<<1)) +& 11 (binary) +
    And we have a context like "101". If we apply it the hash key the result +is (all the values are in binary): 1+00+10=11. 11 would be the hash entry +of context "101". But this however is also the hash entry of "011" (using +hash key:) 1+10+00=11. So at the end we have that both contexts "101" and +"011" share the same entry. In their entry there's the probability list +for 0 and 1. And there we would store the probabilities of the two contexts. +In our implementation of ppmc the contexts which share the same entry have +for sure the same order-1 byte, so more or less they have the same probability +distribution, and thus it doesn't suffer much. In fact the error in prediction +in book2 results only in 0.0864 bpb. +

    If we analyse it, see that we have 2^3 = 8 possible +input values which are mapped to 2^2 = 4 hash entries, thus every entry +is share by 8/4 = context. In our hash key for ppmc we have 2^24 = 16,777,216 +contexts which are mapped (that is, we assign them a position in the hash +table) to 2^16 =  65,536 (this is the number of entries in the hash +table). Thus we have that in ppmc order-3h every entry holds 256 different +contexts. 256 different probability distributions in every list, however +the impact in compression is little because due to our hash key the contexts +which share the same entry are similar. +
    #define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) +& ppmc_order4_hash_size-1 +
    In our hash key "k" is "o1_byte" and the hash key as worst modify the +highest bit, but the rest remain untouched, so we know for sure that all +the contexts that share the same hash entry have the first seven bits of +order-1 the same. +

    Another thing which I had to check was using a classic +hash key (used for example in lzp [14]) instead of +the current one. Instead of '+' a '^' (xor)  +

            Using ^                                   Using +
    +File    Bpb     Kbyps c   Kbyps d         File    Bpb     Kbyps c   Kbyps d
    +book1   2.5531  218.5143  214.8418        book1   2.5415  236.7239  221.6721
    +book2   2.3997  237.1344  213.9184        book2   2.3859  241.8208  227.4374
    +geo     4.8769  86.9565   82.6446         geo     4.9209  90.9091   86.9565
    +obj2    3.0717  182.5980  162.8576        obj2    3.0635  191.2931  175.9338
    +progc   2.6109  182.4308  182.4308        progc   2.6010  182.4308  182.4308
    +news    2.8417  186.2531  167.2981        news    2.8194  203.2762  186.2531
    +Average 3.0590  182.3145  170.6652        Average 3.0554  191.0756  180.1139
    +

    The use of xor instead of adding resulted in worse compression +and speed for all the files except geo, which is a list of floating point +numbers. The hash key for order-3 works better with "+" for text. The test +files come from +[15]. +
      +

    Adding full exclusion to +the already done functions +

    Full exclusion is when we exclude from probability +computation symbols which appeared in orders which we already used. Obviously +if in an order there's a symbol but we instead emit an escape code, because +it wasn't the symbol to code, it will not be coded, so we shouldn't assign +it any probability. Let's say we have a message with the following probabilities: +
      + + + + + + +
    Order-3 +
    'a'-2 +
    'b'-4 +
    'c'-0 +
    escape-3 
    Order-2 +
    'a'-6 +
    'b'-7 +
    'c'-1 +
    escape-3
    + +

    And we want to code 'c'. In order-3 we don't find +it, so we emit an escape code with a probability of 3/9. Then in order-2 +we find it and emit it with a probability of 1/16 (4 bits). As you see +in order-2 we allocate probability space for 'a' and 'b', though they appeared +in order-3 and we didn't select them, therefore they are not the byte to +code. So we decide to use full exclusion instead of lazy exclusion (which +excludes lower orders) then if we exclude 'a' and 'b', the escape code +has a probability of 1 (the scheme C assigns it a value equal to the number +of defined bytes in the context, which now is only 'c') then we code 'c' +with a probability of 1/2 (1 bit). +

    For full exclusion we use a table with an entry for +every symbol (in the case bytes this means a table with 256 entries, which +is acceptable), this table is accessed like "excluded[byte]", and has a +value of 0 if it's not excluded, and a value of 1 if it is. At the start +of loop (every itineration, of course) we put all the values to 0, because +no order has been visited yet. If a context emits an escape code, we update +this table, by putting the value of the bytes in that order to 1. That +way they'll be excluded in lower orders. Then when computing probabilities, +we don't take in account bytes excluded. For making the coding easier to +read, I decided to use some new variables: "exc_total_cump", "exc_defined_bytes" +and "exc_max_cump" which are self-explanatory. In the case of order-(-1) +(ordern1 in the code) I chose not to use full exclusions, because order-(-1) +is hardly used. Also if you implement full exclusions, you have to be careful +about a problem that can arise. +

    The change over previous versions is mainly done +in the function "ppmc_get_totf_order2", which has to exclude from the probability +computation the excluded bytes, comparing its value in the "excluded" table. +

    +

    We start by clearing both maximum cumulative probability +and defined bytes, then we read till the end of the linked lists, +and check if current byte is excluded, if it isn't we can update the values. +When the end of the linked list is reached (the pointer "next" is 0) we +break the loop, and then we compute the maximum cumulative probability. +

    The function for coding would first call "ppmc_get_totf_order2", +but now it could happen that due to exclusion there was no byte defined +in the context, thus we have to check it and return a value meaning that +we didn't code the byte. +

    +

    After that we have to compute its probability, the code +used is the modification of the code +used with lazy exclusions, in that case we just have to car about excluded +bytes: +

    +

    Then it's only a matter of coding the byte. But in the +case an escape occurs we do not only have to emit it, but we also have +to update the "excluded" table. As explained before we have to read all +the linked list and then put to 1 the value of the bytes present under +that context: +

    +

    There's room +for optimization. If we code the byte under this context we read the linked +list twice, and if we don't we read it three times, which can be of course +improved. It's possible to make the symbol cumulative probability while +making the maximum cumulative probability, and thus we would avoid reading +the linked list again when coding the symbol. Even we should be able to +not read the linked list again the third time, in that case we should update +the table "excluded" while computing the maximum cumulative probability,  +of course we should update the count after having read it. I haven't tested +the idea nor the code, but with both optimizations the code would look +like the following: +

    +

    In this code we exclude all the symbols which appear +in current order, if an escape occurs that was what we had to do, if we +code a byte, it doesn't matter because then we are not going to use "excluded". +Note we can't apply this optimization to the decoding functions, because +we first need to know the maximum cumulative probability for decoding, +then we have to read the linked list for finding the symbol. +

    The function for decoding starts by doing the same +as the encoding function, that is making "exc_max_cump", and then checking +that there's at least one byte defined in the order. In case an escape +occurs we do the same as when coding. The routine for decoding the symbol +is a modification of the one +used for lazy exclusions, which doesn't take in account excluded bytes. +

    +

    I leave as an exercise to the reader adapting the functions +for order-0 and order-1, if you find any problem have a look at the source +code. +

    Warning: the updating +function assumes that if the byte was not coded under order-2 (or higher) +it didn't exist in the linked list, and that it has the pointer "o2_ll_node" +initialized to the last element in the linked list due to the search done +during the encoding process. This is however not true. It may happen that +all the characters are excluded (due to the fact that they appeared in +higher orders) thus "get_totf" would return a 0 in the variable "exc_defined_bytes". +The encoding function would check it and abort (obviously if there's no +byte in the context we can't code anything). Thus the pointer to the linked +list is not initialized. The solution to this problem is just reading till +the end of the linked list while updating. Note that this is not needed +in the highest order, because none of the symbols is excluded because it's +the higher. Therefore in my code I haven't done this change in the function +for updating of order-4. So be careful if you want to put another higher +order. Use the code for order-3 which takes this in account and has linked +list for probabilities and contexts. +

    The literature tends to say that full exclusion is +excluding the symbols which appeared in higher orders, but this is not +true for ppmz, which uses LOE (Local Order Estimation) for selecting the +best order for current symbol. I.e: let's say the maximum order is 8, but +LOE chooses to try to encode current symbol in order-5, an escape occurs, +and then LOE selects order-3, now we'll use full exclusion, but we don't +exclude all the bytes of higher orders, because we don't know them (at +encoding time we do, but while decoding we won't), it could even happen +that our byte is present there. We only exclude the orders which we have +seen which in this case is only order-5. +
      +

    Compression and speed +results of the different implementations +

    A little comparison between different orders and +implementations is shown in the graphics. O-3h is order-3 hashed, and o-4 +(exc) is ppmc order-4 with full exclusions. The compression is measured +in bpb and the speed in kbyps. + + + + + + +
    +
    Bits Per Byte +
    bpb, Bits Per Byte 
    +
    +
    Kylo Bytes Per Second +
    Kbyps, Kylo BYtes Per Second
    +
    +

    The results which follow correspond to the three different +implementations, ppmc with lazy exclusion, ppmc with full exclusion and +ppmc order-3 hashed. Note that the implementation with full exclusions +could be faster. These are results +for the Calgary Corpus, and at the end you can also see  the results +for the large Canterbury corpus +[16]. +

    Full results:
    +
    +Ppmc lazy exclusion, order-4
    +File    Bpb     Kbyps c   Kbyps d
    +book1   2.4757  136.9617  118.1796
    +book2   2.1989  150.3210  126.6680
    +bib     2.0030  139.9831  123.4259
    +geo     5.3678  45.6621   42.3729
    +news    2.8141  107.7190  94.2877
    +obj1    4.0171  95.4545   77.7778
    +obj2    2.7300  91.2990   79.8110
    +paper1  2.5560  139.8309  106.2715
    +paper2  2.5191  134.3654  113.8373
    +pic     1.3235  190.5656  146.0821
    +progc   2.5625  121.6205  105.6178
    +progl   1.8593  163.9959  131.1967
    +progp   1.8602  151.9442  128.5682
    +trans   1.6749  167.7681  139.8068
    +Average 2.5687  131.2494  109.5645
    +
    +
    +Ppmc order-3 hashed (lazy exclusion)
    +File    Bpb     Kbyps c   Kbyps d
    +book1   2.5415  236.7239  221.6721
    +book2   2.3859  241.8208  227.4374
    +bib     2.2053  229.5723  229.5723
    +geo     4.9209  90.9091   86.9565
    +news    2.8194  203.2762  186.2531
    +obj1    4.0532  190.9091  131.2500
    +obj2    3.0635  191.2931  175.9338
    +paper1  2.6161  189.7705  189.7705
    +paper2  2.5515  210.1613  215.6918
    +pic     1.3430  327.5735  294.8162
    +progc   2.6010  182.4308  182.4308
    +progl   1.9551  257.7079  267.2526
    +progp   1.9208  227.9164  227.9164
    +trans   1.9003  236.5961  236.5961
    +Average 2.6341  215.4758  205.2535
    +
    +
    +Ppmc full exclusion, order-4
    +File    Bpb     Kbyps c  Kbyps d
    +book1   2.2723  21.1874  22.5584
    +book2   1.9947  22.5509  22.9227
    +bib     1.8522  22.4630  23.2361
    +geo     4.7744  7.4906   9.0580
    +news    2.3807  20.8546  17.2488
    +obj1    3.8140  8.3004   10.6061
    +obj2    2.5281  16.7498  18.6700
    +paper1  2.3264  21.0023  21.0856
    +paper2  2.3028  18.6704  20.6977
    +pic     1.2678  20.1848  20.0075
    +progc   2.3511  20.2701  19.7708
    +progl   1.7235  25.3187  23.4280
    +progp   1.7267  22.1865  20.3002
    +trans   1.5737  23.3601  21.8138
    +Average 2.3492  19.3278  19.3860
    +
    +
    +Ppmc full exclusion, order-4
    +File       Bpb     Kbyps c  Kbyps d
    +bible.txt  1.7062  29.5428  28.9945
    +e.coli     1.9560  32.8449  31.6604
    +kjv.guten  1.6801  28.8764  28.3149
    +world192   1.6892  28.1551  27.9047
    +Average    1.7579  29.8548  29.2186
    +
    + +


    References +
    [1] Charles Bloom "Kraft Inequality", http://www.cbloom.com +(Compression : News Postings : Kraft Inequality) +
    [2] Bell, T.C., Cleary, J.G. and Witten, I.H. (1990) +"Text compression", Prentice Hall, Englewood Cliffs, NJ. +
    [3] Charles Bloom "Solving the Problems of Context +Modeling", http://www.cbloom.com +(ppmz.ps) +
    [4] Mark Nelson, Jean-Loup Gaily (1996) "The Data +Compression Book", M&T Books. Source available at Mark's page. +
    [5] Ppmd source code by Dmitry +Shkarin available via ftp1 +or ftp2. +
    [6] Arturo Campos (1999) "Finite context modeling" +available at my home page. +
    [7] Moffat, A. (1988) "A note on the PPM data compression +algorithm," Research Report 88/7, Department of Computer Science, University +of Melbourne, Parkville, Victoria, Australia. +
    [8] Raita, T. and Teuhola, J. (1987) "Predictive +text compression by hashing." ACM conference on Information Retrieval, +New Orleans. +
    [9] Clearly, J.G. and Witten, I.H. (1984) "Data +compression using adaptative coding and partial string matching" IEEE Trans. +Communications, COM-32 (4), 396-402, April. +
    [10] Arturo Campos (1999) "Arithmetic coding" +available at my home page. +
    [11] Arturo Campos (1999) "Range coder" available +at my home page. +
    [12] Michael Schindler. Source code of the range +coder. Available via html. +
    [13] The Context-Tree Weighting Project. http://ei0.ei.ele.tue.nl/~frans/ResearchCTW.html +
    [14] Charles Bloom "LZP: a new data compression +algorithm", http://www.cbloom.com (lzp.ps) +
    [15] Calgary corpus set. ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus +
    [16] large Canterbury corpus http://corpus.canterbury.ac.nz +
      +

    Closing words +

    Now you can use your ppmc implementation for an +stand alone compressor of lossless files, or use it for lossless image +compression, or using it for compressing the output of other compression +algorithms (read next paragraph). However those are not the only uses of +ppmc, you can use it for predicting and use it for an operative system +where the reader needs to input text, or as a help for OCR [2]. +If you want better compression than ppmc you should know that using an +order above 4 with ppmc isn't worth because higher orders produce too much +escape codes. For taking profit of the information that high orders provide +you should have a look at other implementations like ppmd or ppmz. Fortunately +ppmc with hash tables is the perfect base for ppmz [3] +an state of art. I'll write an article about it, so keep an eye to my home +page. +

    If you want to use ppmc +for compressing the output of lzp (literals), you have to change update +functions of higher orders, because currently they expect that coding is +always done before updating. However in this case, a lot of literals are +coded with lzp but have to be updated with ppmc too. For orders lowers +than 2 there's no problem, but for order-2 and highers, we have to read +till the end of the linked list of contexts or probabilities when we want +to put a new element, or we have to read till the current one to update +its count (variables like "o2_ll_node" are not initialized because the +coding has not been done). Such changes are easy to do and are left as +homework, in case you have any doubt email +me. +

    If when implementing ppmc with full exclusions you +find any bug, read the warning about updating. +If still you can't find the bug, have a look at the code, or email +me. This article took long to do, and thus I look forward for your comments +about it, so please take the time to tell me what you think of it, just +as I took the (long) time to write it. +

    I want to thank to Malcolm Taylor for letting me see +his ppmz code which used hash tables, unfortunately later he was too busy +to be able to answer my questions, which at the end I solved myself. +
      +

    Contacting the author +

    You can reach me via email at: arturo@arturocampos.com  +See you in the next article! +
      +
      +

    Arturo San Emeterio Campos, Barcelona 21-March-2000
    + +
    +

    This article comes from Arturo Campos home page +at http://www.arturocampos.com +Visit again soon for new and updated compression articles and software. + + diff --git a/doc/ppmc/ppmc-bpb.gif b/doc/ppmc/ppmc-bpb.gif new file mode 100644 index 0000000..8f1d1ee Binary files /dev/null and b/doc/ppmc/ppmc-bpb.gif differ diff --git a/doc/ppmc/ppmc-kbyps.gif b/doc/ppmc/ppmc-kbyps.gif new file mode 100644 index 0000000..da45eb2 Binary files /dev/null and b/doc/ppmc/ppmc-kbyps.gif differ diff --git a/src/ppmc/Makefile b/src/ppmc/Makefile new file mode 100644 index 0000000..887aa40 --- /dev/null +++ b/src/ppmc/Makefile @@ -0,0 +1,18 @@ + +TARGETS=ppmc unppmc +COMMON= ppmc.o ppmcdata.o range.o + +CFLAGS=-O3 -Wall -DNDEBUG + +all: $(TARGETS) + +ppmc: $(COMMON) ppmcmain.c + +unppmc: $(COMMON) unppmc.c + +clean: + + @$(RM) -f *.o $(TARGETS) + +.PHONY: all clean + diff --git a/src/ppmc/README b/src/ppmc/README new file mode 100644 index 0000000..5e9ae11 --- /dev/null +++ b/src/ppmc/README @@ -0,0 +1,118 @@ + + PPMC + + +TABLE OF CONTENTS +-Description +-Compiling +-Files included in this release +-Timing +-Author +-Disclaimer + + +DESCRIPTION +This is the source code of an implementation of ppmc. +The data structures used are hash tables instead of a context trie. + +A file is compressed and decompressed like that: +ppmc inputfile compressedfile +unppmc compressedfile outputfile + +I don't recommend to use this for compressing vital information, because it +hasn't been fully tested, and moreover, the machine where the decompressor is +being run, must have at least as much memory as the encoder had. I recommend +to use this compressor only for researching pourposes. + +For further information read ac_ppmc.html also included in the package, for +the latest version visit http://www.ross.net/arturocampos + + +COMPILING +The source code is in the C programming language, and was successfully +compiled with Djgpp version 2.02, a project was made which included the +following files for the encoder (call this project ppmc): +ppmc.c +ppmcdata.c +ppmcmain.c +range.c + +And for the decoder (call this project unppmc) +ppmc.c +ppmcdata.c +range.c +unppmc.c + +Then you just have to hit F9 and wait. I tried to do makefiles, however there +was something wrong and it didn't worked. If someone has any idea about it, or +how to compile this source for other compilers, please let me know it, so in +the next release I can include makefiles. + + +FILES INCLUDED IN THIS RELEASE +In the zip file ac_ppmc_html.zip you should find: + +readme.txt -> The file you're reading now. +ac_ppmc.html -> article which explains ppmc and the data structures used +ppmc.c -> This is the main file which includes all the routines used + by both the encoder and decoder's model. +range.c -> The encoder and decoder's routines. +ppmcmain.c -> The main routine for the compressor +unppmc.c -> The main routine for the decompressor +ppmcdata.c -> Global data structures. (used mainly by ppmc.c) +ppmc.h -> Declarations of routines +ppmcdata.h -> Declarations of global data and structures +range.h -> Declarations of the routines for the range encoder and decoder +ppmc.exe -> A compiled version of the compressor +unppmc.exe -> A compiled version of the decompressor + +All this files are the implementation of ppmc order-4 using lazy exclusions. +In the /exclusion directory you can find the same files (unless readme) but +for ppmc order-4 using full exclusion. +I thought there was no need to include the files for ppmc-o3h, because they +are nothing else than ppmc with lazy exclusions, using only order-2 (and +lowers) but instead of using the hash key for order-2 it uses the one for +order-3. (you also have to take care about o3_byte, of course) + +The executables were compiled with Djgpp 2.95 using only the switch -O6. + + +TIMING +The standard function to get the +time "time()", has a maximum precision of seconds. This is not enough for +testing the speed of a compressor. Due to this timing was not included in +this release. +If you are interested on compiling it with Djgpp, my original version used the +following code: + + struct time ppmc_time, ppmc_time2; + double _time2, _time; + + // Get current time + gettime(&ppmc_time); + + //Compress file + + + // Print bpb and kbyps + gettime(&ppmc_time2); + printf("%s at %f bpb in ",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8); + _time=((ppmc_time.ti_min)*60)+(ppmc_time.ti_sec)+(((double)ppmc_time.ti_hund)/100); + _time2=((ppmc_time2.ti_min)*60)+(ppmc_time2.ti_sec)+(((double)ppmc_time2.ti_hund)/100); + if((_time2-_time)!=0) + printf("%f kbytes/seconds.", ((float)size_file_input/(float)1024)/(_time2-_time)); + + +AUTHOR +This code was made by Arturo San Emeterio Campos, you can find his home page +at: http://www.ross.net/arturocampos +And his email is: arturo-campos@mixmail.com + + +DISCLAIMER +Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved. Permission +is granted to make verbatim copies of this files for private use only. There +is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + + Arturo San Emeterio Campos, Barcelona 04-Jan-2000 diff --git a/src/ppmc/exclusion/COMMENTS b/src/ppmc/exclusion/COMMENTS new file mode 100644 index 0000000..bdfc5ab --- /dev/null +++ b/src/ppmc/exclusion/COMMENTS @@ -0,0 +1,15 @@ +Order-3 and order-4 use almost the same code. With lazy exclusions it was +exactly the same. But for full exclusions is not. + +The problem is that when updating we can't be sure that the stored pointer +points where it should to (as it did with lazy exclusions). So we have to +read the whole linked list. This is done for order-2 and order-3. However +order-4 is the highest order used, so it doesn't use exclusion, therefore +it uses the same code as with lazy exclusions. + +So if you want to use higher orders remember that the code to use should +be the one for order-3 because it updates correctly when exclusions are +being used. and use the order-4 code for the highest order. + + Arturo Campos (arturo-campos@mixmail.com) + http://www.arturocampos.com \ No newline at end of file diff --git a/src/ppmc/exclusion/Makefile b/src/ppmc/exclusion/Makefile new file mode 100644 index 0000000..887aa40 --- /dev/null +++ b/src/ppmc/exclusion/Makefile @@ -0,0 +1,18 @@ + +TARGETS=ppmc unppmc +COMMON= ppmc.o ppmcdata.o range.o + +CFLAGS=-O3 -Wall -DNDEBUG + +all: $(TARGETS) + +ppmc: $(COMMON) ppmcmain.c + +unppmc: $(COMMON) unppmc.c + +clean: + + @$(RM) -f *.o $(TARGETS) + +.PHONY: all clean + diff --git a/src/ppmc/exclusion/ppmc.c b/src/ppmc/exclusion/ppmc.c new file mode 100644 index 0000000..0730e1a --- /dev/null +++ b/src/ppmc/exclusion/ppmc.c @@ -0,0 +1,3418 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmc.c" (exclusion) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains the whole ppmc encoder. It uses hash tables for + managing most of the orders. And a maximum order of 4. It codes bytes. + Order-1-0-(-1) are all handled in tables. Order-2 has a table with + direct hashing with pointers to the linked lists. Order-4 and order-3 + both have hash tables with pointers to contexts in a linked lists which + finally have a pointer to the start of the linked list with the + probability distribution. Update exclusion is used, but exclusion is not. + + Please, note that if the machine where the decoder is run doesn't has as + much memory as the computer where the encoder was ran, the decoder will + not be able to properly decode the file, because it will not be able to + keep track of new statistics, in this case it will just exit. + + For applications where the loss of data is not admisible, I suggest you to + limit both encoder and decoder's memory requeriments to a given minimum. + + Using exclusion. It's up to the main encoding routine to clear this table + for every new byte. +*/ + + +#include +#include +#include "range.h" +#include "ppmcdata.h" + + + +// Ruotines used by ppmc. Not including the range coder. +// +// They are for initializing of both encoder and decoder, and unless there +// are two version, both encoder and decoder use the same routines. Like +// "ppmc_initialize_contexts". + + +// This one allocs the memory needed by ppmc, and adjust some pointers used +// for allocating elements in the linked lists. The mempool arrays must be +// initialized now. +void ppmc_alloc_memory(void) +{ + unsigned long counter; + + + // Init mempool buffers + + for(counter=0;counter!=_mempool_max_index;++counter) + { + _bytes_pool_array[counter]=0; + _context_pool_array[counter]=0; + } + + _bytes_pool_index=1; //first entry will be used now + _context_pool_index=1; + + + // Allocate memory for ppmc structures and adjust some variables + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + + //save pointers in the array for freeing + _bytes_pool_array[0]=_bytes_pool; + _context_pool_array[0]=_context_pool; + + + //adjust variables + _bytes_pool_max=_bytes_pool+_bytes_pool_elements; + _context_pool_max=_context_pool+_context_pool_elements; + + ppmc_out_of_memory=0; //we still have memory +} + + +// This routine initializes all the contexts, and all the tables including +// those who care about the number of bytes defined in a context. +void ppmc_initialize_contexts(void) +{ + unsigned long counter, counter2; + + + // Order-0 + for(counter=0;counter!=256;++counter) //clear table + order0_array[counter]=0; + + order0_defined_bytes=0; //adjust variables + order0_max_cump=0; + + + // Order-1 + for(counter=0;counter!=256;++counter) //erase every table of every context + for(counter2=0;counter2!=256;++counter2) + order1_array[counter][counter2]=0; + + for(counter=0;counter!=256;++counter) //adjust variables + { + order1_defined_bytes_array[counter]=0; + order1_max_cump_array[counter]=0; + } + + + // Order-2 + for(counter=0;counter!=65536;++counter) + { + //order2_array[counter].prob=0; //clear pointer to bytes and frequencies + //order2_array[counter].max_cump=0; + order2_array[counter].defined_bytes=0; + } + + + // Order-4-3 + for(counter=0;counter!=65536;++counter) //order-4-3 + { + order4_hasht[counter]=0; + order3_hasht[counter]=0; + } +} + + +// This routine initializes the encode model by outputting as many bytes as +// needed to prepare the models. This should be called before the main loop +// and after the memory has been allocated and tables initialized. +// +// It does not need uses the range coder. It output the first 1 bytes. +void ppmc_encoder_initialize(void) +{ + + // Initialize order-0 and prepare different bytes for orders + fputc((byte=fgetc(file_input)),file_output); + o4_byte=byte; //order-4 + + fputc((byte=fgetc(file_input)),file_output); + o3_byte=byte; //order-3 + + fputc((byte=fgetc(file_input)),file_output); + o2_byte=byte; //order-2 + ppmc_update_order0(); + + fputc((byte=fgetc(file_input)),file_output); + o1_byte=byte; + +} + + +// This routine initializes the decoder model, should be called to do the same +// changes as "ppmc_encoder_initialize()" did. +void ppmc_decoder_initialize(void) +{ + + // Initialize order-0 and context bytes + byte=fgetc(file_input); + o4_byte=byte; //order-4 + fputc(byte,file_output); + + byte=fgetc(file_input); + o3_byte=byte; //order-3 + fputc(byte,file_output); + + byte=fgetc(file_input); + o2_byte=byte; //order-2 + + fputc(byte,file_output); //output first byte + ppmc_update_order0(); + + byte=fgetc(file_input); + o1_byte=byte; //order-1 + fputc(byte,file_output); +} + + +// Once coding or decoding is finished you have to call this routine. +// It must be called when done. +void ppmc_free_memory(void) +{ + unsigned long counter; + + // Free the memory buffers + + for(counter=0;counter!=_mempool_max_index;++counter) + { + if(_bytes_pool_array[counter]!=0) + free(_bytes_pool_array[counter]); + + if(_context_pool_array[counter]!=0) + free(_context_pool_array[counter]); + } + +} + + +// This routine flushes the memory and restarts all the tables of +// probabilities, current order bytes are not modified, this function +// is called when we ran out of memory. We have to output the code +// number 256 which means memory flushing, for doing this we have to go +// to order-(-1) so we have to output an escape code in all the orders +// till we reach order-(-1) where we can code it. Then we free all the +// memory, alloc it again, and reinitialize all the orders. +// +// However we may find the case when the current order is not initialized, +// in this case we don't need to output an escape code. +void ppmc_flush_mem_enc(void) +{ + unsigned long counter; + + + // Clear exclusion table + for(counter=0;counter!=256;++counter) + excluded[counter]=0; + + + // Code an escape code in order-4 + if(ppmc_get_totf_order4()!=0) //if 0 no need of escape code + { + + ppmc_get_escape_prob_order4(); //get prob and cump + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + } + + + // Code an escape code in order-3 + if(ppmc_get_totf_order3()!=0) //if 0 no need of escape code + { + + ppmc_get_escape_prob_order3(); //get prob and cump + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + } + + + // Code an escape code in order-2 + + o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte); + + // First check if current order-2 context is empty + if(order2_array[o2_cntxt].defined_bytes!=0) //it's not empty + { + ppmc_get_totf_order2(); + ppmc_get_escape_prob_order2(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + } + + + // Code an escape code in order-1 + + // First check if current order-1 table is empty + if(order1_defined_bytes_array[o1_byte]!=0) //it's not empty + { + ppmc_get_totf_order1(); + ppmc_get_escape_prob_order1(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + } + + + // Code an escape code in order-0. Order-0 always has at least one symbol + + ppmc_get_totf_order0(); + ppmc_get_escape_prob_order0(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + + + // Now we can code the code 256 + + symb_prob=1; + symb_cump=256; + total_cump=257; + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + + // Now that decoder knows the flushing, free memory and reinit + + ppmc_free_memory(); + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + + + // Be sure that order-0 has at least one probability + + order0_array[o1_byte]++; + order0_max_cump++; + order0_defined_bytes++; + +} + + +// When the decoder gets the symbol of flushing, most of the job is done +// because we already got all the escape codes, so we only have to reinit. +void ppmc_flush_mem_dec(void) +{ + + // Free memory and reinit + + ppmc_free_memory(); + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + + + // Be sure that order-0 has at least one probability + + order0_array[o1_byte]++; + order0_max_cump++; + order0_defined_bytes++; + + +} + + + +// ORDER-(-1) functions, also called ordern1 (Negative1) in functions +// +// Because order-(-1) does not need to update its probability tables, it +// has no tables, and relies on the fact that the cump of byte is its own +// value, and the probability is fixed, 1, and the total cump is 257. +// +// The alphabet has the following distribution: 0-255 the bytes. 256 is +// an special symbol which means that we have flushed the encoder tables, +// and thus the encoder must flush its tables too. +// +// The rest of the tables only have 256 symbols, because we have no need +// of assign a symbol to the flush code (which already is the order-(-1) +// table) nor to the escape code. +// +// For order-(-1) we don't use exclusion. + + +// Gets the probability for a given symbol in the order-(-1) (ordern1) +void ppmc_get_prob_ordern1(void) +{ + symb_cump=byte; //its value + symb_prob=1; //flat probability + total_cump=257; //total cump +} + + +// Returns in the variable "total_cump" the current total cump of +// order-(-1) +void ppmc_get_totf_ordern1(void) +{ + total_cump=257; //this is fixed +} + + +// Returns the symbol for a given cump under order-(-1) +unsigned long ppmc_get_symbol_ordern1 (void) +{ + return symb_cump; +} + + + +// ORDER-0 functions +// +// Due to the fact that order-0 has no context, I use an array for all the +// probabilities under order-0, just as you could do in a basic model for +// arithmetic coding. +// +// The main array is: order0_array. Where order0_array[byte] contains the +// probability for a given byte. The same applies to order-1. +// +// To ensure that the updating and coding is done correctly, "byte" can't +// be changed till all the coding and updating is done. +// +// Order-0 uses exclusions. Exclusion values are always prepared in "get_totf" +// so there's no need to get them again. However order-0 doesn't have to +// update exclude table, because order-(-1) will not use it + + +// Returns in the variable "total_cump" the current total cump of +// order-0. We have to read the whole array because we can't +// guarante that all the bytes are used. +void ppmc_get_totf_order0(void) +{ + unsigned long temp_cump, //temp value for the cump + counter; + + exc_defined_bytes=0; + exc_max_cump=0; + + // Read the number of defined bytes by reading the count of every byte + // and if it's present in the exclusion table. + for(counter=0;counter!=256;++counter) + { + if(excluded[counter]==0) //only if it's not excluded + if(order0_array[counter]!=0) //if it has a nonzero count, then it's present + { + ++exc_defined_bytes; + exc_max_cump+=order0_array[counter]; + } + } + + // Total cump is current total cump plus the probability for the escape code + exc_total_cump=exc_max_cump+exc_defined_bytes; +} + + +// Codes a byte under order-0 and returns 1, otherwise it returns a 0 and +// has coded an escape code. In this case further coding is needed. +// +// Returns: 1 in case a byte was coded. 0 in case of escape code. +char ppmc_code_byte_order0(void) +{ + unsigned long counter; + + ppmc_get_totf_order0(); //get total cump + + // It's possible that due to excluding, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + return 0; + + // See if the byte is present + if(order0_array[byte]==0) //a probability of 0 + { + + // Because it was not present, output an escape code, prepare variables + + symb_cump=exc_max_cump; //obviously its cump is current max_cump + //without escape code's space + + symb_prob=exc_defined_bytes; //the number of defined bytes + + total_cump=exc_total_cump; + + // Code the escape code + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 0; //byte not coded + } + else + { + + coded_in_order=0; + + // The symbol is present, code it under order-0 + + symb_prob=order0_array[byte]; //get probability directly + + // Make cump for current symbol + + symb_cump=0; //for first symbol is 0 + for(counter=0; counter!=byte ; ++counter) + { + if(excluded[counter]==0) + symb_cump+=order0_array[counter]; //sum probabilities before our symbol + } + + total_cump=exc_total_cump; + + // Code the symbol + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 1; //symbol coded under order-0 + } +} + + +// This functions update order-0 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +void ppmc_update_order0(void) +{ + if(order0_array[byte]==0) + { + // It had a zero probability + order0_array[byte]++; //increment symbol probability + ++order0_defined_bytes; //new byte defined + ++order0_max_cump; //total cump + return; + } + else + { + // It had a non-zero probability + + // Increment its probability + order0_array[byte]++; //increment symbol probability + ++order0_max_cump; //total cump + + // Check to see if its the maximum in this case renormalize + if(order0_array[byte]==255) + ppmc_renormalize_order0(); + + return; + } +} + + +// This functions renormalizes the probabilities at order-0 updating variables +void ppmc_renormalize_order0(void) +{ + unsigned long counter; + + // Initialize variables + order0_defined_bytes=0; //clear them + order0_max_cump=0; + + // Loop over all probabilities, divide them by a factor of 2 and update variables + for(counter=0 ; counter!=256 ; ++counter) + { + if(order0_array[counter]!=0) + { + order0_array[counter]>>=1; //divide by a factor of 2 + if(order0_array[counter]==0) + order0_array[counter]=1; + } + + if(order0_array[counter]!=0) //see if it has a non zero probability + order0_defined_bytes++; + + order0_max_cump+=order0_array[counter]; //sum to the total cump + } +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of a escape code it returns -1 +void ppmc_decode_order0(void) +{ + unsigned long current_cump, counter; + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order0(); //total cump needed for decoding + + // It's possible that due to excluding, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + { + byte=-1; + return; + } + + symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=exc_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order0(); + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + for(counter=0 ; counter!= 256 ; ++counter) + { + if(symb_cump>=1; //divide by a factor of 2 + if(order1_array[o1_byte][counter]==0) + order1_array[o1_byte][counter]=1; //don't let it have a 0 count + } + + if(order1_array[o1_byte][counter]!=0) //see if it has a non zero probability + order1_defined_bytes_array[o1_byte]++; + + order1_max_cump_array[o1_byte]+=order1_array[o1_byte][counter]; //sum to the total cump + } +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +void ppmc_decode_order1(void) +{ + unsigned long current_cump, counter; + + + // First check if current order-1 table is empty + if(order1_defined_bytes_array[o1_byte]==0) //it's empty + { + byte=-1; //byte not coded, nothing done + return; + } + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order1(); //total cump needed for decoding + + // It's possible that due to excluding, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + { + byte=-1; + return; + } + + symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=exc_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order1(); + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + + // Now update "exclude" table + for(counter=0;counter!=256;++counter) + if(order1_array[o1_byte][counter]!=0) + excluded[counter]=1; //occurred but was not code, now exclude + + // Mark as escape code (in fact nothing coded) + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + for(counter=0 ; counter!= 256 ; ++counter) + { + if(symb_cumpbyte]==0) + { + exc_defined_bytes++; + exc_max_cump+=node->freq; //add the probability of this byte to the cump + } + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // Total cump is current total cump plus the probability for the escape code + exc_total_cump=exc_max_cump+exc_defined_bytes; + +} + + +// Codes a byte under order-2 and returns 1. +// Otherwise it returns a 0. It may be that it has coded an escape code, or +// that current table was empty. +// +// Returns: 1 in case a byte was coded. 0 in case of escape code or empty table. +// In case the byte is coded under this context, coded_in_order=2. +char ppmc_code_byte_order2(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Initialize o2_cntxt + + o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte); + + + // First check if current order-2 context is empty + if(order2_array[o2_cntxt].defined_bytes==0) //it's empty + { + return 0; //byte not coded, nothing done + } + + + // Now try to code this byte under order-2 + + ppmc_get_totf_order2(); //get total cump + + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + return 0; + + + // See if the byte is present and compute its cump at the same time + + node=order2_array[o2_cntxt].prob; //pointer to first element in the linked list + + symb_cump=0; //the first symbol always has a 0 cump + + + // Now search the byte in the linked list + + do{ + if(node->byte==byte) + goto ppmc_o2_byte_found; //bad thing, I know, anyone has a better idea? + if(excluded[node->byte]==0) + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o2_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=exc_max_cump; + symb_prob=exc_defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + // Then, update "excluded" table + + node=order2_array[o2_cntxt].prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return 0; //now exit + + + // That code is executed when the byte is found in the linked list + + ppmc_o2_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=2; //successfully coded under order-2 + + o2_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-2 +} + + +// This functions update order-2 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// Of course "o2_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. +// +// This updating is only for encoding. +void ppmc_update_order2(void) +{ + struct _byte_and_freq *node; + + + // First of all check if that's the first byte in this context, in that case + // we have to initialize some variables in the context structure. + + if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + order2_array[o2_cntxt].defined_bytes=1; + order2_array[o2_cntxt].max_cump=1; + order2_array[o2_cntxt].prob=_bytes_pool; + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order two, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==2) //coded under order-2 + { + + // Update its count and variables of this context and check for renormalization + + o2_ll_node->freq++; //increment its frequency (rather probability) + + order2_array[o2_cntxt].max_cump++; //total cump + + if(o2_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order2(); //renormalize + + } + else + { + + // Once every paranoid check has been done we are sure that this byte + // did not existed and so we have to create a new node in the linked + // list. Also we have to take care of memory issues. + // + // However due to the use of exclusion, we have to ensure that "o2_ll_node" + // points to the last element in the linked lists of this context + + node=order2_array[o2_cntxt].prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + o2_ll_node=node; + + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o2_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + order2_array[o2_cntxt].max_cump++; //total cump + order2_array[o2_cntxt].defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + +} + + +// This functions renormalizes the probabilities at order-2 updating context +// variables. +void ppmc_renormalize_order2(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + // Initialize variables. Defined bytes remain the same. + order2_array[o2_cntxt].max_cump=0; //clear them + + node=order2_array[o2_cntxt].prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + order2_array[o2_cntxt].max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + + //printf("\nRenormalization, context:%c%c",o2_byte,o1_byte); + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o2_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o2_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order2(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + // Initialize o2_cntxt + + o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte); + + + // First check if current order-2 context is empty + if(order2_array[o2_cntxt].defined_bytes==0) //it's empty + { + byte=-1; //byte not coded, nothing done + return; + } + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order2(); //total cump needed for decoding + + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + { + byte=-1; + return; //byte not coded, nothing done + } + + + symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=exc_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order2(); + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + // Then, update "excluded" table + + node=order2_array[o2_cntxt].prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=order2_array[o2_cntxt].prob; //get pointer to linked lists + + while(1) + { + if(excluded[node->byte]==0) //only if it's not excluded + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o2_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=2; + + return; + } + +} + + +// This is the routine for updating while decoding. We have to search the byte +// in the linked list, if it's present, update its count, otherwise we have +// hitted the end of the linked list, and there we have to create a new node. +// +// Of course if the byte was matched in order-2 we'll have a pointer to it +// in "o2_ll_node" so we don't need to read the linked list. (we already did +// in decoding) +// +// Another case which we also have to specially deal with, this is the case +// when the context has not been initalized yet. +void ppmc_update_dec_order2(void) +{ + struct _byte_and_freq *node; + + + // Handle the case when the context is not initialized + // This code is the same as the one for the encoding. + + if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + order2_array[o2_cntxt].defined_bytes=1; + order2_array[o2_cntxt].max_cump=1; + order2_array[o2_cntxt].prob=_bytes_pool; + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + + return; //nothing else to do + } + + + // Current context is initalized, proceed + + if(coded_in_order==2) //check if it was decoded under order-2 + { + + // We can be sure that the pointer "o2_ll_node" points to its entry, and + // it has a non 0 probability (otherwise it couldn't be coded) so just + // update its probability and max_cump + + o2_ll_node->freq++; //the probability of the byte + order2_array[o2_cntxt].max_cump++; //the max_cump + + if(o2_ll_node->freq==255) //check for renormalization + ppmc_renormalize_order2(); + + } + else + { + + // An escape code was decoded under order-2, we have to read till the + // end of the linked list so we can add a new node for this new byte. + + node=order2_array[o2_cntxt].prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + + // We reached the end of the linked list, add a new node if possible, + // we are using the same code of "ppmc_update_order2()" with the + // difference that the pointer to the linked list is "node" + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + order2_array[o2_cntxt].max_cump++; //total cump + order2_array[o2_cntxt].defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //we are finished updating + + } + +} + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order2(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=exc_defined_bytes; + symb_cump=exc_max_cump; +} + + + +// ORDER-3 functions +// +// The difference between order-3 and order-3 are just a few, instead of +// keeping a table with the context structures, we keep a hash table with +// pointers to linked lists with the context, so it's only a matter of +// searching current context in the linked list corresponding to its hash +// entry. This is done in "ppmc_get_totf_order3" because that's the first +// routine that both encoding and decoding routines call. + + +// Returns in the variable "total_cump" the current total cump of +// order-3. Must be called while encoding or decoding before anything else +// because it initializes the pointers to the context structure in +// "o3_context" and o3_cntxt. +// +// If the hash entry is not initialized it returns "o3_context"=0 +// If the context is not present in the linked list of context, "o3_context" +// will point to the last element in the linked list. +// If the context is present "o3_context" will point to the context to use. +// One can distinguish the last two by checking the context value of the +// structure, if it's not the same, is the last element. +// +// The routine returns 0 if the hash entry is not initialized or if the +// the context was not present. Otherwise it returns 1, meaning that we +// have to code under this context. +char ppmc_get_totf_order3(void) +{ + struct context *cntxt_node; + struct _byte_and_freq *node; + + + // First make the hash key for order-3 + + o3_cntxt=ppmc_order3_hash_key(o1_byte,o2_byte,o3_byte); + full_o3_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16); //order-3 + + + // Now check the hash entry in the table + + if(order3_hasht[o3_cntxt]==0) //if 0, not initialized + { + + o3_context=0; //no hash entry + + return 0; //hash entry not initialized + } + + + // Now read trough the linked list of context searching current one + + cntxt_node=order3_hasht[o3_cntxt]; + + while(1) + { + + if(cntxt_node->order4321==full_o3_cntxt) //compare context + goto ppmc_gtf_cntxt_found; + + if(cntxt_node->next==0) //end of context's linked list + break; + + cntxt_node=cntxt_node->next; //next element + + } + + + // Once there the context was not found + o3_context=cntxt_node; //pointer to last element in the linked list + + return 0; //it was not present + + + // The context is found, so return pointer and cump + + ppmc_gtf_cntxt_found: + + o3_context=cntxt_node; + + // Read the whole linked list for making the values + node=o3_context->prob; + exc_max_cump=0; + exc_defined_bytes=0; + + do{ + if(excluded[node->byte]==0) + { + exc_defined_bytes++; + exc_max_cump+=node->freq; //add the probability of this byte to the cump + } + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // Total cump is current total cump plus the probability for the escape code + exc_total_cump=exc_max_cump+exc_defined_bytes; + + + return 1; //context found + +} + + +// Codes a byte under order-3 and returns 1. +// Otherwise it returns a 0. +// +// In case the byte is coded under this context, coded_in_order=3. +char ppmc_code_byte_order3(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order3()==0) + return 0; + + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + return 0; + + + // See if the byte is present and compute its cump at the same time + + node=o3_context->prob; //pointer to first element in the linked list + + symb_cump=0; //the first symbol always has a 0 cump + + + // Now search the byte in the linked list + + do{ + if(node->byte==byte) + goto ppmc_o3_byte_found; //bad thing, I know, anyone has a better idea? + if(excluded[node->byte]==0) + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o3_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=exc_max_cump; + symb_prob=exc_defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + // Then, update "excluded" table + + node=o3_context->prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return 0; + + + // That code is executed when the byte is found in the linked list + + ppmc_o3_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=3; //successfully coded under order-3 + + o3_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-3 +} + + +// This functions update order-3 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// +// "o3_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. Also "o3_context" must be initialized. +// +// This updating is only for encoding. +void ppmc_update_order3(void) +{ + struct _byte_and_freq *node; + + + // First thing first, check if the hash entry is initialized + + if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order3_hasht[o3_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==3) //coded under order-3 + { + + // Update its count and variables of this context and check for renormalization + + o3_ll_node->freq++; //increment its frequency (rather probability) + + o3_context->max_cump++; //total cump + + if(o3_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order3(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o3_context" points to the last element, so we can put the new element. + + if(o3_context->order4321==full_o3_cntxt) //chech if that's the last + { //element or a context found + + // However due to the use of exclusion, we have to ensure that "o3_ll_node" + // points to the last element in the linked lists of this context + + node=o3_context->prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + o3_ll_node=node; + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o3_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o3_context->max_cump++; //total cump + o3_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // Ensure that we are at the end of the linked list of contexts + o3_context=order3_hasht[o3_cntxt]; + + do{ + if(o3_context->next==0) + break; + o3_context=o3_context->next; + }while(1); + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o3_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + +// This functions renormalizes the probabilities at order-3 updating context +// variables. +void ppmc_renormalize_order3(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Initialize variables. Defined bytes remain the same. + o3_context->max_cump=0; //clear them + + node=o3_context->prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + o3_context->max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o3_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o3_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order3(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order3()==0) + { + byte=-1; + return; + } + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + { + byte=-1; + return; + } + + + + // Decode current cump + + symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=exc_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order3(); + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + // Then, update "excluded" table + + node=o3_context->prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=o3_context->prob; //get pointer to linked lists + + while(1) + { + if(excluded[node->byte]==0) + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o3_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=3; + + return; + } + +} + + +// This is the routine for updating while decoding. The only difference with +// the routine for coding is that when an escape code was coded, "o3_ll_node" +// is not initialized so we have to read till the end of the linked list. +// Fortunately "o3_context" will be initialized so we don't need to read its +// linked list. +void ppmc_update_dec_order3(void) +{ + struct _byte_and_freq *node; + + // First thing first, check if the hash entry is initialized + + if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order3_hasht[o3_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==3) //coded under order-3 + { + + // Update its count and variables of this context and check for renormalization + + o3_ll_node->freq++; //increment its frequency (rather probability) + + o3_context->max_cump++; //total cump + + if(o3_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order3(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o3_context" points to the last element, so we can put the new element. + + if(o3_context->order4321==full_o3_cntxt) //chech if that's the last + { //element or the a context found + + // Read till the end of the linked list + + node=o3_context->prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + // Now add element + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o3_context->max_cump++; //total cump + o3_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o3_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool."); + exit(1); + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order3(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=exc_defined_bytes; + symb_cump=exc_max_cump; +} + + + +// ORDER-4 functions +// +// The routines for order-4 are *equal* to those for order-3, there are a few +// changes like different global variables, and different hash keys. +// +// If you want to go to higher orders, you'd use the same code and data +// structures, with the difference of the context bytes (order4321) +// stored in every context's linked list. + + +// Returns in the variable "total_cump" the current total cump of +// order-4. Must be called while encoding or decoding before anything else +// because it initializes the pointers to the context structure in +// "o4_context" and o4_cntxt. +// +// If the hash entry is not initialized it returns "o4_context"=0 +// If the context is not present in the linked list of context, "o4_context" +// will point to the last element in the linked list. +// If the context is present "o4_context" will point to the context to use. +// One can distinguish the last two by checking the context value of the +// structure, if it's not the same, is the last element. +// +// The routine returns 0 if the hash entry is not initialized or if the +// the context was not present. Otherwise it returns 1, meaning that we +// have to code under this context. +char ppmc_get_totf_order4(void) +{ + struct context *cntxt_node; + struct _byte_and_freq *node; + + + // First make the hash key for order-4 + + o4_cntxt=ppmc_order4_hash_key(o1_byte,o2_byte,o3_byte,o4_byte); + full_o4_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16)+(o4_byte<<24); //order-4 + + + // Now check the hash entry in the table + + if(order4_hasht[o4_cntxt]==0) //if 0, not initialized + { + + o4_context=0; //no hash entry + + return 0; //hash entry not initialized + } + + + // Now read trough the linked list of context searching current one + + cntxt_node=order4_hasht[o4_cntxt]; + + while(1) + { + + if(cntxt_node->order4321==full_o4_cntxt) //compare context + goto ppmc_gtfo4_cntxt_found; + + if(cntxt_node->next==0) //end of context's linked list + break; + + cntxt_node=cntxt_node->next; //next element + + } + + + // Once there the context was not found + o4_context=cntxt_node; //pointer to last element in the linked list + + return 0; //it was not present + + + // The context is found, so return pointer and cump + + ppmc_gtfo4_cntxt_found: + + o4_context=cntxt_node; + + // Read the whole linked list for making the values + node=o4_context->prob; + exc_max_cump=0; + exc_defined_bytes=0; + + do{ + if(excluded[node->byte]==0) + { + exc_defined_bytes++; + exc_max_cump+=node->freq; //add the probability of this byte to the cump + } + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // Total cump is current total cump plus the probability for the escape code + exc_total_cump=exc_max_cump+exc_defined_bytes; + + + return 1; //context found + +} + + +// Codes a byte under order-4 and returns 1. +// Otherwise it returns a 0. +// +// In case the byte is coded under this context, coded_in_order=4. +char ppmc_code_byte_order4(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order4()==0) + return 0; + + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + return 0; + + + // See if the byte is present and compute its cump at the same time + + node=o4_context->prob; //pointer to first element in the linked list + + symb_cump=0; //the first symbol always has a 0 cump + + + // Now search the byte in the linked list + + do{ + if(node->byte==byte) + goto ppmc_o4_byte_found; //bad thing, I know, anyone has a better idea? + if(excluded[node->byte]==0) + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o4_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=exc_max_cump; + symb_prob=exc_defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + // Then, update "excluded" table + + node=o4_context->prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return 0; + + + // That code is executed when the byte is found in the linked list + + ppmc_o4_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=4; //successfully coded under order-4 + + o4_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-4 +} + + +// This functions update order-4 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// +// "o4_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. Also "o4_context" must be initialized. +// +// This updating is only for encoding. +void ppmc_update_order4(void) +{ + + // First thing first, check if the hash entry is initialized + + if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order4_hasht[o4_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool."); + exit(1); + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==4) //coded under order-4 + { + + // Update its count and variables of this context and check for renormalization + + o4_ll_node->freq++; //increment its frequency (rather probability) + + o4_context->max_cump++; //total cump + + if(o4_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order4(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o4_context" points to the last element, so we can put the new element. + + if(o4_context->order4321==full_o4_cntxt) //chech if that's the last + { //element or the a context found + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o4_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o4_context->max_cump++; //total cump + o4_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o4_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool."); + exit(1); + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + +// This functions renormalizes the probabilities at order-4 updating context +// variables. +void ppmc_renormalize_order4(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Initialize variables. Defined bytes remain the same. + o4_context->max_cump=0; //clear them + + node=o4_context->prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + o4_context->max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o4_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o4_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order4(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order4()==0) + { + byte=-1; + return; + } + + + // It's possible that due to exclusion, there's no byte left, in that case + // return. + if(exc_defined_bytes==0) + { + byte=-1; + return; + } + + + // Decode current cump + + symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=exc_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order4(); + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + // Then, update "excluded" table + + node=o4_context->prob; + + do{ + excluded[node->byte]=1; + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=o4_context->prob; //get pointer to linked lists + + while(1) + { + if(excluded[node->byte]==0) + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o4_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=4; + + return; + } + +} + + +// This is the routine for updating while decoding. The only difference with +// the routine for coding is that when an escape code was coded, "o4_ll_node" +// is not initialized so we have to read till the end of the linked list. +// Fortunately "o4_context" will be initialized so we don't need to read its +// linked list. +void ppmc_update_dec_order4(void) +{ + struct _byte_and_freq *node; + + // First thing first, check if the hash entry is initialized + + if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order4_hasht[o4_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool."); + exit(1); + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order four, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==4) //coded under order-4 + { + + // Update its count and variables of this context and check for renormalization + + o4_ll_node->freq++; //increment its frequency (rather probability) + + o4_context->max_cump++; //total cump + + if(o4_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order4(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o4_context" points to the last element, so we can put the new element. + + if(o4_context->order4321==full_o4_cntxt) //chech if that's the last + { //element or the a context found + + // Read till the end of the linked list + + node=o4_context->prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + // Now add element + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o4_context->max_cump++; //total cump + o4_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o4_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool."); + exit(1); + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order4(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=exc_defined_bytes; + symb_cump=exc_max_cump; +} + diff --git a/src/ppmc/exclusion/ppmc.h b/src/ppmc/exclusion/ppmc.h new file mode 100644 index 0000000..392da20 --- /dev/null +++ b/src/ppmc/exclusion/ppmc.h @@ -0,0 +1,134 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmc.h" (exclusions) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + Part of the ppmc encoder and decoder. + + This module contains the definitions of different functions and all the + data structures defined by ppmc. Also contains defines. +*/ + +// Definitions + +#define ppmc_order4_hash_size 65536 +#define ppmc_order4_hash_key(k,j,i,l) ( (k)+(j<<8)+(i<<1)+(l<<9) )& ppmc_order4_hash_size-1 +#define ppmc_order3_hash_size 65536 +#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) & ppmc_order3_hash_size-1 +#define ppmc_order2_hash_key(k,j) ((k)+(j<<8)) +#define _bytes_pool_elements 125000 //this is used the first time + //that we allocate memory, that's + //the number of entries +#define _bytes_pool_elements_inc 125000 //if we need to alloc again, this + //is the number of entries to get +#define _context_pool_elements 50000 +#define _context_pool_elements_inc 50000 + +#define _mempool_max_index 1000 //the number of entries in the array with + //pointers + + +// Data structures + +// This structure contains a single element of a linked lists which contains +// the probability distribution of a given order. This structure takes 6 bytes. +struct _byte_and_freq{ +unsigned char byte; //the byte itself +unsigned char freq; //and the frequency of it +struct _byte_and_freq *next; //pointer to next element in linked list or 0 +}; + + +// This structure is used for both order-3 and order-4. It takes 20 bytes, +// and it can still hold another byte more. (only 19 being used) +// Order 2-1-0-(-1) use different structures for a faster accessing. +struct context{ +struct context *next; //next context in the hash entry +unsigned long order4321; //order-4-3-2-1 (or order-3-2-1 for order-3) +struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution +unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 ) +unsigned int defined_bytes; //the number of bytes in this context +}; + +// That's the same but for order-2 where there's no hash collisions. +struct context_o2{ +struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution +unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 ) +unsigned int defined_bytes; //the number of bytes in this context +}; + + +// Declaration of functions + + +// Functions for initializing +void ppmc_alloc_memory(void); +void ppmc_initialize_contexts(void); +void ppmc_encoder_initialize(void); +void ppmc_decoder_initialize(void); +void ppmc_free_memory(void); +void ppmc_flush_mem_enc(void); +void ppmc_flush_mem_dec(void); + +// Functions for order-(-1) +void ppmc_get_prob_ordern1(void); +unsigned long ppmc_get_symbol_ordern1(void); +void ppmc_get_totf_ordern1(void); +void ppmc_renormalize_order1(void); + +// Functions for order-0 +void ppmc_get_totf_order0(void); +char ppmc_code_byte_order0(void); +void ppmc_update_order0(void); +void ppmc_renormalize_order0(void); +void ppmc_decode_order0(void); +void ppmc_get_escape_prob_order0(void); +void ppmc_get_prob_order0(void); + +// Functions for order-1 +void ppmc_get_totf_order1(void); +char ppmc_code_byte_order1(void); +void ppmc_update_order1(void); +void ppmc_renormalize_order1(void); +void ppmc_decode_order1(void); +void ppmc_get_escape_prob_order1(void); +void ppmc_get_prob_order1(void); + + +// Functions for order-2 +void ppmc_get_totf_order2(void); +char ppmc_code_byte_order2(void); +void ppmc_update_order2(void); +void ppmc_renormalize_order2(void); +void ppmc_decode_order2(void); +void ppmc_update_dec_order2(void); +void ppmc_get_escape_prob_order2(void); +void ppmc_get_prob_order2(void); + + +// Functions for order-3 +char ppmc_get_totf_order3(void); +char ppmc_code_byte_order3(void); +void ppmc_update_order3(void); +void ppmc_renormalize_order3(void); +void ppmc_decode_order3(void); +void ppmc_update_dec_order3(void); +void ppmc_get_escape_prob_order3(void); +void ppmc_get_prob_order3(void); + + +// Functions for order-4 +char ppmc_get_totf_order4(void); +char ppmc_code_byte_order4(void); +void ppmc_update_order4(void); +void ppmc_renormalize_order4(void); +void ppmc_decode_order4(void); +void ppmc_update_dec_order4(void); +void ppmc_get_escape_prob_order4(void); +void ppmc_get_prob_order4(void); + + + diff --git a/src/ppmc/exclusion/ppmcdata.c b/src/ppmc/exclusion/ppmcdata.c new file mode 100644 index 0000000..6bec0cf --- /dev/null +++ b/src/ppmc/exclusion/ppmcdata.c @@ -0,0 +1,132 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcdata.c" (exclusions) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains global data. +*/ + +#include "ppmc.h" //defines +#include "range.h" + +// Order-4 uses a hash table which points to the start of a linked list with +// the different context, which has the cump, the number of different symbols +// and a pointer to the linked list with the bytes and frequencies. +// Order-3 is almost the same, both take 262144 bytes. +struct context *order4_hasht[ppmc_order4_hash_size]; + +struct context *order3_hasht[ppmc_order3_hash_size]; + + +// The array for order-2 is different, as we do directly hashing, and thus +// we have no need to do the stuff of linked lists for the context itself, +// so it contains the context used. This takes 1310720 bytes. +struct context_o2 order2_array[65536]; + + +// Those are the multiple arrays for order-1. It takes 65536 bytes. +unsigned char order1_array[256][256]; +unsigned int order1_defined_bytes_array[256]; //the defined bytes in every context +unsigned int order1_max_cump_array[256]; //max cump of every context + + +// This is the array for order-0. It takes 256 bytes. +unsigned char order0_array[256]; +unsigned int order0_defined_bytes; +unsigned int order0_max_cump; + + +// No need of variables for order-(-1), because it's fixed. + + + +// Those are the pointers and variables used for managing the mem pool for +// both context, and bytes and frequencies. +struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked + //lists with bytes and frequencies + *_bytes_pool_max; //the maximum of this buffer +struct context *_context_pool; //pointer to pool containing contexts +struct context *_context_pool_max; //the same as with _bytes_pool + +unsigned long _bytes_pool_index; //index in array of pointers +unsigned long _context_pool_index; + +//the following is an array keeping pointers to different buffers. A new +//buffer is allocated when the current one is full, so we always have a +//buffer for linked lists. (without allocating a buffer for every element) +struct _byte_and_freq *_bytes_pool_array[_mempool_max_index]; +struct context *_context_pool_array[_mempool_max_index]; + +char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any + //routine that needs to allocate memory must + //quit if that's 1. + + +// Variables which contain current byte to code and order +unsigned long byte, //current byte to code + o1_byte, //order-1 byte + o2_byte, //order-2 byte + o3_byte, //order-3 byte + o4_byte; //order-4 byte + +unsigned long o2_cntxt; //used in the hash key of order-2 +unsigned long o3_cntxt; //use as hash key for order-3 +unsigned long o4_cntxt; //use as hash key for order-4 +unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together +unsigned long full_o4_cntxt; //order-4-3-2-1 + +unsigned long coded_in_order; //in which order the last byte was coded + //it's for update exclusion + //also used for decoding + +// Variables used for coding + +unsigned long + total_cump, //the total cumulative probability + symb_cump, //the symbol cumulative probability + symb_prob; //the symbol frequency + +rangecoder rc_coder; //state of range coder +rangecoder rc_decoder; //state of range decoder + +// File handles + + FILE *file_input, //file to code + *file_output; //file where the coded data is placed + + +// Pointers to linked lists and context structures used for faster updating +// or creation of new nodes, because instead of reading again all the linked +// list till the end (in the case of creation) we have a pointer to the last +// element. In the case that a byte was present in the linked lists but it +// had a 0 count, we just have to update its probability. And in the case +// that it already was present and we coded it under that context or a lower +// one, we just have to update its probability. + +struct _byte_and_freq *o2_ll_node; //pointer to linked lists under order-2 + //where does it points depends in which + //order the byte was coded. +struct _byte_and_freq *o3_ll_node; //the same but for order-3 +struct _byte_and_freq *o4_ll_node; + +struct context *o3_context; //pointer to current order-3 context +struct context *o4_context; //pointer to current order-3 context + + +// Those are the variables especially used for exclusion. + +// the array for doing exclusion. Every routine from every order can +// use them. +unsigned char excluded[256]; //if 0 it was not present, if it's 1 then + //it appeared in a higher order. + +unsigned long exc_total_cump, //total cump of context (after exclusion) + exc_defined_bytes,//defined bytes in context (after exclusion) + exc_max_cump; + diff --git a/src/ppmc/exclusion/ppmcdata.h b/src/ppmc/exclusion/ppmcdata.h new file mode 100644 index 0000000..3d7a2a9 --- /dev/null +++ b/src/ppmc/exclusion/ppmcdata.h @@ -0,0 +1,128 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcdata.h" (exclusions) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains externs definition for global data. +*/ + +#include "ppmc.h" + +// Order-4 uses a hash table which points to the start of a linked list with +// the different context, which has the cump, the number of different symbols +// and a pointer to the linked list with the bytes and frequencies. +// Order-3 is almost the same, both take 262144 bytes. +extern struct context *order4_hasht[]; + +extern struct context *order3_hasht[]; + + +// The array for order-2 is different, as we do directly hashing, and thus +// we have no need to do the stuff of linked lists for the context itself, +// so it contains the context used. This takes 1310720 bytes. +extern struct context_o2 order2_array[]; + + +// Those are the multiple arrays for order-1. It takes 65536 bytes. +extern unsigned char order1_array[256][256]; +extern unsigned int order1_defined_bytes_array[]; //the defined bytes in every context +extern unsigned int order1_max_cump_array[]; //max cump of every context + + +// This is the array for order-0. It takes 256 bytes. +extern unsigned char order0_array[]; +extern unsigned int order0_defined_bytes; +extern unsigned int order0_max_cump; + + +// Those are the pointers and variables used for managing the mem pool for +// both context, and bytes and frequencies. +extern struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked + //lists with bytes and frequencies + *_bytes_pool_max; //the maximum of this buffer +extern struct context *_context_pool; //pointer to pool containing contexts +extern struct context *_context_pool_max; //the same as with _bytes_pool + +extern unsigned long _bytes_pool_index; //index in array of pointers +extern unsigned long _context_pool_index; + +//the following is an array keeping pointers to different buffers. A new +//buffer is allocated when the current one is full, so we always have a +//buffer for linked lists. (without allocating a buffer for every element) +extern struct _byte_and_freq *_bytes_pool_array[_mempool_max_index]; +extern struct context *_context_pool_array[_mempool_max_index]; + +extern char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any + //routine that needs to allocate memory must + //quit if that's 1. + + + +// Variables which contain current byte to code and order +extern unsigned long //they are only bytes + byte, //current byte to code + o1_byte, //order-1 byte + o2_byte, //order-2 byte + o3_byte, //order-3 byte + o4_byte; //order-4 byte + +extern unsigned long o2_cntxt; //used in the hash key of order-2 +extern unsigned long o3_cntxt; //use as hash key for order-3 +unsigned long o4_cntxt; //use as hash key for order-4 +extern unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together +extern unsigned long full_o4_cntxt; //order-4-3-2-1 + +extern unsigned long coded_in_order; //in which order the last byte was coded + //it's for update exclusion + //also used for decoding +// Variables used for coding + +extern unsigned long //no need for negative values + total_cump, //the total cumulative probability + symb_cump, //the symbol cumulative probability + symb_prob; //the symbol frequency + +extern rangecoder rc_coder; //state of range coder +extern rangecoder rc_decoder; //state of range decoder + +// File handles + + FILE *file_input, //file to code + *file_output; //file where the coded data is placed + + + +// Pointers to linked lists and context structures used for faster updating +// or creation of new nodes, because instead of reading again all the linked +// list till the end (in the case of creation) we have a pointer to the last +// element. In the case that a byte was present in the linked lists but it +// had a 0 count, we just have to update its probability. And in the case +// that it already was present and we coded it under that context or a lower +// one, we just have to update its probability. + + +extern struct _byte_and_freq *o2_ll_node;//pointer to linked lists under order-2 + //where does it points depends in which + //order the byte was coded. +extern struct _byte_and_freq *o3_ll_node; //the same but for order-3 +extern struct _byte_and_freq *o4_ll_node; + +extern struct context *o3_context; //pointer to current order-3 context +extern struct context *o4_context; //pointer to current order-3 context + + +// the array for doing exclusion. Every routine from every order can +// use them. +extern unsigned char excluded[256]; //if 0 it was not present, if it's 1 then + //it appeared in a higher order. + +extern unsigned long exc_total_cump, //total cump of context (after exclusion) + exc_defined_bytes,//defined bytes in context (after exclusion) + exc_max_cump; + diff --git a/src/ppmc/exclusion/ppmcmain.c b/src/ppmc/exclusion/ppmcmain.c new file mode 100644 index 0000000..5169de3 --- /dev/null +++ b/src/ppmc/exclusion/ppmcmain.c @@ -0,0 +1,184 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcmain.c" (exclusions) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder only. Using exclusion. + + This module is the main module and calls the different modules to do + the encoding of a file. When done prints bpb and kbyps. +*/ + + +#include +#include +#include "range.h" //the range coder functions and data +#include "ppmcdata.h" + + +long filesize(FILE *stream); + +unsigned long debug=0; + + +//Main +void main (char argc, char *argv[]) +{ + unsigned long counter, //temporal counter for loops like for or while + counter2, //another temporal counter for sub loops + size_file_input; //the size of the input file + + + // Print title + printf("PPMC using range coder. (with exclusion)\n"); + printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n"); + printf("Permission is granted to make verbatim copies of this program for private\n"); + printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n"); + + + + // Check for correct number of parameters + if(argc!=3) + { + printf("Bad number of arguments.\n"); + exit(1); + } + + + // Try to open input and output files + if((file_input=fopen(argv[1],"r+b"))==NULL) + { + printf("Couldn't open %s.\n",argv[1]); + exit(1); + } + + if((file_output=fopen(argv[2],"w+b"))==NULL) + { + printf("Couldn't create %s.\n",argv[2]); + exit(1); + } + + + // Check input file length and not accept 0 length files + size_file_input=filesize(file_input); + + if(size_file_input<5) + { + printf("Can't work with files below than 5 bytes!"); + exit(1); + } + + + // First output file length + fwrite(&size_file_input,1,4,file_output); //input length + + + // Initialize ppmc encoder + ppmc_alloc_memory(); //get memory + ppmc_initialize_contexts(); //initialize model + ppmc_encoder_initialize(); + + // Initialize range coder + range_coder_init(&rc_coder,file_output); + + + // Start main loop which codes the file + while((byte=fgetc(file_input))!=EOF) + { + + // Clear exclusion table + for(counter=0;counter!=256;++counter) + excluded[counter]=0; + + + // Try to code current byte under order-4 if possible then go to lower orders + if(ppmc_code_byte_order4()==0) + if(ppmc_code_byte_order3()==0) + if(ppmc_code_byte_order2()==0) + if(ppmc_code_byte_order1()==0) + if(ppmc_code_byte_order0()==0) //else try to code under order-0 + { + // Code under order-(-1) + ppmc_get_prob_ordern1(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + coded_in_order=0; //update all the tables (unless order-(-1)) + } + + + // Now do update exclusion + + switch(coded_in_order) + { + case 0: ppmc_update_order0(); //update only order-0 + case 1: ppmc_update_order1(); //update order-0 and order-1 + case 2: ppmc_update_order2(); //update order-2 1 and 0... + case 3: ppmc_update_order3(); + case 4: ppmc_update_order4(); + default: break; + }; + + + // Update order variables + + o4_byte=o3_byte; + o3_byte=o2_byte; + o2_byte=o1_byte; + o1_byte=byte; //current one is next time order-1 + + debug++; + + // Check if we run out of memory, in that case, flush the encoder + + if(ppmc_out_of_memory==1) + { + printf("Flushing memory! Output file might be not decodable.\n"); + ppmc_flush_mem_enc(); + } + + + } + + + // Flush range coder + range_coder_flush(&rc_coder); + + // Free memory + ppmc_free_memory(); + + + // Print bpb and kbyps + printf("%s at %f bpb.\n",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8); + + + // Close file handles + fclose(file_input); + fclose(file_output); + + + + // Nicely exit + exit(0); +} + + +// Routines not used by ppmc but rather by main. +// Not including the range coder. + + +// Returns the file size of a given file. +long filesize(FILE *stream) +{ + long curpos, length; + + curpos = ftell(stream); + fseek(stream, 0L, SEEK_END); + length = ftell(stream); + fseek(stream, curpos, SEEK_SET); + return length; +} + + diff --git a/src/ppmc/exclusion/range.c b/src/ppmc/exclusion/range.c new file mode 100644 index 0000000..7be4ce6 --- /dev/null +++ b/src/ppmc/exclusion/range.c @@ -0,0 +1,221 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "range.c" (exclusion) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + This module contains the routines of both the range coder and decoder. + + The range coder works internally in 32 bits, and uses bytes as symbols. + Also the end of message symbol is used. So z=257. + + Both input and output use rc_file as the file stream. Of course we can't + code and decode at the same time. All the input or output comes from the + same file, no matter what range coder structure are we using. The modules + here provided don't manage the io except for reading and writing, they + don't open nor close the files. The reading and writing is done via + putc and getc. +*/ + +#include "range.h" + +/* + Inits the range coder state. Must be called before encoding any symbol. + It uses a magic number 0xB3 as the first byte outputted. + -rangecoder *rc, the range coder to be used. + + Shoulde be called like that: + range_coder_init(&o0_rc_state,file_output); +*/ +void range_coder_init(rangecoder *rc, FILE *stream) +{ + rc_file=stream; + rc->low=0; //define state + rc->range=0x80000000; + rc->byte_buffer=0xB3; //magic number + rc->help=0; //temp value +} + + +/* + Encodes a symbol. + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative frequency + -unsigned long lt_f, the cumulative probabilty of the symbol + -unsigned long sy_f, the probability of the symbol +*/ +void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f) +{ + unsigned long temp, r; + + if(lt_f>tot_f) + { + printf("BUG!!"); + exit(1); + } + +// printf("\nc> %d,%d,%d ",tot_f,lt_f,sy_f); + + range_coder_renormalize(rc); //&rc? + + r=rc->range/tot_f; + temp=r*lt_f; + if(lt_f+sy_frange=r*sy_f; + else + rc->range-=temp; + rc->low+=temp; +} + +/* + Renormalizes the state when coding. + -rangecoder *rc, the range coder to be used. +*/ + +void range_coder_renormalize(rangecoder *rc) +{ + while(rc->range<=(unsigned long)0x00800000) + { + if(rc->low<(unsigned long)0x7F800000) + { + putc(rc->byte_buffer,rc_file); + for(;rc->help;rc->help--) + putc(0xFF,rc_file); + rc->byte_buffer=(unsigned char)(rc->low>>23); + } + else + { + if(rc->low&(unsigned long)0x80000000) + { + putc(rc->byte_buffer+1,rc_file); + for(;rc->help;rc->help--) + putc(0x00,rc_file); + rc->byte_buffer=(unsigned char)(rc->low>>23); + } + else + rc->help++; + } + rc->range<<=8; + rc->low=(rc->low<<8)&(unsigned long)(0x7FFFFFFF); + } +} + + +/* + Flushes the encoder. Must be called when the coding is done. + -rangecoder *rc, the range coder to be used. + + Shoulde be called like that: + range_coder_flush(&o0_rc_state); +*/ +void range_coder_flush(rangecoder *rc) +{ + unsigned long tmp; + + range_coder_renormalize(rc); + tmp = rc->low >> 23; + if (tmp > 0xff) + { + putc(rc->byte_buffer+1,rc_file); + for(; rc->help; rc->help--) + putc(0,rc_file); + } + else + { + putc(rc->byte_buffer,rc_file); + for(; rc->help; rc->help--) + putc(0xff,rc_file); + } + + putc(tmp & 0xff,rc_file); + putc((tmp = rc->low >> (23-8)) & 0xff,rc_file); +} + + +/* + Inits the range decoder state. Also checks for the magic number, and + quits in case it isn't the first, so be careful. + -rangecoder *rc, the range coder to be used. +*/ +void range_decoder_init(rangecoder *rc, FILE *stream) +{ + unsigned int _rd_c; + + rc_file=stream; + if((_rd_c=getc(rc_file))!=0xB3) + { + printf("\nThis is not range coded data. Magic number not found. Exiting."); + exit(1); + } + rc->byte_buffer=getc(rc_file); + rc->low=rc->byte_buffer>>1; + rc->range=0x80; +} + + +/* + Decode a symbol, get its cumulative probability. + Input: + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative probability + Output: + -unsigned long, cumulative probability of the current symbol + Should be called like that: + current_cump=range_decoder_decode(&o0_rc_state,o0_tot_f); +*/ +unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f) +{ + unsigned long temp; + + range_decoder_renormalize(rc); + rc->help=rc->range/tot_f; + temp=rc->low/rc->help; + if(temp>=tot_f) + return tot_f-1; + else + return temp; +} + + +/* + Updates the state so next symbol can be decoded. + Input: + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative probability + -unsigned long lt_f, the cumulative probabilty of the symbol + -unsigned long sy_f, the probability of the symbol + +*/ +void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f) +{ + unsigned long temp; + +// printf("\nd> %d,%d,%d ",tot_f,lt_f,sy_f); + + temp=rc->help*lt_f; + rc->low-=temp; + if(lt_f+sy_frange=rc->help*sy_f; + else + rc->range-=temp; +} + + +/* + Renormalizes the state while decoding. + -rangecoder *rc, the range coder to be used. +*/ +void range_decoder_renormalize(rangecoder *rc) +{ + while(rc->range<=0x00800000) + { + rc->low=(rc->low<<8)|((rc->byte_buffer<<7)&0xFF); + rc->byte_buffer=getc(rc_file); + rc->low |= rc->byte_buffer >> (1); + rc->range<<=8; + } +} + diff --git a/src/ppmc/exclusion/range.h b/src/ppmc/exclusion/range.h new file mode 100644 index 0000000..23721e4 --- /dev/null +++ b/src/ppmc/exclusion/range.h @@ -0,0 +1,39 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "range.h" (exclusions) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Declarations for the coder. +*/ + +#include + +typedef struct{ + unsigned long low, range, help; + unsigned char byte_buffer; +}rangecoder; + +FILE *rc_file; + +void range_coder_init(rangecoder *rc, FILE *stream); //coding routines +void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f); +void range_coder_renormalize(rangecoder *rc); +void range_coder_flush(rangecoder *rc); +void range_decoder_init(rangecoder *rc, FILE *stream);//decoding routines +unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f); +void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f); +void range_decoder_renormalize(rangecoder *rc); + + +typedef unsigned long code_value; +#define CODE_BITS 32 +#define Top_value ((code_value)1 << (CODE_BITS-1)) +#define SHIFT_BITS (CODE_BITS - 9) +#define EXTRA_BITS ((CODE_BITS-2) % 8 + 1) +#define Bottom_value (Top_value >> 8) +#define outbyte(cod,x) putc(x,stdout) +#define inbyte(cod) getc(stdin) diff --git a/src/ppmc/exclusion/unppmc.c b/src/ppmc/exclusion/unppmc.c new file mode 100644 index 0000000..bfb3aaf --- /dev/null +++ b/src/ppmc/exclusion/unppmc.c @@ -0,0 +1,209 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "unppmc.c" (exclusion) + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc decoder. + + This module is the main module and calls the different modules to do + the decoding of a file. When done prints kbyps. +*/ + + +// Bibliotecas necesarias +#include +#include +#include "range.h" //the range coder functions and data +#include "ppmcdata.h" + + +// Declaracion de funciones del ppmcmain.c +long filesize(FILE *stream); + + + + +//Main +void main (int argc, char *argv[]) +{ + unsigned long counter, //temporal counter for loops like for or while + counter2, //another temporal counter for sub loops + size_file_output, //the size of the output file + main_counter; //used in main + char expected_flush=0; //used for checking flushing which can't be done + + + // Print title and version. + printf("UNPPMC using range coder.\n"); + printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n"); + printf("Permission is granted to make verbatim copies of this program for private\n"); + printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n"); + + + // Check for correct number of parameters + if(argc!=3) + { + printf("Bad number of arguments.\n"); + exit(1); + } + + + // Try to open input and output files + if((file_input=fopen(argv[1],"r+b"))==NULL) + { + printf("Couldn't open %s.\n",argv[1]); + exit(1); + } + + if((file_output=fopen(argv[2],"w+b"))==NULL) + { + printf("Couldn't create %s.\n",argv[2]); + exit(1); + } + + + // Get output length + fread(&size_file_output,1,4,file_input); + + + // Initialize ppmc decoder + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + ppmc_decoder_initialize(); + + + + // Initialize decoder + range_decoder_init(&rc_decoder,file_input); + + + // Start main loop which decodes the file + main_counter=size_file_output-4; //take in account the bytes already written + expected_flush=0; //we don't expect a flush yet + + while(main_counter!=0) + { + + // Clear exclusion table + for(counter=0;counter!=256;++counter) + excluded[counter]=0; + +// Try to decode current byte in order-4 if possible, else in lower ones +ppmc_decode_order4(); +if(byte==-1) + ppmc_decode_order3(); + if(byte==-1) + { + ppmc_decode_order2(); + if(byte==-1) + { + ppmc_decode_order1(); + if(byte==-1) + { + ppmc_decode_order0(); + if(byte==-1) //check if it was an escape code + { + // Decode in order-(-1) + ppmc_get_totf_ordern1(); + symb_cump=range_decoder_decode(&rc_decoder,total_cump); + byte=ppmc_get_symbol_ordern1(); + ppmc_get_prob_ordern1(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + coded_in_order=0; //update all orders + + // Now see if it's the code of flushing + + if(symb_cump==256) + { + printf("Flushing.\n"); + ppmc_flush_mem_dec(); + expected_flush=0; + continue; //do not output byte nor update + } + + } + } + } + } + + // Output byte and update model + + fputc(byte,file_output); + + switch(coded_in_order) //update exclusion + { + case 0: ppmc_update_order0(); //update only order-0 + case 1: ppmc_update_order1(); //update order-0 and order-1 + case 2: ppmc_update_dec_order2(); //update order-0 1 and 2 + case 3: ppmc_update_dec_order3(); + case 4: ppmc_update_dec_order4(); + default: break; + }; + + + // Check if flushing has to be done and has not been done. + // This is optional, in case you limit the memory usage, you don't + // need to include this + + if(expected_flush==1) // If flushing didn't happen, we can't decode + { + printf("Can't decompress file. Not enough memory.\nTry in a machine with more memory.\n"); + exit(1); + } + if(ppmc_out_of_memory==1) + { + expected_flush=1; // Next code must be a flush code, otherwise we don't + // have enough memory, and therefore we can't decode + } + + + // Update order variables + + o4_byte=o3_byte; + o3_byte=o2_byte; + o2_byte=o1_byte; + o1_byte=byte; //current one, is next time order-1 + + // Byte decoded and model updated, loop + main_counter--; + +//printf("\n%d",size_file_output-main_counter); + + } + + + ppmc_free_memory(); + + // Close file handles and free memory + fclose(file_input); + fclose(file_output); + + + + + // Nicely exit + exit(0); +} + + +// Ruotines not used by ppmc but rather by main. +// Not including the range coder. + + +// Returns the file size of a given file. +long filesize(FILE *stream) +{ + long curpos, length; + + curpos = ftell(stream); + fseek(stream, 0L, SEEK_END); + length = ftell(stream); + fseek(stream, curpos, SEEK_SET); + return length; +} + + diff --git a/src/ppmc/ppmc.c b/src/ppmc/ppmc.c new file mode 100644 index 0000000..cd7f63c --- /dev/null +++ b/src/ppmc/ppmc.c @@ -0,0 +1,3084 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmc.c" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains the whole ppmc encoder. It uses hash tables for + managing most of the orders. And a maximum order of 4. It codes bytes. + Order-1-0-(-1) are all handled in tables. Order-2 has a table with + direct hashing with pointers to the linked lists. Order-4 and order-3 + both have hash tables with pointers to contexts in a linked lists which + finally have a pointer to the start of the linked list with the + probability distribution. Update exclusion is used, but exclusion is not. + + Please, note that if the machine where the decoder is run doesn't has as + much memory as the computer where the encoder was ran, the decoder will + not be able to properly decode the file, because it will not be able to + keep track of new statistics, in this case it will just exit. + + For applications where the loss of data is not admisible, I suggest you to + limit both encoder and decoder's memory requeriments to a given minimum + which both machines have. +*/ + + +#include +#include +#include "range.h" +#include "ppmcdata.h" + + + +// Ruotines used by ppmc. Not including the range coder. +// +// They are for initializing of both encoder and decoder, and unless there +// are two version, both encoder and decoder use the same routines. Like +// "ppmc_initialize_contexts". + + +// This one allocs the memory needed by ppmc, and adjust some pointers used +// for allocating elements in the linked lists. The mempool arrays must be +// initialized now. +void ppmc_alloc_memory(void) +{ + unsigned long counter; + + + // Init mempool buffers + + for(counter=0;counter!=_mempool_max_index;++counter) + { + _bytes_pool_array[counter]=0; + _context_pool_array[counter]=0; + } + + _bytes_pool_index=1; //first entry will be used now + _context_pool_index=1; + + + // Allocate memory for ppmc structures and adjust some variables + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + + //save pointers in the array for freeing + _bytes_pool_array[0]=_bytes_pool; + _context_pool_array[0]=_context_pool; + + + //adjust variables + _bytes_pool_max=_bytes_pool+_bytes_pool_elements; + _context_pool_max=_context_pool+_context_pool_elements; + + ppmc_out_of_memory=0; //we still have memory +} + + +// This routine initializes all the contexts, and all the tables including +// those who care about the number of bytes defined in a context. +void ppmc_initialize_contexts(void) +{ + unsigned long counter, counter2; + + + // Order-0 + for(counter=0;counter!=256;++counter) //clear table + order0_array[counter]=0; + + order0_defined_bytes=0; //adjust variables + order0_max_cump=0; + + + // Order-1 + for(counter=0;counter!=256;++counter) //erase every table of every context + for(counter2=0;counter2!=256;++counter2) + order1_array[counter][counter2]=0; + + for(counter=0;counter!=256;++counter) //adjust variables + { + order1_defined_bytes_array[counter]=0; + order1_max_cump_array[counter]=0; + } + + + // Order-2 + for(counter=0;counter!=65536;++counter) + { + //order2_array[counter].prob=0; //clear pointer to bytes and frequencies + //order2_array[counter].max_cump=0; + order2_array[counter].defined_bytes=0; + } + + + // Order-4-3 + for(counter=0;counter!=65536;++counter) //order-4-3 + { + order4_hasht[counter]=0; + order3_hasht[counter]=0; + } +} + + +// This routine initializes the encode model by outputting as many bytes as +// needed to prepare the models. This should be called before the main loop +// and after the memory has been allocated and tables initialized. +// +// It does not need uses the range coder. It output the first 1 bytes. +void ppmc_encoder_initialize(void) +{ + + // Initialize order-0 and prepare different bytes for orders + fputc((byte=fgetc(file_input)),file_output); + o4_byte=byte; //order-4 + + fputc((byte=fgetc(file_input)),file_output); + o3_byte=byte; //order-3 + + fputc((byte=fgetc(file_input)),file_output); + o2_byte=byte; //order-2 + ppmc_update_order0(); + + fputc((byte=fgetc(file_input)),file_output); + o1_byte=byte; + +} + + +// This routine initializes the decoder model, should be called to do the same +// changes as "ppmc_encoder_initialize()" did. +void ppmc_decoder_initialize(void) +{ + + // Initialize order-0 and context bytes + byte=fgetc(file_input); + o4_byte=byte; //order-4 + fputc(byte,file_output); + + byte=fgetc(file_input); + o3_byte=byte; //order-3 + fputc(byte,file_output); + + byte=fgetc(file_input); + o2_byte=byte; //order-2 + + fputc(byte,file_output); //output first byte + ppmc_update_order0(); + + byte=fgetc(file_input); + o1_byte=byte; //order-1 + fputc(byte,file_output); +} + + +// Once coding or decoding is finished you have to call this routine. +// It must be called when done. +void ppmc_free_memory(void) +{ + unsigned long counter; + + // Free the memory buffers + + for(counter=0;counter!=_mempool_max_index;++counter) + { + if(_bytes_pool_array[counter]!=0) + free(_bytes_pool_array[counter]); + + if(_context_pool_array[counter]!=0) + free(_context_pool_array[counter]); + } + +} + + +// This routine flushes the memory and restarts all the tables of +// probabilities, current order bytes are not modified, this function +// is called when we ran out of memory. We have to output the code +// number 256 which means memory flushing, for doing this we have to go +// to order-(-1) so we have to output an escape code in all the orders +// till we reach order-(-1) where we can code it. Then we free all the +// memory, alloc it again, and reinitialize all the orders. +// +// However we may find the case when the current order is not initialized, +// in this case we don't need to output an escape code. +void ppmc_flush_mem_enc(void) +{ + + // Code an escape code in order-4 + if(ppmc_get_totf_order4()!=0) //if 0 no need of escape code + { + + ppmc_get_escape_prob_order4(); //get prob and cump + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + } + + + // Code an escape code in order-3 + if(ppmc_get_totf_order3()!=0) //if 0 no need of escape code + { + + ppmc_get_escape_prob_order3(); //get prob and cump + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + } + + + // Code an escape code in order-2 + + o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte); + + // First check if current order-2 context is empty + if(order2_array[o2_cntxt].defined_bytes!=0) //it's not empty + { + ppmc_get_totf_order2(); + ppmc_get_escape_prob_order2(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + } + + + // Code an escape code in order-1 + + // First check if current order-1 table is empty + if(order1_defined_bytes_array[o1_byte]!=0) //it's not empty + { + ppmc_get_totf_order1(); + ppmc_get_escape_prob_order1(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + } + + + // Code an escape code in order-0. Order-0 always has at least one symbol + + ppmc_get_totf_order0(); + ppmc_get_escape_prob_order0(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + + + // Now we can code the code 256 + + symb_prob=1; + symb_cump=256; + total_cump=257; + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + + // Now that decoder knows the flushing, free memory and reinit + + ppmc_free_memory(); + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + + + // Be sure that order-0 has at least one probability + + order0_array[o1_byte]++; + order0_max_cump++; + order0_defined_bytes++; + +} + + +// When the decoder gets the symbol of flushing, most of the job is done +// because we already got all the escape codes, so we only have to reinit. +void ppmc_flush_mem_dec(void) +{ + + // Free memory and reinit + + ppmc_free_memory(); + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + + + // Be sure that order-0 has at least one probability + + order0_array[o1_byte]++; + order0_max_cump++; + order0_defined_bytes++; + + +} + + + +// ORDER-(-1) functions, also called ordern1 (Negative1) in functions +// +// Because order-(-1) does not need to update its probability tables, it +// has no tables, and relies on the fact that the cump of byte is its own +// value, and the probability is fixed, 1, and the total cump is 257. +// +// The alphabet has the following distribution: 0-255 the bytes. 256 is +// an special symbol which means that we have flushed the encoder tables, +// and thus the encoder must flush its tables too. +// +// The rest of the tables only have 256 symbols, because we have no need +// of assign a symbol to the flush code (which already is the order-(-1) +// table) nor to the escape code. + + +// Gets the probability for a given symbol in the order-(-1) (ordern1) +void ppmc_get_prob_ordern1(void) +{ + symb_cump=byte; //its value + symb_prob=1; //flat probability + total_cump=257; //total cump +} + + +// Returns in the variable "total_cump" the current total cump of +// order-(-1) +void ppmc_get_totf_ordern1(void) +{ + total_cump=257; //this is fixed +} + + +// Returns the symbol for a given cump under order-(-1) +unsigned long ppmc_get_symbol_ordern1 (void) +{ + return symb_cump; +} + + + +// ORDER-0 functions +// +// Due to the fact that order-0 has no context, I use an array for all the +// probabilities under order-0, just as you could do in a basic model for +// arithmetic coding. +// +// The main array is: order0_array. Where order0_array[byte] contains the +// probability for a given byte. The same applies to order-1. +// +// To ensure that the updating and coding is done correctly, "byte" can't +// be changed till all the coding and updating is done. + + +// Returns in the variable "total_cump" the current total cump of +// order-0 +void ppmc_get_totf_order0(void) +{ + // Total cump is current total cump plus the escape for the escape code + total_cump=order0_defined_bytes+order0_max_cump; +} + + +// Codes a byte under order-0 and returns 1, otherwise it returns a 0 and +// has coded an escape code. In this case further coding is needed. +// +// Returns: 1 in case a byte was coded. 0 in case of escape code. +char ppmc_code_byte_order0(void) +{ + unsigned long counter; + + ppmc_get_totf_order0(); //get total cump + + // See if the byte is present + if(order0_array[byte]==0) //a probability of 0 + { + + // Because it was not present, output an escape code, prepare variables + + symb_cump=order0_max_cump; //obviously its cump is current max_cump + //without escape code's space + + symb_prob=order0_defined_bytes; //the number of defined bytes + + // Code the escape code + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 0; //byte not coded + } + else + { + + coded_in_order=0; + + // The symbol is present, code it under order-0 + + symb_prob=order0_array[byte]; //get probability directly + + // Make cump for current symbol + + symb_cump=0; //for first symbol is 0 + for(counter=0; counter!=byte ; ++counter) + symb_cump+=order0_array[counter]; //sum probabilities before our symbol + + // Code the symbol + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 1; //symbol coded under order-0 + } +} + + +// This functions update order-0 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +void ppmc_update_order0(void) +{ + if(order0_array[byte]==0) + { + // It had a zero probability + order0_array[byte]++; //increment symbol probability + ++order0_defined_bytes; //new byte defined + ++order0_max_cump; //total cump + return; + } + else + { + // It had a non-zero probability + + // Increment its probability + order0_array[byte]++; //increment symbol probability + ++order0_max_cump; //total cump + + // Check to see if its the maximum in this case renormalize + if(order0_array[byte]==255) + ppmc_renormalize_order0(); + + return; + } +} + + +// This functions renormalizes the probabilities at order-0 updating variables +void ppmc_renormalize_order0(void) +{ + unsigned long counter; + + // Initialize variables + order0_defined_bytes=0; //clear them + order0_max_cump=0; + + // Loop over all probabilities, divide them by a factor of 2 and update variables + for(counter=0 ; counter!=256 ; ++counter) + { + order0_array[counter]>>=1; //divide by a factor of 2 + + if(order0_array[counter]!=0) //see if it has a non zero probability + order0_defined_bytes++; + + order0_max_cump+=order0_array[counter]; //sum to the total cump + } +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of a escape code it returns -1 +void ppmc_decode_order0(void) +{ + unsigned long current_cump, counter; + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order0(); //total cump needed for decoding + symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=order0_max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order0(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + for(counter=0 ; counter!= 256 ; ++counter) + { + if(symb_cump>=1; //divide by a factor of 2 + + if(order1_array[o1_byte][counter]!=0) //see if it has a non zero probability + order1_defined_bytes_array[o1_byte]++; + + order1_max_cump_array[o1_byte]+=order1_array[o1_byte][counter]; //sum to the total cump + } +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +void ppmc_decode_order1(void) +{ + unsigned long current_cump, counter; + + + // First check if current order-1 table is empty + if(order1_defined_bytes_array[o1_byte]==0) //it's empty + { + byte=-1; //byte not coded, nothing done + return; + } + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order1(); //total cump needed for decoding + symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=order1_max_cump_array[o1_byte]) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order1(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + for(counter=0 ; counter!= 256 ; ++counter) + { + if(symb_cumpbyte==byte) + goto ppmc_o2_byte_found; //bad thing, I know, anyone has a better idea? + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o2_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=order2_array[o2_cntxt].max_cump; + symb_prob=order2_array[o2_cntxt].defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 0; + + + // That code is executed when the byte is found in the linked list + + ppmc_o2_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=2; //successfully coded under order-2 + + o2_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-2 +} + + +// This functions update order-2 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// Of course "o2_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. +// +// This updating is only for encoding. +void ppmc_update_order2(void) +{ + + // First of all check if that's the first byte in this context, in that case + // we have to initialize some variables in the context structure. + + if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + order2_array[o2_cntxt].defined_bytes=1; + order2_array[o2_cntxt].max_cump=1; + order2_array[o2_cntxt].prob=_bytes_pool; + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order two, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==2) //coded under order-2 + { + + // Update its count and variables of this context and check for renormalization + + o2_ll_node->freq++; //increment its frequency (rather probability) + + order2_array[o2_cntxt].max_cump++; //total cump + + if(o2_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order2(); //renormalize + + } + else + { + + // Once every paranoid check has been done we are sure that this byte + // did not existed and so we have to create a new node in the linked + // list. Also we have to take care of memory issues. + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o2_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + order2_array[o2_cntxt].max_cump++; //total cump + order2_array[o2_cntxt].defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + +} + + +// This functions renormalizes the probabilities at order-2 updating context +// variables. +void ppmc_renormalize_order2(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + // Initialize variables. Defined bytes remain the same. + order2_array[o2_cntxt].max_cump=0; //clear them + + node=order2_array[o2_cntxt].prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + order2_array[o2_cntxt].max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + + //printf("\nRenormalization, context:%c%c",o2_byte,o1_byte); + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o2_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o2_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order2(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + // Initialize o2_cntxt + + o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte); + + + // First check if current order-2 context is empty + if(order2_array[o2_cntxt].defined_bytes==0) //it's empty + { + byte=-1; //byte not coded, nothing done + return; + } + + + // Get the total cump needed for decoding symbol + ppmc_get_totf_order2(); //total cump needed for decoding + symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=order2_array[o2_cntxt].max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order2(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=order2_array[o2_cntxt].prob; //get pointer to linked lists + + while(1) + { + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o2_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=2; + + return; + } + +} + + +// This is the routine for updating while decoding. We have to search the byte +// in the linked list, if it's present, update its count, otherwise we have +// hitted the end of the linked list, and there we have to create a new node. +// +// Of course if the byte was matched in order-2 we'll have a pointer to it +// in "o2_ll_node" so we don't need to read the linked list. (we already did +// in decoding) +// +// Another case which we also have to specially deal with, this is the case +// when the context has not been initalized yet. +void ppmc_update_dec_order2(void) +{ + struct _byte_and_freq *node; + + + // Handle the case when the context is not initialized + // This code is the same as the one for the encoding. + + if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + order2_array[o2_cntxt].defined_bytes=1; + order2_array[o2_cntxt].max_cump=1; + order2_array[o2_cntxt].prob=_bytes_pool; + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + + return; //nothing else to do + } + + + // Current context is initalized, proceed + + if(coded_in_order==2) //check if it was decoded under order-2 + { + + // We can be sure that the pointer "o2_ll_node" points to its entry, and + // it has a non 0 probability (otherwise it couldn't be coded) so just + // update its probability and max_cump + + o2_ll_node->freq++; //the probability of the byte + order2_array[o2_cntxt].max_cump++; //the max_cump + + if(o2_ll_node->freq==255) //check for renormalization + ppmc_renormalize_order2(); + + } + else + { + + // An escape code was decoded under order-2, we have to read till the + // end of the linked list so we can add a new node for this new byte. + + node=order2_array[o2_cntxt].prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + + // We reached the end of the linked list, add a new node if possible, + // we are using the same code of "ppmc_update_order2()" with the + // difference that the pointer to the linked list is "node" + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + order2_array[o2_cntxt].max_cump++; //total cump + order2_array[o2_cntxt].defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //we are finished updating + + } + +} + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order2(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=order2_array[o2_cntxt].defined_bytes; + symb_cump=order2_array[o2_cntxt].max_cump; +} + + + +// ORDER-3 functions +// +// The difference between order-3 and order-3 are just a few, instead of +// keeping a table with the context structures, we keep a hash table with +// pointers to linked lists with the context, so it's only a matter of +// searching current context in the linked list corresponding to its hash +// entry. This is done in "ppmc_get_totf_order3" because that's the first +// routine that both encoding and decoding routines call. + + +// Returns in the variable "total_cump" the current total cump of +// order-3. Must be called while encoding or decoding before anything else +// because it initializes the pointers to the context structure in +// "o3_context" and o3_cntxt. +// +// If the hash entry is not initialized it returns "o3_context"=0 +// If the context is not present in the linked list of context, "o3_context" +// will point to the last element in the linked list. +// If the context is present "o3_context" will point to the context to use. +// One can distinguish the last two by checking the context value of the +// structure, if it's not the same, is the last element. +// +// The routine returns 0 if the hash entry is not initialized or if the +// the context was not present. Otherwise it returns 1, meaning that we +// have to code under this context. +char ppmc_get_totf_order3(void) +{ + struct context *cntxt_node; + + + // First make the hash key for order-3 + + o3_cntxt=ppmc_order3_hash_key(o1_byte,o2_byte,o3_byte); + full_o3_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16); //order-3 + + + // Now check the hash entry in the table + + if(order3_hasht[o3_cntxt]==0) //if 0, not initialized + { + + o3_context=0; //no hash entry + + return 0; //hash entry not initialized + } + + + // Now read trough the linked list of context searching current one + + cntxt_node=order3_hasht[o3_cntxt]; + + while(1) + { + + if(cntxt_node->order4321==full_o3_cntxt) //compare context + goto ppmc_gtf_cntxt_found; + + if(cntxt_node->next==0) //end of context's linked list + break; + + cntxt_node=cntxt_node->next; //next element + + } + + + // Once there the context was not found + o3_context=cntxt_node; //pointer to last element in the linked list + + return 0; //it was not present + + + // The context is found, so return pointer and cump + + ppmc_gtf_cntxt_found: + + o3_context=cntxt_node; + + // Total cump is current total cump plus the escape for the escape code + + total_cump=o3_context->defined_bytes+o3_context->max_cump; + + return 1; //context found + +} + + +// Codes a byte under order-3 and returns 1. +// Otherwise it returns a 0. +// +// In case the byte is coded under this context, coded_in_order=3. +char ppmc_code_byte_order3(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order3()==0) + return 0; + + + // See if the byte is present and compute its cump at the same time + + node=o3_context->prob; //pointer to first element in the linked list + + symb_cump=0; //the first symbol always has a 0 cump + + + // Now search the byte in the linked list + + do{ + if(node->byte==byte) + goto ppmc_o3_byte_found; //bad thing, I know, anyone has a better idea? + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o3_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=o3_context->max_cump; + symb_prob=o3_context->defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 0; + + + // That code is executed when the byte is found in the linked list + + ppmc_o3_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=3; //successfully coded under order-3 + + o3_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-3 +} + + +// This functions update order-3 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// +// "o3_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. Also "o3_context" must be initialized. +// +// This updating is only for encoding. +void ppmc_update_order3(void) +{ + + // First thing first, check if the hash entry is initialized + + if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order3_hasht[o3_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==3) //coded under order-3 + { + + // Update its count and variables of this context and check for renormalization + + o3_ll_node->freq++; //increment its frequency (rather probability) + + o3_context->max_cump++; //total cump + + if(o3_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order3(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o3_context" points to the last element, so we can put the new element. + + if(o3_context->order4321==full_o3_cntxt) //chech if that's the last + { //element or the a context found + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o3_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o3_context->max_cump++; //total cump + o3_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o3_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + +// This functions renormalizes the probabilities at order-3 updating context +// variables. +void ppmc_renormalize_order3(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Initialize variables. Defined bytes remain the same. + o3_context->max_cump=0; //clear them + + node=o3_context->prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + o3_context->max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o3_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o3_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order3(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order3()==0) + { + byte=-1; + return; + } + + + // Decode current cump + + symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=o3_context->max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order3(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=o3_context->prob; //get pointer to linked lists + + while(1) + { + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o3_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=3; + + return; + } + +} + + +// This is the routine for updating while decoding. The only difference with +// the routine for coding is that when an escape code was coded, "o3_ll_node" +// is not initialized so we have to read till the end of the linked list. +// Fortunately "o3_context" will be initialized so we don't need to read its +// linked list. +void ppmc_update_dec_order3(void) +{ + struct _byte_and_freq *node; + + // First thing first, check if the hash entry is initialized + + if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order3_hasht[o3_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==3) //coded under order-3 + { + + // Update its count and variables of this context and check for renormalization + + o3_ll_node->freq++; //increment its frequency (rather probability) + + o3_context->max_cump++; //total cump + + if(o3_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order3(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o3_context" points to the last element, so we can put the new element. + + if(o3_context->order4321==full_o3_cntxt) //chech if that's the last + { //element or the a context found + + // Read till the end of the linked list + + node=o3_context->prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + // Now add element + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o3_context->max_cump++; //total cump + o3_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o3_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o3_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order3(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=o3_context->defined_bytes; + symb_cump=o3_context->max_cump; +} + + + +// ORDER-4 functions +// +// The routines for order-4 are *equal* to those for order-3, there are a few +// changes like different global variables, and different hash keys. +// +// If you want to go to higher orders, you'd use the same code and data +// structures, with the difference of the context bytes (order4321) +// stored in every context's linked list. + + +// Returns in the variable "total_cump" the current total cump of +// order-4. Must be called while encoding or decoding before anything else +// because it initializes the pointers to the context structure in +// "o4_context" and o4_cntxt. +// +// If the hash entry is not initialized it returns "o4_context"=0 +// If the context is not present in the linked list of context, "o4_context" +// will point to the last element in the linked list. +// If the context is present "o4_context" will point to the context to use. +// One can distinguish the last two by checking the context value of the +// structure, if it's not the same, is the last element. +// +// The routine returns 0 if the hash entry is not initialized or if the +// the context was not present. Otherwise it returns 1, meaning that we +// have to code under this context. +char ppmc_get_totf_order4(void) +{ + struct context *cntxt_node; + + + // First make the hash key for order-4 + + o4_cntxt=ppmc_order4_hash_key(o1_byte,o2_byte,o3_byte,o4_byte); + full_o4_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16)+(o4_byte<<24); //order-4 + + + // Now check the hash entry in the table + + if(order4_hasht[o4_cntxt]==0) //if 0, not initialized + { + + o4_context=0; //no hash entry + + return 0; //hash entry not initialized + } + + + // Now read trough the linked list of context searching current one + + cntxt_node=order4_hasht[o4_cntxt]; + + while(1) + { + + if(cntxt_node->order4321==full_o4_cntxt) //compare context + goto ppmc_gtfo4_cntxt_found; + + if(cntxt_node->next==0) //end of context's linked list + break; + + cntxt_node=cntxt_node->next; //next element + + } + + + // Once there the context was not found + o4_context=cntxt_node; //pointer to last element in the linked list + + return 0; //it was not present + + + // The context is found, so return pointer and cump + + ppmc_gtfo4_cntxt_found: + + o4_context=cntxt_node; + + // Total cump is current total cump plus the escape for the escape code + + total_cump=o4_context->defined_bytes+o4_context->max_cump; + + return 1; //context found + +} + + +// Codes a byte under order-4 and returns 1. +// Otherwise it returns a 0. +// +// In case the byte is coded under this context, coded_in_order=4. +char ppmc_code_byte_order4(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order4()==0) + return 0; + + + // See if the byte is present and compute its cump at the same time + + node=o4_context->prob; //pointer to first element in the linked list + + symb_cump=0; //the first symbol always has a 0 cump + + + // Now search the byte in the linked list + + do{ + if(node->byte==byte) + goto ppmc_o4_byte_found; //bad thing, I know, anyone has a better idea? + symb_cump+=node->freq; //add the probability of this byte to the cump + if(node->next==0) + break; + node=node->next; //next element in the linked list + }while(1); + + + // If we reach that point, the byte was not found in the linked list + // so we don't need the cump, we have to output an escape code, whose + // probabilities are know using the context structure in the table. + + // Byte was not present in the linked list, current node is the last one, + // and that's the node needed for creating a new node, save it. + + o4_ll_node=node; + + // Now get the probability and cump of the escape code + + symb_cump=o4_context->max_cump; + symb_prob=o4_context->defined_bytes; + + // Code the escape code + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 0; + + + // That code is executed when the byte is found in the linked list + + ppmc_o4_byte_found: + + + // Everything has been tested, now we can feel free to code the byte, + // the symb_cump is already computed, now get its probability and code + // the byte, also save pointer to this element in the linked lists for + // updating. + + coded_in_order=4; //successfully coded under order-4 + + o4_ll_node=node; //save it for updating + + symb_prob=node->freq; //get the probability of the byte + + // Code it. + + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + + return 1; //byte coded under order-4 +} + + +// This functions update order-4 probabilities with current byte, also takes +// care about updating variables, and renormalizing. +// +// "o4_ll_node" must point to the element to update or the last one, +// based on the "coded_in_order" and checking the pointer of the element we +// can decide what to do. Also "o4_context" must be initialized. +// +// This updating is only for encoding. +void ppmc_update_order4(void) +{ + + // First thing first, check if the hash entry is initialized + + if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order4_hasht[o4_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + + + // The byte was coded under order three, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==4) //coded under order-4 + { + + // Update its count and variables of this context and check for renormalization + + o4_ll_node->freq++; //increment its frequency (rather probability) + + o4_context->max_cump++; //total cump + + if(o4_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order4(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o4_context" points to the last element, so we can put the new element. + + if(o4_context->order4321==full_o4_cntxt) //chech if that's the last + { //element or the a context found + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + o4_ll_node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o4_context->max_cump++; //total cump + o4_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o4_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + +// This functions renormalizes the probabilities at order-4 updating context +// variables. +void ppmc_renormalize_order4(void) +{ + unsigned long counter; + struct _byte_and_freq *node; + + + // Initialize variables. Defined bytes remain the same. + o4_context->max_cump=0; //clear them + + node=o4_context->prob; //get pointer to lined lists + + // Divide all the probabilities by 2 and update context variables + while(1) + { + node->freq>>=1; //divide by a factor of 2 + + if(node->freq==0) //don't allow a probability to be 0 + node->freq=1; + + o4_context->max_cump+=node->freq; //sum to the total cump + + if(node->next==0) //last element + break; + node=node->next; + } + + +} + + +// Decodes the symbol correspoding to the current order, it returns the byte +// or in case of an escape code or empty table it returns -1. +// It updates "coded_in_order". +// +// If we decode an escape code, we don't modify "o4_ll_node" and thus its +// work of the updating routine to read till the end of the linked list +// (for adding a new element) however if we decode a byte, we save on +// "o4_ll_node" the pointer to its node. (so updating is faster) +void ppmc_decode_order4(void) +{ + unsigned long current_cump, counter; + struct _byte_and_freq *node; + + + // Get current context (if present) and total cump. + + if(ppmc_get_totf_order4()==0) + { + byte=-1; + return; + } + + + // Decode current cump + + symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it + + // Now check if it's an escape code + if(symb_cump>=o4_context->max_cump) //the defined code space for the escape code + { + + // Update coding values + + ppmc_get_escape_prob_order4(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Mark as escape code + + byte=-1; + + return; //an escape code + } + else + { + // Now we have to check what symbol it is + + current_cump=0; //cump of the current symbol + + node=o4_context->prob; //get pointer to linked lists + + while(1) + { + current_cump+=node->freq; //update cump + if(symb_cumpnext; //next element + //we have no need to check for the last symbol, we'll never read further + //the end of the linked lists, before we'll found the last byte. + } + + + //read byte value and probability + + symb_prob=node->freq; //get the probability for updating the state + byte=node->byte; //get byte + o4_ll_node=node; //used for updating + + + // Get the cump of byte + + symb_cump=current_cump-symb_prob; + + // Update coding state + + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + + // Update coded_in_order used for update exclusion + + coded_in_order=4; + + return; + } + +} + + +// This is the routine for updating while decoding. The only difference with +// the routine for coding is that when an escape code was coded, "o4_ll_node" +// is not initialized so we have to read till the end of the linked list. +// Fortunately "o4_context" will be initialized so we don't need to read its +// linked list. +void ppmc_update_dec_order4(void) +{ + struct _byte_and_freq *node; + + // First thing first, check if the hash entry is initialized + + if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet + { + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + order4_hasht[o4_cntxt]=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + return; //nothing else to do + } + + + // The byte was coded under order four, otherwise it was coded under a + // lower order (never higher ones, remember that we are using update + // exclusion) in this case we have to create a new node in the list. + + if(coded_in_order==4) //coded under order-4 + { + + // Update its count and variables of this context and check for renormalization + + o4_ll_node->freq++; //increment its frequency (rather probability) + + o4_context->max_cump++; //total cump + + if(o4_ll_node->freq==255) //do we have to renormalize? + ppmc_renormalize_order4(); //renormalize + + } + else + { + + // Now we have two cases, under a given context (which we actually found) + // we coded an escape coded, in that case just create a new node in the + // linked list of bytes and probabilities. Otherwise we didn't find the + // same node so we have to create it in the linked list for context. + // And we can be sure that it at least has one element and that + // "o4_context" points to the last element, so we can put the new element. + + if(o4_context->order4321==full_o4_cntxt) //chech if that's the last + { //element or the a context found + + // Read till the end of the linked list + + node=o4_context->prob; //get pointer to linked list + + while(1) + { + if(node->next==0) //check for the end of the linked list + break; + node=node->next; //next node + } + + // Now add element + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate mem + + node->next=_bytes_pool; //put it in the next free entry + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + o4_context->max_cump++; //total cump + o4_context->defined_bytes++; //total cump + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + } + else + { + + // We have to create a new context node + + if(ppmc_out_of_memory==1) + return; //exit this function, we can't allocate memory + + + // First create the context + + o4_context->next=_context_pool; + + _context_pool->next=0; //this is the last element + _context_pool->order4321=full_o4_cntxt; //put context + _context_pool->prob=_bytes_pool; //pointer to linked list + _context_pool->max_cump=1; + _context_pool->defined_bytes=1; + + + // Do update of linked list variables and memory use of contexts + + ++_context_pool; //next time use next entry (this is a pointer) + + if(_context_pool==_context_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_context_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_context_pool=(struct context *)malloc + (sizeof(struct context)*_context_pool_elements_inc))==NULL) + { + printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _context_pool_array[_context_pool_index]=_context_pool; + + _context_pool_max=_context_pool+_context_pool_elements_inc; + + _context_pool_index++; + + } + + // Now care about the first (and last) linked list element + + _bytes_pool->byte=byte; //initialize byte to current one + _bytes_pool->freq=1; //it appeared once + _bytes_pool->next=0; //now this is last element in ll + + // Do update of linked list variables and memory use + + ++_bytes_pool; //next time use next entry (this is a pointer) + + if(_bytes_pool==_bytes_pool_max) //maximum reached + { + + // First check to see that we still have entries in the array + + if(_bytes_pool_index==_mempool_max_index) + { + ppmc_out_of_memory=1; //flush + return; + } + + // Allocate memory for new buffer + + if((_bytes_pool=(struct _byte_and_freq *)malloc + (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL) + { + printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index); + ppmc_out_of_memory=1; //flush + return; + } + + _bytes_pool_array[_bytes_pool_index]=_bytes_pool; + + _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc; + + _bytes_pool_index++; + + } + + } + + } + +} + + + +// This function returns the probability for the escape codes in the variables +void ppmc_get_escape_prob_order4(void) +{ + // To understand that remember that the escape allocated for the escape code + // is above the current maximum cump and that it has a probability determined + // by the scheme C. + + symb_prob=o4_context->defined_bytes; + symb_cump=o4_context->max_cump; +} + diff --git a/src/ppmc/ppmc.h b/src/ppmc/ppmc.h new file mode 100644 index 0000000..e21545e --- /dev/null +++ b/src/ppmc/ppmc.h @@ -0,0 +1,135 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmc.h" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains the definitions of different functions and all the + data structures defined by ppmc. Also contains defines. +*/ + +// Definitions + +#define ppmc_order4_hash_size 65536 +#define ppmc_order4_hash_key(k,j,i,l) ( (k)+(j<<8)+(i<<1)+(l<<9) )& ppmc_order4_hash_size-1 +#define ppmc_order3_hash_size 65536 +#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) & ppmc_order3_hash_size-1 +#define ppmc_order2_hash_key(k,j) ((k)+(j<<8)) +#define _bytes_pool_elements 125000 //this is used the first time + //that we allocate memory, that's + //the number of entries +#define _bytes_pool_elements_inc 125000 //if we need to alloc again, this + //is the number of entries to get +#define _context_pool_elements 50000 +#define _context_pool_elements_inc 50000 + +#define _mempool_max_index 1000 //the number of entries in the array with + //pointers + + +// Data structures + +// This structure contains a single element of a linked lists which contains +// the probability distribution of a given order. This structure takes 6 bytes. +struct _byte_and_freq{ +unsigned char byte; //the byte itself +unsigned char freq; //and the frequency of it +struct _byte_and_freq *next; //pointer to next element in linked list or 0 +}; + + +// This structure is used for both order-3 and order-4. It takes 20 bytes, +// and it can still hold another byte more. (only 19 being used) +// Order 2-1-0-(-1) use different structures for a faster accessing. +struct context{ +struct context *next; //next context in the hash entry +unsigned long order4321; //order-4-3-2-1 (or order-3-2-1 for order-3) +struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution +unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 ) +unsigned int defined_bytes; //the number of bytes in this context +}; + +// That's the same but for order-2 where there's no hash collisions. +struct context_o2{ +struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution +unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 ) +unsigned int defined_bytes; //the number of bytes in this context +}; + + +// Declaration of functions + + +// Functions for initializing +void ppmc_alloc_memory(void); +void ppmc_initialize_contexts(void); +void ppmc_encoder_initialize(void); +void ppmc_decoder_initialize(void); +void ppmc_free_memory(void); +void ppmc_flush_mem_enc(void); +void ppmc_flush_mem_dec(void); + +// Functions for order-(-1) +void ppmc_get_prob_ordern1(void); +unsigned long ppmc_get_symbol_ordern1(void); +void ppmc_get_totf_ordern1(void); +void ppmc_renormalize_order1(void); + +// Functions for order-0 +void ppmc_get_totf_order0(void); +char ppmc_code_byte_order0(void); +void ppmc_update_order0(void); +void ppmc_renormalize_order0(void); +void ppmc_decode_order0(void); +void ppmc_get_escape_prob_order0(void); +void ppmc_get_prob_order0(void); + +// Functions for order-1 +void ppmc_get_totf_order1(void); +char ppmc_code_byte_order1(void); +void ppmc_update_order1(void); +void ppmc_renormalize_order1(void); +void ppmc_decode_order1(void); +void ppmc_get_escape_prob_order1(void); +void ppmc_get_prob_order1(void); + + +// Functions for order-2 +void ppmc_get_totf_order2(void); +char ppmc_code_byte_order2(void); +void ppmc_update_order2(void); +void ppmc_renormalize_order2(void); +void ppmc_decode_order2(void); +void ppmc_update_dec_order2(void); +void ppmc_get_escape_prob_order2(void); +void ppmc_get_prob_order2(void); + + +// Functions for order-3 +char ppmc_get_totf_order3(void); +char ppmc_code_byte_order3(void); +void ppmc_update_order3(void); +void ppmc_renormalize_order3(void); +void ppmc_decode_order3(void); +void ppmc_update_dec_order3(void); +void ppmc_get_escape_prob_order3(void); +void ppmc_get_prob_order3(void); + + +// Functions for order-4 +char ppmc_get_totf_order4(void); +char ppmc_code_byte_order4(void); +void ppmc_update_order4(void); +void ppmc_renormalize_order4(void); +void ppmc_decode_order4(void); +void ppmc_update_dec_order4(void); +void ppmc_get_escape_prob_order4(void); +void ppmc_get_prob_order4(void); + + + diff --git a/src/ppmc/ppmcdata.c b/src/ppmc/ppmcdata.c new file mode 100644 index 0000000..8ff5c5c --- /dev/null +++ b/src/ppmc/ppmcdata.c @@ -0,0 +1,119 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcdata.c" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains global data. +*/ + +#include "ppmc.h" //defines +#include "range.h" + +// Order-4 uses a hash table which points to the start of a linked list with +// the different context, which has the cump, the number of different symbols +// and a pointer to the linked list with the bytes and frequencies. +// Order-3 is almost the same, both take 262144 bytes. +struct context *order4_hasht[ppmc_order4_hash_size]; + +struct context *order3_hasht[ppmc_order3_hash_size]; + + +// The array for order-2 is different, as we do directly hashing, and thus +// we have no need to do the stuff of linked lists for the context itself, +// so it contains the context used. This takes 1310720 bytes. +struct context_o2 order2_array[65536]; + + +// Those are the multiple arrays for order-1. It takes 65536 bytes. +unsigned char order1_array[256][256]; +unsigned int order1_defined_bytes_array[256]; //the defined bytes in every context +unsigned int order1_max_cump_array[256]; //max cump of every context + + +// This is the array for order-0. It takes 256 bytes. +unsigned char order0_array[256]; +unsigned int order0_defined_bytes; +unsigned int order0_max_cump; + + +// No need of variables for order-(-1), because it's fixed. + + +// Those are the pointers and variables used for managing the mem pool for +// both context, and bytes and frequencies. +struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked + //lists with bytes and frequencies + *_bytes_pool_max; //the maximum of this buffer +struct context *_context_pool; //pointer to pool containing contexts +struct context *_context_pool_max; //the same as with _bytes_pool + +unsigned long _bytes_pool_index; //index in array of pointers +unsigned long _context_pool_index; + +//the following is an array keeping pointers to different buffers. A new +//buffer is allocated when the current one is full, so we always have a +//buffer for linked lists. (without allocating a buffer for every element) +struct _byte_and_freq *_bytes_pool_array[_mempool_max_index]; +struct context *_context_pool_array[_mempool_max_index]; + +char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any + //routine that needs to allocate memory must + //quit if that's 1. + + +// Variables which contain current byte to code and order +unsigned long byte, //current byte to code + o1_byte, //order-1 byte + o2_byte, //order-2 byte + o3_byte, //order-3 byte + o4_byte; //order-4 byte + +unsigned long o2_cntxt; //used in the hash key of order-2 +unsigned long o3_cntxt; //use as hash key for order-3 +unsigned long o4_cntxt; //use as hash key for order-4 +unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together +unsigned long full_o4_cntxt; //order-4-3-2-1 + +unsigned long coded_in_order; //in which order the last byte was coded + //it's for update exclusion + //also used for decoding + +// Variables used for coding + +unsigned long + total_cump, //the total cumulative probability + symb_cump, //the symbol cumulative probability + symb_prob; //the symbol frequency + +rangecoder rc_coder; //state of range coder +rangecoder rc_decoder; //state of range decoder + +// File handles + + FILE *file_input, //file to code + *file_output; //file where the coded data is placed + + +// Pointers to linked lists and context structures used for faster updating +// or creation of new nodes, because instead of reading again all the linked +// list till the end (in the case of creation) we have a pointer to the last +// element. In the case that a byte was present in the linked lists but it +// had a 0 count, we just have to update its probability. And in the case +// that it already was present and we coded it under that context or a lower +// one, we just have to update its probability. + + +struct _byte_and_freq *o2_ll_node; //pointer to linked lists under order-2 + //where does it points depends in which + //order the byte was coded. +struct _byte_and_freq *o3_ll_node; //the same but for order-3 +struct _byte_and_freq *o4_ll_node; + +struct context *o3_context; //pointer to current order-3 context +struct context *o4_context; //pointer to current order-3 context diff --git a/src/ppmc/ppmcdata.h b/src/ppmc/ppmcdata.h new file mode 100644 index 0000000..f4bddc1 --- /dev/null +++ b/src/ppmc/ppmcdata.h @@ -0,0 +1,117 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcdata.h" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder and decoder. + + This module contains externs definition for global data. +*/ + +#include "ppmc.h" + +// Order-4 uses a hash table which points to the start of a linked list with +// the different context, which has the cump, the number of different symbols +// and a pointer to the linked list with the bytes and frequencies. +// Order-3 is almost the same, both take 262144 bytes. +extern struct context *order4_hasht[]; + +extern struct context *order3_hasht[]; + + +// The array for order-2 is different, as we do directly hashing, and thus +// we have no need to do the stuff of linked lists for the context itself, +// so it contains the context used. This takes 1310720 bytes. +extern struct context_o2 order2_array[]; + + +// Those are the multiple arrays for order-1. It takes 65536 bytes. +extern unsigned char order1_array[256][256]; +extern unsigned int order1_defined_bytes_array[]; //the defined bytes in every context +extern unsigned int order1_max_cump_array[]; //max cump of every context + + +// This is the array for order-0. It takes 256 bytes. +extern unsigned char order0_array[]; +extern unsigned int order0_defined_bytes; +extern unsigned int order0_max_cump; + + +// Those are the pointers and variables used for managing the mem pool for +// both context, and bytes and frequencies. +extern struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked + //lists with bytes and frequencies + *_bytes_pool_max; //the maximum of this buffer +extern struct context *_context_pool; //pointer to pool containing contexts +extern struct context *_context_pool_max; //the same as with _bytes_pool + +extern unsigned long _bytes_pool_index; //index in array of pointers +extern unsigned long _context_pool_index; + +//the following is an array keeping pointers to different buffers. A new +//buffer is allocated when the current one is full, so we always have a +//buffer for linked lists. (without allocating a buffer for every element) +extern struct _byte_and_freq *_bytes_pool_array[_mempool_max_index]; +extern struct context *_context_pool_array[_mempool_max_index]; + +extern char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any + //routine that needs to allocate memory must + //quit if that's 1. + + + +// Variables which contain current byte to code and order +extern unsigned long //they are only bytes + byte, //current byte to code + o1_byte, //order-1 byte + o2_byte, //order-2 byte + o3_byte, //order-3 byte + o4_byte; //order-4 byte + +extern unsigned long o2_cntxt; //used in the hash key of order-2 +extern unsigned long o3_cntxt; //use as hash key for order-3 +unsigned long o4_cntxt; //use as hash key for order-4 +extern unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together +extern unsigned long full_o4_cntxt; //order-4-3-2-1 + +extern unsigned long coded_in_order; //in which order the last byte was coded + //it's for update exclusion + //also used for decoding +// Variables used for coding + +extern unsigned long //no need for negative values + total_cump, //the total cumulative probability + symb_cump, //the symbol cumulative probability + symb_prob; //the symbol frequency + +extern rangecoder rc_coder; //state of range coder +extern rangecoder rc_decoder; //state of range decoder + +// File handles + + FILE *file_input, //file to code + *file_output; //file where the coded data is placed + + + +// Pointers to linked lists and context structures used for faster updating +// or creation of new nodes, because instead of reading again all the linked +// list till the end (in the case of creation) we have a pointer to the last +// element. In the case that a byte was present in the linked lists but it +// had a 0 count, we just have to update its probability. And in the case +// that it already was present and we coded it under that context or a lower +// one, we just have to update its probability. + + +extern struct _byte_and_freq *o2_ll_node;//pointer to linked lists under order-2 + //where does it points depends in which + //order the byte was coded. +extern struct _byte_and_freq *o3_ll_node; //the same but for order-3 +extern struct _byte_and_freq *o4_ll_node; + +extern struct context *o3_context; //pointer to current order-3 context +extern struct context *o4_context; //pointer to current order-3 context diff --git a/src/ppmc/ppmcmain.c b/src/ppmc/ppmcmain.c new file mode 100644 index 0000000..3e83e7c --- /dev/null +++ b/src/ppmc/ppmcmain.c @@ -0,0 +1,176 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "ppmcmain.c" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Part of the ppmc encoder only. + + This module is the main module and calls the different modules to do + the encoding of a file. When done prints bpb and kbyps. +*/ + + +#include +#include +#include "range.h" //the range coder functions and data +#include "ppmcdata.h" + + +long filesize(FILE *stream); + + + +//Main +int main (char argc, char *argv[]) +{ + unsigned long counter, //temporal counter for loops like for or while + counter2, //another temporal counter for sub loops + size_file_input; //the size of the input file + + + // Print title, version and copyright + printf("PPMC using range coder. (without exclusion)\n"); + printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n"); + printf("Permission is granted to make verbatim copies of this program for private\n"); + printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n"); + + // Check for correct number of parameters + if(argc!=3) + { + printf("Bad number of arguments.\n"); + exit(1); + } + + + // Try to open input and output files + if((file_input=fopen(argv[1],"r+b"))==NULL) + { + printf("Couldn't open %s.\n",argv[1]); + exit(1); + } + + if((file_output=fopen(argv[2],"w+b"))==NULL) + { + printf("Couldn't create %s.\n",argv[2]); + exit(1); + } + + + // Check input file length and not accept 0 length files + size_file_input=filesize(file_input); + + if(size_file_input<5) + { + printf("Can't work with files below than 5 bytes!"); + exit(1); + } + + + // First output file length + fwrite(&size_file_input,1,4,file_output); //input length + + + // Initialize ppmc encoder + ppmc_alloc_memory(); //get memory + ppmc_initialize_contexts(); //initialize model + ppmc_encoder_initialize(); + + // Initialize range coder + range_coder_init(&rc_coder,file_output); + + + // Start main loop which codes the file + while((byte=fgetc(file_input))!=EOF) + { + + // Try to code current byte under order-4 if possible then go to lower orders + if(ppmc_code_byte_order4()==0) + if(ppmc_code_byte_order3()==0) + if(ppmc_code_byte_order2()==0) + if(ppmc_code_byte_order1()==0) + if(ppmc_code_byte_order0()==0) //else try to code under order-0 + { + // Code under order-(-1) + ppmc_get_prob_ordern1(); + range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob); + coded_in_order=0; //update all the tables (unless order-(-1)) + } + + + // Now do update exclusion + + switch(coded_in_order) + { + case 0: ppmc_update_order0(); //update only order-0 + case 1: ppmc_update_order1(); //update order-0 and order-1 + case 2: ppmc_update_order2(); //update order-2 1 and 0... + case 3: ppmc_update_order3(); + case 4: ppmc_update_order4(); + default: break; + }; + + + + // Update order variables + + o4_byte=o3_byte; + o3_byte=o2_byte; + o2_byte=o1_byte; + o1_byte=byte; //current one is next time order-1 + + + // Check if we run out of memory, in that case, flush the encoder + + if(ppmc_out_of_memory==1) + { + printf("Flushing memory! Output file might be not decodable.\n"); + ppmc_flush_mem_enc(); + } + + + } + + + // Flush range coder + range_coder_flush(&rc_coder); + + // Free memory + ppmc_free_memory(); + + + // Print bpb and kbyps + printf("%s at %f bpb.\n",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8); + + + // Close file handles + fclose(file_input); + fclose(file_output); + + + + // Nicely exit + return 0; +} + + +// Routines not used by ppmc but rather by main. +// Not including the range coder. + + +// Returns the file size of a given file. +long filesize(FILE *stream) +{ + long curpos, length; + + curpos = ftell(stream); + fseek(stream, 0L, SEEK_END); + length = ftell(stream); + fseek(stream, curpos, SEEK_SET); + return length; +} + + diff --git a/src/ppmc/range.c b/src/ppmc/range.c new file mode 100644 index 0000000..f734e62 --- /dev/null +++ b/src/ppmc/range.c @@ -0,0 +1,212 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "range.c" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + This module contains the routines of both the range coder and decoder. + + The range coder works internally in 32 bits, and uses bytes as symbols. + Also the end of message symbol is used. So z=257. + + Both input and output use rc_file as the file stream. Of course we can't + code and decode at the same time. All the input or output comes from the + same file, no matter what range coder structure are we using. The modules + here provided don't manage the io except for reading and writing, they + don't open nor close the files. The reading and writing is done via + putc and getc. +*/ + +#include "range.h" +#include + +/* + Inits the range coder state. Must be called before encoding any symbol. + It uses a magic number 0xB3 as the first byte outputted. + -rangecoder *rc, the range coder to be used. + + Shoulde be called like that: + range_coder_init(&o0_rc_state,file_output); +*/ +void range_coder_init(rangecoder *rc, FILE *stream) +{ + rc_file=stream; + rc->low=0; //define state + rc->range=0x80000000; + rc->byte_buffer=0xB3; //magic number + rc->help=0; //temp value +} + + +/* + Encodes a symbol. + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative frequency + -unsigned long lt_f, the cumulative probabilty of the symbol + -unsigned long sy_f, the probability of the symbol +*/ +void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f) +{ + unsigned long temp, r; + + range_coder_renormalize(rc); //&rc? + + r=rc->range/tot_f; + temp=r*lt_f; + if(lt_f+sy_frange=r*sy_f; + else + rc->range-=temp; + rc->low+=temp; +} + +/* + Renormalizes the state when coding. + -rangecoder *rc, the range coder to be used. +*/ + +void range_coder_renormalize(rangecoder *rc) +{ + while(rc->range<=(unsigned long)0x00800000) + { + if(rc->low<(unsigned long)0x7F800000) + { + putc(rc->byte_buffer,rc_file); + for(;rc->help;rc->help--) + putc(0xFF,rc_file); + rc->byte_buffer=(unsigned char)(rc->low>>23); + } + else + { + if(rc->low&(unsigned long)0x80000000) + { + putc(rc->byte_buffer+1,rc_file); + for(;rc->help;rc->help--) + putc(0x00,rc_file); + rc->byte_buffer=(unsigned char)(rc->low>>23); + } + else + rc->help++; + } + rc->range<<=8; + rc->low=(rc->low<<8)&(unsigned long)(0x7FFFFFFF); + } +} + + +/* + Flushes the encoder. Must be called when the coding is done. + -rangecoder *rc, the range coder to be used. + + Shoulde be called like that: + range_coder_flush(&o0_rc_state); +*/ +void range_coder_flush(rangecoder *rc) +{ + unsigned long tmp; + + range_coder_renormalize(rc); + tmp = rc->low >> 23; + if (tmp > 0xff) + { + putc(rc->byte_buffer+1,rc_file); + for(; rc->help; rc->help--) + putc(0,rc_file); + } + else + { + putc(rc->byte_buffer,rc_file); + for(; rc->help; rc->help--) + putc(0xff,rc_file); + } + + putc(tmp & 0xff,rc_file); + putc((tmp = rc->low >> (23-8)) & 0xff,rc_file); +} + + +/* + Inits the range decoder state. Also checks for the magic number, and + quits in case it isn't the first, so be careful. + -rangecoder *rc, the range coder to be used. +*/ +void range_decoder_init(rangecoder *rc, FILE *stream) +{ + unsigned int _rd_c; + + rc_file=stream; + if((_rd_c=getc(rc_file))!=0xB3) + { + printf("\nThis is not range coded data. Magic number not found. Exiting."); + exit(1); + } + rc->byte_buffer=getc(rc_file); + rc->low=rc->byte_buffer>>1; + rc->range=0x80; +} + + +/* + Decode a symbol, get its cumulative probability. + Input: + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative probability + Output: + -unsigned long, cumulative probability of the current symbol + Should be called like that: + current_cump=range_decoder_decode(&o0_rc_state,o0_tot_f); +*/ +unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f) +{ + unsigned long temp; + + range_decoder_renormalize(rc); + rc->help=rc->range/tot_f; + temp=rc->low/rc->help; + if(temp>=tot_f) + return tot_f-1; + else + return temp; +} + + +/* + Updates the state so next symbol can be decoded. + Input: + -rangecoder *rc, the range coder to be used. + -unsigned long tot_f, the maximum cumulative probability + -unsigned long lt_f, the cumulative probabilty of the symbol + -unsigned long sy_f, the probability of the symbol + +*/ +void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f) +{ + unsigned long temp; + + temp=rc->help*lt_f; + rc->low-=temp; + if(lt_f+sy_frange=rc->help*sy_f; + else + rc->range-=temp; +} + + +/* + Renormalizes the state while decoding. + -rangecoder *rc, the range coder to be used. +*/ +void range_decoder_renormalize(rangecoder *rc) +{ + while(rc->range<=0x00800000) + { + rc->low=(rc->low<<8)|((rc->byte_buffer<<7)&0xFF); + rc->byte_buffer=getc(rc_file); + rc->low |= rc->byte_buffer >> (1); + rc->range<<=8; + } +} + diff --git a/src/ppmc/range.h b/src/ppmc/range.h new file mode 100644 index 0000000..04b6fcc --- /dev/null +++ b/src/ppmc/range.h @@ -0,0 +1,39 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this file for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "range.h" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + Declarations for the coder. +*/ + +#include + +typedef struct{ + unsigned long low, range, help; + unsigned char byte_buffer; +}rangecoder; + +FILE *rc_file; + +void range_coder_init(rangecoder *rc, FILE *stream); //coding routines +void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f); +void range_coder_renormalize(rangecoder *rc); +void range_coder_flush(rangecoder *rc); +void range_decoder_init(rangecoder *rc, FILE *stream);//decoding routines +unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f); +void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f); +void range_decoder_renormalize(rangecoder *rc); + + +typedef unsigned long code_value; +#define CODE_BITS 32 +#define Top_value ((code_value)1 << (CODE_BITS-1)) +#define SHIFT_BITS (CODE_BITS - 9) +#define EXTRA_BITS ((CODE_BITS-2) % 8 + 1) +#define Bottom_value (Top_value >> 8) +#define outbyte(cod,x) putc(x,stdout) +#define inbyte(cod) getc(stdin) diff --git a/src/ppmc/unppmc.c b/src/ppmc/unppmc.c new file mode 100644 index 0000000..64281dc --- /dev/null +++ b/src/ppmc/unppmc.c @@ -0,0 +1,204 @@ +/* + Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved. + Permission is granted to make verbatim copies of this program for private + use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK. + + This file is: "unppmc.c" + Email: arturo@arturocampos.com + Web: http://www.arturocampos.com + + + Part of the ppmc decoder. + + This module is the main module and calls the different modules to do + the decoding of a file. When done prints kbyps. +*/ + + +// Bibliotecas necesarias +#include +#include +#include "range.h" //the range coder functions and data +#include "ppmcdata.h" + + +// Declaracion de funciones del ppmcmain.c +long filesize(FILE *stream); + + + + +//Main +void main (int argc, char *argv[]) +{ + unsigned long counter, //temporal counter for loops like for or while + counter2, //another temporal counter for sub loops + size_file_output, //the size of the output file + main_counter; //used in main + char expected_flush=0; //used for checking flushing which can't be done + + + // Print title, version and copyright + printf("UNPPMC using range coder.\n"); + printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n"); + printf("Permission is granted to make verbatim copies of this program for private\n"); + printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n"); + + + + // Check for correct number of parameters + if(argc!=3) + { + printf("Bad number of arguments.\n"); + exit(1); + } + + + // Try to open input and output files + if((file_input=fopen(argv[1],"r+b"))==NULL) + { + printf("Couldn't open %s.\n",argv[1]); + exit(1); + } + + if((file_output=fopen(argv[2],"w+b"))==NULL) + { + printf("Couldn't create %s.\n",argv[2]); + exit(1); + } + + + // Get output length + fread(&size_file_output,1,4,file_input); + + + // Initialize ppmc decoder + ppmc_alloc_memory(); + ppmc_initialize_contexts(); + ppmc_decoder_initialize(); + + + + // Initialize decoder + range_decoder_init(&rc_decoder,file_input); + + + // Start main loop which decodes the file + main_counter=size_file_output-4; //take in account the bytes already written + expected_flush=0; //we don't expect a flush yet + + while(main_counter!=0) + { + +// Try to decode current byte in order-4 if possible, else in lower ones +ppmc_decode_order4(); +if(byte==-1) + ppmc_decode_order3(); + if(byte==-1) + { + ppmc_decode_order2(); + if(byte==-1) + { + ppmc_decode_order1(); + if(byte==-1) + { + ppmc_decode_order0(); + if(byte==-1) //check if it was an escape code + { + // Decode in order-(-1) + ppmc_get_totf_ordern1(); + symb_cump=range_decoder_decode(&rc_decoder,total_cump); + byte=ppmc_get_symbol_ordern1(); + ppmc_get_prob_ordern1(); + range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob); + coded_in_order=0; //update all orders + + // Now see if it's the code of flushing + + if(symb_cump==256) + { + printf("Flushing.\n"); + ppmc_flush_mem_dec(); + expected_flush=0; + continue; //do not output byte nor update + } + + } + } + } + } + + // Output byte and update model + + fputc(byte,file_output); + + switch(coded_in_order) //update exclusion + { + case 0: ppmc_update_order0(); //update only order-0 + case 1: ppmc_update_order1(); //update order-0 and order-1 + case 2: ppmc_update_dec_order2(); //update order-0 1 and 2 + case 3: ppmc_update_dec_order3(); + case 4: ppmc_update_dec_order4(); + default: break; + }; + + + // Check if flushing has to be done and has not been done. + // This is optional, in case you limit the memory usage, you don't + // need to include this + + if(expected_flush==1) // If flushing didn't happen, we can't decode + { + printf("Can't decompress file. Not enough memory.\nTry in a machine with more memory.\n"); + exit(1); + } + if(ppmc_out_of_memory==1) + { + expected_flush=1; // Next code must be a flush code, otherwise we don't + // have enough memory, and therefore we can't decode + } + + + // Update order variables + + o4_byte=o3_byte; + o3_byte=o2_byte; + o2_byte=o1_byte; + o1_byte=byte; //current one, is next time order-1 + + // Byte decoded and model updated, loop + main_counter--; + + + } + + + ppmc_free_memory(); + + // Close file handles and free memory + fclose(file_input); + fclose(file_output); + + + // Nicely exit + exit(0); +} + + +// Ruotines not used by ppmc but rather by main. +// Not including the range coder. + + +// Returns the file size of a given file. +long filesize(FILE *stream) +{ + long curpos, length; + + curpos = ftell(stream); + fseek(stream, 0L, SEEK_END); + length = ftell(stream); + fseek(stream, curpos, SEEK_SET); + return length; +} + +