--- /dev/null
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">\r
+<HTML>\r
+<HEAD>\r
+ <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\r
+ <META NAME="GENERATOR" CONTENT="Mozilla/4.06 [es] (Win95; I) [Netscape]">\r
+ <META NAME="KeyWords" CONTENT="model, arithmetic, coding, modeling, symbol, alphabet, size, order, context, contexts, cump, cumulative, probabilities, frequencies, escape, codes, code, redundancy, zero, problem, arturo, campos, compression, ppmc, PPMC, ppm, modeling, hash, table, linked, list, C, source, code, cump, esc, range">\r
+ <META NAME="Description" CONTENT="Ppmc is implemented with hash tables to access probabilities.">\r
+ <META NAME="Author" CONTENT="Arturo San Emeterio Campos">\r
+ <TITLE>Implementing ppmc with hash tables by Arturo Campos</TITLE>\r
+</HEAD>\r
+<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#800080" ALINK="#3366FF">\r
+\r
+<CENTER><FONT SIZE=+2>"Implementing ppmc with hash tables"</FONT>\r
+<BR><FONT SIZE=+1>by</FONT>\r
+<BR><FONT SIZE=+2>Arturo San Emeterio Campos</FONT></CENTER>\r
+\r
+<P><BR>\r
+<BR>\r
+<BR>\r
+<BR>\r
+<BR>\r
+<BR>\r
+<BR>\r
+<P><FONT SIZE=+2>Disclaimer</FONT>\r
+<BR>Copyright (c) Arturo San Emeterio Campos 2000. All rights reserved.\r
+Permission is granted to make verbatim copies of this document for private\r
+use only.\r
+<BR> \r
+<P><FONT SIZE=+2>Download</FONT>\r
+<BR>Download the <A HREF="ac_ppmc_html.zip">whole article</A> zipped. It\r
+includes the C source code as well as compiled binaries for Dos.\r
+<BR> \r
+<P><FONT SIZE=+2>Table of contents</FONT>\r
+<LI>\r
+<A HREF="#Introduction">Introduction</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Ppmc">Ppmc, history review and data structures used</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Some terminology">Some terminology and a brief look to a model</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Cumulative probabilities">Cumulative probabilities and how to\r
+manage them</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Escape method C">Escape method C, how to compute and use it</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Linked lists">Linked lists for the counts of the bytes</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Coding and decoding with linked lists">Coding and decoding with\r
+linked lists</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Hash tables">Hash tables for contexts</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Global variables">Global variables</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Functions for every order">Functions nomenclature and use</A></LI>\r
+\r
+<LI>\r
+<A HREF="#The main">The main part</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Order-(-1)">Order-(-1)</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Order-0">Order-0 and order-1</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Order-2">Order-2</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Order-3">Order-3 and order-4</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Memory requeriments and management">Memory requirements and management</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Fast order-3 hashed">How to implement the fast order-3 hashed\r
+version</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Full exclusion">Adding full exclusion to the already done functions</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Compression and speed">Compression and speed results of the different\r
+implementations</A></LI>\r
+\r
+<LI>\r
+<A HREF="#References">References</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Closing words">Closing words</A></LI>\r
+\r
+<LI>\r
+<A HREF="#Contacting the author">Contacting the author</A></LI>\r
+\r
+<BR> \r
+<P> \r
+<BR> \r
+<BR> \r
+<BR> \r
+<BR> \r
+<BR> \r
+<BR> \r
+<BR> \r
+<P><A NAME="Introduction"></A><FONT SIZE=+2>Introduction</FONT>\r
+<BR><P ALIGN="JUSTIFY">This article is about ppmc, a lossless compression\r
+algorithm which achieves compression higher than some commercial products.\r
+This articles shows in deep how to implement ppmc using hash tables and\r
+linked lists. Different versions of it are explained, ppmc with lazy exclusions,\r
+ppmc with full exclusions (slower than the previous but achieving more compression)\r
+and a tricky version of ppmc, <A HREF="#Fast order-3 hashed">ppmc order-3\r
+hashed</A>, the only ppm implementation which is practical in the compression/speed/memory\r
+tradeoff. Ppmc can be used as an standalone compressor or with other algorithms\r
+like lzp for <A HREF="#literals-lzp">coding of the literals</A> <A HREF="#r.[14]">[14]</A>,\r
+due to his high compression level. You are encouraged to see the\r
+<A HREF="#Compression and speed">results</A>\r
+in speed and compression to see if this is the algorithm that you need.\r
+Understanding ppmc is the key to understand ppmz, the state of art in text\r
+compression <A HREF="#r.[3]">[3]</A>. The explanations follow the code\r
+that is released along with the article. This article presupposes a theory\r
+which is explained in my article about finite context modeling\r
+<A HREF="#r.[6]">[6]</A>.\r
+<P><P ALIGN="JUSTIFY">The article is distributed in the following way:\r
+Some historical <A HREF="#Ppmc">review</A> is done. After that, some <A HREF="#Some terminology">terminology</A>\r
+is explained. First I talk about\r
+<A HREF="#Cumulative probabilities">cumulative\r
+probabilities</A>, <A HREF="#Linked lists">linked lists</A>, the probability\r
+of the <A HREF="#Escape method C">escape code</A>, and <A HREF="#Hash tables">hash\r
+tables for contexts</A>. These sections describe all the algorithms and\r
+code used for managing the model and encoder which are later needed. Then\r
+we see how to put everything together. Some <A HREF="#Global variables">global\r
+variables</A> and data structures are discussed, the initialization and\r
+some other code is explained too. The next section explains more about\r
+the code and the <A HREF="#Functions for every order">functions</A> for\r
+coding, decoding and updating. Then the algorithm is commented from the\r
+<A HREF="#Order-(-1)">lowest\r
+order</A> till <A HREF="#Order-3">order-4</A>, once there it's only a matter\r
+of implementing the code explained before. Due to the fact that the data\r
+structures and functions from one context don't depend on others, the idea\r
+is to have order-(-1) working before starting implementing order-0, and\r
+so on. This makes programming very easy, and it helps finding bugs. My\r
+implementation uses bytes as symbols as its goal is to compress files.\r
+At first all the functions will be explained for lazy exclusion. Then there's\r
+a section about how much <A HREF="#Memory requeriments and management">memory</A>\r
+needs every data structure and every order, and how it's managed. After\r
+this there's a section about <A HREF="#Full exclusion">full exclusion</A>\r
+which shows how to make ppmc with full exclusions, from the code already\r
+done. Also I talk about a <A HREF="#Fast order-3 hashed">fast order-3 hashed</A>\r
+ppmc implementation, which is as fast as order-2 but achieving more compression\r
+(not as much as real order-3 of course). At the end of the article you\r
+may find the\r
+<A HREF="#Compression and speed">results</A> of different\r
+ppmc implementations, in both compression and speed. The algorithm is explained\r
+in the way I found easier to understand, the basic order-0 is explained,\r
+then the concept "context" is used order-1. We make order-2 from order-1\r
+using linked lists for probabilities, and a hash table for contexts. Based\r
+on order-2 we make both order-3 and order-4, which are a natural extension,\r
+using a hash table and linked lists for storing the contexts. Every concept\r
+is based on the previous, and thus I don't add difficulties to the already\r
+complex process of learning.\r
+<BR> \r
+<P><A NAME="Ppmc"></A><FONT SIZE=+2>Ppmc, history review and data structures\r
+used</FONT>\r
+<BR><P ALIGN="JUSTIFY">Prediction by Partial Matching is the Markovian\r
+model which achieves higher compression. Markovian models are those who\r
+have the Markov property: the probability of the current symbol only depends\r
+on the symbols which appeared before. Due to the increasing power and memory\r
+available in today's machines some ppm implementations are practical. Some\r
+others are not like the actual state of art in lossless text compression:\r
+ppmz2 by Charles Bloom. Other states of art are CTW <A HREF="#r.[13]">[13]</A>\r
+and the switching schemes of Paul Volf (though nowadays they are still\r
+hardly know). The main programming difficulty of ppm models is the data\r
+structure that holds all the contexts to be used. The first data structure\r
+used was the context trie in <A HREF="#r.[3]">[3]</A> or <A HREF="#r.[4]">[4]</A>.\r
+The context trie uses a linked list for every order. In every node there's\r
+a byte and its count, and pointer to the next element in a linked list\r
+containing the rest of the bytes under that context. For traversing the\r
+structure there's a pointer to the next higher order, obviously at the\r
+root of the tree you can find order-0. Optionally (for higher speed) there's\r
+another pointer to the next lower order (used when escaping). This data\r
+structure is used for bounded (limited) orders of a length of 5 or so.\r
+But for unbounded orders (that is, of non previously chosen length), which\r
+are very long, this is not satisfactory due to the high memory usage.\r
+<P><P ALIGN="JUSTIFY">The patricia tree and, mainly, the suffix tree are\r
+a solution to this problem. The suffix tree is used in ppmd <A HREF="#r.[5]">[5]</A>.\r
+However suffix trees are more difficult to implement than other solutions.\r
+<P><P ALIGN="JUSTIFY">In <A HREF="#r.[8]">[8]</A> hash tables were used,\r
+I've not checked this paper though. Thus I'll explain how to implement\r
+ppmc with hash tables without following it. The ideas exposed in this article\r
+are based on the implementation of ppmz by Malcolm Taylor. Hash tables are\r
+easy to learn, so they fit quite well this article. Moreover they are\r
+the appropriate structure structure for ppmz, and another of the goals of\r
+this article is to learn the base of ppmz. Just as a little overview of\r
+ppmc with hash tables, every context has its own hash table and linked\r
+lists where the bytes (or symbols) and counts are stored, then it's only\r
+a matter to read the count of a symbol, compute probabilities and code\r
+it. All the theory behind ppm was explained in "Finite context modeling"\r
+<A HREF="#r.[6]">[6]</A>\r
+so it's recommended to read it before this one. Along with this article,\r
+I've released my implementation of ppmc in C. I'll not try to focus much\r
+on it, because this would cause problems to the reader interested in implementing\r
+it in other language, anyway I'll follow it.\r
+<P><P ALIGN="JUSTIFY">Ppm was invented by John Cleary and Ian Witten back\r
+in 1984. The original paper <A HREF="#r.[9]">[9]</A>, talked about ppma\r
+and ppmb. The suffix letter ('a' or 'b') denotes the escape code used in\r
+the algorithm, however the rest of the algorithm is roughly the same. In\r
+the book "Text compression"\r
+<A HREF="#r.[2]">[2]</A> all of them were explained.\r
+Ppmc was presented by Alistair Moffat in <A HREF="#r.[7]">[7]</A>. In the\r
+paper by Moffat also ppmc' was presented, which is ppmc but using lazy\r
+exclusion. Other variants such as ppmd, ppm* or ppmz fall outside\r
+the scope of this article, however I'll probably talk about them too in\r
+a near future. The difference between ppmc and ppma or ppmb are only the\r
+probability computation for the escape code.\r
+<P><P ALIGN="JUSTIFY">As we saw in <A HREF="#r.[6]">[6]</A> ppm variants\r
+are finite context models. Practical implementations don't use blending.\r
+They use escape codes instead. They fallback to lower orders when needed.\r
+That is, they start at the highest order initialized, if the byte is present\r
+they code it under the current context, otherwise they emit an escape code,\r
+fallback to (use) a lower context and try to code the byte under that context.\r
+The last context (order-(-1)) contains all the bytes, so we'll surely find\r
+it there if it wasn't present in higher orders. We code the probability\r
+with arithmetic coding, or any variant (like in the case of the source,\r
+the range coder\r
+<A HREF="#r.[11]">[11]</A>). Once the byte is coded, it's\r
+only a matter of updating the probability of the symbol which we have just\r
+coded. As we are going to see later, ppmc uses update exclusions, which\r
+does not only make it faster but also achieves slightly better compression.\r
+How to implement update exclusions is explained in the section <A HREF="#Global variables">Global\r
+variables</A>.\r
+<P><P ALIGN="JUSTIFY">If we are using escape codes, when we are in a given\r
+context we exclude lower orders from probability computing, because we\r
+just take in account the probabilities of current one, that's ppmc\r
+using lazy exclusions. If we exclude symbols which have appeared in higher\r
+orders from the probability computation that's called full exclusion.\r
+<P><P ALIGN="JUSTIFY">Once we have the probabilities, we code them optimally\r
+(respect to the entropy) with arithmetic coding, which is explained in\r
+<A HREF="#r.[10]">[10]</A> and <A HREF="#r.[2]">[2]</A>. The range coder,\r
+a variant of it is explained in <A HREF="#r.[11]">[11]</A> and the original\r
+source code can be found in <A HREF="#r.[12]">[12]</A>.\r
+<BR> \r
+<P><A NAME="Some terminology"></A><FONT SIZE=+2>Some terminology and a\r
+brief look to a model</FONT>\r
+<BR><P ALIGN="JUSTIFY">This is only a brief look to some terms which may\r
+cause problems, for more theory you are encouraged to read <A HREF="#r.[6]">[6]</A>\r
+Context are the symbols or symbol which come before or after current one\r
+(in ppmc only before). Order-n refers to n bytes of context. In a given\r
+message a symbol can appear k times, this is its frequency. From there\r
+the model computes its probability, which in ppmc comes in the form of\r
+probability/total_probability, like 1/4. Ppmc rescales the frequencies,\r
+so they are called counts instead. If we have the message "bacabac" the\r
+ppmc model would look like the following if we don't take the escape code\r
+in account (bytes in order of appearance).\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Order-3:\r
+<BR>Context: "bac" list: 'a'-1\r
+<BR>Context: "aca" list: 'b'-1\r
+<BR>Context: "cab" list: 'a'-1\r
+<BR>Context: "aba" list: 'c'-1</TD>\r
+\r
+<TD>Order-2:\r
+<BR>Context: "ba" list: 'c'-2\r
+<BR>Context: "ac" list: 'a'-1\r
+<BR>Context: "ca" list: 'b'-1\r
+<BR>Context: "ab" list: 'a'-1</TD>\r
+</TR>\r
+\r
+<TR VALIGN=TOP>\r
+<TD>Order-1:\r
+<BR>Context: "b" list: 'a'-2\r
+<BR>Context: "a" list: 'c'-2 'b'-1\r
+<BR>Context: "c" list: 'a'-1</TD>\r
+\r
+<TD>Order-0:\r
+<BR>Context: "" list: 'b'-2 'a'-3 'c'-2</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P>Under order-0 the probabilities are: 'a' 3/7, 'b' 2/7 and 'c' 2/7. For\r
+more examples, see <A HREF="#r.[6]">[6]</A>.\r
+<BR> \r
+<P><A NAME="Cumulative probabilities"></A><FONT SIZE=+2>Cumulative probabilities\r
+and how to manage them</FONT>\r
+<BR><P ALIGN="JUSTIFY">Arithmetic coding optimally encodes a symbol given\r
+its probability (optimal compared with the entropy, which has been proven\r
+to be the minimum, read <A HREF="#r.[1]">[1]</A>). In arithmetic coding\r
+we distribute the symbols in a probability line (between 0 and 1), then\r
+to encode a symbol we need to know its cumulative probability and its probability.\r
+The cumulative probability of the first symbol in the probability line\r
+is 0. For any other symbol its cumulative probability is the sum of all\r
+the probabilities of the symbols before it. We do so to assign ranges of\r
+this probability line to every symbol, and that thus we can latter know\r
+which is the encoder symbol, knowing only its cumulative probability.\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>a</TD>\r
+\r
+<TD>b</TD>\r
+\r
+<TD>c</TD>\r
+\r
+<TD>d</TD>\r
+\r
+<TD>e</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Probability:</TD>\r
+\r
+<TD>2/8</TD>\r
+\r
+<TD>3/8</TD>\r
+\r
+<TD>0/8</TD>\r
+\r
+<TD>1/8</TD>\r
+\r
+<TD>2/8</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0 (0.0)</TD>\r
+\r
+<TD>2 (0.25)</TD>\r
+\r
+<TD>5 (0.625)</TD>\r
+\r
+<TD>5 (0.625)</TD>\r
+\r
+<TD>6 ( 0,75)</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Range:</TD>\r
+\r
+<TD>[0 , 0.25) </TD>\r
+\r
+<TD>[0.25 , 0.625) </TD>\r
+\r
+<TD>[0.625 , 0.625) </TD>\r
+\r
+<TD>[0.625 , 0.75) </TD>\r
+\r
+<TD>[0.75 , 1) </TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">In the example above we have in first file the count\r
+of the byte, in some cases that could be the frequency. But as we'll see\r
+later ppmc uses renormalization, and thus it's the count instead. The second\r
+has the probability of the byte, note that 8 is the total count (the sum\r
+of all the counts). Then the cumulative probability is computed for every\r
+symbol as explained above. The last file has the theorical range assigned\r
+for this symbol. The symbol 'c' has an invalid range, however it's supposed\r
+that we'll never try to code it (in a real implementation of ppmc instead\r
+we would have coded an escape code). When implementing ppmc you don't need\r
+to compute the range for the symbols. Arithmetic coding only needs the\r
+count of the symbol, its cumulative probability and the total count (the\r
+sum of all the counts).\r
+<P><P ALIGN="JUSTIFY">Though in my implementation and some other you can\r
+find other terms like probability referring to the count, the terms used\r
+above are the correct ones. Probability is used to refer to count just\r
+because count/total_count is the probability. The total count is also called\r
+maximum cumulative probability. Expect different names and misuses not\r
+only in my article but also in others, but don't forget the correct names.\r
+<P><P ALIGN="JUSTIFY">In the implementation we'll keep tables (or linked\r
+lists for orders higher than 1) with the count of every byte. Based on\r
+this we have to compute the cumulative probabilities and maximum cumulative\r
+probability. We don't need all the cumulative probabilities, only the cumulative\r
+probability for the symbol that we want to code (the symbol that we are\r
+modeling now). From the explanation of how to compute the cumulative probabilities\r
+we can derive the following pseudo code:\r
+<UL>\r
+<LI>\r
+Symbol_cumulative_probability = 0 //The cumulative probability\r
+of the first symbol is 0</LI>\r
+\r
+<LI>\r
+For ( counter = 0 ; counter != byte ; ++counter ) //Till\r
+we hit the entry of the byte to code</LI>\r
+\r
+<UL>\r
+<LI>\r
+Symbol_cumulative_probability += counts_table[ counter ] //Add\r
+the count of the previous bytes</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">This is the code used for order-0 and order-1, where\r
+a byte's probability can be found in counts_table[byte]. The complexity\r
+of this code is O(K) where K is the position of the byte in the table.\r
+It's worth mentioning that this is the greatest overhead of simple a arithmetic\r
+coding implementation, and also an important part (in time execution) of\r
+ppmc. Once the code above is executed and assuming that we have the maximum\r
+cumulative probability (as it will be seen later, that's stored in every\r
+context) we can code the symbol. In this example counts_table[] stores\r
+the counts of the symbols under the current order (which is assumed for\r
+simplicity to be 0).\r
+<UL>\r
+<LI>\r
+symbol_probability = counts_table[ byte ] //Get its count\r
+(here called probability)</LI>\r
+\r
+<LI>\r
+Arithmetic_encode ( symbol_probability , symbol_cumulative_probability\r
+, maximum_cumulative_probability )</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">With this information the arithmetic coder can compute\r
+its range:\r
+<BR>[ symbol_cumulative_probability / maximum_cumulative_probability ,\r
+<BR>( symbol_cumulative_probability + symbol_probability ) / maximum_cumulative_probability\r
+)\r
+<BR>And encode the symbol. This is not the point of this article, however\r
+note that arithmetic coding compresses best symbols with higher count because\r
+it can use all the range to represent this number, for example in the table\r
+above 'b' ( [0.25 , 0.625) ) can be represented with 0.25, 0.3, 0.4, 0.5,\r
+0.6, 0.6249999, etc.\r
+<P><P ALIGN="JUSTIFY">When decoding you need the maximum cumulative probability\r
+(which we have assumed to be already know), and the arithmetic decoder\r
+will return a cumulative probability. This cumulative probability distinguishes\r
+the symbol coded. We have to search it in our table, and then update the\r
+arithmetic coding state. As stated before we only store the counts, so\r
+we have to compute the cumulative probability on the fly to search our\r
+symbol. Let's say that we have the same table as in the previous example:\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>a </TD>\r
+\r
+<TD>b </TD>\r
+\r
+<TD>c </TD>\r
+\r
+<TD>d </TD>\r
+\r
+<TD>e </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>2 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>6 </TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">We need to read till the next byte of the coded one.\r
+For example with a cumulative probability of 4 we know that the encoded\r
+byte was 'b' (because 4 lands in its range) however to know so we have\r
+to read till the 'c', whose cumulative probability is 5, so we can be sure\r
+that the decoded byte is the right one, the previous. We'll compare the\r
+decoded cumulative probability with the one computed on the fly, when it's\r
+higher we can be sure that the previous byte was the encoded one. But what\r
+about if both have the same value? in this case we can't be sure. Let's\r
+say we have encoded a 'd', the cumulative probability is 5, but when decoding\r
+if we stop comparing when they are the same, we would decode a 'c' which\r
+is wrong. If we however stop only when the computed one is greater, we\r
+would stop in the 'e' and thus correctly decode a 'd' (the previous symbol).\r
+We have to do so because we have to care about both the lower and higher\r
+bound. Having all the previous in mind we do the following decoding routine:\r
+<UL>\r
+<LI>\r
+Decoded_cumulative_probability = arithmetic_decode ( maximum_cumulative_probability\r
+) //Decode it</LI>\r
+\r
+<LI>\r
+On_the_fly_cumulative_probability = 0 //First symbol\r
+has a cumulative probability of 0</LI>\r
+\r
+<LI>\r
+For ( counter = 0 ; counter != 256 ; ++counter ) //Search\r
+symbol based on cumulative probability</LI>\r
+\r
+<UL>\r
+<LI>\r
+If ( Decoded_cumulative_probability < on_the_fly_cumulative_probability\r
+) //Check upper bound</LI>\r
+\r
+<UL>\r
+<LI>\r
+break; //We are sure that the encoded byte is the previous</LI>\r
+</UL>\r
+\r
+<LI>\r
+On_the_fly_cumulative_probability += counts_table[ counter ] //Make\r
+cumulative probability</LI>\r
+</UL>\r
+\r
+<LI>\r
+Decoded_byte = counter-1 //Once the loop is breaked we\r
+know that the decoded byte is the previous one</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">In this code we compute the cumulative probability (on_the_fly_cumulative_probability)\r
+only for the current symbol in each step, because we don't need the cumulative\r
+probability of symbols which are in positions after the byte to decode.\r
+Note that this code is the same as for any simple order-0 arithmetic coding\r
+implementation. And that they use a table where the entry of a byte is\r
+the value of the byte itself (the count of 'byte' is in count_table[byte]).\r
+Due to this at the end of this routine we decide that the decoded byte\r
+is counter-1, because counter is the current one and counter-1 is the previous,\r
+note that the symbols are in order. In linked lists this will be different,\r
+but we'll see this later. We already know the decoded byte thus we know\r
+its probability and cumulative probability, so we can update the state\r
+of the arithmetic coder.\r
+<P><P ALIGN="JUSTIFY">To save some space in higher order tables we may\r
+want to reduce the size of the count. For lower orders we can use double words\r
+(32 bits) but for higher orders is better to use only a byte (8 bits).\r
+Every time the count is incremented we have to check that it isn't the\r
+maximum, if it is we have to halve all the counts. We divide all of them\r
+by a factor of 2. This is called renormalization or rescaling. In the case\r
+of linked lists we don't want that the count of a byte is 0, so when this\r
+happens, we change its value to 1. Renormalization does not only save space\r
+but also increases compression a little bit. That is because the probabilities\r
+of text change within a file, thus giving more importance to newer frequencies\r
+improve compression, because it reflects the changes in the source (the\r
+probabilities of the file).\r
+<BR> \r
+<P><A NAME="Escape method C"></A><FONT SIZE=+2>Escape method C, how to\r
+compute and use it</FONT>\r
+<BR><P ALIGN="JUSTIFY">The escape method C was already discussed at <A HREF="#r.[6]">[6]</A>\r
+and can be also found in <A HREF="#r.[2]">[2]</A>. I'll recall the example\r
+table, where a symbol ('c') had a count of 0. In ppmc we'll suppose that\r
+all the orders are initially empty (unless the order-(-1)), when we want\r
+to code in any order that is totally empty we just don't code anything.\r
+But what happens if there's only one symbol or just a few? how can we transmit\r
+to the decoder that the byte that we want to finally decode is not present\r
+in this table? For doing this we use the escape code. Both compressor and\r
+decompressor assign the same probability to the escape code. There are\r
+different methods for estimating the escape probability. Ppmc uses the\r
+escape method called 'C' (and that's why it's called ppmc) Method C assigns\r
+a count to the escape code equal to the number of different symbols that\r
+have appeared under this context. For practical issues we assume that\r
+the escape code is the last one. We also have to take care of its probability\r
+when coding. Let's see the table that we are using for examples, how it\r
+is without using escape code and with it:\r
+<CENTER><TABLE BORDER=0 CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>a</TD>\r
+\r
+<TD>b </TD>\r
+\r
+<TD>c </TD>\r
+\r
+<TD>d </TD>\r
+\r
+<TD>e </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>2 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>6 </TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<CENTER>Without escape code (total_cump=8)</CENTER>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>a </TD>\r
+\r
+<TD>b </TD>\r
+\r
+<TD>c</TD>\r
+\r
+<TD>d </TD>\r
+\r
+<TD>e</TD>\r
+\r
+<TD>Esc</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>4</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>2 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>6 </TD>\r
+\r
+<TD>8</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<CENTER>With escape code (total_cump=12)</CENTER>\r
+</TD>\r
+</TR>\r
+</TABLE></CENTER>\r
+<P ALIGN="JUSTIFY">As I have said before, in every context we'll store\r
+the maximum cumulative probability. Also we'll store the number of different\r
+bytes (which in my implementation is called defined_bytes). Due to this\r
+I call the maximum cumulative probability "max_cump", but because\r
+we also have to take in account the code space of the escape code,\r
+I have another variable called "total_cump", which is the maximum cumulative\r
+probability plus the count of the escape code (which for method C is the\r
+number of different bytes), this is the value which must be using when\r
+calling the decoding functions.\r
+<P><P ALIGN="JUSTIFY">But how does the escape code affect to coding and\r
+decoding? In encoding before trying to make the cumulative probability\r
+of the current byte and coding it, we have to check that it exists or that\r
+it has a non zero probability. If everything is ok then we can code it,\r
+otherwise the context is present but the byte is not, so we have to code\r
+an escape code. We know the probability (which comes from the count) of\r
+the escape code, it's simply the number of defined bytes in that context.\r
+But what about its cumulative probability? Due to how the cumulative probabilities\r
+are computed and also due to the fact that we assume the escape code to\r
+be the last symbol, the cumulative probability of the escape code is the\r
+maximum cumulative probability.\r
+<BR>"max_cump" is the total count before adding it the escape code's count.\r
+If you have a look at the table above, you'll see that max_cump is 8, and\r
+that the escape code's cumulative probability is also 8. Taking all those\r
+facts in account we use the following pseudo code:\r
+<UL>\r
+<LI>\r
+If ( counts_table[ counter ] != 0 ) //Check if it has\r
+a 0 count, in that case we can't code it</LI>\r
+\r
+<UL>\r
+<LI>\r
+Code byte as usual</LI>\r
+</UL>\r
+\r
+<LI>\r
+Else //Otherwise we have to code an escape code</LI>\r
+\r
+<UL>\r
+<LI>\r
+symbol_probability = defined_bytes //The count of the\r
+escape code is the number of different bytes</LI>\r
+\r
+<LI>\r
+symbol_cumulative_probability = maximum_cumulative_probability</LI>\r
+\r
+<LI>\r
+total_cumulative_probability = maximum_cumulative_probability + symbol_probability\r
+//The total count</LI>\r
+\r
+<LI>\r
+Arithmetic_encode ( symbol_probability , symbol_cumulative_probability\r
+, total_cumulative_probability )</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">In the case of decoding we have to check if the coded\r
+symbol is an escape code, we do so based on its cumulative probability\r
+(which is stored) otherwise decode as usual:\r
+<UL>\r
+<LI>\r
+Decoded_cumulative_probability = arithmetic_decode ( total_cumulative_probability\r
+) //Decode it</LI>\r
+\r
+<LI>\r
+If ( Decoded_cumulative_probability > maximum_cumulative_probability )</LI>\r
+\r
+<UL>\r
+<LI>\r
+Fallback to lower order //Escape code, byte not present\r
+in current context</LI>\r
+</UL>\r
+\r
+<LI>\r
+Else</LI>\r
+\r
+<UL>\r
+<LI>\r
+Decode byte as usual</LI>\r
+</UL>\r
+</UL>\r
+\r
+<P><BR><A NAME="Linked lists"></A><FONT SIZE=+2>Linked lists for the counts\r
+of the bytes</FONT>\r
+<BR><P ALIGN="JUSTIFY">All the algorithms used above work only if we have\r
+all the symbols in order and in a table where the entry of 'byte' is in\r
+count_table[byte]. This is a good idea for order-0 and order-1 where the\r
+tables may take up to 1,024 and 262,144 bytes respectively (in case we\r
+use double words (32 bits) for the count). But this is not acceptable for\r
+order-2 which would need 67,108,864 bytes to hold all the tables (256*256*256*4).\r
+For higher orders the amount of memory is not available nowadays. Moreover\r
+due to the fact that there are less high context than lower ones, most\r
+of the entries are not used. The solution to both problems is using a linked\r
+list for storing the counts. A linked list is made of different elements,\r
+every element has its own value and a pointer to the next element. We store\r
+on it the count, and also the byte. In the case of tables (which in fact\r
+were hash tables with direct addressing) we knew the value of a byte by\r
+its entry, now we don't, so we have to store the value of the byte too.\r
+The last element in the linked list, has a pointer with a value of 0, meaning\r
+that it's the last one. Using linked list we can dynamically allocate memory\r
+only for the entries that we are going to use. Let's have a look at our\r
+table of example, which now is a linked lists:\r
+<BR> \r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD>\r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'a'</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>to 'b'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'b'</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>to 'd'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'d'</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>to 'e'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count</TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'e'</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>0</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<CENTER>Linked list form</CENTER>\r
+</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD></TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD><P ALIGN="JUSTIFY">As you can see in linked list form, there's no entry\r
+for 'c' because it has not appeared, in our example the linked list is\r
+sorted, but in the implementation you'll find them in order of appearance,\r
+the first that appeared will be the first in the linked list.\r
+<P><P ALIGN="JUSTIFY">In my implementation I kept a <A HREF="#Memory requeriments and management">buffer</A>\r
+where new nodes were allocated. Once this buffer was full, another one\r
+was allocated. The pointers to all the allocated buffers were stored in\r
+one array. When compression or decompression is done, all the allocated\r
+buffers are freed.</TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>a </TD>\r
+\r
+<TD>b </TD>\r
+\r
+<TD>c </TD>\r
+\r
+<TD>d </TD>\r
+\r
+<TD>e </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<CENTER>Table form</CENTER>\r
+</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P>The data structure of a linked list in C is the following:\r
+<BR>struct _byte_and_freq\r
+<BR>{\r
+<BR>unsigned char byte; //The value of the byte\r
+<BR>unsigned char freq; //The count of the byte\r
+<BR>struct _byte_and_freq *next; //Pointer to the next\r
+node in the linked list\r
+<BR>}\r
+<P><P ALIGN="JUSTIFY">In a linked list new nodes are stored at the end.\r
+For example if we had to put 'c' in the list, we should read the linked\r
+list till the end, once at the end, allocate memory for this new buffer.\r
+Then put in the node of 'e' a pointer to this new location. The new element\r
+should have the following values: byte 'c', count 1, pointer 0. Its pointer\r
+is 0 because now it is the last element in the linked list. In our implementation\r
+we don't need to delete nodes of the linked list, so we make it even easier.\r
+<BR> \r
+<P><A NAME="Coding and decoding with linked lists"></A><FONT SIZE=+2>Coding\r
+and decoding with linked lists</FONT>\r
+<BR><P ALIGN="JUSTIFY">For coding we need to know if a given byte is present\r
+in the linked list, if it is we can code it. To check if it's present,\r
+we have to read the whole linked list till we find it or reach the end.\r
+When implementing this we'll already have a pointer to the first element,\r
+so we we'll assume so. Checking all the elements is only a matter of reading\r
+current one, if it's not the one we are looking for, get the pointer to\r
+the next, and so on, till we hit the end of the linked list. The pseudo\r
+code is like the following:\r
+<UL>\r
+<LI>\r
+Pointer_ll = first_element_in_ll //Initialize pointer\r
+to the first node in the linked list</LI>\r
+\r
+<LI>\r
+While ( 1 ) //Loop break would be done in the body of\r
+the loop</LI>\r
+\r
+<UL>\r
+<LI>\r
+If ( pointer_ll->byte = byte ) //If the byte to code\r
+and byte are equals</LI>\r
+\r
+<UL>\r
+<LI>\r
+Goto _byte_found //Execute the code which we have to</LI>\r
+</UL>\r
+\r
+<LI>\r
+If ( pointer_ll->next = 0 ) //If this is the last element\r
+in the linked list</LI>\r
+\r
+<UL>\r
+<LI>\r
+Goto _byte_not_found //Execute the proper code</LI>\r
+</UL>\r
+</UL>\r
+</UL>\r
+<A NAME="Coding and decoding with linked lists.coding"></A><P ALIGN="JUSTIFY">For\r
+the readers who don't know the C language, pointer_ll->byte, means accessing\r
+the value called 'byte' in the data structure addressed by the pointer\r
+'pointer_ll'. We also need to know the cumulative probability of the byte,\r
+to do so we need the code explained before. We have to read all the elements\r
+till the byte. Both the code needed for searching a symbol and checking\r
+if it's present and the code for making the cumulative probability do the\r
+same. So why not merging them? In the same loop we have to check if the\r
+byte is present or not and compute its cumulative probability, for the\r
+case that it is present. Later we'll have to update the count of this byte,\r
+and because we want to avoid having to read the linked list again, we store\r
+the pointer in a global variable. But if the byte is not present when we\r
+update we'll have to add a new pointer to the end of the linked list, for\r
+that case we store a pointer to the last element.\r
+<UL>\r
+<LI>\r
+Symbol_cumulative_probability = 0 //The cumulative probability\r
+of the first symbol is 0</LI>\r
+\r
+<LI>\r
+Pointer_ll = first_element_in_ll //Initialize pointer\r
+to the first node in the linked list</LI>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+While ( 1 ) //Loop break would be done in the body of\r
+the loop</LI>\r
+\r
+<UL>\r
+<LI>\r
+If ( pointer_ll->byte = byte ) //If the byte to code\r
+and byte are equals</LI>\r
+\r
+<UL>\r
+<LI>\r
+Goto _byte_found //Execute the code which we have to</LI>\r
+</UL>\r
+\r
+<LI>\r
+Symbol_cumulative_probability += pointer_ll->byte //Add\r
+to the cump the count of the byte</LI>\r
+\r
+<LI>\r
+If ( pointer_ll->next = 0 ) //If this is the last element\r
+in the linked list</LI>\r
+\r
+<UL>\r
+<LI>\r
+Goto _byte_not_found //Execute the proper code</LI>\r
+</UL>\r
+</UL>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+_byte_not_found: //Because it was not found, code an\r
+escape code</LI>\r
+\r
+<LI>\r
+symbol_probability = defined_bytes //The count of the\r
+escape code is the number of different bytes</LI>\r
+\r
+<LI>\r
+symbol_cumulative_probability = maximum_cumulative_probability</LI>\r
+\r
+<LI>\r
+total_cumulative_probability = maximum_cumulative_probability + symbol_probability\r
+//The total count</LI>\r
+\r
+<LI>\r
+Arithmetic_encode ( symbol_probability , symbol_cumulative_probability\r
+, total_cumulative_probability )</LI>\r
+\r
+<LI>\r
+o2_ll_node = Pointer_ll Save pointer to last element\r
+in linked list for updating.</LI>\r
+\r
+<LI>\r
+Return</LI>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+_byte_found: //The byte was found in the linked list,\r
+thus we can code it</LI>\r
+\r
+<LI>\r
+symbol_probability = pointer_ll->freq //Get its count,\r
+cump already computed</LI>\r
+\r
+<LI>\r
+Arithmetic_encode ( symbol_probability , symbol_cumulative_probability\r
+, total_cumulative_probability )</LI>\r
+\r
+<LI>\r
+o2_ll_node = Pointer_ll //Save pointer to current element,\r
+when updating its count will be be incremented</LI>\r
+\r
+<LI>\r
+Return</LI>\r
+</UL>\r
+\r
+<P><BR><A NAME="Coding and decoding with linked lists.update"></A><P ALIGN="JUSTIFY">Updating\r
+the count of a byte with tables was done just by accessing its entry. For\r
+linked list we don't know its position beforehand, so we have to search\r
+it in the linked list, when found, update its count. But in fact we know\r
+its position: when we code the symbol we need to read the linked list,\r
+so if we keep a pointer to it, we'll not need to read the whole linked\r
+list again. As I've explained before, the linked list nodes are allocated\r
+in a big buffer, to avoid having to call the functions of memory allocation\r
+for every new node. When updating we have to check whatever the encoder\r
+emitted an escape code or a byte. One could think that we could check if\r
+the pointer of the node (o2_ll_node) is 0, in that case this is the last\r
+element and thus the end of the linked list was reached. But this is false.\r
+Imagine that we found the byte but it was in the last position, in that\r
+case the pointer is also 0. The solution is to check if the byte is the\r
+same, that unambiguously distinguishes between the two cases. In all the\r
+examples before we assumed that max_cump and defined_bytes were stored\r
+in the context, now we have to update them aswell. As long as we have coded\r
+the byte (or an escape) with the code above, the following pseudo code\r
+would update the model.\r
+<UL>\r
+<LI>\r
+If ( o2_ll_node->byte = byte ) //Check if byte\r
+was present</LI>\r
+\r
+<UL>\r
+<LI>\r
+o2_ll_node->freq ++ //Increment its probability. It was\r
+present</LI>\r
+\r
+<LI>\r
+Context->max_cump ++ //Increment maximum cumulative probability\r
+stored in this context</LI>\r
+\r
+<LI>\r
+If ( o2_ll_node->freq = 255 ) <FONT COLOR="#33CCFF"> </FONT>//Check\r
+if its the maximum that a byte can hold</LI>\r
+\r
+<UL>\r
+<LI>\r
+Renormalize ( ) <FONT COLOR="#33CCFF"> </FONT>//In that\r
+case renormalize current order. (in the example it's supposed to be order-2)</LI>\r
+</UL>\r
+\r
+<LI>\r
+Return</LI>\r
+</UL>\r
+\r
+<LI>\r
+Else //Byte was not present, we have to create a new\r
+node in the end of the linked list</LI>\r
+\r
+<UL>\r
+<LI>\r
+o2_ll_node->pointer = allocate_memory_in_buffer //In my implementation\r
+I use big buffers</LI>\r
+\r
+<LI>\r
+update_pointers_to_buffer ( ) //Update memory management\r
+variables</LI>\r
+\r
+<LI>\r
+new_location->byte = byte //'byte' is the new byte to\r
+put</LI>\r
+\r
+<LI>\r
+new_location->freq = 1 //As it is new, its count is 1</LI>\r
+\r
+<LI>\r
+new_location->pointer = 0 //Now this is the last element\r
+in the linked list</LI>\r
+\r
+<LI>\r
+Context->max_cump ++ //Increment maximum cumulative probability\r
+stored in this context</LI>\r
+\r
+<LI>\r
+Context->defined_bytes ++ //A new byte in this context</LI>\r
+\r
+<LI>\r
+Return</LI>\r
+</UL>\r
+</UL>\r
+<A NAME="Coding and decoding with linked lists.decoding"></A>Let's say\r
+we have again our example, in linked list form, and not in order (as in\r
+a real implementation it would).\r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'e'</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>to 'b'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'b'</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>to 'a'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'a'</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>to 'd'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Count </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'d'</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>0</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+</TR>\r
+</TABLE>\r
+Such a linked list would produce a table like the following:\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>e </TD>\r
+\r
+<TD>b </TD>\r
+\r
+<TD>a</TD>\r
+\r
+<TD>d </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count:</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>1</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>2 </TD>\r
+\r
+<TD>5 </TD>\r
+\r
+<TD>7 </TD>\r
+</TR>\r
+</TABLE>\r
+<P ALIGN="JUSTIFY">However as I've explained before we don't need to actually\r
+build this table, only as much as needed. Linked lists are a little bit\r
+more difficult to manage than tables for decoding, mainly in the search\r
+(when we have to get the previous symbol). Also another problem is in updating: \r
+we can't ensure that the pointer (in case of an escape code) points to\r
+the last element, because the decoding routine doesn't need to read the\r
+linked list to identify an escape code. The latter problem has no other\r
+solution that reading the whole linked list when updating and an escape\r
+code was emitted. However the second has two solutions. Remember that we\r
+have to read till the next byte of the one to decode? at first glance one\r
+could think that like we did with the tables we have to read till the next\r
+symbol. But then how do we get the previous? in that case we should have\r
+a variable which always has the last byte in the linked list. Fortunately\r
+there's a better solution. Why do why need to read the next element? because\r
+we need to know the upper bound of the cumulative probability. For tables\r
+the fastest (and thus it was used) way is reading the next element's cumulative\r
+probability. However for linked lists its better to sum to the current\r
+cumulative probability the count of the byte. That way we also have the\r
+upper bound. Then we can check if we have to stop. And the node pointer\r
+will point to the byte that we have to decode.\r
+<UL>\r
+<LI>\r
+Decoded_cumulative_probability = arithmetic_decode ( maximum_cumulative_probability\r
+) //Decode it</LI>\r
+\r
+<LI>\r
+Pointer_ll = first_element_in_ll //Initialize pointer\r
+to the first node in the linked list</LI>\r
+\r
+<LI>\r
+On_the_fly_cumulative_probability = 0 //First symbol\r
+has a cumulative probability of 0</LI>\r
+\r
+<LI>\r
+While ( 1 ) //Break condition in the body of the loop</LI>\r
+\r
+<UL>\r
+<LI>\r
+on_the_fly_cumulative_probability += Pointer_ll->freq //Cump,\r
+currently upper bound of current byte's range</LI>\r
+\r
+<LI>\r
+If ( Decoded_cumulative_probability < on_the_fly_cumulative_probability\r
+) //Check upper bound</LI>\r
+\r
+<UL>\r
+<LI>\r
+break; //We are sure that the encoded byte is the current\r
+one</LI>\r
+</UL>\r
+</UL>\r
+\r
+<LI>\r
+Decoded_byte = Pointer_ll->byte //Get current byte</LI>\r
+\r
+<LI>\r
+Symbol_prob = Pointer_ll->freq //The count, probability\r
+for updating the state</LI>\r
+\r
+<LI>\r
+Symbol_cumulative_probability = on_the_fly_cumulative_probability - Symbol_prob\r
+//Lower bound of range</LI>\r
+\r
+<LI>\r
+Arithmetic_decoding_update_state ( )</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">To avoid problems when using linked lists for cumulative\r
+probabilities, we don't allow any count to be 0. When renormalizing we\r
+check if it's 0, in this case we put the value of 1.\r
+<BR> \r
+<P><A NAME="Hash tables"></A><FONT SIZE=+2>Hash tables for contexts</FONT>\r
+<BR><P ALIGN="JUSTIFY">Hash tables are arrays used to store and find items.\r
+A hash table which allows direct addressing has one entry for every item.\r
+This is obviously very memory consuming if the number of items is big.\r
+In the cases where direct addressing is not possible, we use chaining for\r
+resolving collisions (collisions are two or more contexts sharing the same\r
+hash entry). The item of the hash table instead is a pointer to a linked\r
+list. This linked list contains the different contexts. The value of the\r
+context itself is stored for checking. Also we store a pointer to the linked\r
+list with the bytes and counts. One hash table is used for every order.\r
+Hash tables and linked list for contexts are only used for orders above\r
+2.\r
+<P><P ALIGN="JUSTIFY">Let's say we have a hash table for order-1 which\r
+has only two entries. And we have a hash function which maps the items\r
+in the following way: 'a','b','c' entry 1. 'd', 'e', 'f' entry 2. After\r
+having stored the contexts 'e','b','d','a','c' it would look like the following:\r
+<P>Hash table:\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Hash entry </TD>\r
+\r
+<TD>Value </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>1</TD>\r
+\r
+<TD>Pointer to linked list, element 'b' </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>2</TD>\r
+\r
+<TD>Pointer to linked list, element 'e' </TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P>Big memory buffer for linked lists:\r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'e'</TD>\r
+\r
+<TD>to 'd'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'b'</TD>\r
+\r
+<TD>to 'a'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'d'</TD>\r
+\r
+<TD>0</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'a'</TD>\r
+\r
+<TD>to 'c'</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+\r
+<TD> </TD>\r
+\r
+<TD>\r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Byte </TD>\r
+\r
+<TD>Pointer </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>'c'</TD>\r
+\r
+<TD>0</TD>\r
+</TR>\r
+</TABLE>\r
+</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><A NAME="Hash tables.context structure"></A>The structure of a context\r
+node in a linked list is:\r
+<BR>struct context\r
+<BR>{\r
+<BR>struct context *next; //Next context in the linked\r
+list (corresponding to this hash entry)\r
+<BR>unsigned long order4321; //Order-4-3-2-1 (or order-3-2-1\r
+for order-3), that's the value of the context\r
+<BR>struct _byte_and_freq *prob; //Pointer to linked\r
+lists containing bytes and counts of this context\r
+<BR>unsigned int max_cump; //Maximum cumulative probability\r
+in this context\r
+<BR>unsigned int defined_bytes; //The number of bytes\r
+in this context\r
+<BR>};\r
+<P><P ALIGN="JUSTIFY">Note that we could reduce the space needed by using\r
+a byte (unsigned char) instead of a double word (unsigned int or unsigned\r
+long, 32 bits) for the\r
+number of bytes in the context, however then we should add 1 every time\r
+we read this value (to make 0 represent 1 instead, and so on). I decided\r
+to not make the code more difficult to read. Thus this is not implemented\r
+in my ppmc implementation. See the section of<A HREF="#Fast order-3 hashed.hash explanation">\r
+fast order-3 hashed</A> for another example.\r
+<P><P ALIGN="JUSTIFY">The code for managing the linked lists with contexts\r
+is the same as the code for the counts and bytes. For contexts, the value\r
+that makes every element different is "order4321" instead of "byte". When\r
+we want to read the counts of a context, we have to read the pointer in\r
+its hash entry. If this entry is empty, the context was not initialized\r
+yet, so we fallback to the next lower order without emitting escape code.\r
+Then use this pointer to search current context in the linked list of contexts,\r
+if it's not present we also have to fallback without escape code. In case\r
+we found it, we'll read the pointer to the linked list with the bytes and\r
+counts. Because we already know the maximum cumulative probability, and\r
+the number of defined bytes, we can code the byte as explained before.\r
+But let's have a look at the implementation.\r
+<P><P ALIGN="JUSTIFY">The first function called by both encoding and decoding\r
+functions is "ppmc_get_totf_order3". Thus this is the function which is\r
+going to read the linked list with contexts and ensure that the current\r
+one is present. The first thing is to do the hash key. After that we need\r
+to have the context in a variable, in that case because it's order-3, an\r
+unsigned long is ok, if we want higher orders, say 6, we would have order-4-3-2-1\r
+in an unsigned long, and the rest, order-6-5 in another one. We'll use\r
+this variable for comparing while searching for the context.\r
+<UL>\r
+<LI>\r
+o3_cntxt=ppmc_order3_hash_key(o1_byte,o2_byte,o3_byte);</LI>\r
+\r
+<LI>\r
+full_o3_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16); \r
+//order-3</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">The first thing to check is that current entry in the\r
+hash table for this order is initialized. The table is initialized to 0,\r
+so if current entry is 0, it's not initialized, otherwise it would have\r
+a pointer. If it isn't we can be sure that there's no context in this entry,\r
+so we have to abort. "ppmc_get_totf_order3" unlike functions for lower\r
+orders returns a value. I decided to make it return a 0 in case that the\r
+hash entry was not initialized or the context not present, and a 1 in case\r
+it is.\r
+<UL>\r
+<LI>\r
+ if(order3_hasht[o3_cntxt]==0) //if 0 the entry is not initialized</LI>\r
+\r
+<UL>\r
+<LI>\r
+o3_context=0; //no hash entry</LI>\r
+\r
+<LI>\r
+return 0; //hash entry not initialized</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Now we get the pointer from the entry to the linked\r
+list with contexts and store it in "cntxt_node". And now we have to read\r
+all the linked list till we find current context, if we reach the end before\r
+we find it, the context is not present and thus we have to return a 0,\r
+so the encoding or decoding functions know that they can't code in this\r
+order.\r
+<UL>\r
+<LI>\r
+cntxt_node=order3_hasht[o3_cntxt];</LI>\r
+\r
+<LI>\r
+while(1)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(cntxt_node->order4321==full_o3_cntxt) //compare\r
+context</LI>\r
+\r
+<UL>\r
+<LI>\r
+goto ppmc_gtf_cntxt_found;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(cntxt_node->next==0) //end of context's\r
+linked list</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+\r
+<LI>\r
+cntxt_node=cntxt_node->next; //next element</LI>\r
+</UL>\r
+\r
+<LI>\r
+// Once there the context was not found</LI>\r
+\r
+<LI>\r
+o3_context=cntxt_node; //pointer to last element in the\r
+linked list</LI>\r
+\r
+<LI>\r
+return 0; //it was not present</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Once we know that the context is present we have to\r
+return the total cumulative probability. Note that this is different for\r
+<A HREF="#Full exclusion">full exclusions</A>.\r
+<UL>\r
+<LI>\r
+// The context is found, so return pointer and cump</LI>\r
+\r
+<LI>\r
+ppmc_gtf_cntxt_found:</LI>\r
+\r
+<LI>\r
+o3_context=cntxt_node;</LI>\r
+\r
+<LI>\r
+// Total cump is current total cump plus the escape for the escape code</LI>\r
+\r
+<LI>\r
+total_cump=o3_context->defined_bytes+o3_context->max_cump;</LI>\r
+\r
+<LI>\r
+return 1; //context found</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Now encoding and decoding functions would be able to\r
+do their job. After that we have to update the count of current byte. Now\r
+we rely on the fact that "o3_cntxt" and "full_o3_cntxt" are initialized.\r
+While updating four different cases can happen:\r
+<OL>\r
+<LI>\r
+Hash entry was not initialized.</LI>\r
+\r
+<LI>\r
+Symbol was coded under current context.</LI>\r
+\r
+<LI>\r
+An escape code was emitted.</LI>\r
+\r
+<LI>\r
+Current context was not present in the linked list.</LI>\r
+</OL>\r
+<A NAME="Hash tables.new"></A><P ALIGN="JUSTIFY">In the first case we have\r
+to create a new node in the linked list of context in the current hash\r
+entry. The way we allocate memory is explained in the section about <A HREF="#Memory requeriments and management">memory\r
+management</A>. Then we have to initialize this node, and also put pointer\r
+to this new node in the hash table. After that we have to put a new symbol\r
+in the linked list with probabilities, see the section about <A HREF="#Coding and decoding with linked lists.update">linked\r
+lists</A> for that matter. The code is like the following:\r
+<UL>\r
+<LI>\r
+order3_hasht[o3_cntxt]=allocate_memory_in_buffer;</LI>\r
+\r
+<LI>\r
+_context_pool->next=0; //this is the\r
+last element</LI>\r
+\r
+<LI>\r
+_context_pool->order4321=full_o3_cntxt; //put\r
+context</LI>\r
+\r
+<LI>\r
+_context_pool->prob=allocate_memory_in_buffer; \r
+//pointer to linked list</LI>\r
+\r
+<LI>\r
+_context_pool->max_cump=1;</LI>\r
+\r
+<LI>\r
+_context_pool->defined_bytes=1;</LI>\r
+\r
+<LI>\r
+//Put new byte and count in the linked list</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">If the byte was coded under current context we just\r
+have to update its count. We can assume that "o3_ll_node" and "o3_context"\r
+are initialized, so the code becomes as easy as that:\r
+<UL>\r
+<LI>\r
+o3_ll_node->freq++; //increment its frequency</LI>\r
+\r
+<LI>\r
+o3_context->max_cump++; //total cump</LI>\r
+\r
+<LI>\r
+if(o3_ll_node->freq==255) //do we have to renormalize?</LI>\r
+\r
+<UL>\r
+<LI>\r
+ppmc_renormalize_order3(); //renormalize</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Once the two first cases have been discarded, it may\r
+happen that an escape occurred or that In the third case an escape code\r
+has to be coded because current context was not present in the linked list\r
+of context. Because the case of an uninitialized hash entry was already\r
+discarded we can be sure that there's at least one context in the linked\r
+list of current hash entry for contexts. The way to know which one of those\r
+cases is to check the context in the pointer, if it's the same as the current\r
+one we can be sure that an escape was coded, because in that case the context\r
+existed, and we already discarded the case that the byte was coded under\r
+this context.\r
+<UL>\r
+<LI>\r
+if(o3_context->order4321==full_o3_cntxt) //check if that's the last</LI>\r
+\r
+<UL>\r
+<LI>\r
+do the updating for the case of an escape code</LI>\r
+</UL>\r
+\r
+<LI>\r
+else</LI>\r
+\r
+<UL>\r
+<LI>\r
+do the updating for the case that current context didn't exist</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Otherwise we know that it's pointing to the last element\r
+in the linked list of contexts, because when looking for current context,\r
+it wasn't found so it reached the end of the linked list, and also updated\r
+the variable "o3_context". But however we know that the context was present\r
+(because there was at least 1 symbol). Thus the only thing to do is to\r
+create a new node in the linked list for bytes and probabilities as shown\r
+in the section of <A HREF="#Coding and decoding with linked lists.update">linked\r
+lists</A>.\r
+<P><P ALIGN="JUSTIFY">In the fourth case the context was not present in\r
+the linked list corresponding to this hash entry. What we do is reading\r
+till the end of the linked list for context (though I think this is not\r
+really needed) and then put a new node as explained <A HREF="#Hash tables.new">above</A>,\r
+with the only difference that the pointer is stored in the last element\r
+in the linked list instead in the hash entry of the current order.\r
+<BR> \r
+<P><A NAME="Global variables"></A><FONT SIZE=+2>Global variables</FONT>\r
+<BR><P ALIGN="JUSTIFY">In some cases (unbounded length contexts) we need\r
+to have the whole file loaded in a buffer. However in this implementation\r
+the maximum context that we need is order-4 (but we could very easily extent\r
+this to order-8, at the cost of speed and memory, though). Thus we store\r
+the contexts in another way. Because C has some restrictions, we store\r
+the contexts in unsigned chars. One unsigned char for every byte, order-1\r
+is stored in o1_byte, order-2 is stored in o2_byte and so on. Thus is we\r
+have the context "ther", o1_byte = 'r', o2_byte = 'e', o3_byte =\r
+'h', o4_byte = 't'. If the next byte to code is an 'e', after coding it,\r
+the context would be: o1_byte = 'e', o2_byte = 'r', o3_byte = 'e',\r
+o4_byte = 'h'. In assembly language we could store the whole order-4 in\r
+a double word (32 bits), and then just read a byte for order-1 and so on.\r
+<P><P ALIGN="JUSTIFY">In high orders (3,4) we need to have the hash entry\r
+and also the full context so we can compare it when finding the context\r
+in the linked list. In the case of order-3 we have "o3_cntxt" which contains\r
+the hash key used to index in the order-3 hash table. And "full_o3_cntxt"\r
+which contains all the order-3. In the previous example "o3_cntxt" depends\r
+on the hash function, and "full_o3_cntxt" = "ere". For order-4 only the\r
+number changes, that is "o4_cntxt" instead of "o3_cntxt".\r
+<P><P ALIGN="JUSTIFY">For making the code easier to read, the arithmetic\r
+coding routines (range coder) are called with the same variables. They\r
+are: total_cump with the total cumulative probability, symb_cump symbol\r
+cumulative probability, and symb_prob the probability (count) of the symbol.\r
+<P><P ALIGN="JUSTIFY">Ppmc uses update exclusion. That is we only update\r
+in the order where the byte was coded and higher ones. In my implementation\r
+I use a variable called "coded_in_order" whose value is where the byte\r
+was coded. If the byte was coded under order 3 the value of "coded_in_order"\r
+will be 3. If it was coded in order-(-1) its value will be 0 though. It\r
+has a 0 value to update the right orders. The code in C used for updating\r
+is this:\r
+<P>switch(coded_in_order)\r
+<BR> {\r
+<BR> case 0: ppmc_update_order0();\r
+<BR> case 1: ppmc_update_order1();\r
+<BR> case 2: ppmc_update_order2();\r
+<BR> case 3: ppmc_update_order3();\r
+<BR> case 4: ppmc_update_order4();\r
+<BR> default: break;\r
+<BR> };\r
+<P><P ALIGN="JUSTIFY">Which means that if the value of coded_in_order\r
+is 0, all the functions for updating will be called. And if it's 3 then\r
+only the functions for order-3 and order-4 will be called.\r
+<P><P ALIGN="JUSTIFY">For memory usage I use "ppmc_out_of_memory". If it's\r
+0 then there's still memory available. If it's 1 there's not. Every function\r
+which needs to allocate memory aborts in the case its value is 1. Once\r
+we have a problem allocating memory, this variable is set to 1. At the\r
+end of the loop we check this variable, and in case it's 0 we flush the\r
+model. This is explained in the section of <A HREF="#Memory requeriments and management.flushing">memory\r
+management</A>.\r
+<P><P ALIGN="JUSTIFY">Before starting to model and code the input we have\r
+to reinitialize the model. The main job is to allocate the memory needed\r
+for the hash tables and linked lists. Moreover tables from low orders,\r
+like order-0 and order-1 must be cleared, that is all the counts must have\r
+a value of 0. The hash tables of high orders must be cleared too, that\r
+is we must put a value of 0 in the offsets stored. Apart from that we\r
+have to care about initialising the context variables by reading from the\r
+input, updating the variables and writing directly to the output. Once\r
+this is done we can start with the main loop. Have a look at the source\r
+if you have any question about these steps.\r
+<P><P ALIGN="JUSTIFY">Now we'll see how to implement every order, don't\r
+try doing order-0 until order-(-1) doesn't work, and so on. But before\r
+going in that matter we'll see which functions uses every order and what\r
+they do.\r
+<P><P ALIGN="JUSTIFY">Note that there are some variables different for\r
+every order which are pointers to linked list of contexts or bytes, and\r
+which are initialized in a function, and then in another function are used.\r
+One example is a pointer to the linked list with bytes of the order-2 which\r
+is initialized\r
+(to whatever it has to be) in the function of encoding,\r
+and which is used later while updating. So it's not a good idea to modify\r
+these kind of variables between those functions.\r
+<BR> \r
+<P><A NAME="Functions for every order"></A><FONT SIZE=+2>Functions nomenclature\r
+and use</FONT>\r
+<BR><P ALIGN="JUSTIFY">Every order uses different functions to manage their\r
+data structures. For increasing readability of the code they follow some\r
+guidelines, all functions that do the same have the same name (the only\r
+difference is in the number of the order). They do the same things, in\r
+the sense that they are called in the same way and achieve the same goals,\r
+though most of them use different code.\r
+<P><P ALIGN="JUSTIFY">When decoding we need to know the maximum cumulative\r
+probability for being able to decode the input, so in this case, and also\r
+while encoding, the first function called is "ppmc_get_totf_order2". Which\r
+stands for: get the total frequency, in fact frequency and probability\r
+are different things, as I explained before, unfortunately I perpetuated\r
+this error, but I suggest you to not do so. However the term maximum cumulative\r
+is already used for referring to the sum of the counts of all the bytes\r
+so not using this term for different things can help to avoid misunderstandings.\r
+The function "ppmc_get_totf_order2" does nothing else than returning in\r
+"total_cump" the maximum cumulative probability of the current context.\r
+As explained in the section about the escape method this can be quickly\r
+done by adding the number of defined bytes in the context (the count of\r
+the escape code) and the maximum cumulative probability. Also remember\r
+that "total_cump", "symb_cump" and "symb_prob" are used as the parameters\r
+for arithmetic coding. (in the case of my source a range coder)\r
+<P><P ALIGN="JUSTIFY">The function "ppmc_code_byte_order2" (the number\r
+'2' should be changed if we want the function for another order) codes\r
+the byte in the current symbol if it's present, in that case it returns\r
+a 1. It may happen that under this order, the current context is not present,\r
+in that case it doesn't code anything, and returns a 0. It could happen\r
+too that in current order the context was present but the byte we have\r
+to code wasn't, so an escape code must be coded and the function will return\r
+a 0. From this you can see that the main loop of the compressor just have\r
+to call the encoding routine for highest order, and call the next lower\r
+one in case it returns a 0, or stop when it returns a 1.\r
+<P><P ALIGN="JUSTIFY">Another job of the ppmc model is to update the count\r
+of the current symbol, this is done by calling "ppmc_update_order2". This\r
+function returns nothing, and the main loop doesn't have to check whatever\r
+it worked or not. At least not till the updating is done. In higher orders\r
+the decoder uses a different function for updating "ppmc_update_dec_order2", \r
+because some care should be taken with the linked lists.\r
+<P><P ALIGN="JUSTIFY">When the count of a byte gets too big for the date\r
+type used for storing it, renormalization (halving counts) must be done.\r
+The function "ppmc_renormalize_order2" does this. It's called from the\r
+routine of updating.\r
+<P><P ALIGN="JUSTIFY">For decompression, "ppmc_decode_order2" is called.\r
+It just returns "-1" if the byte was not decoded (because current context\r
+didn't exist or it was an escape code) otherwise it returns the byte in\r
+the variable "byte". The main loop of the decoder would check if the returned\r
+value is "-1" in that case it would try to decode in a lower order.\r
+<BR> \r
+<P><A NAME="The main"></A><FONT SIZE=+2>The main part</FONT>\r
+<BR><P ALIGN="JUSTIFY">Every order uses its own functions and the only\r
+work of the main part is calling them, this makes the code easier to read\r
+and to expand. The first thing is writing to the output file the length\r
+of the input file, that way the decoder will know when to stop. After that\r
+we start initializing the model and coder:\r
+<UL>\r
+<LI>\r
+ppmc_alloc_memory();</LI>\r
+\r
+<LI>\r
+ppmc_initialize_contexts();</LI>\r
+\r
+<LI>\r
+ppmc_encoder_initialize();</LI>\r
+\r
+<LI>\r
+range_coder_init(&rc_coder,file_output);</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Once this is done we start the main loop, we read a\r
+byte from the input till there are no more bytes. Then we try to code in\r
+the highest order and if failed, in a lower one. In the worst case we'll\r
+code the byte in order-1. After that we have to do the updating, keeping\r
+in mind that we do update exclusion, that is we only update the same order\r
+as the byte was coded in, and higher ones. Then we update the context variables\r
+and check for memory problems. The code is like the following:\r
+<BR> \r
+<LI>\r
+while((byte=fgetc(file_input))!=EOF)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_code_byte_order4()==0) //If it doesn't work keep\r
+trying</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_code_byte_order3()==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_code_byte_order2()==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_code_byte_order1()==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_code_byte_order0()==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+ppmc_get_prob_ordern1();</LI>\r
+\r
+<LI>\r
+range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);</LI>\r
+\r
+<LI>\r
+coded_in_order=0;</LI>\r
+</UL>\r
+</UL>\r
+</UL>\r
+</UL>\r
+</UL>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+switch(coded_in_order)</LI>\r
+\r
+<UL>\r
+<LI>\r
+case 0: ppmc_update_order0(); //update only order-0</LI>\r
+\r
+<LI>\r
+case 1: ppmc_update_order1(); //update order-0 and order-1</LI>\r
+\r
+<LI>\r
+case 2: ppmc_update_order2(); //update order-2 1 and 0...</LI>\r
+\r
+<LI>\r
+case 3: ppmc_update_order3();</LI>\r
+\r
+<LI>\r
+case 4: ppmc_update_order4();</LI>\r
+\r
+<LI>\r
+default: break;</LI>\r
+</UL>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+o4_byte=o3_byte;</LI>\r
+\r
+<LI>\r
+o3_byte=o2_byte;</LI>\r
+\r
+<LI>\r
+o2_byte=o1_byte;</LI>\r
+\r
+<LI>\r
+o1_byte=byte; //current one is next time order-1</LI>\r
+</UL>\r
+\r
+<UL>\r
+<LI>\r
+if(ppmc_out_of_memory==1)</LI>\r
+\r
+<UL>\r
+<LI>\r
+ppmc_flush_mem_enc();</LI>\r
+</UL>\r
+</UL>\r
+<A NAME="Order-(-1)"></A><FONT SIZE=+2>Order-(-1)</FONT>\r
+<BR><P ALIGN="JUSTIFY">The order-(-1) in the code is called "ordern1" meaning\r
+negative one. As we saw in the article about <A HREF="#r.[6]">finite context\r
+modeling</A> order-(-1) is the lowest order whose only job is to be able\r
+to code all the symbols that may appear in the input. Order-1 is not used\r
+for achieving compression, so it's never updated. Because it must give a\r
+non-zero count to all the symbols but it's not updated, we use a flat probability\r
+distribution. That is every symbol has the same probability. For matters\r
+of the implementation every symbol has a 1 count. The alphabet has all\r
+the bytes, from 0-255 in the same positions, and an special code, the 256\r
+which is used for flushing the model when running out of memory. Thus we\r
+have that the probability of every symbol in the order-(-1) table is 1/257.\r
+<P><P ALIGN="JUSTIFY">At first we could think about having a table with\r
+the counts and cumulative probabilities of every symbol, but having a look\r
+at those values we realize that this is not needed.\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Symbol: </TD>\r
+\r
+<TD>0 </TD>\r
+\r
+<TD>1 </TD>\r
+\r
+<TD>2 </TD>\r
+\r
+<TD>3 </TD>\r
+\r
+<TD>n </TD>\r
+\r
+<TD>256</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Count: </TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>1</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Cumulative probability: </TD>\r
+\r
+<TD>0</TD>\r
+\r
+<TD>1</TD>\r
+\r
+<TD>2</TD>\r
+\r
+<TD>3 </TD>\r
+\r
+<TD>n</TD>\r
+\r
+<TD>256</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">We see that the cumulative probability of a given\r
+byte is the value of the byte itself. We can take profit of this fact and\r
+avoid using any table. So the first function to implement "ppmc_get_prob_ordern1"\r
+is quite easy, it just has to assign a 257 for the maximum cumulative probability\r
+(no need of escape code), 1 to the probability, and the value of the byte\r
+to the cumulative probability.\r
+<P><P ALIGN="JUSTIFY">The function for returning the byte of a given cumulative\r
+probability is very easy too, the byte is the value of the cumulative probability.\r
+<P><P ALIGN="JUSTIFY">Because this order has the only job of being able\r
+to code all symbols, it's not update, so we don't need to care about that\r
+matter.\r
+<BR> \r
+<P><A NAME="Order-0"></A><FONT SIZE=+2>Order-0 and order-1</FONT>\r
+<BR><P ALIGN="JUSTIFY">Order-0 and order-1 are not memory hungry, so we\r
+can store them in tables. If you have already implemented any arithmetic\r
+coder, you surely know how does order-0 works. Given this, order-1 is only\r
+a little modification: in order-0 we keep the probabilities in this table\r
+"order0_array[]" and access it directly (that is the probability for "byte"\r
+is in "order0_array[byte]"), order-1 uses instead "order1_array[o1_byte][byte]".\r
+The only difference from an usual implementation is the escape code. Now\r
+it's only a matter of implementing it having in mind the differences pointed\r
+in the section about <A HREF="#Cumulative probabilities">Cumulative probabilities</A>.\r
+<BR> \r
+<P><A NAME="Order-2"></A><FONT SIZE=+2>Order-2</FONT>\r
+<BR><P ALIGN="JUSTIFY">Order-2 needs more memory than order-1, the probabilities\r
+can't be stored on tables, because it would take too much memory. Thus\r
+the probabilities are stored in linked lists. Order-2 has 2^16 = 65536\r
+contexts, we can afford to put the contexts in a table. Like a hash table\r
+with direct addressing, obviously the hash key is just the order-2, for\r
+the context "there" the hash key would be "re", thus we would find the\r
+context in "order2_array['re']". Due to the fact that in every entry we\r
+only have one context, we don't need to store the context nor use linked\r
+lists for contexts, so the data structure used is similar to the <A HREF="#Hash tables.context structure">one</A>\r
+used for higher orders, but with these differences.\r
+<P>struct o2_context\r
+<BR>{\r
+<BR>struct _byte_and_freq *prob; //Pointer to linked\r
+lists containing bytes and counts of this context\r
+<BR>unsigned int max_cump; //Maximum cumulative probability\r
+in this context\r
+<BR>unsigned int defined_bytes; //The number of bytes\r
+in this context\r
+<BR>};\r
+<P>Putting all together we have that the maximum cumulative probability\r
+of order-2 with a context like "which" is in:\r
+<LI>\r
+order2_array['re'].max_cump.</LI>\r
+\r
+<BR>Once this is clear we already know how to encode a byte, first we have\r
+to make the hash key.\r
+<LI>\r
+o2_cntxt = ppmc_order2_hash_key(o1_byte,o2_byte);</LI>\r
+\r
+<BR><P ALIGN="JUSTIFY">The hash key as already explained is very simple,\r
+and the only thing it does is putting two bytes, the order-1 and order-2\r
+bytes, in a variable, which will be used to access the array for order-2.\r
+<BR>#define ppmc_order2_hash_key(k,j) ((k)+(j<<8))\r
+<P><P ALIGN="JUSTIFY">Now it could happen that the context hasn't been\r
+created yet, so we have to check it. As explained before once a byte in\r
+a linked list in created, it's never deleted, so just by checking the number\r
+of defined bytes we can see if the context has been already created. If\r
+the count is 0 it hasn't, so we have to return without coding anything.\r
+<LI>\r
+if(order2_array[o2_cntxt].defined_bytes == 0) //it's empty</LI>\r
+\r
+<LI>\r
+return 0; //byte not coded, nothing done</LI>\r
+\r
+<BR><P ALIGN="JUSTIFY">Now we use the code explained in the section of\r
+<A HREF="#Coding and decoding with linked lists.coding">Coding and decoding\r
+with linked lists</A> and code an escape code or the byte depending\r
+on if the byte wasn't present or it was.\r
+<P><P ALIGN="JUSTIFY">The next function to implement is "ppmc_update_order2",\r
+because we saved in "o2_ll_node" the pointer to the linked list, we don't\r
+need to read the linked list again, and thus we can update the order as\r
+explained in <A HREF="#Coding and decoding with linked lists.update">Coding\r
+and decoding with linked lists</A>.\r
+<P><P ALIGN="JUSTIFY">Decoding is not much more difficult just follow the\r
+explanations at\r
+<A HREF="#Coding and decoding with linked lists.decoding">Coding\r
+and decoding with linked lists</A>, and remember to check first if context\r
+exists. The function for updating while decoding can't rely on the fact\r
+that "o2_ll_node" points to the last element, so it has to read all the\r
+linked lists, till the end and put new nodes there. This happens when the\r
+byte was not coded under order-2. Check the code if you have any question.\r
+<BR> \r
+<P><A NAME="Order-3"></A><FONT SIZE=+2>Order-3 and order-4</FONT>\r
+<BR><P ALIGN="JUSTIFY">We can't use a table for contexts in order-3 because\r
+it would be too big, it would require 16,777,216 entries and every entry\r
+would take around 12 bytes. We can't afford that, so instead we use linked\r
+lists for storing contexts. Read <A HREF="#Hash tables">Hash tables for\r
+contexts</A>.\r
+<P><P ALIGN="JUSTIFY">While encoding or decoding the first function called\r
+is "ppmc_get_totf_order3", this one is responsible for checking if current\r
+context is present, and also updates the variable "o3_context" (or\r
+"o4_context" for order-4), it contains a pointer to the entry, and thus\r
+the functions now are as in order-2 but instead of "order2_array[o2_cntxt]"\r
+we use "o3_context". Apart from some special care while updating, all\r
+the functions are the same, so you only have to change that and you already\r
+have order-3 working. Both order-3 and order-4 are managed in the same\r
+way. So do one, and copy the code for the next one.\r
+<BR> \r
+<P><A NAME="Memory requeriments and management"></A><FONT SIZE=+2>Memory\r
+requirements and management</FONT>\r
+<BR>The basic data structures are the following:\r
+<BR> \r
+<TABLE BORDER CELLSPACING=3 CELLPADDING=3 >\r
+<TR>\r
+<TD VALIGN=TOP>struct _byte_and_freq{\r
+<BR>unsigned char byte;\r
+<BR>unsigned char freq;\r
+<BR>struct _byte_and_freq *next; };</TD>\r
+\r
+<TD VALIGN=TOP>struct context{\r
+<BR>struct context *next;\r
+<BR>unsigned long order4321;\r
+<BR>struct _byte_and_freq *prob; \r
+<BR>unsigned int max_cump;\r
+<BR>unsigned int defined_bytes; };</TD>\r
+\r
+<TD VALIGN=TOP>struct context_o2{\r
+<BR>struct _byte_and_freq *prob;\r
+<BR>unsigned int max_cump;\r
+<BR>unsigned int defined_bytes; };</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P>Thus they take:\r
+<BR> \r
+<TABLE BORDER CELLSPACING=3 CELLPADDING=3 >\r
+<TR>\r
+<TD>Structure </TD>\r
+\r
+<TD>Unaligned </TD>\r
+\r
+<TD>Aligned </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>_byte_and_freq</TD>\r
+\r
+<TD>6 bytes</TD>\r
+\r
+<TD>8 bytes</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>context</TD>\r
+\r
+<TD>20 bytes</TD>\r
+\r
+<TD>20 bytes</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>context_o2</TD>\r
+\r
+<TD>12 bytes</TD>\r
+\r
+<TD>12 bytes</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">The static memory of the program, memory which is\r
+not allocated while the program runs takes is the showed in the table.\r
+This is the memory that the programs just need to start.\r
+<BR> \r
+<TABLE BORDER CELLSPACING=3 CELLPADDING=3 >\r
+<TR>\r
+<TD>Order-0 </TD>\r
+\r
+<TD>256 </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Order-1 </TD>\r
+\r
+<TD>66048 </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Order-2 </TD>\r
+\r
+<TD>786432</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Order-3 </TD>\r
+\r
+<TD>262144</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Order-4 </TD>\r
+\r
+<TD>262144 </TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Bytes pool </TD>\r
+\r
+<TD>1000000</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Context pool </TD>\r
+\r
+<TD>1000000</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>Total</TD>\r
+\r
+<TD>3377024 </TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">Order-(-1) doesn't need any array. Order-0 uses an\r
+array with the count (8 bits) for every byte. Order-1 uses the same table\r
+but 256 times, thus 256*256, moreover we need two arrays for every byte\r
+(256 entries thus) for "defined_bytes" and "max_cump". Order-2 has a hash\r
+table with direct addressing, because we'll address two bytes (order-2\r
+context) we need 256*256=65,536 entries. Every entry uses the data structure\r
+context_o2, thus it takes 65536*12=786,432 bytes. Order-3 instead has a\r
+hash table with pointers to contexts, it has 65,536 entries and is a pointer\r
+which (in my implementation) takes 4 bytes, 65536*4=262,144. Order-4 uses\r
+exactly the same. The pool for bytes and contexts take 2,000,000 bytes,\r
+we'll explain now why.\r
+<P><P ALIGN="JUSTIFY">We have to allocate memory for every new element\r
+in the linked list. It wouldn't be a good idea to call the functions for\r
+doing so every time. Also there's another fact that we should be aware of\r
+while thinking about how to allocate memory, an element in the linked list\r
+is never deleted. Due to these facts the solution that I chose for my\r
+implementation was having a big buffer, and put there new nodes there.\r
+Keeping track of the next position and the last one with pointers. Once\r
+this buffer is filled we have to allocate a new one. Given this solution,\r
+there's still two parameters to choose, the length of the buffer, and the\r
+maximum number of buffers to allocate. I decided to give them a value of\r
+1,000,000 and 100 respectively. We can choose to have another value, the\r
+length of new buffers, that way you could allocate a very bug buffer at\r
+the start (suited to your needs) and allocating less memory for the new\r
+buffers, expecting to not need a lot. This could save some memory, but\r
+should be tuned. Though in my implementation I have both definitions, they\r
+have the same value.\r
+<P><P ALIGN="JUSTIFY">Again having code clarity in mind I chose to have\r
+two buffers, one for the linked lists containing bytes and probabilities,\r
+and another for the contexts. The code is the same and only the nomenclature\r
+(names assigned) for the variables is different, thus instead of "_bytes_pool_elements"\r
+there's "_context_pool_elements". The variables used for this code are\r
+the following ones:\r
+<UL>\r
+<LI>\r
+struct context *_context_pool, this is a pointer to the current buffer,\r
+exactly in the position where is pointing we can allocate a new element.</LI>\r
+\r
+<LI>\r
+struct context *_context_pool_max, this pointer is used to detect the end\r
+of current buffer.</LI>\r
+\r
+<LI>\r
+unsigned long _context_pool_index, this is a value which says which buffer\r
+are we using.</LI>\r
+\r
+<LI>\r
+struct context *_context_pool_array[_mempool_max_index], this is the table\r
+were we save the pointers to the different buffers, we need to keep it\r
+for freeing them.</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">The defined values follow too. We get the length of\r
+any buffer by multiplying it by its size, so the size of a buffer for contexts\r
+is 50,000*20=1,000,000. I decided to make every buffer one mega long, it\r
+seemed like a reasonable value, because it was big enough to not need to\r
+allocate another buffer soon, and that it was not so big that it couldn't\r
+be allocated it today's computers.\r
+<UL>\r
+<LI>\r
+_context_pool_elements (50,000), this is the number of entries which are\r
+allocated for the first buffer (initialized at the very start of the program).</LI>\r
+\r
+<LI>\r
+_context_pool_elements_inc (50,000), this is the number of entries which\r
+would be allocated for the rest of the buffers.</LI>\r
+\r
+<LI>\r
+_bytes_pool_elements (125,000), the same but for the linked list which\r
+holds bytes and probabilities.</LI>\r
+\r
+<LI>\r
+_bytes_pool_elements_inc (125,000), the same but for the linked list which\r
+holds bytes and probabilities.</LI>\r
+\r
+<LI>\r
+_mempool_max_index (1,000), this is the maximum number of buffers, of every\r
+kind, which we can allocate. In my code I choose to use the same value for\r
+both bytes and contexts' tables, but they could be further tuned, though\r
+it's of little importance as long as this number is big enough to not run\r
+out of buffers.</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">If we allocate a new element the steps to follow are\r
+clear: first let the routine use the pointer at "_context_pool" for the\r
+new entry, second increment the pointer and then check that this is not\r
+the last entry, if its we should allocate another buffer. We can do another\r
+paranoid check, that is checking that we still have any entry free in the\r
+table where we store the pointers to the big buffers. We could avoid it\r
+by being sure that we have enough buffers (which probably is the case).\r
+<UL>\r
+<LI>\r
+Use the pointer as needed</LI>\r
+\r
+<LI>\r
+++_context_pool; //increment pointer, next position in\r
+the big buffer</LI>\r
+\r
+<LI>\r
+if(_context_pool==_context_pool_max) //if they are equal,\r
+we reached the maximum</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(_context_pool_index==_mempool_max_index) // First\r
+check to see that we still have entries in the array</LI>\r
+\r
+<UL>\r
+<LI>\r
+ppmc_out_of_memory=1; //At the end of the loop in the\r
+main part, flush</LI>\r
+\r
+<LI>\r
+return; //In case we were updating we can't follow, there\r
+are no entries left</LI>\r
+</UL>\r
+\r
+<LI>\r
+// Allocate memory for new buffer, and return if it's not possible</LI>\r
+\r
+<LI>\r
+if((_context_pool=(struct context *)malloc(sizeof(struct context)*_context_pool_elements_inc))==NULL)</LI>\r
+\r
+<UL>\r
+<LI>\r
+ppmc_out_of_memory=1; //At the end of the loop in the\r
+main part, flush</LI>\r
+\r
+<LI>\r
+return;</LI>\r
+</UL>\r
+\r
+<LI>\r
+_context_pool_array[_context_pool_index]=_context_pool;</LI>\r
+\r
+<LI>\r
+_context_pool_max=_context_pool+_context_pool_elements_inc;</LI>\r
+\r
+<LI>\r
+_context_pool_index++;</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">After allocating memory for the new buffer, we put its\r
+value in the table, and compute the value of the end of the buffer, obviously\r
+it's the pointer plus the maximum number of entries (in case of assembly\r
+you first should multiply this value by the length of every entry). Then\r
+we increment the index tot the table with pointers to the big buffers,\r
+which will be used when freeing memory.\r
+<P><P ALIGN="JUSTIFY">When the compressing process is done, or when we\r
+have to flush the encoder, we have to free all the buffers. At the start\r
+of the program we cleared all the entries of the table with pointers to\r
+the big buffers, so we have to check if it's non zero, in case it isn't,\r
+then it's a pointer, and we have to free it. In C we only need to know\r
+the pointer of the buffer to free it, in other programming languages you\r
+might need storing additional information.\r
+<P><A NAME="Memory requeriments and management.flushing"></A><P ALIGN="JUSTIFY">The\r
+flushing routine has to ensure that everything (variables, tables and buffers)\r
+is like if it was just initialized. First we have to free all the memory.\r
+Then clear all the tables for all orders, that is all the probabilities\r
+of order-0 and order-1 set to zero, and all the pointers of higher orders\r
+to 0. Then the big buffers must be allocated again. In my implementation\r
+I had to set the count of the current byte in order-0 to 1 too. Next we\r
+have to emit the flush code. We have to visit each context and if there's\r
+any probability, emit an escape code in this order. Once we reach order-(-1)\r
+we can emit the flush code, which in my implementation is done with:\r
+<UL>\r
+<LI>\r
+symb_prob=1;</LI>\r
+\r
+<LI>\r
+symb_cump=256;</LI>\r
+\r
+<LI>\r
+total_cump=257;</LI>\r
+\r
+<LI>\r
+range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">The decoder checks the symbol decoded under order-(-1)\r
+before writing it in the output, in case it's a flush code, it has to flush\r
+its model, no matter if it still has memory (this could happen if the file\r
+was compressed in a computer with more memory). In my code I do even another\r
+check, in case that "ppmc_out_of_memory" = 1, but we didn't decode a flush\r
+code, that means that compressor had more memory than decompressor, in\r
+that case we just can't decompress the rest of the file correctly, because\r
+we can't keep track of changes in the model (new nodes), so we can only\r
+exit. To check this at the end of the loop we check if the variable "expected_flush"=1\r
+(it's initialized to 0) in case it's we exit. After that we check if we\r
+run out of memory. If we do, we set "expected_flush" to 1, and thus if\r
+next time we don't decode a flush code, it means that we have to exit,\r
+which will be done by the check explained before.\r
+<BR> \r
+<BR> \r
+<P><A NAME="Fast order-3 hashed"></A><FONT SIZE=+2>How to implement the\r
+fast order-3 hashed version</FONT>\r
+<BR><P ALIGN="JUSTIFY">With hash tables we can do a trick which in most\r
+of the cases is of not use, but in this one resulted in something which\r
+works quite well. The trick is simply not resolving hash collisions. Thus\r
+let different contexts share the same entry. We only need to implement\r
+order-2, once we have it working we change the hash key for order-2 for\r
+the hash key of order-3:\r
+<P>#define ppmc_order2_hash_key(k,j) ((k)+(j<<8))\r
+<BR>#define ppmc_order3_hash_size 65536\r
+<BR>#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11))\r
+& ppmc_order4_hash_size-1\r
+<P><P ALIGN="JUSTIFY">If you are not used to C '&' means the AND mask,\r
+in the case of hash keys we use it for erasing the high bits that we don't\r
+use (because we restricted the length of the hash table). The '<<'\r
+are left shifts.\r
+<P><P ALIGN="JUSTIFY">Now we just have to change the main part to use order-3\r
+instead of order-2, we only have to use the variable "o3_byte" too. And\r
+it's all done. Now in the same hash entry there will be different contexts\r
+sharing the same probability list. This doesn't produce bad results, instead\r
+it works better than order-2, though of course it works worse than real\r
+order-3. However if we extent the same trick for order-4 the results are\r
+bad. These are the results for book2. (time for compression in kbyps)\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Order </TD>\r
+\r
+<TD>Bpb </TD>\r
+\r
+<TD>Time (c)</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>2</TD>\r
+\r
+<TD>2.8629 </TD>\r
+\r
+<TD>246.696</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>3h</TD>\r
+\r
+<TD>2.3713</TD>\r
+\r
+<TD>237.134</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>3</TD>\r
+\r
+<TD>2.2849</TD>\r
+\r
+<TD>205.995</TD>\r
+</TR>\r
+\r
+<TR>\r
+<TD>4h</TD>\r
+\r
+<TD>2.4934</TD>\r
+\r
+<TD>146.716</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">It's worth mentioning that order-3h uses almost as\r
+much memory as order-2, because they use the same hash tables. Order-3\r
+can take a little bit more of memory in linked lists for bytes and probabilities.\r
+The difference from order-2 and order-3h in compression is big while in\r
+speed and memory usage is very little! This happens because though different\r
+contexts are sharing the same probability list, they aren't quite different\r
+or because there are just a few contexts in the same entry. In the other\r
+hand in order-4 a lot of contexts share the same entry, with may be different\r
+probabilities, so at the end we get worse compression, because the same\r
+context (hash entry in that case) gives a probability distribution which\r
+doesn't reflect current file.\r
+<P><P ALIGN="JUSTIFY">The differences in speed can be explained too. They\r
+all come from insertion and updating of nodes in the list of bytes and\r
+their counts. 3h or 4h have to create and update more nodes, and while\r
+coding and decoding there are more symbols in the cumulative probabilities,\r
+so the process is slower.\r
+<P><P ALIGN="JUSTIFY">Order-3h is a ppmc implementation well suited for\r
+real world compression because it only takes a little bit more of speed\r
+and memory than order-2, but achieves 0.5bpb more, which ranks it in a\r
+good position.\r
+<P><A NAME="Fast order-3 hashed.hash explanation"></A><P ALIGN="JUSTIFY">If\r
+you wonder how this works, let's put an example. Let's say we have a binary\r
+message, that is, the alphabet only contains 1 and 0. And our hash key\r
+looks like the following:\r
+<BR>#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<1)+(i<<1))\r
+& 11 (binary)\r
+<BR>And we have a context like "101". If we apply it the hash key the result\r
+is (all the values are in binary): 1+00+10=11. 11 would be the hash entry\r
+of context "101". But this however is also the hash entry of "011" (using\r
+hash key:) 1+10+00=11. So at the end we have that both contexts "101" and\r
+"011" share the same entry. In their entry there's the probability list\r
+for 0 and 1. And there we would store the probabilities of the two contexts.\r
+In our implementation of ppmc the contexts which share the same entry have\r
+for sure the same order-1 byte, so more or less they have the same probability\r
+distribution, and thus it doesn't suffer much. In fact the error in prediction\r
+in book2 results only in 0.0864 bpb.\r
+<P><P ALIGN="JUSTIFY">If we analyse it, see that we have 2^3 = 8 possible\r
+input values which are mapped to 2^2 = 4 hash entries, thus every entry\r
+is share by 8/4 = context. In our hash key for ppmc we have 2^24 = 16,777,216\r
+contexts which are mapped (that is, we assign them a position in the hash\r
+table) to 2^16 = 65,536 (this is the number of entries in the hash\r
+table). Thus we have that in ppmc order-3h every entry holds 256 different\r
+contexts. 256 different probability distributions in every list, however\r
+the impact in compression is little because due to our hash key the contexts\r
+which share the same entry are similar.\r
+<BR>#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11))\r
+& ppmc_order4_hash_size-1\r
+<BR>In our hash key "k" is "o1_byte" and the hash key as worst modify the\r
+highest bit, but the rest remain untouched, so we know for sure that all\r
+the contexts that share the same hash entry have the first seven bits of\r
+order-1 the same.\r
+<P><P ALIGN="JUSTIFY">Another thing which I had to check was using a classic\r
+hash key (used for example in lzp <A HREF="#r.[14]">[14]</A>) instead of\r
+the current one. Instead of '+' a '^' (xor) <!-- Plain text. File: instead.txt [spaces instead of tabs] -->\r
+<PRE> Using ^ Using +\r
+File Bpb Kbyps c Kbyps d File Bpb Kbyps c Kbyps d\r
+book1 2.5531 218.5143 214.8418 book1 2.5415 236.7239 221.6721\r
+book2 2.3997 237.1344 213.9184 book2 2.3859 241.8208 227.4374\r
+geo 4.8769 86.9565 82.6446 geo 4.9209 90.9091 86.9565\r
+obj2 3.0717 182.5980 162.8576 obj2 3.0635 191.2931 175.9338\r
+progc 2.6109 182.4308 182.4308 progc 2.6010 182.4308 182.4308\r
+news 2.8417 186.2531 167.2981 news 2.8194 203.2762 186.2531\r
+Average 3.0590 182.3145 170.6652 Average 3.0554 191.0756 180.1139</PRE>\r
+<P ALIGN="JUSTIFY">The use of xor instead of adding resulted in worse compression\r
+and speed for all the files except geo, which is a list of floating point\r
+numbers. The hash key for order-3 works better with "+" for text. The test\r
+files come from\r
+<A HREF="#r.[15]">[15]</A>.\r
+<BR> \r
+<P><A NAME="Full exclusion"></A><FONT SIZE=+2>Adding full exclusion to\r
+the already done functions</FONT>\r
+<BR><P ALIGN="JUSTIFY">Full exclusion is when we exclude from probability\r
+computation symbols which appeared in orders which we already used. Obviously\r
+if in an order there's a symbol but we instead emit an escape code, because\r
+it wasn't the symbol to code, it will not be coded, so we shouldn't assign\r
+it any probability. Let's say we have a message with the following probabilities:\r
+<BR> \r
+<TABLE BORDER CELLSPACING=2 CELLPADDING=2 >\r
+<TR>\r
+<TD>Order-3\r
+<BR>'a'-2\r
+<BR>'b'-4\r
+<BR>'c'-0\r
+<BR>escape-3 </TD>\r
+\r
+<TD>Order-2\r
+<BR>'a'-6\r
+<BR>'b'-7\r
+<BR>'c'-1\r
+<BR>escape-3</TD>\r
+</TR>\r
+</TABLE>\r
+\r
+<P><P ALIGN="JUSTIFY">And we want to code 'c'. In order-3 we don't find\r
+it, so we emit an escape code with a probability of 3/9. Then in order-2\r
+we find it and emit it with a probability of 1/16 (4 bits). As you see\r
+in order-2 we allocate probability space for 'a' and 'b', though they appeared\r
+in order-3 and we didn't select them, therefore they are not the byte to\r
+code. So we decide to use full exclusion instead of lazy exclusion (which\r
+excludes lower orders) then if we exclude 'a' and 'b', the escape code\r
+has a probability of 1 (the scheme C assigns it a value equal to the number\r
+of defined bytes in the context, which now is only 'c') then we code 'c'\r
+with a probability of 1/2 (1 bit).\r
+<P><P ALIGN="JUSTIFY">For full exclusion we use a table with an entry for\r
+every symbol (in the case bytes this means a table with 256 entries, which\r
+is acceptable), this table is accessed like "excluded[byte]", and has a\r
+value of 0 if it's not excluded, and a value of 1 if it is. At the start\r
+of loop (every itineration, of course) we put all the values to 0, because\r
+no order has been visited yet. If a context emits an escape code, we update\r
+this table, by putting the value of the bytes in that order to 1. That\r
+way they'll be excluded in lower orders. Then when computing probabilities,\r
+we don't take in account bytes excluded. For making the coding easier to\r
+read, I decided to use some new variables: "exc_total_cump", "exc_defined_bytes"\r
+and "exc_max_cump" which are self-explanatory. In the case of order-(-1)\r
+(ordern1 in the code) I chose not to use full exclusions, because order-(-1)\r
+is hardly used. Also if you implement full exclusions, you have to be careful\r
+about a <A HREF="#full.warning">problem</A> that can arise.\r
+<P><P ALIGN="JUSTIFY">The change over previous versions is mainly done\r
+in the function "ppmc_get_totf_order2", which has to exclude from the probability\r
+computation the excluded bytes, comparing its value in the "excluded" table.\r
+<UL>\r
+<LI>\r
+node=order2_array[o2_cntxt].prob;</LI>\r
+\r
+<LI>\r
+exc_max_cump=0;</LI>\r
+\r
+<LI>\r
+exc_defined_bytes=0;</LI>\r
+\r
+<LI>\r
+do</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(excluded[node->byte]==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+exc_defined_bytes++;</LI>\r
+\r
+<LI>\r
+exc_max_cump+=node->freq;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(node->next==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+\r
+<LI>\r
+ node=node->next;</LI>\r
+</UL>\r
+\r
+<LI>\r
+while(1);</LI>\r
+\r
+<LI>\r
+exc_total_cump=exc_max_cump+exc_defined_bytes;</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">We start by clearing both maximum cumulative probability\r
+and defined bytes, then we read till the end of the linked lists,\r
+and check if current byte is excluded, if it isn't we can update the values.\r
+When the end of the linked list is reached (the pointer "next" is 0) we\r
+break the loop, and then we compute the maximum cumulative probability.\r
+<P><P ALIGN="JUSTIFY">The function for coding would first call "ppmc_get_totf_order2",\r
+but now it could happen that due to exclusion there was no byte defined\r
+in the context, thus we have to check it and return a value meaning that\r
+we didn't code the byte.\r
+<UL>\r
+<LI>\r
+if(exc_defined_bytes==0) //If there's no byte defined,\r
+return</LI>\r
+\r
+<UL>\r
+<LI>\r
+return 0;</LI>\r
+</UL>\r
+</UL>\r
+<P ALIGN="JUSTIFY">After that we have to compute its probability, the code\r
+used is the modification of the <A HREF="#Coding and decoding with linked lists.coding">code</A>\r
+used with lazy exclusions, in that case we just have to car about excluded\r
+bytes:\r
+<UL>\r
+<LI>\r
+node=order2_array[o2_cntxt].prob;</LI>\r
+\r
+<LI>\r
+symb_cump=0;</LI>\r
+\r
+<LI>\r
+do</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(node->byte==byte)</LI>\r
+\r
+<UL>\r
+<LI>\r
+goto ppmc_o2_byte_found;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(excluded[node->byte]==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+symb_cump+=node->freq;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(node->next==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+</UL>\r
+\r
+<LI>\r
+node=node->next;</LI>\r
+\r
+<LI>\r
+while(1);</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">Then it's only a matter of coding the byte. But in the\r
+case an escape occurs we do not only have to emit it, but we also have\r
+to update the "excluded" table. As explained before we have to read all\r
+the linked list and then put to 1 the value of the bytes present under\r
+that context:\r
+<UL>\r
+<LI>\r
+node=order2_array[o2_cntxt].prob;</LI>\r
+\r
+<LI>\r
+do</LI>\r
+\r
+<UL>\r
+<LI>\r
+excluded[node->byte]=1;</LI>\r
+\r
+<LI>\r
+if(node->next==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+\r
+<LI>\r
+node=node->next;</LI>\r
+</UL>\r
+\r
+<LI>\r
+while(1);</LI>\r
+</UL>\r
+<A NAME="Full exclusion.optimization"></A><P ALIGN="JUSTIFY">There's room\r
+for optimization. If we code the byte under this context we read the linked\r
+list twice, and if we don't we read it three times, which can be of course\r
+improved. It's possible to make the symbol cumulative probability while\r
+making the maximum cumulative probability, and thus we would avoid reading\r
+the linked list again when coding the symbol. Even we should be able to\r
+not read the linked list again the third time, in that case we should update\r
+the table "excluded" while computing the maximum cumulative probability, \r
+of course we should update the count after having read it. I haven't tested\r
+the idea nor the code, but with both optimizations the code would look\r
+like the following:\r
+<UL>\r
+<LI>\r
+node=order2_array[o2_cntxt].prob;</LI>\r
+\r
+<LI>\r
+byte_seen=0</LI>\r
+\r
+<LI>\r
+exc_max_cump=0;</LI>\r
+\r
+<LI>\r
+exc_defined_bytes=0;</LI>\r
+\r
+<LI>\r
+do</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(excluded[node->byte]==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+exc_defined_bytes++;</LI>\r
+\r
+<LI>\r
+exc_max_cump+=node->freq;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(node->byte==byte) //If it's the same, we don't have\r
+to update "symb_cump" anymore</LI>\r
+\r
+<UL>\r
+<LI>\r
+byte_seen=1</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(byte_seen==0) //If the byte was not seen this itineration\r
+nor the previous ones</LI>\r
+\r
+<UL>\r
+<LI>\r
+symb_cump+=node->freq;</LI>\r
+</UL>\r
+\r
+<LI>\r
+excluded[node->byte]=1 //Exclude byte</LI>\r
+\r
+<LI>\r
+if(node->next==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+\r
+<LI>\r
+ node=node->next;</LI>\r
+</UL>\r
+\r
+<LI>\r
+while(1);</LI>\r
+\r
+<LI>\r
+exc_total_cump=exc_max_cump+exc_defined_bytes;</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">In this code we exclude all the symbols which appear\r
+in current order, if an escape occurs that was what we had to do, if we\r
+code a byte, it doesn't matter because then we are not going to use "excluded".\r
+Note we can't apply this optimization to the decoding functions, because\r
+we first need to know the maximum cumulative probability for decoding,\r
+then we have to read the linked list for finding the symbol.\r
+<P><P ALIGN="JUSTIFY">The function for decoding starts by doing the same\r
+as the encoding function, that is making "exc_max_cump", and then checking\r
+that there's at least one byte defined in the order. In case an escape\r
+occurs we do the same as when coding. The routine for decoding the symbol\r
+is a modification of the <A HREF="#Coding and decoding with linked lists.decoding">one</A>\r
+used for lazy exclusions, which doesn't take in account excluded bytes.\r
+<UL>\r
+<LI>\r
+node=order2_array[o2_cntxt].prob;</LI>\r
+\r
+<LI>\r
+while(1)</LI>\r
+\r
+<UL>\r
+<LI>\r
+if(excluded[node->byte]==0)</LI>\r
+\r
+<UL>\r
+<LI>\r
+current_cump+=node->freq;</LI>\r
+</UL>\r
+\r
+<LI>\r
+if(symb_cump<current_cump)</LI>\r
+\r
+<UL>\r
+<LI>\r
+break;</LI>\r
+</UL>\r
+\r
+<LI>\r
+node=node->next;</LI>\r
+</UL>\r
+\r
+<LI>\r
+symb_prob=node->freq;</LI>\r
+\r
+<LI>\r
+byte=node->byte;</LI>\r
+\r
+<LI>\r
+o2_ll_node=node;</LI>\r
+\r
+<LI>\r
+symb_cump=current_cump-symb_prob;</LI>\r
+\r
+<LI>\r
+range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);</LI>\r
+</UL>\r
+<P ALIGN="JUSTIFY">I leave as an exercise to the reader adapting the functions\r
+for order-0 and order-1, if you find any problem have a look at the source\r
+code.\r
+<P><A NAME="full.warning"></A><P ALIGN="JUSTIFY">Warning: the updating\r
+function assumes that if the byte was not coded under order-2 (or higher)\r
+it didn't exist in the linked list, and that it has the pointer "o2_ll_node"\r
+initialized to the last element in the linked list due to the search done\r
+during the encoding process. This is however not true. It may happen that\r
+all the characters are excluded (due to the fact that they appeared in\r
+higher orders) thus "get_totf" would return a 0 in the variable "exc_defined_bytes".\r
+The encoding function would check it and abort (obviously if there's no\r
+byte in the context we can't code anything). Thus the pointer to the linked\r
+list is not initialized. The solution to this problem is just reading till\r
+the end of the linked list while updating. Note that this is not needed\r
+in the highest order, because none of the symbols is excluded because it's\r
+the higher. Therefore in my code I haven't done this change in the function\r
+for updating of order-4. So be careful if you want to put another higher\r
+order. Use the code for order-3 which takes this in account and has linked\r
+list for probabilities and contexts.\r
+<P><P ALIGN="JUSTIFY">The literature tends to say that full exclusion is\r
+excluding the symbols which appeared in higher orders, but this is not\r
+true for ppmz, which uses LOE (Local Order Estimation) for selecting the\r
+best order for current symbol. I.e: let's say the maximum order is 8, but\r
+LOE chooses to try to encode current symbol in order-5, an escape occurs,\r
+and then LOE selects order-3, now we'll use full exclusion, but we don't\r
+exclude all the bytes of higher orders, because we don't know them (at\r
+encoding time we do, but while decoding we won't), it could even happen\r
+that our byte is present there. We only exclude the orders which we have\r
+seen which in this case is only order-5.\r
+<BR> \r
+<P><A NAME="Compression and speed"></A><FONT SIZE=+2>Compression and speed\r
+results of the different implementations</FONT>\r
+<BR><P ALIGN="JUSTIFY">A little comparison between different orders and\r
+implementations is shown in the graphics. O-3h is order-3 hashed, and o-4\r
+(exc) is ppmc order-4 with full exclusions. The compression is measured\r
+in bpb and the speed in kbyps.\r
+<TABLE BORDER=0 >\r
+<TR>\r
+<TD>\r
+<BR><IMG SRC="ppmc-bpb.gif" ALT="Bits Per Byte" HEIGHT=234 WIDTH=377>\r
+<CENTER>bpb, Bits Per Byte </CENTER>\r
+</TD>\r
+\r
+<TD>\r
+<BR><IMG SRC="ppmc-kbyps.gif" ALT="Kylo Bytes Per Second" HEIGHT=234 WIDTH=369>\r
+<CENTER>Kbyps, Kylo BYtes Per Second</CENTER>\r
+</TD>\r
+</TR>\r
+</TABLE>\r
+<P ALIGN="JUSTIFY">The results which follow correspond to the three different\r
+implementations, ppmc with lazy exclusion, ppmc with full exclusion and\r
+ppmc order-3 hashed. Note that the implementation with full exclusions\r
+could be <A HREF="#Full exclusion.optimization">faster</A>. These are results\r
+for the Calgary Corpus, and at the end you can also see the results\r
+for the large Canterbury corpus\r
+<A HREF="#r.[16]">[16]</A>.<!-- Full results. re_full.txt -->\r
+<PRE>Full results:\r
+\r
+Ppmc lazy exclusion, order-4\r
+File Bpb Kbyps c Kbyps d\r
+book1 2.4757 136.9617 118.1796\r
+book2 2.1989 150.3210 126.6680\r
+bib 2.0030 139.9831 123.4259\r
+geo 5.3678 45.6621 42.3729\r
+news 2.8141 107.7190 94.2877\r
+obj1 4.0171 95.4545 77.7778\r
+obj2 2.7300 91.2990 79.8110\r
+paper1 2.5560 139.8309 106.2715\r
+paper2 2.5191 134.3654 113.8373\r
+pic 1.3235 190.5656 146.0821\r
+progc 2.5625 121.6205 105.6178\r
+progl 1.8593 163.9959 131.1967\r
+progp 1.8602 151.9442 128.5682\r
+trans 1.6749 167.7681 139.8068\r
+Average 2.5687 131.2494 109.5645\r
+\r
+\r
+Ppmc order-3 hashed (lazy exclusion)\r
+File Bpb Kbyps c Kbyps d\r
+book1 2.5415 236.7239 221.6721\r
+book2 2.3859 241.8208 227.4374\r
+bib 2.2053 229.5723 229.5723\r
+geo 4.9209 90.9091 86.9565\r
+news 2.8194 203.2762 186.2531\r
+obj1 4.0532 190.9091 131.2500\r
+obj2 3.0635 191.2931 175.9338\r
+paper1 2.6161 189.7705 189.7705\r
+paper2 2.5515 210.1613 215.6918\r
+pic 1.3430 327.5735 294.8162\r
+progc 2.6010 182.4308 182.4308\r
+progl 1.9551 257.7079 267.2526\r
+progp 1.9208 227.9164 227.9164\r
+trans 1.9003 236.5961 236.5961\r
+Average 2.6341 215.4758 205.2535\r
+\r
+\r
+Ppmc full exclusion, order-4\r
+File Bpb Kbyps c Kbyps d\r
+book1 2.2723 21.1874 22.5584\r
+book2 1.9947 22.5509 22.9227\r
+bib 1.8522 22.4630 23.2361\r
+geo 4.7744 7.4906 9.0580\r
+news 2.3807 20.8546 17.2488\r
+obj1 3.8140 8.3004 10.6061\r
+obj2 2.5281 16.7498 18.6700\r
+paper1 2.3264 21.0023 21.0856\r
+paper2 2.3028 18.6704 20.6977\r
+pic 1.2678 20.1848 20.0075\r
+progc 2.3511 20.2701 19.7708\r
+progl 1.7235 25.3187 23.4280\r
+progp 1.7267 22.1865 20.3002\r
+trans 1.5737 23.3601 21.8138\r
+Average 2.3492 19.3278 19.3860\r
+\r
+\r
+Ppmc full exclusion, order-4\r
+File Bpb Kbyps c Kbyps d\r
+bible.txt 1.7062 29.5428 28.9945\r
+e.coli 1.9560 32.8449 31.6604\r
+kjv.guten 1.6801 28.8764 28.3149\r
+world192 1.6892 28.1551 27.9047\r
+Average 1.7579 29.8548 29.2186\r
+</PRE>\r
+\r
+<P><BR><A NAME="References"></A><FONT SIZE=+2>References</FONT>\r
+<BR><A NAME="r.[1]"></A>[1] Charles Bloom "Kraft Inequality"<I>,</I> <A HREF="http://www.cbloom.com">http://www.cbloom.com</A>\r
+(Compression : News Postings : Kraft Inequality)\r
+<BR><A NAME="r.[2]"></A>[2] Bell, T.C., Cleary, J.G. and Witten, I.H. (1990)\r
+"Text compression<I>"</I>, Prentice Hall, Englewood Cliffs, NJ.\r
+<BR><A NAME="r.[3]"></A>[3] Charles Bloom "Solving the Problems of Context\r
+Modeling<I>"</I>, <A HREF="http://www.cbloom.com">http://www.cbloom.com</A>\r
+(ppmz.ps)\r
+<BR><A NAME="r.[4]"></A>[4] Mark Nelson, Jean-Loup Gaily (1996) "The Data\r
+Compression Book<I>"</I>, M&T Books. Source available at Mark's <A HREF="http://www.dogma.net/markn">page</A>.\r
+<BR><A NAME="r.[5]"></A>[5] Ppmd source code by <A HREF="mailto:dmitry.shkarin@mtu-net.ru">Dmitry\r
+Shkarin</A> available via <A HREF="ftp://ftp.elf.stuba.sk/pub/pc/pack/ppmde.rar">ftp1</A>\r
+or <A HREF="ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmde.zip">ftp2</A>.\r
+<BR><A NAME="r.[6]"></A>[6] Arturo Campos (1999) "Finite context modeling"\r
+available at my <A HREF="http://www.arturocampos.com">home page</A>.\r
+<BR><A NAME="r.[7]"></A>[7] Moffat, A. (1988) "A note on the PPM data compression\r
+algorithm," Research Report 88/7, Department of Computer Science, University\r
+of Melbourne, Parkville, Victoria, Australia.\r
+<BR><A NAME="r.[8]"></A>[8] Raita, T. and Teuhola, J. (1987) "Predictive\r
+text compression by hashing." ACM conference on Information Retrieval,\r
+New Orleans.\r
+<BR><A NAME="r.[9]"></A>[9] Clearly, J.G. and Witten, I.H. (1984) "Data\r
+compression using adaptative coding and partial string matching" IEEE Trans.\r
+Communications, COM-32 (4), 396-402, April.\r
+<BR><A NAME="r.[10]"></A>[10] Arturo Campos (1999) "Arithmetic coding"\r
+available at my <A HREF="http://www.arturocampos.com">home page</A>.\r
+<BR><A NAME="r.[11]"></A>[11] Arturo Campos (1999) "Range coder" available\r
+at my <A HREF="http://www.arturocampos.com">home page</A>.\r
+<BR><A NAME="r.[12]"></A>[12] Michael Schindler. Source code of the range\r
+coder. Available via <A HREF="http://www.compressconsult.com/rangecoder">html</A>.\r
+<BR><A NAME="r.[13]"></A>[13] The Context-Tree Weighting Project. <A HREF="http://ei0.ei.ele.tue.nl/~frans/ResearchCTW.html">http://ei0.ei.ele.tue.nl/~frans/ResearchCTW.html</A>\r
+<BR><A NAME="r.[14]"></A>[14] Charles Bloom "LZP: a new data compression\r
+algorithm", <A HREF="http://www.cbloom.com">http://www.cbloom.com</A> (lzp.ps)\r
+<BR><A NAME="r.[15]"></A>[15] Calgary corpus set. <A HREF="ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus">ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus</A>\r
+<BR><A NAME="r.[16]"></A>[16] large Canterbury corpus <A HREF="http://corpus.canterbury.ac.nz">http://corpus.canterbury.ac.nz</A>\r
+<BR> \r
+<P><A NAME="Closing words"></A><FONT SIZE=+2>Closing words</FONT>\r
+<BR><P ALIGN="JUSTIFY">Now you can use your ppmc implementation for an\r
+stand alone compressor of lossless files, or use it for lossless image\r
+compression, or using it for compressing the output of other compression\r
+algorithms (read next paragraph). However those are not the only uses of\r
+ppmc, you can use it for predicting and use it for an operative system\r
+where the reader needs to input text, or as a help for OCR <A HREF="#r.[2]">[2]</A>.\r
+If you want better compression than ppmc you should know that using an\r
+order above 4 with ppmc isn't worth because higher orders produce too much\r
+escape codes. For taking profit of the information that high orders provide\r
+you should have a look at other implementations like ppmd or ppmz. Fortunately\r
+ppmc with hash tables is the perfect base for ppmz <A HREF="#r.[3]">[3]</A>\r
+an state of art. I'll write an article about it, so keep an eye to my <A HREF="http://www.arturocampos.com">home\r
+page</A>.\r
+<P><A NAME="literals-lzp"></A><P ALIGN="JUSTIFY">If you want to use ppmc\r
+for compressing the output of lzp (literals), you have to change update\r
+functions of higher orders, because currently they expect that coding is\r
+always done before updating. However in this case, a lot of literals are\r
+coded with lzp but have to be updated with ppmc too. For orders lowers\r
+than 2 there's no problem, but for order-2 and highers, we have to read\r
+till the end of the linked list of contexts or probabilities when we want\r
+to put a new element, or we have to read till the current one to update\r
+its count (variables like "o2_ll_node" are not initialized because the\r
+coding has not been done). Such changes are easy to do and are left as\r
+homework, in case you have any doubt <A HREF="mailto:arturo@arturocampos.com">email</A>\r
+me.\r
+<P><P ALIGN="JUSTIFY">If when implementing ppmc with full exclusions you\r
+find any bug, read the <A HREF="#full.warning">warning</A> about updating.\r
+If still you can't find the bug, have a look at the code, or <A HREF="mailto:arturo@arturocampos.com">email</A>\r
+me. This article took long to do, and thus I look forward for your comments\r
+about it, so please take the time to tell me what you think of it, just\r
+as I took the (long) time to write it.\r
+<P><P ALIGN="JUSTIFY">I want to thank to Malcolm Taylor for letting me see\r
+his ppmz code which used hash tables, unfortunately later he was too busy\r
+to be able to answer my questions, which at the end I solved myself.\r
+<BR> \r
+<P><A NAME="Contacting the author"></A><FONT SIZE=+2>Contacting the author</FONT>\r
+<BR><P ALIGN="JUSTIFY">You can reach me via email at: <A HREF="mailto:arturo@arturocampos.com">arturo@arturocampos.com</A> \r
+See you in the next article!\r
+<BR> \r
+<BR> \r
+<DIV ALIGN=right>Arturo San Emeterio Campos, Barcelona 21-March-2000</DIV>\r
+\r
+<HR WIDTH="100%">\r
+<BR><P ALIGN="JUSTIFY">This article comes from Arturo Campos home page\r
+at <A HREF="http://www.arturocampos.com">http://www.arturocampos.com</A>\r
+Visit again soon for new and updated compression articles and software.\r
+</BODY>\r
+</HTML>\r
--- /dev/null
+
+TARGETS=ppmc unppmc
+COMMON= ppmc.o ppmcdata.o range.o
+
+CFLAGS=-O3 -Wall -DNDEBUG
+
+all: $(TARGETS)
+
+ppmc: $(COMMON) ppmcmain.c
+
+unppmc: $(COMMON) unppmc.c
+
+clean:
+
+ @$(RM) -f *.o $(TARGETS)
+
+.PHONY: all clean
+
--- /dev/null
+
+ PPMC
+
+
+TABLE OF CONTENTS
+-Description
+-Compiling
+-Files included in this release
+-Timing
+-Author
+-Disclaimer
+
+
+DESCRIPTION
+This is the source code of an implementation of ppmc.
+The data structures used are hash tables instead of a context trie.
+
+A file is compressed and decompressed like that:
+ppmc inputfile compressedfile
+unppmc compressedfile outputfile
+
+I don't recommend to use this for compressing vital information, because it
+hasn't been fully tested, and moreover, the machine where the decompressor is
+being run, must have at least as much memory as the encoder had. I recommend
+to use this compressor only for researching pourposes.
+
+For further information read ac_ppmc.html also included in the package, for
+the latest version visit http://www.ross.net/arturocampos
+
+
+COMPILING
+The source code is in the C programming language, and was successfully
+compiled with Djgpp version 2.02, a project was made which included the
+following files for the encoder (call this project ppmc):
+ppmc.c
+ppmcdata.c
+ppmcmain.c
+range.c
+
+And for the decoder (call this project unppmc)
+ppmc.c
+ppmcdata.c
+range.c
+unppmc.c
+
+Then you just have to hit F9 and wait. I tried to do makefiles, however there
+was something wrong and it didn't worked. If someone has any idea about it, or
+how to compile this source for other compilers, please let me know it, so in
+the next release I can include makefiles.
+
+
+FILES INCLUDED IN THIS RELEASE
+In the zip file ac_ppmc_html.zip you should find:
+
+readme.txt -> The file you're reading now.
+ac_ppmc.html -> article which explains ppmc and the data structures used
+ppmc.c -> This is the main file which includes all the routines used
+ by both the encoder and decoder's model.
+range.c -> The encoder and decoder's routines.
+ppmcmain.c -> The main routine for the compressor
+unppmc.c -> The main routine for the decompressor
+ppmcdata.c -> Global data structures. (used mainly by ppmc.c)
+ppmc.h -> Declarations of routines
+ppmcdata.h -> Declarations of global data and structures
+range.h -> Declarations of the routines for the range encoder and decoder
+ppmc.exe -> A compiled version of the compressor
+unppmc.exe -> A compiled version of the decompressor
+
+All this files are the implementation of ppmc order-4 using lazy exclusions.
+In the /exclusion directory you can find the same files (unless readme) but
+for ppmc order-4 using full exclusion.
+I thought there was no need to include the files for ppmc-o3h, because they
+are nothing else than ppmc with lazy exclusions, using only order-2 (and
+lowers) but instead of using the hash key for order-2 it uses the one for
+order-3. (you also have to take care about o3_byte, of course)
+
+The executables were compiled with Djgpp 2.95 using only the switch -O6.
+
+
+TIMING
+The standard function to get the
+time "time()", has a maximum precision of seconds. This is not enough for
+testing the speed of a compressor. Due to this timing was not included in
+this release.
+If you are interested on compiling it with Djgpp, my original version used the
+following code:
+
+ struct time ppmc_time, ppmc_time2;
+ double _time2, _time;
+
+ // Get current time
+ gettime(&ppmc_time);
+
+ //Compress file
+
+
+ // Print bpb and kbyps
+ gettime(&ppmc_time2);
+ printf("%s at %f bpb in ",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8);
+ _time=((ppmc_time.ti_min)*60)+(ppmc_time.ti_sec)+(((double)ppmc_time.ti_hund)/100);
+ _time2=((ppmc_time2.ti_min)*60)+(ppmc_time2.ti_sec)+(((double)ppmc_time2.ti_hund)/100);
+ if((_time2-_time)!=0)
+ printf("%f kbytes/seconds.", ((float)size_file_input/(float)1024)/(_time2-_time));
+
+
+AUTHOR
+This code was made by Arturo San Emeterio Campos, you can find his home page
+at: http://www.ross.net/arturocampos
+And his email is: arturo-campos@mixmail.com
+
+
+DISCLAIMER
+Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved. Permission
+is granted to make verbatim copies of this files for private use only. There
+is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+
+ Arturo San Emeterio Campos, Barcelona 04-Jan-2000
--- /dev/null
+Order-3 and order-4 use almost the same code. With lazy exclusions it was
+exactly the same. But for full exclusions is not.
+
+The problem is that when updating we can't be sure that the stored pointer
+points where it should to (as it did with lazy exclusions). So we have to
+read the whole linked list. This is done for order-2 and order-3. However
+order-4 is the highest order used, so it doesn't use exclusion, therefore
+it uses the same code as with lazy exclusions.
+
+So if you want to use higher orders remember that the code to use should
+be the one for order-3 because it updates correctly when exclusions are
+being used. and use the order-4 code for the highest order.
+
+ Arturo Campos (arturo-campos@mixmail.com)
+ http://www.arturocampos.com
\ No newline at end of file
--- /dev/null
+
+TARGETS=ppmc unppmc
+COMMON= ppmc.o ppmcdata.o range.o
+
+CFLAGS=-O3 -Wall -DNDEBUG
+
+all: $(TARGETS)
+
+ppmc: $(COMMON) ppmcmain.c
+
+unppmc: $(COMMON) unppmc.c
+
+clean:
+
+ @$(RM) -f *.o $(TARGETS)
+
+.PHONY: all clean
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmc.c" (exclusion)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains the whole ppmc encoder. It uses hash tables for
+ managing most of the orders. And a maximum order of 4. It codes bytes.
+ Order-1-0-(-1) are all handled in tables. Order-2 has a table with
+ direct hashing with pointers to the linked lists. Order-4 and order-3
+ both have hash tables with pointers to contexts in a linked lists which
+ finally have a pointer to the start of the linked list with the
+ probability distribution. Update exclusion is used, but exclusion is not.
+
+ Please, note that if the machine where the decoder is run doesn't has as
+ much memory as the computer where the encoder was ran, the decoder will
+ not be able to properly decode the file, because it will not be able to
+ keep track of new statistics, in this case it will just exit.
+
+ For applications where the loss of data is not admisible, I suggest you to
+ limit both encoder and decoder's memory requeriments to a given minimum.
+
+ Using exclusion. It's up to the main encoding routine to clear this table
+ for every new byte.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h"
+#include "ppmcdata.h"
+
+
+
+// Ruotines used by ppmc. Not including the range coder.
+//
+// They are for initializing of both encoder and decoder, and unless there
+// are two version, both encoder and decoder use the same routines. Like
+// "ppmc_initialize_contexts".
+
+
+// This one allocs the memory needed by ppmc, and adjust some pointers used
+// for allocating elements in the linked lists. The mempool arrays must be
+// initialized now.
+void ppmc_alloc_memory(void)
+{
+ unsigned long counter;
+
+
+ // Init mempool buffers
+
+ for(counter=0;counter!=_mempool_max_index;++counter)
+ {
+ _bytes_pool_array[counter]=0;
+ _context_pool_array[counter]=0;
+ }
+
+ _bytes_pool_index=1; //first entry will be used now
+ _context_pool_index=1;
+
+
+ // Allocate memory for ppmc structures and adjust some variables
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+
+ //save pointers in the array for freeing
+ _bytes_pool_array[0]=_bytes_pool;
+ _context_pool_array[0]=_context_pool;
+
+
+ //adjust variables
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements;
+ _context_pool_max=_context_pool+_context_pool_elements;
+
+ ppmc_out_of_memory=0; //we still have memory
+}
+
+
+// This routine initializes all the contexts, and all the tables including
+// those who care about the number of bytes defined in a context.
+void ppmc_initialize_contexts(void)
+{
+ unsigned long counter, counter2;
+
+
+ // Order-0
+ for(counter=0;counter!=256;++counter) //clear table
+ order0_array[counter]=0;
+
+ order0_defined_bytes=0; //adjust variables
+ order0_max_cump=0;
+
+
+ // Order-1
+ for(counter=0;counter!=256;++counter) //erase every table of every context
+ for(counter2=0;counter2!=256;++counter2)
+ order1_array[counter][counter2]=0;
+
+ for(counter=0;counter!=256;++counter) //adjust variables
+ {
+ order1_defined_bytes_array[counter]=0;
+ order1_max_cump_array[counter]=0;
+ }
+
+
+ // Order-2
+ for(counter=0;counter!=65536;++counter)
+ {
+ //order2_array[counter].prob=0; //clear pointer to bytes and frequencies
+ //order2_array[counter].max_cump=0;
+ order2_array[counter].defined_bytes=0;
+ }
+
+
+ // Order-4-3
+ for(counter=0;counter!=65536;++counter) //order-4-3
+ {
+ order4_hasht[counter]=0;
+ order3_hasht[counter]=0;
+ }
+}
+
+
+// This routine initializes the encode model by outputting as many bytes as
+// needed to prepare the models. This should be called before the main loop
+// and after the memory has been allocated and tables initialized.
+//
+// It does not need uses the range coder. It output the first 1 bytes.
+void ppmc_encoder_initialize(void)
+{
+
+ // Initialize order-0 and prepare different bytes for orders
+ fputc((byte=fgetc(file_input)),file_output);
+ o4_byte=byte; //order-4
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o3_byte=byte; //order-3
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o2_byte=byte; //order-2
+ ppmc_update_order0();
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o1_byte=byte;
+
+}
+
+
+// This routine initializes the decoder model, should be called to do the same
+// changes as "ppmc_encoder_initialize()" did.
+void ppmc_decoder_initialize(void)
+{
+
+ // Initialize order-0 and context bytes
+ byte=fgetc(file_input);
+ o4_byte=byte; //order-4
+ fputc(byte,file_output);
+
+ byte=fgetc(file_input);
+ o3_byte=byte; //order-3
+ fputc(byte,file_output);
+
+ byte=fgetc(file_input);
+ o2_byte=byte; //order-2
+
+ fputc(byte,file_output); //output first byte
+ ppmc_update_order0();
+
+ byte=fgetc(file_input);
+ o1_byte=byte; //order-1
+ fputc(byte,file_output);
+}
+
+
+// Once coding or decoding is finished you have to call this routine.
+// It must be called when done.
+void ppmc_free_memory(void)
+{
+ unsigned long counter;
+
+ // Free the memory buffers
+
+ for(counter=0;counter!=_mempool_max_index;++counter)
+ {
+ if(_bytes_pool_array[counter]!=0)
+ free(_bytes_pool_array[counter]);
+
+ if(_context_pool_array[counter]!=0)
+ free(_context_pool_array[counter]);
+ }
+
+}
+
+
+// This routine flushes the memory and restarts all the tables of
+// probabilities, current order bytes are not modified, this function
+// is called when we ran out of memory. We have to output the code
+// number 256 which means memory flushing, for doing this we have to go
+// to order-(-1) so we have to output an escape code in all the orders
+// till we reach order-(-1) where we can code it. Then we free all the
+// memory, alloc it again, and reinitialize all the orders.
+//
+// However we may find the case when the current order is not initialized,
+// in this case we don't need to output an escape code.
+void ppmc_flush_mem_enc(void)
+{
+ unsigned long counter;
+
+
+ // Clear exclusion table
+ for(counter=0;counter!=256;++counter)
+ excluded[counter]=0;
+
+
+ // Code an escape code in order-4
+ if(ppmc_get_totf_order4()!=0) //if 0 no need of escape code
+ {
+
+ ppmc_get_escape_prob_order4(); //get prob and cump
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ }
+
+
+ // Code an escape code in order-3
+ if(ppmc_get_totf_order3()!=0) //if 0 no need of escape code
+ {
+
+ ppmc_get_escape_prob_order3(); //get prob and cump
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ }
+
+
+ // Code an escape code in order-2
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes!=0) //it's not empty
+ {
+ ppmc_get_totf_order2();
+ ppmc_get_escape_prob_order2();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ }
+
+
+ // Code an escape code in order-1
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]!=0) //it's not empty
+ {
+ ppmc_get_totf_order1();
+ ppmc_get_escape_prob_order1();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ }
+
+
+ // Code an escape code in order-0. Order-0 always has at least one symbol
+
+ ppmc_get_totf_order0();
+ ppmc_get_escape_prob_order0();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+
+
+ // Now we can code the code 256
+
+ symb_prob=1;
+ symb_cump=256;
+ total_cump=257;
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+
+ // Now that decoder knows the flushing, free memory and reinit
+
+ ppmc_free_memory();
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+
+
+ // Be sure that order-0 has at least one probability
+
+ order0_array[o1_byte]++;
+ order0_max_cump++;
+ order0_defined_bytes++;
+
+}
+
+
+// When the decoder gets the symbol of flushing, most of the job is done
+// because we already got all the escape codes, so we only have to reinit.
+void ppmc_flush_mem_dec(void)
+{
+
+ // Free memory and reinit
+
+ ppmc_free_memory();
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+
+
+ // Be sure that order-0 has at least one probability
+
+ order0_array[o1_byte]++;
+ order0_max_cump++;
+ order0_defined_bytes++;
+
+
+}
+
+
+
+// ORDER-(-1) functions, also called ordern1 (Negative1) in functions
+//
+// Because order-(-1) does not need to update its probability tables, it
+// has no tables, and relies on the fact that the cump of byte is its own
+// value, and the probability is fixed, 1, and the total cump is 257.
+//
+// The alphabet has the following distribution: 0-255 the bytes. 256 is
+// an special symbol which means that we have flushed the encoder tables,
+// and thus the encoder must flush its tables too.
+//
+// The rest of the tables only have 256 symbols, because we have no need
+// of assign a symbol to the flush code (which already is the order-(-1)
+// table) nor to the escape code.
+//
+// For order-(-1) we don't use exclusion.
+
+
+// Gets the probability for a given symbol in the order-(-1) (ordern1)
+void ppmc_get_prob_ordern1(void)
+{
+ symb_cump=byte; //its value
+ symb_prob=1; //flat probability
+ total_cump=257; //total cump
+}
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-(-1)
+void ppmc_get_totf_ordern1(void)
+{
+ total_cump=257; //this is fixed
+}
+
+
+// Returns the symbol for a given cump under order-(-1)
+unsigned long ppmc_get_symbol_ordern1 (void)
+{
+ return symb_cump;
+}
+
+
+
+// ORDER-0 functions
+//
+// Due to the fact that order-0 has no context, I use an array for all the
+// probabilities under order-0, just as you could do in a basic model for
+// arithmetic coding.
+//
+// The main array is: order0_array. Where order0_array[byte] contains the
+// probability for a given byte. The same applies to order-1.
+//
+// To ensure that the updating and coding is done correctly, "byte" can't
+// be changed till all the coding and updating is done.
+//
+// Order-0 uses exclusions. Exclusion values are always prepared in "get_totf"
+// so there's no need to get them again. However order-0 doesn't have to
+// update exclude table, because order-(-1) will not use it
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-0. We have to read the whole array because we can't
+// guarante that all the bytes are used.
+void ppmc_get_totf_order0(void)
+{
+ unsigned long temp_cump, //temp value for the cump
+ counter;
+
+ exc_defined_bytes=0;
+ exc_max_cump=0;
+
+ // Read the number of defined bytes by reading the count of every byte
+ // and if it's present in the exclusion table.
+ for(counter=0;counter!=256;++counter)
+ {
+ if(excluded[counter]==0) //only if it's not excluded
+ if(order0_array[counter]!=0) //if it has a nonzero count, then it's present
+ {
+ ++exc_defined_bytes;
+ exc_max_cump+=order0_array[counter];
+ }
+ }
+
+ // Total cump is current total cump plus the probability for the escape code
+ exc_total_cump=exc_max_cump+exc_defined_bytes;
+}
+
+
+// Codes a byte under order-0 and returns 1, otherwise it returns a 0 and
+// has coded an escape code. In this case further coding is needed.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code.
+char ppmc_code_byte_order0(void)
+{
+ unsigned long counter;
+
+ ppmc_get_totf_order0(); //get total cump
+
+ // It's possible that due to excluding, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ return 0;
+
+ // See if the byte is present
+ if(order0_array[byte]==0) //a probability of 0
+ {
+
+ // Because it was not present, output an escape code, prepare variables
+
+ symb_cump=exc_max_cump; //obviously its cump is current max_cump
+ //without escape code's space
+
+ symb_prob=exc_defined_bytes; //the number of defined bytes
+
+ total_cump=exc_total_cump;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0; //byte not coded
+ }
+ else
+ {
+
+ coded_in_order=0;
+
+ // The symbol is present, code it under order-0
+
+ symb_prob=order0_array[byte]; //get probability directly
+
+ // Make cump for current symbol
+
+ symb_cump=0; //for first symbol is 0
+ for(counter=0; counter!=byte ; ++counter)
+ {
+ if(excluded[counter]==0)
+ symb_cump+=order0_array[counter]; //sum probabilities before our symbol
+ }
+
+ total_cump=exc_total_cump;
+
+ // Code the symbol
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //symbol coded under order-0
+ }
+}
+
+
+// This functions update order-0 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+void ppmc_update_order0(void)
+{
+ if(order0_array[byte]==0)
+ {
+ // It had a zero probability
+ order0_array[byte]++; //increment symbol probability
+ ++order0_defined_bytes; //new byte defined
+ ++order0_max_cump; //total cump
+ return;
+ }
+ else
+ {
+ // It had a non-zero probability
+
+ // Increment its probability
+ order0_array[byte]++; //increment symbol probability
+ ++order0_max_cump; //total cump
+
+ // Check to see if its the maximum in this case renormalize
+ if(order0_array[byte]==255)
+ ppmc_renormalize_order0();
+
+ return;
+ }
+}
+
+
+// This functions renormalizes the probabilities at order-0 updating variables
+void ppmc_renormalize_order0(void)
+{
+ unsigned long counter;
+
+ // Initialize variables
+ order0_defined_bytes=0; //clear them
+ order0_max_cump=0;
+
+ // Loop over all probabilities, divide them by a factor of 2 and update variables
+ for(counter=0 ; counter!=256 ; ++counter)
+ {
+ if(order0_array[counter]!=0)
+ {
+ order0_array[counter]>>=1; //divide by a factor of 2
+ if(order0_array[counter]==0)
+ order0_array[counter]=1;
+ }
+
+ if(order0_array[counter]!=0) //see if it has a non zero probability
+ order0_defined_bytes++;
+
+ order0_max_cump+=order0_array[counter]; //sum to the total cump
+ }
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of a escape code it returns -1
+void ppmc_decode_order0(void)
+{
+ unsigned long current_cump, counter;
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order0(); //total cump needed for decoding
+
+ // It's possible that due to excluding, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ {
+ byte=-1;
+ return;
+ }
+
+ symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=exc_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order0();
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ for(counter=0 ; counter!= 256 ; ++counter)
+ {
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+ if(excluded[counter]==0)
+ current_cump+=order0_array[counter]; //update cump
+ }
+
+ counter--; //-1 (see ac_ppmc.html)
+
+ byte=counter; //update the variable with the found symbol
+
+
+ // Get the cump and prob of byte
+
+ symb_prob=order0_array[byte]; //the probabilty
+ symb_cump=current_cump-order0_array[byte]; //using the cump which was
+ //already made
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ coded_in_order=0; //decoded under order-0
+
+ return;
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order0(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+ symb_prob=exc_defined_bytes;
+ symb_cump=exc_max_cump;
+ total_cump=exc_defined_bytes+exc_max_cump;
+}
+
+
+
+// ORDER-1 functions
+//
+// Order-1 uses 256 arrays one for every context, obviously they are accessed
+// like that: order1_array[o1_byte][byte]
+//
+// Also for computing the value of the current escape code in a given context
+// we need to know how many bytes are defined in a given context, they are
+// stored in an array, and you use them like that:
+// order1_defined_bytes_array[o1_byte]
+//
+// The same goes for the maximum cump of a given context:
+// order1_max_cump_array[o1_byte]
+//
+// Read the order-0 explanation.
+//
+// To ensure that the updating and coding is done correctly, "byte" and
+// "o1_byte" can't be changed till all the coding and updating is done.
+//
+// Order-1 uses exclusions. Exclusion values are always prepared in "get_totf"
+// so there's no need to get them again. It does update the exclude table.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-1. o1_byte should contain the order-1
+void ppmc_get_totf_order1(void)
+{
+ unsigned long temp_cump, //temp value for the cump
+ counter;
+
+ exc_defined_bytes=0;
+ exc_max_cump=0;
+
+ // Read the number of defined bytes by reading the count of every byte
+ // and if it's present in the exclusion table.
+ for(counter=0;counter!=256;++counter)
+ {
+ if(excluded[counter]==0) //only if it's not excluded
+ if(order1_array[o1_byte][counter]!=0) //if it has a nonzero count, then it's present
+ {
+ ++exc_defined_bytes;
+ exc_max_cump+=order1_array[o1_byte][counter];
+ }
+ }
+
+ // Total cump is current total cump plus the probability for the escape code
+ exc_total_cump=exc_max_cump+exc_defined_bytes;
+}
+
+
+// Codes a byte under order-1 and returns 1.
+// Otherwise it returns a 0 it may be that it has coded an escape code, or
+// that current table was empty, in this case nothing is outputted
+// In both cases further coding is needed.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code or empty table.
+// In case the byte is coded under this context, coded_in_order=1.
+char ppmc_code_byte_order1(void)
+{
+ unsigned long counter;
+
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]==0) //it's empty
+ {
+ return 0; //byte not coded, nothing done
+ }
+
+
+ // Now try to code this byte under order-1
+
+ ppmc_get_totf_order1(); //get total cump
+
+ // It's possible that due to excluding, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ return 0;
+
+
+ // See if the byte is present
+ if(order1_array[o1_byte][byte]==0) //a probability of 0
+ {
+
+ // Because it was not present, output an escape code, prepare variables
+
+ symb_cump=exc_max_cump;
+
+ symb_prob=exc_defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ // Now update "exclude" table
+ for(counter=0;counter!=256;++counter)
+ if(order1_array[o1_byte][counter]!=0)
+ excluded[counter]=1; //occurred but was not code, now exclude
+
+ return 0; //byte not coded, escape coded
+ }
+ else
+ {
+
+ coded_in_order=1; //because we coded it under order-1
+
+ // The symbol is present, code it under order-1
+
+ symb_prob=order1_array[o1_byte][byte]; //get probability directly
+
+ // Make cump for current symbol
+
+ symb_cump=0; //for first symbol is 0
+ for(counter=0; counter!=byte ; ++counter)
+ {
+ if(excluded[counter]==0)
+ symb_cump+=order1_array[o1_byte][counter]; //sum probabilities before our symbol
+ }
+
+ // Code the symbol
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ return 1; //symbol coded under order-1
+ }
+}
+
+
+// This functions update order-1 probabilities with current byte, also takes
+// care about updating variables, and renormalizing. o1_byte must be filled.
+void ppmc_update_order1(void)
+{
+ if(order1_array[o1_byte][byte]==0)
+ {
+ // It had a zero probability
+ order1_array[o1_byte][byte]++; //increment symbol probability
+ ++order1_defined_bytes_array[o1_byte]; //new byte defined
+ ++order1_max_cump_array[o1_byte]; //total cump
+ return;
+ }
+ else
+ {
+ // It had a non-zero probability
+
+ // Increment its probability
+ order1_array[o1_byte][byte]++; //increment symbol probability
+ ++order1_max_cump_array[o1_byte]; //total cump
+
+ // Check to see if its the maximum in this case renormalize
+ if(order1_array[o1_byte][byte]==255)
+ ppmc_renormalize_order1();
+
+ return;
+ }
+}
+
+
+// This functions renormalizes the probabilities at order-1 updating variables
+void ppmc_renormalize_order1(void)
+{
+ unsigned long counter;
+
+ // Initialize variables
+ order1_defined_bytes_array[o1_byte]=0; //clear them
+ order1_max_cump_array[o1_byte]=0;
+
+ // Loop over all probabilities, divide them by a factor of 2 and update variables
+ for(counter=0 ; counter!=256 ; ++counter)
+ {
+ if(order1_array[o1_byte][counter]!=0)
+ {
+ order1_array[o1_byte][counter]>>=1; //divide by a factor of 2
+ if(order1_array[o1_byte][counter]==0)
+ order1_array[o1_byte][counter]=1; //don't let it have a 0 count
+ }
+
+ if(order1_array[o1_byte][counter]!=0) //see if it has a non zero probability
+ order1_defined_bytes_array[o1_byte]++;
+
+ order1_max_cump_array[o1_byte]+=order1_array[o1_byte][counter]; //sum to the total cump
+ }
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+void ppmc_decode_order1(void)
+{
+ unsigned long current_cump, counter;
+
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]==0) //it's empty
+ {
+ byte=-1; //byte not coded, nothing done
+ return;
+ }
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order1(); //total cump needed for decoding
+
+ // It's possible that due to excluding, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ {
+ byte=-1;
+ return;
+ }
+
+ symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=exc_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order1();
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+
+ // Now update "exclude" table
+ for(counter=0;counter!=256;++counter)
+ if(order1_array[o1_byte][counter]!=0)
+ excluded[counter]=1; //occurred but was not code, now exclude
+
+ // Mark as escape code (in fact nothing coded)
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ for(counter=0 ; counter!= 256 ; ++counter)
+ {
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+ if(excluded[counter]==0)
+ current_cump+=order1_array[o1_byte][counter]; //update cump
+ }
+
+ counter--; //-1 (see ppmc.txt, searching for a given symbol)
+
+ byte=counter; //update the variable with the found symbol
+
+
+ // Get the cump and prob of byte
+
+ symb_prob=order1_array[o1_byte][byte]; //the probabilty
+ symb_cump=current_cump-order1_array[o1_byte][byte]; //using the cump which was
+ //already made
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=1;
+
+ return;
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order1(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+ symb_prob=exc_defined_bytes;
+ symb_cump=exc_max_cump;
+ total_cump=exc_defined_bytes+exc_max_cump;
+}
+
+
+
+// ORDER-2 functions
+//
+// Order-2 uses a table which holds the contexts data structure of every
+// structure, its accessed with "ppmc_order2_hash_key(o1_byte, o2_byte)"
+// and because it uses direct addressing there's no need to check for hash
+// collisions, which makes it faster. The variable "o2_cntxt" contains o1_byte
+// and o2_byte used directly to index the order-2 array.
+//
+// Though order-2 uses the same routines as lower orders they are rather
+// different, because we can't allocate memory for all the probabilities
+// tables, we have to use linked lists for the bytes and probabilities.
+//
+// Under order-2 we don't let a probability to be lower than 1, that is,
+// there can't be a symbol in a linked list with a 0 probability, I choosed
+// this for three factors: the code is easier to read (it isn't messy)
+// we gain some speed, and the loss of compression seems little.
+//
+// To ensure that the updating is done correctly, "o2_cntxt" and
+// "o2_ll_node" must not be modified by any routine except those who manage
+// order-2.
+//
+// Order-2 uses exclusions. Exclusion values are always prepared in "get_totf"
+// so there's no need to get them again. It does update the exclude table.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-2. cntxt must have o1_byte and o2_byte hashed with the hash key
+// for order-2.
+//
+// Here we have to compute both "exc_defined_bytes" and "exc_max_cump"
+void ppmc_get_totf_order2(void)
+{
+ struct _byte_and_freq *node;
+
+
+ // Read the whole linked list for making the values
+ node=order2_array[o2_cntxt].prob;
+ exc_max_cump=0;
+ exc_defined_bytes=0;
+
+ do{
+ if(excluded[node->byte]==0)
+ {
+ exc_defined_bytes++;
+ exc_max_cump+=node->freq; //add the probability of this byte to the cump
+ }
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // Total cump is current total cump plus the probability for the escape code
+ exc_total_cump=exc_max_cump+exc_defined_bytes;
+
+}
+
+
+// Codes a byte under order-2 and returns 1.
+// Otherwise it returns a 0. It may be that it has coded an escape code, or
+// that current table was empty.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code or empty table.
+// In case the byte is coded under this context, coded_in_order=2.
+char ppmc_code_byte_order2(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize o2_cntxt
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes==0) //it's empty
+ {
+ return 0; //byte not coded, nothing done
+ }
+
+
+ // Now try to code this byte under order-2
+
+ ppmc_get_totf_order2(); //get total cump
+
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ return 0;
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=order2_array[o2_cntxt].prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o2_byte_found; //bad thing, I know, anyone has a better idea?
+ if(excluded[node->byte]==0)
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o2_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=exc_max_cump;
+ symb_prob=exc_defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ // Then, update "excluded" table
+
+ node=order2_array[o2_cntxt].prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return 0; //now exit
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o2_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=2; //successfully coded under order-2
+
+ o2_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-2
+}
+
+
+// This functions update order-2 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+// Of course "o2_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do.
+//
+// This updating is only for encoding.
+void ppmc_update_order2(void)
+{
+ struct _byte_and_freq *node;
+
+
+ // First of all check if that's the first byte in this context, in that case
+ // we have to initialize some variables in the context structure.
+
+ if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+ order2_array[o2_cntxt].defined_bytes=1;
+ order2_array[o2_cntxt].max_cump=1;
+ order2_array[o2_cntxt].prob=_bytes_pool;
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order two, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==2) //coded under order-2
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o2_ll_node->freq++; //increment its frequency (rather probability)
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+
+ if(o2_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order2(); //renormalize
+
+ }
+ else
+ {
+
+ // Once every paranoid check has been done we are sure that this byte
+ // did not existed and so we have to create a new node in the linked
+ // list. Also we have to take care of memory issues.
+ //
+ // However due to the use of exclusion, we have to ensure that "o2_ll_node"
+ // points to the last element in the linked lists of this context
+
+ node=order2_array[o2_cntxt].prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ o2_ll_node=node;
+
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o2_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+ order2_array[o2_cntxt].defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-2 updating context
+// variables.
+void ppmc_renormalize_order2(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+ // Initialize variables. Defined bytes remain the same.
+ order2_array[o2_cntxt].max_cump=0; //clear them
+
+ node=order2_array[o2_cntxt].prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ order2_array[o2_cntxt].max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+ //printf("\nRenormalization, context:%c%c",o2_byte,o1_byte);
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o2_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o2_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order2(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+ // Initialize o2_cntxt
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes==0) //it's empty
+ {
+ byte=-1; //byte not coded, nothing done
+ return;
+ }
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order2(); //total cump needed for decoding
+
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ {
+ byte=-1;
+ return; //byte not coded, nothing done
+ }
+
+
+ symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=exc_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order2();
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ // Then, update "excluded" table
+
+ node=order2_array[o2_cntxt].prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=order2_array[o2_cntxt].prob; //get pointer to linked lists
+
+ while(1)
+ {
+ if(excluded[node->byte]==0) //only if it's not excluded
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o2_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=2;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. We have to search the byte
+// in the linked list, if it's present, update its count, otherwise we have
+// hitted the end of the linked list, and there we have to create a new node.
+//
+// Of course if the byte was matched in order-2 we'll have a pointer to it
+// in "o2_ll_node" so we don't need to read the linked list. (we already did
+// in decoding)
+//
+// Another case which we also have to specially deal with, this is the case
+// when the context has not been initalized yet.
+void ppmc_update_dec_order2(void)
+{
+ struct _byte_and_freq *node;
+
+
+ // Handle the case when the context is not initialized
+ // This code is the same as the one for the encoding.
+
+ if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+ order2_array[o2_cntxt].defined_bytes=1;
+ order2_array[o2_cntxt].max_cump=1;
+ order2_array[o2_cntxt].prob=_bytes_pool;
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+
+ return; //nothing else to do
+ }
+
+
+ // Current context is initalized, proceed
+
+ if(coded_in_order==2) //check if it was decoded under order-2
+ {
+
+ // We can be sure that the pointer "o2_ll_node" points to its entry, and
+ // it has a non 0 probability (otherwise it couldn't be coded) so just
+ // update its probability and max_cump
+
+ o2_ll_node->freq++; //the probability of the byte
+ order2_array[o2_cntxt].max_cump++; //the max_cump
+
+ if(o2_ll_node->freq==255) //check for renormalization
+ ppmc_renormalize_order2();
+
+ }
+ else
+ {
+
+ // An escape code was decoded under order-2, we have to read till the
+ // end of the linked list so we can add a new node for this new byte.
+
+ node=order2_array[o2_cntxt].prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+
+ // We reached the end of the linked list, add a new node if possible,
+ // we are using the same code of "ppmc_update_order2()" with the
+ // difference that the pointer to the linked list is "node"
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+ order2_array[o2_cntxt].defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //we are finished updating
+
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order2(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=exc_defined_bytes;
+ symb_cump=exc_max_cump;
+}
+
+
+
+// ORDER-3 functions
+//
+// The difference between order-3 and order-3 are just a few, instead of
+// keeping a table with the context structures, we keep a hash table with
+// pointers to linked lists with the context, so it's only a matter of
+// searching current context in the linked list corresponding to its hash
+// entry. This is done in "ppmc_get_totf_order3" because that's the first
+// routine that both encoding and decoding routines call.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-3. Must be called while encoding or decoding before anything else
+// because it initializes the pointers to the context structure in
+// "o3_context" and o3_cntxt.
+//
+// If the hash entry is not initialized it returns "o3_context"=0
+// If the context is not present in the linked list of context, "o3_context"
+// will point to the last element in the linked list.
+// If the context is present "o3_context" will point to the context to use.
+// One can distinguish the last two by checking the context value of the
+// structure, if it's not the same, is the last element.
+//
+// The routine returns 0 if the hash entry is not initialized or if the
+// the context was not present. Otherwise it returns 1, meaning that we
+// have to code under this context.
+char ppmc_get_totf_order3(void)
+{
+ struct context *cntxt_node;
+ struct _byte_and_freq *node;
+
+
+ // First make the hash key for order-3
+
+ o3_cntxt=ppmc_order3_hash_key(o1_byte,o2_byte,o3_byte);
+ full_o3_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16); //order-3
+
+
+ // Now check the hash entry in the table
+
+ if(order3_hasht[o3_cntxt]==0) //if 0, not initialized
+ {
+
+ o3_context=0; //no hash entry
+
+ return 0; //hash entry not initialized
+ }
+
+
+ // Now read trough the linked list of context searching current one
+
+ cntxt_node=order3_hasht[o3_cntxt];
+
+ while(1)
+ {
+
+ if(cntxt_node->order4321==full_o3_cntxt) //compare context
+ goto ppmc_gtf_cntxt_found;
+
+ if(cntxt_node->next==0) //end of context's linked list
+ break;
+
+ cntxt_node=cntxt_node->next; //next element
+
+ }
+
+
+ // Once there the context was not found
+ o3_context=cntxt_node; //pointer to last element in the linked list
+
+ return 0; //it was not present
+
+
+ // The context is found, so return pointer and cump
+
+ ppmc_gtf_cntxt_found:
+
+ o3_context=cntxt_node;
+
+ // Read the whole linked list for making the values
+ node=o3_context->prob;
+ exc_max_cump=0;
+ exc_defined_bytes=0;
+
+ do{
+ if(excluded[node->byte]==0)
+ {
+ exc_defined_bytes++;
+ exc_max_cump+=node->freq; //add the probability of this byte to the cump
+ }
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // Total cump is current total cump plus the probability for the escape code
+ exc_total_cump=exc_max_cump+exc_defined_bytes;
+
+
+ return 1; //context found
+
+}
+
+
+// Codes a byte under order-3 and returns 1.
+// Otherwise it returns a 0.
+//
+// In case the byte is coded under this context, coded_in_order=3.
+char ppmc_code_byte_order3(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order3()==0)
+ return 0;
+
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ return 0;
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=o3_context->prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o3_byte_found; //bad thing, I know, anyone has a better idea?
+ if(excluded[node->byte]==0)
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o3_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=exc_max_cump;
+ symb_prob=exc_defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ // Then, update "excluded" table
+
+ node=o3_context->prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return 0;
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o3_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=3; //successfully coded under order-3
+
+ o3_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-3
+}
+
+
+// This functions update order-3 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+//
+// "o3_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do. Also "o3_context" must be initialized.
+//
+// This updating is only for encoding.
+void ppmc_update_order3(void)
+{
+ struct _byte_and_freq *node;
+
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order3_hasht[o3_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==3) //coded under order-3
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o3_ll_node->freq++; //increment its frequency (rather probability)
+
+ o3_context->max_cump++; //total cump
+
+ if(o3_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order3(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o3_context" points to the last element, so we can put the new element.
+
+ if(o3_context->order4321==full_o3_cntxt) //chech if that's the last
+ { //element or a context found
+
+ // However due to the use of exclusion, we have to ensure that "o3_ll_node"
+ // points to the last element in the linked lists of this context
+
+ node=o3_context->prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ o3_ll_node=node;
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o3_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o3_context->max_cump++; //total cump
+ o3_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // Ensure that we are at the end of the linked list of contexts
+ o3_context=order3_hasht[o3_cntxt];
+
+ do{
+ if(o3_context->next==0)
+ break;
+ o3_context=o3_context->next;
+ }while(1);
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o3_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-3 updating context
+// variables.
+void ppmc_renormalize_order3(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize variables. Defined bytes remain the same.
+ o3_context->max_cump=0; //clear them
+
+ node=o3_context->prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ o3_context->max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o3_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o3_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order3(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order3()==0)
+ {
+ byte=-1;
+ return;
+ }
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ {
+ byte=-1;
+ return;
+ }
+
+
+
+ // Decode current cump
+
+ symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=exc_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order3();
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ // Then, update "excluded" table
+
+ node=o3_context->prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=o3_context->prob; //get pointer to linked lists
+
+ while(1)
+ {
+ if(excluded[node->byte]==0)
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o3_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=3;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. The only difference with
+// the routine for coding is that when an escape code was coded, "o3_ll_node"
+// is not initialized so we have to read till the end of the linked list.
+// Fortunately "o3_context" will be initialized so we don't need to read its
+// linked list.
+void ppmc_update_dec_order3(void)
+{
+ struct _byte_and_freq *node;
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order3_hasht[o3_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==3) //coded under order-3
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o3_ll_node->freq++; //increment its frequency (rather probability)
+
+ o3_context->max_cump++; //total cump
+
+ if(o3_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order3(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o3_context" points to the last element, so we can put the new element.
+
+ if(o3_context->order4321==full_o3_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ // Read till the end of the linked list
+
+ node=o3_context->prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ // Now add element
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o3_context->max_cump++; //total cump
+ o3_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o3_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.");
+ exit(1);
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order3(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=exc_defined_bytes;
+ symb_cump=exc_max_cump;
+}
+
+
+
+// ORDER-4 functions
+//
+// The routines for order-4 are *equal* to those for order-3, there are a few
+// changes like different global variables, and different hash keys.
+//
+// If you want to go to higher orders, you'd use the same code and data
+// structures, with the difference of the context bytes (order4321)
+// stored in every context's linked list.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-4. Must be called while encoding or decoding before anything else
+// because it initializes the pointers to the context structure in
+// "o4_context" and o4_cntxt.
+//
+// If the hash entry is not initialized it returns "o4_context"=0
+// If the context is not present in the linked list of context, "o4_context"
+// will point to the last element in the linked list.
+// If the context is present "o4_context" will point to the context to use.
+// One can distinguish the last two by checking the context value of the
+// structure, if it's not the same, is the last element.
+//
+// The routine returns 0 if the hash entry is not initialized or if the
+// the context was not present. Otherwise it returns 1, meaning that we
+// have to code under this context.
+char ppmc_get_totf_order4(void)
+{
+ struct context *cntxt_node;
+ struct _byte_and_freq *node;
+
+
+ // First make the hash key for order-4
+
+ o4_cntxt=ppmc_order4_hash_key(o1_byte,o2_byte,o3_byte,o4_byte);
+ full_o4_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16)+(o4_byte<<24); //order-4
+
+
+ // Now check the hash entry in the table
+
+ if(order4_hasht[o4_cntxt]==0) //if 0, not initialized
+ {
+
+ o4_context=0; //no hash entry
+
+ return 0; //hash entry not initialized
+ }
+
+
+ // Now read trough the linked list of context searching current one
+
+ cntxt_node=order4_hasht[o4_cntxt];
+
+ while(1)
+ {
+
+ if(cntxt_node->order4321==full_o4_cntxt) //compare context
+ goto ppmc_gtfo4_cntxt_found;
+
+ if(cntxt_node->next==0) //end of context's linked list
+ break;
+
+ cntxt_node=cntxt_node->next; //next element
+
+ }
+
+
+ // Once there the context was not found
+ o4_context=cntxt_node; //pointer to last element in the linked list
+
+ return 0; //it was not present
+
+
+ // The context is found, so return pointer and cump
+
+ ppmc_gtfo4_cntxt_found:
+
+ o4_context=cntxt_node;
+
+ // Read the whole linked list for making the values
+ node=o4_context->prob;
+ exc_max_cump=0;
+ exc_defined_bytes=0;
+
+ do{
+ if(excluded[node->byte]==0)
+ {
+ exc_defined_bytes++;
+ exc_max_cump+=node->freq; //add the probability of this byte to the cump
+ }
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // Total cump is current total cump plus the probability for the escape code
+ exc_total_cump=exc_max_cump+exc_defined_bytes;
+
+
+ return 1; //context found
+
+}
+
+
+// Codes a byte under order-4 and returns 1.
+// Otherwise it returns a 0.
+//
+// In case the byte is coded under this context, coded_in_order=4.
+char ppmc_code_byte_order4(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order4()==0)
+ return 0;
+
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ return 0;
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=o4_context->prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o4_byte_found; //bad thing, I know, anyone has a better idea?
+ if(excluded[node->byte]==0)
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o4_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=exc_max_cump;
+ symb_prob=exc_defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ // Then, update "excluded" table
+
+ node=o4_context->prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return 0;
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o4_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=4; //successfully coded under order-4
+
+ o4_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,exc_total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-4
+}
+
+
+// This functions update order-4 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+//
+// "o4_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do. Also "o4_context" must be initialized.
+//
+// This updating is only for encoding.
+void ppmc_update_order4(void)
+{
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order4_hasht[o4_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.");
+ exit(1);
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==4) //coded under order-4
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o4_ll_node->freq++; //increment its frequency (rather probability)
+
+ o4_context->max_cump++; //total cump
+
+ if(o4_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order4(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o4_context" points to the last element, so we can put the new element.
+
+ if(o4_context->order4321==full_o4_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o4_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o4_context->max_cump++; //total cump
+ o4_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o4_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.");
+ exit(1);
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-4 updating context
+// variables.
+void ppmc_renormalize_order4(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize variables. Defined bytes remain the same.
+ o4_context->max_cump=0; //clear them
+
+ node=o4_context->prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ o4_context->max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o4_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o4_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order4(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order4()==0)
+ {
+ byte=-1;
+ return;
+ }
+
+
+ // It's possible that due to exclusion, there's no byte left, in that case
+ // return.
+ if(exc_defined_bytes==0)
+ {
+ byte=-1;
+ return;
+ }
+
+
+ // Decode current cump
+
+ symb_cump=range_decoder_decode(&rc_decoder,exc_total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=exc_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order4();
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ // Then, update "excluded" table
+
+ node=o4_context->prob;
+
+ do{
+ excluded[node->byte]=1;
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=o4_context->prob; //get pointer to linked lists
+
+ while(1)
+ {
+ if(excluded[node->byte]==0)
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o4_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,exc_total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=4;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. The only difference with
+// the routine for coding is that when an escape code was coded, "o4_ll_node"
+// is not initialized so we have to read till the end of the linked list.
+// Fortunately "o4_context" will be initialized so we don't need to read its
+// linked list.
+void ppmc_update_dec_order4(void)
+{
+ struct _byte_and_freq *node;
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order4_hasht[o4_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.");
+ exit(1);
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order four, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==4) //coded under order-4
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o4_ll_node->freq++; //increment its frequency (rather probability)
+
+ o4_context->max_cump++; //total cump
+
+ if(o4_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order4(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o4_context" points to the last element, so we can put the new element.
+
+ if(o4_context->order4321==full_o4_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ // Read till the end of the linked list
+
+ node=o4_context->prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ // Now add element
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o4_context->max_cump++; //total cump
+ o4_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o4_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.");
+ exit(1);
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order4(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=exc_defined_bytes;
+ symb_cump=exc_max_cump;
+}
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmc.h" (exclusions)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+ Part of the ppmc encoder and decoder.
+
+ This module contains the definitions of different functions and all the
+ data structures defined by ppmc. Also contains defines.
+*/
+
+// Definitions
+
+#define ppmc_order4_hash_size 65536
+#define ppmc_order4_hash_key(k,j,i,l) ( (k)+(j<<8)+(i<<1)+(l<<9) )& ppmc_order4_hash_size-1
+#define ppmc_order3_hash_size 65536
+#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) & ppmc_order3_hash_size-1
+#define ppmc_order2_hash_key(k,j) ((k)+(j<<8))
+#define _bytes_pool_elements 125000 //this is used the first time
+ //that we allocate memory, that's
+ //the number of entries
+#define _bytes_pool_elements_inc 125000 //if we need to alloc again, this
+ //is the number of entries to get
+#define _context_pool_elements 50000
+#define _context_pool_elements_inc 50000
+
+#define _mempool_max_index 1000 //the number of entries in the array with
+ //pointers
+
+
+// Data structures
+
+// This structure contains a single element of a linked lists which contains
+// the probability distribution of a given order. This structure takes 6 bytes.
+struct _byte_and_freq{
+unsigned char byte; //the byte itself
+unsigned char freq; //and the frequency of it
+struct _byte_and_freq *next; //pointer to next element in linked list or 0
+};
+
+
+// This structure is used for both order-3 and order-4. It takes 20 bytes,
+// and it can still hold another byte more. (only 19 being used)
+// Order 2-1-0-(-1) use different structures for a faster accessing.
+struct context{
+struct context *next; //next context in the hash entry
+unsigned long order4321; //order-4-3-2-1 (or order-3-2-1 for order-3)
+struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution
+unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 )
+unsigned int defined_bytes; //the number of bytes in this context
+};
+
+// That's the same but for order-2 where there's no hash collisions.
+struct context_o2{
+struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution
+unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 )
+unsigned int defined_bytes; //the number of bytes in this context
+};
+
+
+// Declaration of functions
+
+
+// Functions for initializing
+void ppmc_alloc_memory(void);
+void ppmc_initialize_contexts(void);
+void ppmc_encoder_initialize(void);
+void ppmc_decoder_initialize(void);
+void ppmc_free_memory(void);
+void ppmc_flush_mem_enc(void);
+void ppmc_flush_mem_dec(void);
+
+// Functions for order-(-1)
+void ppmc_get_prob_ordern1(void);
+unsigned long ppmc_get_symbol_ordern1(void);
+void ppmc_get_totf_ordern1(void);
+void ppmc_renormalize_order1(void);
+
+// Functions for order-0
+void ppmc_get_totf_order0(void);
+char ppmc_code_byte_order0(void);
+void ppmc_update_order0(void);
+void ppmc_renormalize_order0(void);
+void ppmc_decode_order0(void);
+void ppmc_get_escape_prob_order0(void);
+void ppmc_get_prob_order0(void);
+
+// Functions for order-1
+void ppmc_get_totf_order1(void);
+char ppmc_code_byte_order1(void);
+void ppmc_update_order1(void);
+void ppmc_renormalize_order1(void);
+void ppmc_decode_order1(void);
+void ppmc_get_escape_prob_order1(void);
+void ppmc_get_prob_order1(void);
+
+
+// Functions for order-2
+void ppmc_get_totf_order2(void);
+char ppmc_code_byte_order2(void);
+void ppmc_update_order2(void);
+void ppmc_renormalize_order2(void);
+void ppmc_decode_order2(void);
+void ppmc_update_dec_order2(void);
+void ppmc_get_escape_prob_order2(void);
+void ppmc_get_prob_order2(void);
+
+
+// Functions for order-3
+char ppmc_get_totf_order3(void);
+char ppmc_code_byte_order3(void);
+void ppmc_update_order3(void);
+void ppmc_renormalize_order3(void);
+void ppmc_decode_order3(void);
+void ppmc_update_dec_order3(void);
+void ppmc_get_escape_prob_order3(void);
+void ppmc_get_prob_order3(void);
+
+
+// Functions for order-4
+char ppmc_get_totf_order4(void);
+char ppmc_code_byte_order4(void);
+void ppmc_update_order4(void);
+void ppmc_renormalize_order4(void);
+void ppmc_decode_order4(void);
+void ppmc_update_dec_order4(void);
+void ppmc_get_escape_prob_order4(void);
+void ppmc_get_prob_order4(void);
+
+
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcdata.c" (exclusions)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains global data.
+*/
+
+#include "ppmc.h" //defines
+#include "range.h"
+
+// Order-4 uses a hash table which points to the start of a linked list with
+// the different context, which has the cump, the number of different symbols
+// and a pointer to the linked list with the bytes and frequencies.
+// Order-3 is almost the same, both take 262144 bytes.
+struct context *order4_hasht[ppmc_order4_hash_size];
+
+struct context *order3_hasht[ppmc_order3_hash_size];
+
+
+// The array for order-2 is different, as we do directly hashing, and thus
+// we have no need to do the stuff of linked lists for the context itself,
+// so it contains the context used. This takes 1310720 bytes.
+struct context_o2 order2_array[65536];
+
+
+// Those are the multiple arrays for order-1. It takes 65536 bytes.
+unsigned char order1_array[256][256];
+unsigned int order1_defined_bytes_array[256]; //the defined bytes in every context
+unsigned int order1_max_cump_array[256]; //max cump of every context
+
+
+// This is the array for order-0. It takes 256 bytes.
+unsigned char order0_array[256];
+unsigned int order0_defined_bytes;
+unsigned int order0_max_cump;
+
+
+// No need of variables for order-(-1), because it's fixed.
+
+
+
+// Those are the pointers and variables used for managing the mem pool for
+// both context, and bytes and frequencies.
+struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked
+ //lists with bytes and frequencies
+ *_bytes_pool_max; //the maximum of this buffer
+struct context *_context_pool; //pointer to pool containing contexts
+struct context *_context_pool_max; //the same as with _bytes_pool
+
+unsigned long _bytes_pool_index; //index in array of pointers
+unsigned long _context_pool_index;
+
+//the following is an array keeping pointers to different buffers. A new
+//buffer is allocated when the current one is full, so we always have a
+//buffer for linked lists. (without allocating a buffer for every element)
+struct _byte_and_freq *_bytes_pool_array[_mempool_max_index];
+struct context *_context_pool_array[_mempool_max_index];
+
+char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any
+ //routine that needs to allocate memory must
+ //quit if that's 1.
+
+
+// Variables which contain current byte to code and order
+unsigned long byte, //current byte to code
+ o1_byte, //order-1 byte
+ o2_byte, //order-2 byte
+ o3_byte, //order-3 byte
+ o4_byte; //order-4 byte
+
+unsigned long o2_cntxt; //used in the hash key of order-2
+unsigned long o3_cntxt; //use as hash key for order-3
+unsigned long o4_cntxt; //use as hash key for order-4
+unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together
+unsigned long full_o4_cntxt; //order-4-3-2-1
+
+unsigned long coded_in_order; //in which order the last byte was coded
+ //it's for update exclusion
+ //also used for decoding
+
+// Variables used for coding
+
+unsigned long
+ total_cump, //the total cumulative probability
+ symb_cump, //the symbol cumulative probability
+ symb_prob; //the symbol frequency
+
+rangecoder rc_coder; //state of range coder
+rangecoder rc_decoder; //state of range decoder
+
+// File handles
+
+ FILE *file_input, //file to code
+ *file_output; //file where the coded data is placed
+
+
+// Pointers to linked lists and context structures used for faster updating
+// or creation of new nodes, because instead of reading again all the linked
+// list till the end (in the case of creation) we have a pointer to the last
+// element. In the case that a byte was present in the linked lists but it
+// had a 0 count, we just have to update its probability. And in the case
+// that it already was present and we coded it under that context or a lower
+// one, we just have to update its probability.
+
+struct _byte_and_freq *o2_ll_node; //pointer to linked lists under order-2
+ //where does it points depends in which
+ //order the byte was coded.
+struct _byte_and_freq *o3_ll_node; //the same but for order-3
+struct _byte_and_freq *o4_ll_node;
+
+struct context *o3_context; //pointer to current order-3 context
+struct context *o4_context; //pointer to current order-3 context
+
+
+// Those are the variables especially used for exclusion.
+
+// the array for doing exclusion. Every routine from every order can
+// use them.
+unsigned char excluded[256]; //if 0 it was not present, if it's 1 then
+ //it appeared in a higher order.
+
+unsigned long exc_total_cump, //total cump of context (after exclusion)
+ exc_defined_bytes,//defined bytes in context (after exclusion)
+ exc_max_cump;
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcdata.h" (exclusions)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains externs definition for global data.
+*/
+
+#include "ppmc.h"
+
+// Order-4 uses a hash table which points to the start of a linked list with
+// the different context, which has the cump, the number of different symbols
+// and a pointer to the linked list with the bytes and frequencies.
+// Order-3 is almost the same, both take 262144 bytes.
+extern struct context *order4_hasht[];
+
+extern struct context *order3_hasht[];
+
+
+// The array for order-2 is different, as we do directly hashing, and thus
+// we have no need to do the stuff of linked lists for the context itself,
+// so it contains the context used. This takes 1310720 bytes.
+extern struct context_o2 order2_array[];
+
+
+// Those are the multiple arrays for order-1. It takes 65536 bytes.
+extern unsigned char order1_array[256][256];
+extern unsigned int order1_defined_bytes_array[]; //the defined bytes in every context
+extern unsigned int order1_max_cump_array[]; //max cump of every context
+
+
+// This is the array for order-0. It takes 256 bytes.
+extern unsigned char order0_array[];
+extern unsigned int order0_defined_bytes;
+extern unsigned int order0_max_cump;
+
+
+// Those are the pointers and variables used for managing the mem pool for
+// both context, and bytes and frequencies.
+extern struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked
+ //lists with bytes and frequencies
+ *_bytes_pool_max; //the maximum of this buffer
+extern struct context *_context_pool; //pointer to pool containing contexts
+extern struct context *_context_pool_max; //the same as with _bytes_pool
+
+extern unsigned long _bytes_pool_index; //index in array of pointers
+extern unsigned long _context_pool_index;
+
+//the following is an array keeping pointers to different buffers. A new
+//buffer is allocated when the current one is full, so we always have a
+//buffer for linked lists. (without allocating a buffer for every element)
+extern struct _byte_and_freq *_bytes_pool_array[_mempool_max_index];
+extern struct context *_context_pool_array[_mempool_max_index];
+
+extern char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any
+ //routine that needs to allocate memory must
+ //quit if that's 1.
+
+
+
+// Variables which contain current byte to code and order
+extern unsigned long //they are only bytes
+ byte, //current byte to code
+ o1_byte, //order-1 byte
+ o2_byte, //order-2 byte
+ o3_byte, //order-3 byte
+ o4_byte; //order-4 byte
+
+extern unsigned long o2_cntxt; //used in the hash key of order-2
+extern unsigned long o3_cntxt; //use as hash key for order-3
+unsigned long o4_cntxt; //use as hash key for order-4
+extern unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together
+extern unsigned long full_o4_cntxt; //order-4-3-2-1
+
+extern unsigned long coded_in_order; //in which order the last byte was coded
+ //it's for update exclusion
+ //also used for decoding
+// Variables used for coding
+
+extern unsigned long //no need for negative values
+ total_cump, //the total cumulative probability
+ symb_cump, //the symbol cumulative probability
+ symb_prob; //the symbol frequency
+
+extern rangecoder rc_coder; //state of range coder
+extern rangecoder rc_decoder; //state of range decoder
+
+// File handles
+
+ FILE *file_input, //file to code
+ *file_output; //file where the coded data is placed
+
+
+
+// Pointers to linked lists and context structures used for faster updating
+// or creation of new nodes, because instead of reading again all the linked
+// list till the end (in the case of creation) we have a pointer to the last
+// element. In the case that a byte was present in the linked lists but it
+// had a 0 count, we just have to update its probability. And in the case
+// that it already was present and we coded it under that context or a lower
+// one, we just have to update its probability.
+
+
+extern struct _byte_and_freq *o2_ll_node;//pointer to linked lists under order-2
+ //where does it points depends in which
+ //order the byte was coded.
+extern struct _byte_and_freq *o3_ll_node; //the same but for order-3
+extern struct _byte_and_freq *o4_ll_node;
+
+extern struct context *o3_context; //pointer to current order-3 context
+extern struct context *o4_context; //pointer to current order-3 context
+
+
+// the array for doing exclusion. Every routine from every order can
+// use them.
+extern unsigned char excluded[256]; //if 0 it was not present, if it's 1 then
+ //it appeared in a higher order.
+
+extern unsigned long exc_total_cump, //total cump of context (after exclusion)
+ exc_defined_bytes,//defined bytes in context (after exclusion)
+ exc_max_cump;
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcmain.c" (exclusions)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder only. Using exclusion.
+
+ This module is the main module and calls the different modules to do
+ the encoding of a file. When done prints bpb and kbyps.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h" //the range coder functions and data
+#include "ppmcdata.h"
+
+
+long filesize(FILE *stream);
+
+unsigned long debug=0;
+
+
+//Main
+void main (char argc, char *argv[])
+{
+ unsigned long counter, //temporal counter for loops like for or while
+ counter2, //another temporal counter for sub loops
+ size_file_input; //the size of the input file
+
+
+ // Print title
+ printf("PPMC using range coder. (with exclusion)\n");
+ printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n");
+ printf("Permission is granted to make verbatim copies of this program for private\n");
+ printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n");
+
+
+
+ // Check for correct number of parameters
+ if(argc!=3)
+ {
+ printf("Bad number of arguments.\n");
+ exit(1);
+ }
+
+
+ // Try to open input and output files
+ if((file_input=fopen(argv[1],"r+b"))==NULL)
+ {
+ printf("Couldn't open %s.\n",argv[1]);
+ exit(1);
+ }
+
+ if((file_output=fopen(argv[2],"w+b"))==NULL)
+ {
+ printf("Couldn't create %s.\n",argv[2]);
+ exit(1);
+ }
+
+
+ // Check input file length and not accept 0 length files
+ size_file_input=filesize(file_input);
+
+ if(size_file_input<5)
+ {
+ printf("Can't work with files below than 5 bytes!");
+ exit(1);
+ }
+
+
+ // First output file length
+ fwrite(&size_file_input,1,4,file_output); //input length
+
+
+ // Initialize ppmc encoder
+ ppmc_alloc_memory(); //get memory
+ ppmc_initialize_contexts(); //initialize model
+ ppmc_encoder_initialize();
+
+ // Initialize range coder
+ range_coder_init(&rc_coder,file_output);
+
+
+ // Start main loop which codes the file
+ while((byte=fgetc(file_input))!=EOF)
+ {
+
+ // Clear exclusion table
+ for(counter=0;counter!=256;++counter)
+ excluded[counter]=0;
+
+
+ // Try to code current byte under order-4 if possible then go to lower orders
+ if(ppmc_code_byte_order4()==0)
+ if(ppmc_code_byte_order3()==0)
+ if(ppmc_code_byte_order2()==0)
+ if(ppmc_code_byte_order1()==0)
+ if(ppmc_code_byte_order0()==0) //else try to code under order-0
+ {
+ // Code under order-(-1)
+ ppmc_get_prob_ordern1();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ coded_in_order=0; //update all the tables (unless order-(-1))
+ }
+
+
+ // Now do update exclusion
+
+ switch(coded_in_order)
+ {
+ case 0: ppmc_update_order0(); //update only order-0
+ case 1: ppmc_update_order1(); //update order-0 and order-1
+ case 2: ppmc_update_order2(); //update order-2 1 and 0...
+ case 3: ppmc_update_order3();
+ case 4: ppmc_update_order4();
+ default: break;
+ };
+
+
+ // Update order variables
+
+ o4_byte=o3_byte;
+ o3_byte=o2_byte;
+ o2_byte=o1_byte;
+ o1_byte=byte; //current one is next time order-1
+
+ debug++;
+
+ // Check if we run out of memory, in that case, flush the encoder
+
+ if(ppmc_out_of_memory==1)
+ {
+ printf("Flushing memory! Output file might be not decodable.\n");
+ ppmc_flush_mem_enc();
+ }
+
+
+ }
+
+
+ // Flush range coder
+ range_coder_flush(&rc_coder);
+
+ // Free memory
+ ppmc_free_memory();
+
+
+ // Print bpb and kbyps
+ printf("%s at %f bpb.\n",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8);
+
+
+ // Close file handles
+ fclose(file_input);
+ fclose(file_output);
+
+
+
+ // Nicely exit
+ exit(0);
+}
+
+
+// Routines not used by ppmc but rather by main.
+// Not including the range coder.
+
+
+// Returns the file size of a given file.
+long filesize(FILE *stream)
+{
+ long curpos, length;
+
+ curpos = ftell(stream);
+ fseek(stream, 0L, SEEK_END);
+ length = ftell(stream);
+ fseek(stream, curpos, SEEK_SET);
+ return length;
+}
+
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "range.c" (exclusion)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ This module contains the routines of both the range coder and decoder.
+
+ The range coder works internally in 32 bits, and uses bytes as symbols.
+ Also the end of message symbol is used. So z=257.
+
+ Both input and output use rc_file as the file stream. Of course we can't
+ code and decode at the same time. All the input or output comes from the
+ same file, no matter what range coder structure are we using. The modules
+ here provided don't manage the io except for reading and writing, they
+ don't open nor close the files. The reading and writing is done via
+ putc and getc.
+*/
+
+#include "range.h"
+
+/*
+ Inits the range coder state. Must be called before encoding any symbol.
+ It uses a magic number 0xB3 as the first byte outputted.
+ -rangecoder *rc, the range coder to be used.
+
+ Shoulde be called like that:
+ range_coder_init(&o0_rc_state,file_output);
+*/
+void range_coder_init(rangecoder *rc, FILE *stream)
+{
+ rc_file=stream;
+ rc->low=0; //define state
+ rc->range=0x80000000;
+ rc->byte_buffer=0xB3; //magic number
+ rc->help=0; //temp value
+}
+
+
+/*
+ Encodes a symbol.
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative frequency
+ -unsigned long lt_f, the cumulative probabilty of the symbol
+ -unsigned long sy_f, the probability of the symbol
+*/
+void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f)
+{
+ unsigned long temp, r;
+
+ if(lt_f>tot_f)
+ {
+ printf("BUG!!");
+ exit(1);
+ }
+
+// printf("\nc> %d,%d,%d ",tot_f,lt_f,sy_f);
+
+ range_coder_renormalize(rc); //&rc?
+
+ r=rc->range/tot_f;
+ temp=r*lt_f;
+ if(lt_f+sy_f<tot_f)
+ rc->range=r*sy_f;
+ else
+ rc->range-=temp;
+ rc->low+=temp;
+}
+
+/*
+ Renormalizes the state when coding.
+ -rangecoder *rc, the range coder to be used.
+*/
+
+void range_coder_renormalize(rangecoder *rc)
+{
+ while(rc->range<=(unsigned long)0x00800000)
+ {
+ if(rc->low<(unsigned long)0x7F800000)
+ {
+ putc(rc->byte_buffer,rc_file);
+ for(;rc->help;rc->help--)
+ putc(0xFF,rc_file);
+ rc->byte_buffer=(unsigned char)(rc->low>>23);
+ }
+ else
+ {
+ if(rc->low&(unsigned long)0x80000000)
+ {
+ putc(rc->byte_buffer+1,rc_file);
+ for(;rc->help;rc->help--)
+ putc(0x00,rc_file);
+ rc->byte_buffer=(unsigned char)(rc->low>>23);
+ }
+ else
+ rc->help++;
+ }
+ rc->range<<=8;
+ rc->low=(rc->low<<8)&(unsigned long)(0x7FFFFFFF);
+ }
+}
+
+
+/*
+ Flushes the encoder. Must be called when the coding is done.
+ -rangecoder *rc, the range coder to be used.
+
+ Shoulde be called like that:
+ range_coder_flush(&o0_rc_state);
+*/
+void range_coder_flush(rangecoder *rc)
+{
+ unsigned long tmp;
+
+ range_coder_renormalize(rc);
+ tmp = rc->low >> 23;
+ if (tmp > 0xff)
+ {
+ putc(rc->byte_buffer+1,rc_file);
+ for(; rc->help; rc->help--)
+ putc(0,rc_file);
+ }
+ else
+ {
+ putc(rc->byte_buffer,rc_file);
+ for(; rc->help; rc->help--)
+ putc(0xff,rc_file);
+ }
+
+ putc(tmp & 0xff,rc_file);
+ putc((tmp = rc->low >> (23-8)) & 0xff,rc_file);
+}
+
+
+/*
+ Inits the range decoder state. Also checks for the magic number, and
+ quits in case it isn't the first, so be careful.
+ -rangecoder *rc, the range coder to be used.
+*/
+void range_decoder_init(rangecoder *rc, FILE *stream)
+{
+ unsigned int _rd_c;
+
+ rc_file=stream;
+ if((_rd_c=getc(rc_file))!=0xB3)
+ {
+ printf("\nThis is not range coded data. Magic number not found. Exiting.");
+ exit(1);
+ }
+ rc->byte_buffer=getc(rc_file);
+ rc->low=rc->byte_buffer>>1;
+ rc->range=0x80;
+}
+
+
+/*
+ Decode a symbol, get its cumulative probability.
+ Input:
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative probability
+ Output:
+ -unsigned long, cumulative probability of the current symbol
+ Should be called like that:
+ current_cump=range_decoder_decode(&o0_rc_state,o0_tot_f);
+*/
+unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f)
+{
+ unsigned long temp;
+
+ range_decoder_renormalize(rc);
+ rc->help=rc->range/tot_f;
+ temp=rc->low/rc->help;
+ if(temp>=tot_f)
+ return tot_f-1;
+ else
+ return temp;
+}
+
+
+/*
+ Updates the state so next symbol can be decoded.
+ Input:
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative probability
+ -unsigned long lt_f, the cumulative probabilty of the symbol
+ -unsigned long sy_f, the probability of the symbol
+
+*/
+void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f)
+{
+ unsigned long temp;
+
+// printf("\nd> %d,%d,%d ",tot_f,lt_f,sy_f);
+
+ temp=rc->help*lt_f;
+ rc->low-=temp;
+ if(lt_f+sy_f<tot_f)
+ rc->range=rc->help*sy_f;
+ else
+ rc->range-=temp;
+}
+
+
+/*
+ Renormalizes the state while decoding.
+ -rangecoder *rc, the range coder to be used.
+*/
+void range_decoder_renormalize(rangecoder *rc)
+{
+ while(rc->range<=0x00800000)
+ {
+ rc->low=(rc->low<<8)|((rc->byte_buffer<<7)&0xFF);
+ rc->byte_buffer=getc(rc_file);
+ rc->low |= rc->byte_buffer >> (1);
+ rc->range<<=8;
+ }
+}
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "range.h" (exclusions)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Declarations for the coder.
+*/
+
+#include <stdio.h>
+
+typedef struct{
+ unsigned long low, range, help;
+ unsigned char byte_buffer;
+}rangecoder;
+
+FILE *rc_file;
+
+void range_coder_init(rangecoder *rc, FILE *stream); //coding routines
+void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f);
+void range_coder_renormalize(rangecoder *rc);
+void range_coder_flush(rangecoder *rc);
+void range_decoder_init(rangecoder *rc, FILE *stream);//decoding routines
+unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f);
+void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f);
+void range_decoder_renormalize(rangecoder *rc);
+
+
+typedef unsigned long code_value;
+#define CODE_BITS 32
+#define Top_value ((code_value)1 << (CODE_BITS-1))
+#define SHIFT_BITS (CODE_BITS - 9)
+#define EXTRA_BITS ((CODE_BITS-2) % 8 + 1)
+#define Bottom_value (Top_value >> 8)
+#define outbyte(cod,x) putc(x,stdout)
+#define inbyte(cod) getc(stdin)
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "unppmc.c" (exclusion)
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc decoder.
+
+ This module is the main module and calls the different modules to do
+ the decoding of a file. When done prints kbyps.
+*/
+
+
+// Bibliotecas necesarias
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h" //the range coder functions and data
+#include "ppmcdata.h"
+
+
+// Declaracion de funciones del ppmcmain.c
+long filesize(FILE *stream);
+
+
+
+
+//Main
+void main (int argc, char *argv[])
+{
+ unsigned long counter, //temporal counter for loops like for or while
+ counter2, //another temporal counter for sub loops
+ size_file_output, //the size of the output file
+ main_counter; //used in main
+ char expected_flush=0; //used for checking flushing which can't be done
+
+
+ // Print title and version.
+ printf("UNPPMC using range coder.\n");
+ printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n");
+ printf("Permission is granted to make verbatim copies of this program for private\n");
+ printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n");
+
+
+ // Check for correct number of parameters
+ if(argc!=3)
+ {
+ printf("Bad number of arguments.\n");
+ exit(1);
+ }
+
+
+ // Try to open input and output files
+ if((file_input=fopen(argv[1],"r+b"))==NULL)
+ {
+ printf("Couldn't open %s.\n",argv[1]);
+ exit(1);
+ }
+
+ if((file_output=fopen(argv[2],"w+b"))==NULL)
+ {
+ printf("Couldn't create %s.\n",argv[2]);
+ exit(1);
+ }
+
+
+ // Get output length
+ fread(&size_file_output,1,4,file_input);
+
+
+ // Initialize ppmc decoder
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+ ppmc_decoder_initialize();
+
+
+
+ // Initialize decoder
+ range_decoder_init(&rc_decoder,file_input);
+
+
+ // Start main loop which decodes the file
+ main_counter=size_file_output-4; //take in account the bytes already written
+ expected_flush=0; //we don't expect a flush yet
+
+ while(main_counter!=0)
+ {
+
+ // Clear exclusion table
+ for(counter=0;counter!=256;++counter)
+ excluded[counter]=0;
+
+// Try to decode current byte in order-4 if possible, else in lower ones
+ppmc_decode_order4();
+if(byte==-1)
+ ppmc_decode_order3();
+ if(byte==-1)
+ {
+ ppmc_decode_order2();
+ if(byte==-1)
+ {
+ ppmc_decode_order1();
+ if(byte==-1)
+ {
+ ppmc_decode_order0();
+ if(byte==-1) //check if it was an escape code
+ {
+ // Decode in order-(-1)
+ ppmc_get_totf_ordern1();
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump);
+ byte=ppmc_get_symbol_ordern1();
+ ppmc_get_prob_ordern1();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+ coded_in_order=0; //update all orders
+
+ // Now see if it's the code of flushing
+
+ if(symb_cump==256)
+ {
+ printf("Flushing.\n");
+ ppmc_flush_mem_dec();
+ expected_flush=0;
+ continue; //do not output byte nor update
+ }
+
+ }
+ }
+ }
+ }
+
+ // Output byte and update model
+
+ fputc(byte,file_output);
+
+ switch(coded_in_order) //update exclusion
+ {
+ case 0: ppmc_update_order0(); //update only order-0
+ case 1: ppmc_update_order1(); //update order-0 and order-1
+ case 2: ppmc_update_dec_order2(); //update order-0 1 and 2
+ case 3: ppmc_update_dec_order3();
+ case 4: ppmc_update_dec_order4();
+ default: break;
+ };
+
+
+ // Check if flushing has to be done and has not been done.
+ // This is optional, in case you limit the memory usage, you don't
+ // need to include this
+
+ if(expected_flush==1) // If flushing didn't happen, we can't decode
+ {
+ printf("Can't decompress file. Not enough memory.\nTry in a machine with more memory.\n");
+ exit(1);
+ }
+ if(ppmc_out_of_memory==1)
+ {
+ expected_flush=1; // Next code must be a flush code, otherwise we don't
+ // have enough memory, and therefore we can't decode
+ }
+
+
+ // Update order variables
+
+ o4_byte=o3_byte;
+ o3_byte=o2_byte;
+ o2_byte=o1_byte;
+ o1_byte=byte; //current one, is next time order-1
+
+ // Byte decoded and model updated, loop
+ main_counter--;
+
+//printf("\n%d",size_file_output-main_counter);
+
+ }
+
+
+ ppmc_free_memory();
+
+ // Close file handles and free memory
+ fclose(file_input);
+ fclose(file_output);
+
+
+
+
+ // Nicely exit
+ exit(0);
+}
+
+
+// Ruotines not used by ppmc but rather by main.
+// Not including the range coder.
+
+
+// Returns the file size of a given file.
+long filesize(FILE *stream)
+{
+ long curpos, length;
+
+ curpos = ftell(stream);
+ fseek(stream, 0L, SEEK_END);
+ length = ftell(stream);
+ fseek(stream, curpos, SEEK_SET);
+ return length;
+}
+
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmc.c"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains the whole ppmc encoder. It uses hash tables for
+ managing most of the orders. And a maximum order of 4. It codes bytes.
+ Order-1-0-(-1) are all handled in tables. Order-2 has a table with
+ direct hashing with pointers to the linked lists. Order-4 and order-3
+ both have hash tables with pointers to contexts in a linked lists which
+ finally have a pointer to the start of the linked list with the
+ probability distribution. Update exclusion is used, but exclusion is not.
+
+ Please, note that if the machine where the decoder is run doesn't has as
+ much memory as the computer where the encoder was ran, the decoder will
+ not be able to properly decode the file, because it will not be able to
+ keep track of new statistics, in this case it will just exit.
+
+ For applications where the loss of data is not admisible, I suggest you to
+ limit both encoder and decoder's memory requeriments to a given minimum
+ which both machines have.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h"
+#include "ppmcdata.h"
+
+
+
+// Ruotines used by ppmc. Not including the range coder.
+//
+// They are for initializing of both encoder and decoder, and unless there
+// are two version, both encoder and decoder use the same routines. Like
+// "ppmc_initialize_contexts".
+
+
+// This one allocs the memory needed by ppmc, and adjust some pointers used
+// for allocating elements in the linked lists. The mempool arrays must be
+// initialized now.
+void ppmc_alloc_memory(void)
+{
+ unsigned long counter;
+
+
+ // Init mempool buffers
+
+ for(counter=0;counter!=_mempool_max_index;++counter)
+ {
+ _bytes_pool_array[counter]=0;
+ _context_pool_array[counter]=0;
+ }
+
+ _bytes_pool_index=1; //first entry will be used now
+ _context_pool_index=1;
+
+
+ // Allocate memory for ppmc structures and adjust some variables
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+
+ //save pointers in the array for freeing
+ _bytes_pool_array[0]=_bytes_pool;
+ _context_pool_array[0]=_context_pool;
+
+
+ //adjust variables
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements;
+ _context_pool_max=_context_pool+_context_pool_elements;
+
+ ppmc_out_of_memory=0; //we still have memory
+}
+
+
+// This routine initializes all the contexts, and all the tables including
+// those who care about the number of bytes defined in a context.
+void ppmc_initialize_contexts(void)
+{
+ unsigned long counter, counter2;
+
+
+ // Order-0
+ for(counter=0;counter!=256;++counter) //clear table
+ order0_array[counter]=0;
+
+ order0_defined_bytes=0; //adjust variables
+ order0_max_cump=0;
+
+
+ // Order-1
+ for(counter=0;counter!=256;++counter) //erase every table of every context
+ for(counter2=0;counter2!=256;++counter2)
+ order1_array[counter][counter2]=0;
+
+ for(counter=0;counter!=256;++counter) //adjust variables
+ {
+ order1_defined_bytes_array[counter]=0;
+ order1_max_cump_array[counter]=0;
+ }
+
+
+ // Order-2
+ for(counter=0;counter!=65536;++counter)
+ {
+ //order2_array[counter].prob=0; //clear pointer to bytes and frequencies
+ //order2_array[counter].max_cump=0;
+ order2_array[counter].defined_bytes=0;
+ }
+
+
+ // Order-4-3
+ for(counter=0;counter!=65536;++counter) //order-4-3
+ {
+ order4_hasht[counter]=0;
+ order3_hasht[counter]=0;
+ }
+}
+
+
+// This routine initializes the encode model by outputting as many bytes as
+// needed to prepare the models. This should be called before the main loop
+// and after the memory has been allocated and tables initialized.
+//
+// It does not need uses the range coder. It output the first 1 bytes.
+void ppmc_encoder_initialize(void)
+{
+
+ // Initialize order-0 and prepare different bytes for orders
+ fputc((byte=fgetc(file_input)),file_output);
+ o4_byte=byte; //order-4
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o3_byte=byte; //order-3
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o2_byte=byte; //order-2
+ ppmc_update_order0();
+
+ fputc((byte=fgetc(file_input)),file_output);
+ o1_byte=byte;
+
+}
+
+
+// This routine initializes the decoder model, should be called to do the same
+// changes as "ppmc_encoder_initialize()" did.
+void ppmc_decoder_initialize(void)
+{
+
+ // Initialize order-0 and context bytes
+ byte=fgetc(file_input);
+ o4_byte=byte; //order-4
+ fputc(byte,file_output);
+
+ byte=fgetc(file_input);
+ o3_byte=byte; //order-3
+ fputc(byte,file_output);
+
+ byte=fgetc(file_input);
+ o2_byte=byte; //order-2
+
+ fputc(byte,file_output); //output first byte
+ ppmc_update_order0();
+
+ byte=fgetc(file_input);
+ o1_byte=byte; //order-1
+ fputc(byte,file_output);
+}
+
+
+// Once coding or decoding is finished you have to call this routine.
+// It must be called when done.
+void ppmc_free_memory(void)
+{
+ unsigned long counter;
+
+ // Free the memory buffers
+
+ for(counter=0;counter!=_mempool_max_index;++counter)
+ {
+ if(_bytes_pool_array[counter]!=0)
+ free(_bytes_pool_array[counter]);
+
+ if(_context_pool_array[counter]!=0)
+ free(_context_pool_array[counter]);
+ }
+
+}
+
+
+// This routine flushes the memory and restarts all the tables of
+// probabilities, current order bytes are not modified, this function
+// is called when we ran out of memory. We have to output the code
+// number 256 which means memory flushing, for doing this we have to go
+// to order-(-1) so we have to output an escape code in all the orders
+// till we reach order-(-1) where we can code it. Then we free all the
+// memory, alloc it again, and reinitialize all the orders.
+//
+// However we may find the case when the current order is not initialized,
+// in this case we don't need to output an escape code.
+void ppmc_flush_mem_enc(void)
+{
+
+ // Code an escape code in order-4
+ if(ppmc_get_totf_order4()!=0) //if 0 no need of escape code
+ {
+
+ ppmc_get_escape_prob_order4(); //get prob and cump
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ }
+
+
+ // Code an escape code in order-3
+ if(ppmc_get_totf_order3()!=0) //if 0 no need of escape code
+ {
+
+ ppmc_get_escape_prob_order3(); //get prob and cump
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ }
+
+
+ // Code an escape code in order-2
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes!=0) //it's not empty
+ {
+ ppmc_get_totf_order2();
+ ppmc_get_escape_prob_order2();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ }
+
+
+ // Code an escape code in order-1
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]!=0) //it's not empty
+ {
+ ppmc_get_totf_order1();
+ ppmc_get_escape_prob_order1();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ }
+
+
+ // Code an escape code in order-0. Order-0 always has at least one symbol
+
+ ppmc_get_totf_order0();
+ ppmc_get_escape_prob_order0();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+
+
+ // Now we can code the code 256
+
+ symb_prob=1;
+ symb_cump=256;
+ total_cump=257;
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+
+ // Now that decoder knows the flushing, free memory and reinit
+
+ ppmc_free_memory();
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+
+
+ // Be sure that order-0 has at least one probability
+
+ order0_array[o1_byte]++;
+ order0_max_cump++;
+ order0_defined_bytes++;
+
+}
+
+
+// When the decoder gets the symbol of flushing, most of the job is done
+// because we already got all the escape codes, so we only have to reinit.
+void ppmc_flush_mem_dec(void)
+{
+
+ // Free memory and reinit
+
+ ppmc_free_memory();
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+
+
+ // Be sure that order-0 has at least one probability
+
+ order0_array[o1_byte]++;
+ order0_max_cump++;
+ order0_defined_bytes++;
+
+
+}
+
+
+
+// ORDER-(-1) functions, also called ordern1 (Negative1) in functions
+//
+// Because order-(-1) does not need to update its probability tables, it
+// has no tables, and relies on the fact that the cump of byte is its own
+// value, and the probability is fixed, 1, and the total cump is 257.
+//
+// The alphabet has the following distribution: 0-255 the bytes. 256 is
+// an special symbol which means that we have flushed the encoder tables,
+// and thus the encoder must flush its tables too.
+//
+// The rest of the tables only have 256 symbols, because we have no need
+// of assign a symbol to the flush code (which already is the order-(-1)
+// table) nor to the escape code.
+
+
+// Gets the probability for a given symbol in the order-(-1) (ordern1)
+void ppmc_get_prob_ordern1(void)
+{
+ symb_cump=byte; //its value
+ symb_prob=1; //flat probability
+ total_cump=257; //total cump
+}
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-(-1)
+void ppmc_get_totf_ordern1(void)
+{
+ total_cump=257; //this is fixed
+}
+
+
+// Returns the symbol for a given cump under order-(-1)
+unsigned long ppmc_get_symbol_ordern1 (void)
+{
+ return symb_cump;
+}
+
+
+
+// ORDER-0 functions
+//
+// Due to the fact that order-0 has no context, I use an array for all the
+// probabilities under order-0, just as you could do in a basic model for
+// arithmetic coding.
+//
+// The main array is: order0_array. Where order0_array[byte] contains the
+// probability for a given byte. The same applies to order-1.
+//
+// To ensure that the updating and coding is done correctly, "byte" can't
+// be changed till all the coding and updating is done.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-0
+void ppmc_get_totf_order0(void)
+{
+ // Total cump is current total cump plus the escape for the escape code
+ total_cump=order0_defined_bytes+order0_max_cump;
+}
+
+
+// Codes a byte under order-0 and returns 1, otherwise it returns a 0 and
+// has coded an escape code. In this case further coding is needed.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code.
+char ppmc_code_byte_order0(void)
+{
+ unsigned long counter;
+
+ ppmc_get_totf_order0(); //get total cump
+
+ // See if the byte is present
+ if(order0_array[byte]==0) //a probability of 0
+ {
+
+ // Because it was not present, output an escape code, prepare variables
+
+ symb_cump=order0_max_cump; //obviously its cump is current max_cump
+ //without escape code's space
+
+ symb_prob=order0_defined_bytes; //the number of defined bytes
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0; //byte not coded
+ }
+ else
+ {
+
+ coded_in_order=0;
+
+ // The symbol is present, code it under order-0
+
+ symb_prob=order0_array[byte]; //get probability directly
+
+ // Make cump for current symbol
+
+ symb_cump=0; //for first symbol is 0
+ for(counter=0; counter!=byte ; ++counter)
+ symb_cump+=order0_array[counter]; //sum probabilities before our symbol
+
+ // Code the symbol
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //symbol coded under order-0
+ }
+}
+
+
+// This functions update order-0 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+void ppmc_update_order0(void)
+{
+ if(order0_array[byte]==0)
+ {
+ // It had a zero probability
+ order0_array[byte]++; //increment symbol probability
+ ++order0_defined_bytes; //new byte defined
+ ++order0_max_cump; //total cump
+ return;
+ }
+ else
+ {
+ // It had a non-zero probability
+
+ // Increment its probability
+ order0_array[byte]++; //increment symbol probability
+ ++order0_max_cump; //total cump
+
+ // Check to see if its the maximum in this case renormalize
+ if(order0_array[byte]==255)
+ ppmc_renormalize_order0();
+
+ return;
+ }
+}
+
+
+// This functions renormalizes the probabilities at order-0 updating variables
+void ppmc_renormalize_order0(void)
+{
+ unsigned long counter;
+
+ // Initialize variables
+ order0_defined_bytes=0; //clear them
+ order0_max_cump=0;
+
+ // Loop over all probabilities, divide them by a factor of 2 and update variables
+ for(counter=0 ; counter!=256 ; ++counter)
+ {
+ order0_array[counter]>>=1; //divide by a factor of 2
+
+ if(order0_array[counter]!=0) //see if it has a non zero probability
+ order0_defined_bytes++;
+
+ order0_max_cump+=order0_array[counter]; //sum to the total cump
+ }
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of a escape code it returns -1
+void ppmc_decode_order0(void)
+{
+ unsigned long current_cump, counter;
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order0(); //total cump needed for decoding
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=order0_max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order0();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ for(counter=0 ; counter!= 256 ; ++counter)
+ {
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+ current_cump+=order0_array[counter]; //update cump
+ }
+
+ counter--; //-1 (see ac_ppmc.html, searching for a given symbol)
+
+ byte=counter; //update the variable with the found symbol
+
+
+ // Get the cump and prob of byte
+
+ symb_prob=order0_array[byte]; //the probabilty
+ symb_cump=current_cump-order0_array[byte]; //using the cump which was
+ //already made
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ coded_in_order=0; //decoded under order-0
+
+ return;
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order0(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+ symb_prob=order0_defined_bytes;
+ symb_cump=order0_max_cump;
+ total_cump=order0_defined_bytes+order0_max_cump;
+}
+
+
+
+// ORDER-1 functions
+//
+// Order-1 uses 256 arrays one for every context, obviously they are accessed
+// like that: order1_array[o1_byte][byte]
+//
+// Also for computing the value of the current escape code in a given context
+// we need to know how many bytes are defined in a given context, they are
+// stored in an array, and you use them like that:
+// order1_defined_bytes_array[o1_byte]
+//
+// The same goes for the maximum cump of a given context:
+// order1_max_cump_array[o1_byte]
+//
+// Read the order-0 explanation.
+//
+// To ensure that the updating and coding is done correctly, "byte" and
+// "o1_byte" can't be changed till all the coding and updating is done.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-1. o1_byte should contain the order-1
+void ppmc_get_totf_order1(void)
+{
+ // Total cump is current total cump plus the escape for the escape code
+ total_cump=order1_defined_bytes_array[o1_byte]+order1_max_cump_array[o1_byte];
+}
+
+
+// Codes a byte under order-1 and returns 1.
+// Otherwise it returns a 0 it may be that it has coded an escape code, or
+// that current table was empty, in this case nothing is outputted
+// In both cases further coding is needed.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code or empty table.
+// In case the byte is coded under this context, coded_in_order=1.
+char ppmc_code_byte_order1(void)
+{
+ unsigned long counter;
+
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]==0) //it's empty
+ {
+ return 0; //byte not coded, nothing done
+ }
+
+
+ // Now try to code this byte under order-1
+
+ ppmc_get_totf_order1(); //get total cump
+
+ // See if the byte is present
+ if(order1_array[o1_byte][byte]==0) //a probability of 0
+ {
+
+ // Because it was not present, output an escape code, prepare variables
+
+ symb_cump=order1_max_cump_array[o1_byte];//obviously its cump is current max_cump
+ //without escape code's space
+
+ symb_prob=order1_defined_bytes_array[o1_byte];//the number of defined bytes
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0; //byte not coded, escape coded
+ }
+ else
+ {
+
+ coded_in_order=1; //because we coded it under order-1
+
+ // The symbol is present, code it under order-1
+
+ symb_prob=order1_array[o1_byte][byte]; //get probability directly
+
+ // Make cump for current symbol
+
+ symb_cump=0; //for first symbol is 0
+ for(counter=0; counter!=byte ; ++counter)
+ symb_cump+=order1_array[o1_byte][counter]; //sum probabilities before our symbol
+
+ // Code the symbol
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //symbol coded under order-1
+ }
+}
+
+
+// This functions update order-1 probabilities with current byte, also takes
+// care about updating variables, and renormalizing. o1_byte must be filled.
+void ppmc_update_order1(void)
+{
+ if(order1_array[o1_byte][byte]==0)
+ {
+ // It had a zero probability
+ order1_array[o1_byte][byte]++; //increment symbol probability
+ ++order1_defined_bytes_array[o1_byte]; //new byte defined
+ ++order1_max_cump_array[o1_byte]; //total cump
+ return;
+ }
+ else
+ {
+ // It had a non-zero probability
+
+ // Increment its probability
+ order1_array[o1_byte][byte]++; //increment symbol probability
+ ++order1_max_cump_array[o1_byte]; //total cump
+
+ // Check to see if its the maximum in this case renormalize
+ if(order1_array[o1_byte][byte]==255)
+ ppmc_renormalize_order1();
+
+ return;
+ }
+}
+
+
+// This functions renormalizes the probabilities at order-1 updating variables
+void ppmc_renormalize_order1(void)
+{
+ unsigned long counter;
+
+ // Initialize variables
+ order1_defined_bytes_array[o1_byte]=0; //clear them
+ order1_max_cump_array[o1_byte]=0;
+
+ // Loop over all probabilities, divide them by a factor of 2 and update variables
+ for(counter=0 ; counter!=256 ; ++counter)
+ {
+ order1_array[o1_byte][counter]>>=1; //divide by a factor of 2
+
+ if(order1_array[o1_byte][counter]!=0) //see if it has a non zero probability
+ order1_defined_bytes_array[o1_byte]++;
+
+ order1_max_cump_array[o1_byte]+=order1_array[o1_byte][counter]; //sum to the total cump
+ }
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+void ppmc_decode_order1(void)
+{
+ unsigned long current_cump, counter;
+
+
+ // First check if current order-1 table is empty
+ if(order1_defined_bytes_array[o1_byte]==0) //it's empty
+ {
+ byte=-1; //byte not coded, nothing done
+ return;
+ }
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order1(); //total cump needed for decoding
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=order1_max_cump_array[o1_byte]) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order1();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ for(counter=0 ; counter!= 256 ; ++counter)
+ {
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+ current_cump+=order1_array[o1_byte][counter]; //update cump
+ }
+
+ counter--; //-1 (see ac_ppmc.html, searching for a given symbol)
+
+ byte=counter; //update the variable with the found symbol
+
+
+ // Get the cump and prob of byte
+
+ symb_prob=order1_array[o1_byte][byte]; //the probabilty
+ symb_cump=current_cump-order1_array[o1_byte][byte]; //using the cump which was
+ //already made
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=1;
+
+ return;
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order1(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+ symb_prob=order1_defined_bytes_array[o1_byte];
+ symb_cump=order1_max_cump_array[o1_byte];
+ total_cump=order1_defined_bytes_array[o1_byte]+order1_max_cump_array[o1_byte];
+}
+
+
+
+// ORDER-2 functions
+//
+// Order-2 uses a table which holds the contexts data structure of every
+// structure, its accessed with "ppmc_order2_hash_key(o1_byte, o2_byte)"
+// and because it uses direct addressing there's no need to check for hash
+// collisions, which makes it faster. The variable "o2_cntxt" contains o1_byte
+// and o2_byte used directly to index the order-2 array.
+//
+// Though order-2 uses the same routines as lower orders they are rather
+// different, because we can't allocate memory for all the probabilities
+// tables, we have to use linked lists for the bytes and probabilities.
+//
+// Under order-2 we don't let a probability to be lower than 1, that is,
+// there can't be a symbol in a linked list with a 0 probability, I choosed
+// this for three factors: the code is easier to read (it isn't messy)
+// we gain some speed, and the loss of compression seems little.
+//
+// To ensure that the updating is done correctly, "o2_cntxt" and
+// "o2_ll_node" must not be modified by any routine except those who manage
+// order-2.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-2. cntxt must have o1_byte and o2_byte hashed with the hash key
+// for order-2.
+void ppmc_get_totf_order2(void)
+{
+ // Total cump is current total cump plus the escape for the escape code
+ total_cump=order2_array[o2_cntxt].defined_bytes+order2_array[o2_cntxt].max_cump;
+}
+
+
+// Codes a byte under order-2 and returns 1.
+// Otherwise it returns a 0. It may be that it has coded an escape code, or
+// that current table was empty.
+//
+// Returns: 1 in case a byte was coded. 0 in case of escape code or empty table.
+// In case the byte is coded under this context, coded_in_order=2.
+char ppmc_code_byte_order2(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize o2_cntxt
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes==0) //it's empty
+ {
+ return 0; //byte not coded, nothing done
+ }
+
+
+ // Now try to code this byte under order-2
+
+ ppmc_get_totf_order2(); //get total cump
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=order2_array[o2_cntxt].prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o2_byte_found; //bad thing, I know, anyone has a better idea?
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o2_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=order2_array[o2_cntxt].max_cump;
+ symb_prob=order2_array[o2_cntxt].defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0;
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o2_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=2; //successfully coded under order-2
+
+ o2_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-2
+}
+
+
+// This functions update order-2 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+// Of course "o2_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do.
+//
+// This updating is only for encoding.
+void ppmc_update_order2(void)
+{
+
+ // First of all check if that's the first byte in this context, in that case
+ // we have to initialize some variables in the context structure.
+
+ if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+ order2_array[o2_cntxt].defined_bytes=1;
+ order2_array[o2_cntxt].max_cump=1;
+ order2_array[o2_cntxt].prob=_bytes_pool;
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order two, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==2) //coded under order-2
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o2_ll_node->freq++; //increment its frequency (rather probability)
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+
+ if(o2_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order2(); //renormalize
+
+ }
+ else
+ {
+
+ // Once every paranoid check has been done we are sure that this byte
+ // did not existed and so we have to create a new node in the linked
+ // list. Also we have to take care of memory issues.
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o2_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+ order2_array[o2_cntxt].defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-2 updating context
+// variables.
+void ppmc_renormalize_order2(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+ // Initialize variables. Defined bytes remain the same.
+ order2_array[o2_cntxt].max_cump=0; //clear them
+
+ node=order2_array[o2_cntxt].prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ order2_array[o2_cntxt].max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+ //printf("\nRenormalization, context:%c%c",o2_byte,o1_byte);
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o2_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o2_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order2(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+ // Initialize o2_cntxt
+
+ o2_cntxt=ppmc_order2_hash_key(o1_byte,o2_byte);
+
+
+ // First check if current order-2 context is empty
+ if(order2_array[o2_cntxt].defined_bytes==0) //it's empty
+ {
+ byte=-1; //byte not coded, nothing done
+ return;
+ }
+
+
+ // Get the total cump needed for decoding symbol
+ ppmc_get_totf_order2(); //total cump needed for decoding
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=order2_array[o2_cntxt].max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order2();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=order2_array[o2_cntxt].prob; //get pointer to linked lists
+
+ while(1)
+ {
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o2_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=2;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. We have to search the byte
+// in the linked list, if it's present, update its count, otherwise we have
+// hitted the end of the linked list, and there we have to create a new node.
+//
+// Of course if the byte was matched in order-2 we'll have a pointer to it
+// in "o2_ll_node" so we don't need to read the linked list. (we already did
+// in decoding)
+//
+// Another case which we also have to specially deal with, this is the case
+// when the context has not been initalized yet.
+void ppmc_update_dec_order2(void)
+{
+ struct _byte_and_freq *node;
+
+
+ // Handle the case when the context is not initialized
+ // This code is the same as the one for the encoding.
+
+ if(order2_array[o2_cntxt].defined_bytes==0) //no byte defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+ order2_array[o2_cntxt].defined_bytes=1;
+ order2_array[o2_cntxt].max_cump=1;
+ order2_array[o2_cntxt].prob=_bytes_pool;
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+
+ return; //nothing else to do
+ }
+
+
+ // Current context is initalized, proceed
+
+ if(coded_in_order==2) //check if it was decoded under order-2
+ {
+
+ // We can be sure that the pointer "o2_ll_node" points to its entry, and
+ // it has a non 0 probability (otherwise it couldn't be coded) so just
+ // update its probability and max_cump
+
+ o2_ll_node->freq++; //the probability of the byte
+ order2_array[o2_cntxt].max_cump++; //the max_cump
+
+ if(o2_ll_node->freq==255) //check for renormalization
+ ppmc_renormalize_order2();
+
+ }
+ else
+ {
+
+ // An escape code was decoded under order-2, we have to read till the
+ // end of the linked list so we can add a new node for this new byte.
+
+ node=order2_array[o2_cntxt].prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+
+ // We reached the end of the linked list, add a new node if possible,
+ // we are using the same code of "ppmc_update_order2()" with the
+ // difference that the pointer to the linked list is "node"
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ order2_array[o2_cntxt].max_cump++; //total cump
+ order2_array[o2_cntxt].defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //we are finished updating
+
+ }
+
+}
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order2(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=order2_array[o2_cntxt].defined_bytes;
+ symb_cump=order2_array[o2_cntxt].max_cump;
+}
+
+
+
+// ORDER-3 functions
+//
+// The difference between order-3 and order-3 are just a few, instead of
+// keeping a table with the context structures, we keep a hash table with
+// pointers to linked lists with the context, so it's only a matter of
+// searching current context in the linked list corresponding to its hash
+// entry. This is done in "ppmc_get_totf_order3" because that's the first
+// routine that both encoding and decoding routines call.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-3. Must be called while encoding or decoding before anything else
+// because it initializes the pointers to the context structure in
+// "o3_context" and o3_cntxt.
+//
+// If the hash entry is not initialized it returns "o3_context"=0
+// If the context is not present in the linked list of context, "o3_context"
+// will point to the last element in the linked list.
+// If the context is present "o3_context" will point to the context to use.
+// One can distinguish the last two by checking the context value of the
+// structure, if it's not the same, is the last element.
+//
+// The routine returns 0 if the hash entry is not initialized or if the
+// the context was not present. Otherwise it returns 1, meaning that we
+// have to code under this context.
+char ppmc_get_totf_order3(void)
+{
+ struct context *cntxt_node;
+
+
+ // First make the hash key for order-3
+
+ o3_cntxt=ppmc_order3_hash_key(o1_byte,o2_byte,o3_byte);
+ full_o3_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16); //order-3
+
+
+ // Now check the hash entry in the table
+
+ if(order3_hasht[o3_cntxt]==0) //if 0, not initialized
+ {
+
+ o3_context=0; //no hash entry
+
+ return 0; //hash entry not initialized
+ }
+
+
+ // Now read trough the linked list of context searching current one
+
+ cntxt_node=order3_hasht[o3_cntxt];
+
+ while(1)
+ {
+
+ if(cntxt_node->order4321==full_o3_cntxt) //compare context
+ goto ppmc_gtf_cntxt_found;
+
+ if(cntxt_node->next==0) //end of context's linked list
+ break;
+
+ cntxt_node=cntxt_node->next; //next element
+
+ }
+
+
+ // Once there the context was not found
+ o3_context=cntxt_node; //pointer to last element in the linked list
+
+ return 0; //it was not present
+
+
+ // The context is found, so return pointer and cump
+
+ ppmc_gtf_cntxt_found:
+
+ o3_context=cntxt_node;
+
+ // Total cump is current total cump plus the escape for the escape code
+
+ total_cump=o3_context->defined_bytes+o3_context->max_cump;
+
+ return 1; //context found
+
+}
+
+
+// Codes a byte under order-3 and returns 1.
+// Otherwise it returns a 0.
+//
+// In case the byte is coded under this context, coded_in_order=3.
+char ppmc_code_byte_order3(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order3()==0)
+ return 0;
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=o3_context->prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o3_byte_found; //bad thing, I know, anyone has a better idea?
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o3_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=o3_context->max_cump;
+ symb_prob=o3_context->defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0;
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o3_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=3; //successfully coded under order-3
+
+ o3_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-3
+}
+
+
+// This functions update order-3 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+//
+// "o3_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do. Also "o3_context" must be initialized.
+//
+// This updating is only for encoding.
+void ppmc_update_order3(void)
+{
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order3_hasht[o3_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==3) //coded under order-3
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o3_ll_node->freq++; //increment its frequency (rather probability)
+
+ o3_context->max_cump++; //total cump
+
+ if(o3_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order3(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o3_context" points to the last element, so we can put the new element.
+
+ if(o3_context->order4321==full_o3_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o3_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o3_context->max_cump++; //total cump
+ o3_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o3_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-3 updating context
+// variables.
+void ppmc_renormalize_order3(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize variables. Defined bytes remain the same.
+ o3_context->max_cump=0; //clear them
+
+ node=o3_context->prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ o3_context->max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o3_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o3_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order3(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order3()==0)
+ {
+ byte=-1;
+ return;
+ }
+
+
+ // Decode current cump
+
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=o3_context->max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order3();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=o3_context->prob; //get pointer to linked lists
+
+ while(1)
+ {
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o3_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=3;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. The only difference with
+// the routine for coding is that when an escape code was coded, "o3_ll_node"
+// is not initialized so we have to read till the end of the linked list.
+// Fortunately "o3_context" will be initialized so we don't need to read its
+// linked list.
+void ppmc_update_dec_order3(void)
+{
+ struct _byte_and_freq *node;
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order3_hasht[o3_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order3_hasht[o3_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==3) //coded under order-3
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o3_ll_node->freq++; //increment its frequency (rather probability)
+
+ o3_context->max_cump++; //total cump
+
+ if(o3_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order3(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o3_context" points to the last element, so we can put the new element.
+
+ if(o3_context->order4321==full_o3_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ // Read till the end of the linked list
+
+ node=o3_context->prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ // Now add element
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o3_context->max_cump++; //total cump
+ o3_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o3_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o3_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order3(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=o3_context->defined_bytes;
+ symb_cump=o3_context->max_cump;
+}
+
+
+
+// ORDER-4 functions
+//
+// The routines for order-4 are *equal* to those for order-3, there are a few
+// changes like different global variables, and different hash keys.
+//
+// If you want to go to higher orders, you'd use the same code and data
+// structures, with the difference of the context bytes (order4321)
+// stored in every context's linked list.
+
+
+// Returns in the variable "total_cump" the current total cump of
+// order-4. Must be called while encoding or decoding before anything else
+// because it initializes the pointers to the context structure in
+// "o4_context" and o4_cntxt.
+//
+// If the hash entry is not initialized it returns "o4_context"=0
+// If the context is not present in the linked list of context, "o4_context"
+// will point to the last element in the linked list.
+// If the context is present "o4_context" will point to the context to use.
+// One can distinguish the last two by checking the context value of the
+// structure, if it's not the same, is the last element.
+//
+// The routine returns 0 if the hash entry is not initialized or if the
+// the context was not present. Otherwise it returns 1, meaning that we
+// have to code under this context.
+char ppmc_get_totf_order4(void)
+{
+ struct context *cntxt_node;
+
+
+ // First make the hash key for order-4
+
+ o4_cntxt=ppmc_order4_hash_key(o1_byte,o2_byte,o3_byte,o4_byte);
+ full_o4_cntxt=(o1_byte)+(o2_byte<<8)+(o3_byte<<16)+(o4_byte<<24); //order-4
+
+
+ // Now check the hash entry in the table
+
+ if(order4_hasht[o4_cntxt]==0) //if 0, not initialized
+ {
+
+ o4_context=0; //no hash entry
+
+ return 0; //hash entry not initialized
+ }
+
+
+ // Now read trough the linked list of context searching current one
+
+ cntxt_node=order4_hasht[o4_cntxt];
+
+ while(1)
+ {
+
+ if(cntxt_node->order4321==full_o4_cntxt) //compare context
+ goto ppmc_gtfo4_cntxt_found;
+
+ if(cntxt_node->next==0) //end of context's linked list
+ break;
+
+ cntxt_node=cntxt_node->next; //next element
+
+ }
+
+
+ // Once there the context was not found
+ o4_context=cntxt_node; //pointer to last element in the linked list
+
+ return 0; //it was not present
+
+
+ // The context is found, so return pointer and cump
+
+ ppmc_gtfo4_cntxt_found:
+
+ o4_context=cntxt_node;
+
+ // Total cump is current total cump plus the escape for the escape code
+
+ total_cump=o4_context->defined_bytes+o4_context->max_cump;
+
+ return 1; //context found
+
+}
+
+
+// Codes a byte under order-4 and returns 1.
+// Otherwise it returns a 0.
+//
+// In case the byte is coded under this context, coded_in_order=4.
+char ppmc_code_byte_order4(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order4()==0)
+ return 0;
+
+
+ // See if the byte is present and compute its cump at the same time
+
+ node=o4_context->prob; //pointer to first element in the linked list
+
+ symb_cump=0; //the first symbol always has a 0 cump
+
+
+ // Now search the byte in the linked list
+
+ do{
+ if(node->byte==byte)
+ goto ppmc_o4_byte_found; //bad thing, I know, anyone has a better idea?
+ symb_cump+=node->freq; //add the probability of this byte to the cump
+ if(node->next==0)
+ break;
+ node=node->next; //next element in the linked list
+ }while(1);
+
+
+ // If we reach that point, the byte was not found in the linked list
+ // so we don't need the cump, we have to output an escape code, whose
+ // probabilities are know using the context structure in the table.
+
+ // Byte was not present in the linked list, current node is the last one,
+ // and that's the node needed for creating a new node, save it.
+
+ o4_ll_node=node;
+
+ // Now get the probability and cump of the escape code
+
+ symb_cump=o4_context->max_cump;
+ symb_prob=o4_context->defined_bytes;
+
+ // Code the escape code
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 0;
+
+
+ // That code is executed when the byte is found in the linked list
+
+ ppmc_o4_byte_found:
+
+
+ // Everything has been tested, now we can feel free to code the byte,
+ // the symb_cump is already computed, now get its probability and code
+ // the byte, also save pointer to this element in the linked lists for
+ // updating.
+
+ coded_in_order=4; //successfully coded under order-4
+
+ o4_ll_node=node; //save it for updating
+
+ symb_prob=node->freq; //get the probability of the byte
+
+ // Code it.
+
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+
+ return 1; //byte coded under order-4
+}
+
+
+// This functions update order-4 probabilities with current byte, also takes
+// care about updating variables, and renormalizing.
+//
+// "o4_ll_node" must point to the element to update or the last one,
+// based on the "coded_in_order" and checking the pointer of the element we
+// can decide what to do. Also "o4_context" must be initialized.
+//
+// This updating is only for encoding.
+void ppmc_update_order4(void)
+{
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order4_hasht[o4_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+
+
+ // The byte was coded under order three, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==4) //coded under order-4
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o4_ll_node->freq++; //increment its frequency (rather probability)
+
+ o4_context->max_cump++; //total cump
+
+ if(o4_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order4(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o4_context" points to the last element, so we can put the new element.
+
+ if(o4_context->order4321==full_o4_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ o4_ll_node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o4_context->max_cump++; //total cump
+ o4_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o4_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+// This functions renormalizes the probabilities at order-4 updating context
+// variables.
+void ppmc_renormalize_order4(void)
+{
+ unsigned long counter;
+ struct _byte_and_freq *node;
+
+
+ // Initialize variables. Defined bytes remain the same.
+ o4_context->max_cump=0; //clear them
+
+ node=o4_context->prob; //get pointer to lined lists
+
+ // Divide all the probabilities by 2 and update context variables
+ while(1)
+ {
+ node->freq>>=1; //divide by a factor of 2
+
+ if(node->freq==0) //don't allow a probability to be 0
+ node->freq=1;
+
+ o4_context->max_cump+=node->freq; //sum to the total cump
+
+ if(node->next==0) //last element
+ break;
+ node=node->next;
+ }
+
+
+}
+
+
+// Decodes the symbol correspoding to the current order, it returns the byte
+// or in case of an escape code or empty table it returns -1.
+// It updates "coded_in_order".
+//
+// If we decode an escape code, we don't modify "o4_ll_node" and thus its
+// work of the updating routine to read till the end of the linked list
+// (for adding a new element) however if we decode a byte, we save on
+// "o4_ll_node" the pointer to its node. (so updating is faster)
+void ppmc_decode_order4(void)
+{
+ unsigned long current_cump, counter;
+ struct _byte_and_freq *node;
+
+
+ // Get current context (if present) and total cump.
+
+ if(ppmc_get_totf_order4()==0)
+ {
+ byte=-1;
+ return;
+ }
+
+
+ // Decode current cump
+
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump); //decode it
+
+ // Now check if it's an escape code
+ if(symb_cump>=o4_context->max_cump) //the defined code space for the escape code
+ {
+
+ // Update coding values
+
+ ppmc_get_escape_prob_order4();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Mark as escape code
+
+ byte=-1;
+
+ return; //an escape code
+ }
+ else
+ {
+ // Now we have to check what symbol it is
+
+ current_cump=0; //cump of the current symbol
+
+ node=o4_context->prob; //get pointer to linked lists
+
+ while(1)
+ {
+ current_cump+=node->freq; //update cump
+ if(symb_cump<current_cump)
+ break; //symbol found, break search loop
+
+ node=node->next; //next element
+ //we have no need to check for the last symbol, we'll never read further
+ //the end of the linked lists, before we'll found the last byte.
+ }
+
+
+ //read byte value and probability
+
+ symb_prob=node->freq; //get the probability for updating the state
+ byte=node->byte; //get byte
+ o4_ll_node=node; //used for updating
+
+
+ // Get the cump of byte
+
+ symb_cump=current_cump-symb_prob;
+
+ // Update coding state
+
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+
+ // Update coded_in_order used for update exclusion
+
+ coded_in_order=4;
+
+ return;
+ }
+
+}
+
+
+// This is the routine for updating while decoding. The only difference with
+// the routine for coding is that when an escape code was coded, "o4_ll_node"
+// is not initialized so we have to read till the end of the linked list.
+// Fortunately "o4_context" will be initialized so we don't need to read its
+// linked list.
+void ppmc_update_dec_order4(void)
+{
+ struct _byte_and_freq *node;
+
+ // First thing first, check if the hash entry is initialized
+
+ if(order4_hasht[o4_cntxt]==0) //no pointer to linked list defined yet
+ {
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ order4_hasht[o4_cntxt]=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ return; //nothing else to do
+ }
+
+
+ // The byte was coded under order four, otherwise it was coded under a
+ // lower order (never higher ones, remember that we are using update
+ // exclusion) in this case we have to create a new node in the list.
+
+ if(coded_in_order==4) //coded under order-4
+ {
+
+ // Update its count and variables of this context and check for renormalization
+
+ o4_ll_node->freq++; //increment its frequency (rather probability)
+
+ o4_context->max_cump++; //total cump
+
+ if(o4_ll_node->freq==255) //do we have to renormalize?
+ ppmc_renormalize_order4(); //renormalize
+
+ }
+ else
+ {
+
+ // Now we have two cases, under a given context (which we actually found)
+ // we coded an escape coded, in that case just create a new node in the
+ // linked list of bytes and probabilities. Otherwise we didn't find the
+ // same node so we have to create it in the linked list for context.
+ // And we can be sure that it at least has one element and that
+ // "o4_context" points to the last element, so we can put the new element.
+
+ if(o4_context->order4321==full_o4_cntxt) //chech if that's the last
+ { //element or the a context found
+
+ // Read till the end of the linked list
+
+ node=o4_context->prob; //get pointer to linked list
+
+ while(1)
+ {
+ if(node->next==0) //check for the end of the linked list
+ break;
+ node=node->next; //next node
+ }
+
+ // Now add element
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate mem
+
+ node->next=_bytes_pool; //put it in the next free entry
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ o4_context->max_cump++; //total cump
+ o4_context->defined_bytes++; //total cump
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+ }
+ else
+ {
+
+ // We have to create a new context node
+
+ if(ppmc_out_of_memory==1)
+ return; //exit this function, we can't allocate memory
+
+
+ // First create the context
+
+ o4_context->next=_context_pool;
+
+ _context_pool->next=0; //this is the last element
+ _context_pool->order4321=full_o4_cntxt; //put context
+ _context_pool->prob=_bytes_pool; //pointer to linked list
+ _context_pool->max_cump=1;
+ _context_pool->defined_bytes=1;
+
+
+ // Do update of linked list variables and memory use of contexts
+
+ ++_context_pool; //next time use next entry (this is a pointer)
+
+ if(_context_pool==_context_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_context_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_context_pool=(struct context *)malloc
+ (sizeof(struct context)*_context_pool_elements_inc))==NULL)
+ {
+ printf("Can't allocate memory for context's memory pool.\nIndex: %d",_context_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _context_pool_array[_context_pool_index]=_context_pool;
+
+ _context_pool_max=_context_pool+_context_pool_elements_inc;
+
+ _context_pool_index++;
+
+ }
+
+ // Now care about the first (and last) linked list element
+
+ _bytes_pool->byte=byte; //initialize byte to current one
+ _bytes_pool->freq=1; //it appeared once
+ _bytes_pool->next=0; //now this is last element in ll
+
+ // Do update of linked list variables and memory use
+
+ ++_bytes_pool; //next time use next entry (this is a pointer)
+
+ if(_bytes_pool==_bytes_pool_max) //maximum reached
+ {
+
+ // First check to see that we still have entries in the array
+
+ if(_bytes_pool_index==_mempool_max_index)
+ {
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ // Allocate memory for new buffer
+
+ if((_bytes_pool=(struct _byte_and_freq *)malloc
+ (sizeof(struct _byte_and_freq)*_bytes_pool_elements_inc))==NULL)
+ {
+ printf("\nCan't allocate memory for bytes memory pool.\nIndex: %d",_bytes_pool_index);
+ ppmc_out_of_memory=1; //flush
+ return;
+ }
+
+ _bytes_pool_array[_bytes_pool_index]=_bytes_pool;
+
+ _bytes_pool_max=_bytes_pool+_bytes_pool_elements_inc;
+
+ _bytes_pool_index++;
+
+ }
+
+ }
+
+ }
+
+}
+
+
+
+// This function returns the probability for the escape codes in the variables
+void ppmc_get_escape_prob_order4(void)
+{
+ // To understand that remember that the escape allocated for the escape code
+ // is above the current maximum cump and that it has a probability determined
+ // by the scheme C.
+
+ symb_prob=o4_context->defined_bytes;
+ symb_cump=o4_context->max_cump;
+}
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmc.h"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains the definitions of different functions and all the
+ data structures defined by ppmc. Also contains defines.
+*/
+
+// Definitions
+
+#define ppmc_order4_hash_size 65536
+#define ppmc_order4_hash_key(k,j,i,l) ( (k)+(j<<8)+(i<<1)+(l<<9) )& ppmc_order4_hash_size-1
+#define ppmc_order3_hash_size 65536
+#define ppmc_order3_hash_key(k,j,i) ((k)+(j<<7)+(i<<11)) & ppmc_order3_hash_size-1
+#define ppmc_order2_hash_key(k,j) ((k)+(j<<8))
+#define _bytes_pool_elements 125000 //this is used the first time
+ //that we allocate memory, that's
+ //the number of entries
+#define _bytes_pool_elements_inc 125000 //if we need to alloc again, this
+ //is the number of entries to get
+#define _context_pool_elements 50000
+#define _context_pool_elements_inc 50000
+
+#define _mempool_max_index 1000 //the number of entries in the array with
+ //pointers
+
+
+// Data structures
+
+// This structure contains a single element of a linked lists which contains
+// the probability distribution of a given order. This structure takes 6 bytes.
+struct _byte_and_freq{
+unsigned char byte; //the byte itself
+unsigned char freq; //and the frequency of it
+struct _byte_and_freq *next; //pointer to next element in linked list or 0
+};
+
+
+// This structure is used for both order-3 and order-4. It takes 20 bytes,
+// and it can still hold another byte more. (only 19 being used)
+// Order 2-1-0-(-1) use different structures for a faster accessing.
+struct context{
+struct context *next; //next context in the hash entry
+unsigned long order4321; //order-4-3-2-1 (or order-3-2-1 for order-3)
+struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution
+unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 )
+unsigned int defined_bytes; //the number of bytes in this context
+};
+
+// That's the same but for order-2 where there's no hash collisions.
+struct context_o2{
+struct _byte_and_freq *prob; //pointer to linked lists containing probability distribution
+unsigned int max_cump; //maximum cumulative probability (can't exceed (2^16)-1 )
+unsigned int defined_bytes; //the number of bytes in this context
+};
+
+
+// Declaration of functions
+
+
+// Functions for initializing
+void ppmc_alloc_memory(void);
+void ppmc_initialize_contexts(void);
+void ppmc_encoder_initialize(void);
+void ppmc_decoder_initialize(void);
+void ppmc_free_memory(void);
+void ppmc_flush_mem_enc(void);
+void ppmc_flush_mem_dec(void);
+
+// Functions for order-(-1)
+void ppmc_get_prob_ordern1(void);
+unsigned long ppmc_get_symbol_ordern1(void);
+void ppmc_get_totf_ordern1(void);
+void ppmc_renormalize_order1(void);
+
+// Functions for order-0
+void ppmc_get_totf_order0(void);
+char ppmc_code_byte_order0(void);
+void ppmc_update_order0(void);
+void ppmc_renormalize_order0(void);
+void ppmc_decode_order0(void);
+void ppmc_get_escape_prob_order0(void);
+void ppmc_get_prob_order0(void);
+
+// Functions for order-1
+void ppmc_get_totf_order1(void);
+char ppmc_code_byte_order1(void);
+void ppmc_update_order1(void);
+void ppmc_renormalize_order1(void);
+void ppmc_decode_order1(void);
+void ppmc_get_escape_prob_order1(void);
+void ppmc_get_prob_order1(void);
+
+
+// Functions for order-2
+void ppmc_get_totf_order2(void);
+char ppmc_code_byte_order2(void);
+void ppmc_update_order2(void);
+void ppmc_renormalize_order2(void);
+void ppmc_decode_order2(void);
+void ppmc_update_dec_order2(void);
+void ppmc_get_escape_prob_order2(void);
+void ppmc_get_prob_order2(void);
+
+
+// Functions for order-3
+char ppmc_get_totf_order3(void);
+char ppmc_code_byte_order3(void);
+void ppmc_update_order3(void);
+void ppmc_renormalize_order3(void);
+void ppmc_decode_order3(void);
+void ppmc_update_dec_order3(void);
+void ppmc_get_escape_prob_order3(void);
+void ppmc_get_prob_order3(void);
+
+
+// Functions for order-4
+char ppmc_get_totf_order4(void);
+char ppmc_code_byte_order4(void);
+void ppmc_update_order4(void);
+void ppmc_renormalize_order4(void);
+void ppmc_decode_order4(void);
+void ppmc_update_dec_order4(void);
+void ppmc_get_escape_prob_order4(void);
+void ppmc_get_prob_order4(void);
+
+
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcdata.c"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains global data.
+*/
+
+#include "ppmc.h" //defines
+#include "range.h"
+
+// Order-4 uses a hash table which points to the start of a linked list with
+// the different context, which has the cump, the number of different symbols
+// and a pointer to the linked list with the bytes and frequencies.
+// Order-3 is almost the same, both take 262144 bytes.
+struct context *order4_hasht[ppmc_order4_hash_size];
+
+struct context *order3_hasht[ppmc_order3_hash_size];
+
+
+// The array for order-2 is different, as we do directly hashing, and thus
+// we have no need to do the stuff of linked lists for the context itself,
+// so it contains the context used. This takes 1310720 bytes.
+struct context_o2 order2_array[65536];
+
+
+// Those are the multiple arrays for order-1. It takes 65536 bytes.
+unsigned char order1_array[256][256];
+unsigned int order1_defined_bytes_array[256]; //the defined bytes in every context
+unsigned int order1_max_cump_array[256]; //max cump of every context
+
+
+// This is the array for order-0. It takes 256 bytes.
+unsigned char order0_array[256];
+unsigned int order0_defined_bytes;
+unsigned int order0_max_cump;
+
+
+// No need of variables for order-(-1), because it's fixed.
+
+
+// Those are the pointers and variables used for managing the mem pool for
+// both context, and bytes and frequencies.
+struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked
+ //lists with bytes and frequencies
+ *_bytes_pool_max; //the maximum of this buffer
+struct context *_context_pool; //pointer to pool containing contexts
+struct context *_context_pool_max; //the same as with _bytes_pool
+
+unsigned long _bytes_pool_index; //index in array of pointers
+unsigned long _context_pool_index;
+
+//the following is an array keeping pointers to different buffers. A new
+//buffer is allocated when the current one is full, so we always have a
+//buffer for linked lists. (without allocating a buffer for every element)
+struct _byte_and_freq *_bytes_pool_array[_mempool_max_index];
+struct context *_context_pool_array[_mempool_max_index];
+
+char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any
+ //routine that needs to allocate memory must
+ //quit if that's 1.
+
+
+// Variables which contain current byte to code and order
+unsigned long byte, //current byte to code
+ o1_byte, //order-1 byte
+ o2_byte, //order-2 byte
+ o3_byte, //order-3 byte
+ o4_byte; //order-4 byte
+
+unsigned long o2_cntxt; //used in the hash key of order-2
+unsigned long o3_cntxt; //use as hash key for order-3
+unsigned long o4_cntxt; //use as hash key for order-4
+unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together
+unsigned long full_o4_cntxt; //order-4-3-2-1
+
+unsigned long coded_in_order; //in which order the last byte was coded
+ //it's for update exclusion
+ //also used for decoding
+
+// Variables used for coding
+
+unsigned long
+ total_cump, //the total cumulative probability
+ symb_cump, //the symbol cumulative probability
+ symb_prob; //the symbol frequency
+
+rangecoder rc_coder; //state of range coder
+rangecoder rc_decoder; //state of range decoder
+
+// File handles
+
+ FILE *file_input, //file to code
+ *file_output; //file where the coded data is placed
+
+
+// Pointers to linked lists and context structures used for faster updating
+// or creation of new nodes, because instead of reading again all the linked
+// list till the end (in the case of creation) we have a pointer to the last
+// element. In the case that a byte was present in the linked lists but it
+// had a 0 count, we just have to update its probability. And in the case
+// that it already was present and we coded it under that context or a lower
+// one, we just have to update its probability.
+
+
+struct _byte_and_freq *o2_ll_node; //pointer to linked lists under order-2
+ //where does it points depends in which
+ //order the byte was coded.
+struct _byte_and_freq *o3_ll_node; //the same but for order-3
+struct _byte_and_freq *o4_ll_node;
+
+struct context *o3_context; //pointer to current order-3 context
+struct context *o4_context; //pointer to current order-3 context
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcdata.h"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder and decoder.
+
+ This module contains externs definition for global data.
+*/
+
+#include "ppmc.h"
+
+// Order-4 uses a hash table which points to the start of a linked list with
+// the different context, which has the cump, the number of different symbols
+// and a pointer to the linked list with the bytes and frequencies.
+// Order-3 is almost the same, both take 262144 bytes.
+extern struct context *order4_hasht[];
+
+extern struct context *order3_hasht[];
+
+
+// The array for order-2 is different, as we do directly hashing, and thus
+// we have no need to do the stuff of linked lists for the context itself,
+// so it contains the context used. This takes 1310720 bytes.
+extern struct context_o2 order2_array[];
+
+
+// Those are the multiple arrays for order-1. It takes 65536 bytes.
+extern unsigned char order1_array[256][256];
+extern unsigned int order1_defined_bytes_array[]; //the defined bytes in every context
+extern unsigned int order1_max_cump_array[]; //max cump of every context
+
+
+// This is the array for order-0. It takes 256 bytes.
+extern unsigned char order0_array[];
+extern unsigned int order0_defined_bytes;
+extern unsigned int order0_max_cump;
+
+
+// Those are the pointers and variables used for managing the mem pool for
+// both context, and bytes and frequencies.
+extern struct _byte_and_freq *_bytes_pool, //pointer to pool containing linked
+ //lists with bytes and frequencies
+ *_bytes_pool_max; //the maximum of this buffer
+extern struct context *_context_pool; //pointer to pool containing contexts
+extern struct context *_context_pool_max; //the same as with _bytes_pool
+
+extern unsigned long _bytes_pool_index; //index in array of pointers
+extern unsigned long _context_pool_index;
+
+//the following is an array keeping pointers to different buffers. A new
+//buffer is allocated when the current one is full, so we always have a
+//buffer for linked lists. (without allocating a buffer for every element)
+extern struct _byte_and_freq *_bytes_pool_array[_mempool_max_index];
+extern struct context *_context_pool_array[_mempool_max_index];
+
+extern char ppmc_out_of_memory; //0 if we have enough memory, 1 instead, any
+ //routine that needs to allocate memory must
+ //quit if that's 1.
+
+
+
+// Variables which contain current byte to code and order
+extern unsigned long //they are only bytes
+ byte, //current byte to code
+ o1_byte, //order-1 byte
+ o2_byte, //order-2 byte
+ o3_byte, //order-3 byte
+ o4_byte; //order-4 byte
+
+extern unsigned long o2_cntxt; //used in the hash key of order-2
+extern unsigned long o3_cntxt; //use as hash key for order-3
+unsigned long o4_cntxt; //use as hash key for order-4
+extern unsigned long full_o3_cntxt; //o1_byte, o2_byte and o3_byte together
+extern unsigned long full_o4_cntxt; //order-4-3-2-1
+
+extern unsigned long coded_in_order; //in which order the last byte was coded
+ //it's for update exclusion
+ //also used for decoding
+// Variables used for coding
+
+extern unsigned long //no need for negative values
+ total_cump, //the total cumulative probability
+ symb_cump, //the symbol cumulative probability
+ symb_prob; //the symbol frequency
+
+extern rangecoder rc_coder; //state of range coder
+extern rangecoder rc_decoder; //state of range decoder
+
+// File handles
+
+ FILE *file_input, //file to code
+ *file_output; //file where the coded data is placed
+
+
+
+// Pointers to linked lists and context structures used for faster updating
+// or creation of new nodes, because instead of reading again all the linked
+// list till the end (in the case of creation) we have a pointer to the last
+// element. In the case that a byte was present in the linked lists but it
+// had a 0 count, we just have to update its probability. And in the case
+// that it already was present and we coded it under that context or a lower
+// one, we just have to update its probability.
+
+
+extern struct _byte_and_freq *o2_ll_node;//pointer to linked lists under order-2
+ //where does it points depends in which
+ //order the byte was coded.
+extern struct _byte_and_freq *o3_ll_node; //the same but for order-3
+extern struct _byte_and_freq *o4_ll_node;
+
+extern struct context *o3_context; //pointer to current order-3 context
+extern struct context *o4_context; //pointer to current order-3 context
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "ppmcmain.c"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Part of the ppmc encoder only.
+
+ This module is the main module and calls the different modules to do
+ the encoding of a file. When done prints bpb and kbyps.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h" //the range coder functions and data
+#include "ppmcdata.h"
+
+
+long filesize(FILE *stream);
+
+
+
+//Main
+int main (char argc, char *argv[])
+{
+ unsigned long counter, //temporal counter for loops like for or while
+ counter2, //another temporal counter for sub loops
+ size_file_input; //the size of the input file
+
+
+ // Print title, version and copyright
+ printf("PPMC using range coder. (without exclusion)\n");
+ printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n");
+ printf("Permission is granted to make verbatim copies of this program for private\n");
+ printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n");
+
+ // Check for correct number of parameters
+ if(argc!=3)
+ {
+ printf("Bad number of arguments.\n");
+ exit(1);
+ }
+
+
+ // Try to open input and output files
+ if((file_input=fopen(argv[1],"r+b"))==NULL)
+ {
+ printf("Couldn't open %s.\n",argv[1]);
+ exit(1);
+ }
+
+ if((file_output=fopen(argv[2],"w+b"))==NULL)
+ {
+ printf("Couldn't create %s.\n",argv[2]);
+ exit(1);
+ }
+
+
+ // Check input file length and not accept 0 length files
+ size_file_input=filesize(file_input);
+
+ if(size_file_input<5)
+ {
+ printf("Can't work with files below than 5 bytes!");
+ exit(1);
+ }
+
+
+ // First output file length
+ fwrite(&size_file_input,1,4,file_output); //input length
+
+
+ // Initialize ppmc encoder
+ ppmc_alloc_memory(); //get memory
+ ppmc_initialize_contexts(); //initialize model
+ ppmc_encoder_initialize();
+
+ // Initialize range coder
+ range_coder_init(&rc_coder,file_output);
+
+
+ // Start main loop which codes the file
+ while((byte=fgetc(file_input))!=EOF)
+ {
+
+ // Try to code current byte under order-4 if possible then go to lower orders
+ if(ppmc_code_byte_order4()==0)
+ if(ppmc_code_byte_order3()==0)
+ if(ppmc_code_byte_order2()==0)
+ if(ppmc_code_byte_order1()==0)
+ if(ppmc_code_byte_order0()==0) //else try to code under order-0
+ {
+ // Code under order-(-1)
+ ppmc_get_prob_ordern1();
+ range_coder_encode(&rc_coder,total_cump,symb_cump,symb_prob);
+ coded_in_order=0; //update all the tables (unless order-(-1))
+ }
+
+
+ // Now do update exclusion
+
+ switch(coded_in_order)
+ {
+ case 0: ppmc_update_order0(); //update only order-0
+ case 1: ppmc_update_order1(); //update order-0 and order-1
+ case 2: ppmc_update_order2(); //update order-2 1 and 0...
+ case 3: ppmc_update_order3();
+ case 4: ppmc_update_order4();
+ default: break;
+ };
+
+
+
+ // Update order variables
+
+ o4_byte=o3_byte;
+ o3_byte=o2_byte;
+ o2_byte=o1_byte;
+ o1_byte=byte; //current one is next time order-1
+
+
+ // Check if we run out of memory, in that case, flush the encoder
+
+ if(ppmc_out_of_memory==1)
+ {
+ printf("Flushing memory! Output file might be not decodable.\n");
+ ppmc_flush_mem_enc();
+ }
+
+
+ }
+
+
+ // Flush range coder
+ range_coder_flush(&rc_coder);
+
+ // Free memory
+ ppmc_free_memory();
+
+
+ // Print bpb and kbyps
+ printf("%s at %f bpb.\n",argv[1],((float)filesize(file_output)/(float)size_file_input)*(float)8);
+
+
+ // Close file handles
+ fclose(file_input);
+ fclose(file_output);
+
+
+
+ // Nicely exit
+ return 0;
+}
+
+
+// Routines not used by ppmc but rather by main.
+// Not including the range coder.
+
+
+// Returns the file size of a given file.
+long filesize(FILE *stream)
+{
+ long curpos, length;
+
+ curpos = ftell(stream);
+ fseek(stream, 0L, SEEK_END);
+ length = ftell(stream);
+ fseek(stream, curpos, SEEK_SET);
+ return length;
+}
+
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "range.c"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ This module contains the routines of both the range coder and decoder.
+
+ The range coder works internally in 32 bits, and uses bytes as symbols.
+ Also the end of message symbol is used. So z=257.
+
+ Both input and output use rc_file as the file stream. Of course we can't
+ code and decode at the same time. All the input or output comes from the
+ same file, no matter what range coder structure are we using. The modules
+ here provided don't manage the io except for reading and writing, they
+ don't open nor close the files. The reading and writing is done via
+ putc and getc.
+*/
+
+#include "range.h"
+#include <stdlib.h>
+
+/*
+ Inits the range coder state. Must be called before encoding any symbol.
+ It uses a magic number 0xB3 as the first byte outputted.
+ -rangecoder *rc, the range coder to be used.
+
+ Shoulde be called like that:
+ range_coder_init(&o0_rc_state,file_output);
+*/
+void range_coder_init(rangecoder *rc, FILE *stream)
+{
+ rc_file=stream;
+ rc->low=0; //define state
+ rc->range=0x80000000;
+ rc->byte_buffer=0xB3; //magic number
+ rc->help=0; //temp value
+}
+
+
+/*
+ Encodes a symbol.
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative frequency
+ -unsigned long lt_f, the cumulative probabilty of the symbol
+ -unsigned long sy_f, the probability of the symbol
+*/
+void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f)
+{
+ unsigned long temp, r;
+
+ range_coder_renormalize(rc); //&rc?
+
+ r=rc->range/tot_f;
+ temp=r*lt_f;
+ if(lt_f+sy_f<tot_f)
+ rc->range=r*sy_f;
+ else
+ rc->range-=temp;
+ rc->low+=temp;
+}
+
+/*
+ Renormalizes the state when coding.
+ -rangecoder *rc, the range coder to be used.
+*/
+
+void range_coder_renormalize(rangecoder *rc)
+{
+ while(rc->range<=(unsigned long)0x00800000)
+ {
+ if(rc->low<(unsigned long)0x7F800000)
+ {
+ putc(rc->byte_buffer,rc_file);
+ for(;rc->help;rc->help--)
+ putc(0xFF,rc_file);
+ rc->byte_buffer=(unsigned char)(rc->low>>23);
+ }
+ else
+ {
+ if(rc->low&(unsigned long)0x80000000)
+ {
+ putc(rc->byte_buffer+1,rc_file);
+ for(;rc->help;rc->help--)
+ putc(0x00,rc_file);
+ rc->byte_buffer=(unsigned char)(rc->low>>23);
+ }
+ else
+ rc->help++;
+ }
+ rc->range<<=8;
+ rc->low=(rc->low<<8)&(unsigned long)(0x7FFFFFFF);
+ }
+}
+
+
+/*
+ Flushes the encoder. Must be called when the coding is done.
+ -rangecoder *rc, the range coder to be used.
+
+ Shoulde be called like that:
+ range_coder_flush(&o0_rc_state);
+*/
+void range_coder_flush(rangecoder *rc)
+{
+ unsigned long tmp;
+
+ range_coder_renormalize(rc);
+ tmp = rc->low >> 23;
+ if (tmp > 0xff)
+ {
+ putc(rc->byte_buffer+1,rc_file);
+ for(; rc->help; rc->help--)
+ putc(0,rc_file);
+ }
+ else
+ {
+ putc(rc->byte_buffer,rc_file);
+ for(; rc->help; rc->help--)
+ putc(0xff,rc_file);
+ }
+
+ putc(tmp & 0xff,rc_file);
+ putc((tmp = rc->low >> (23-8)) & 0xff,rc_file);
+}
+
+
+/*
+ Inits the range decoder state. Also checks for the magic number, and
+ quits in case it isn't the first, so be careful.
+ -rangecoder *rc, the range coder to be used.
+*/
+void range_decoder_init(rangecoder *rc, FILE *stream)
+{
+ unsigned int _rd_c;
+
+ rc_file=stream;
+ if((_rd_c=getc(rc_file))!=0xB3)
+ {
+ printf("\nThis is not range coded data. Magic number not found. Exiting.");
+ exit(1);
+ }
+ rc->byte_buffer=getc(rc_file);
+ rc->low=rc->byte_buffer>>1;
+ rc->range=0x80;
+}
+
+
+/*
+ Decode a symbol, get its cumulative probability.
+ Input:
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative probability
+ Output:
+ -unsigned long, cumulative probability of the current symbol
+ Should be called like that:
+ current_cump=range_decoder_decode(&o0_rc_state,o0_tot_f);
+*/
+unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f)
+{
+ unsigned long temp;
+
+ range_decoder_renormalize(rc);
+ rc->help=rc->range/tot_f;
+ temp=rc->low/rc->help;
+ if(temp>=tot_f)
+ return tot_f-1;
+ else
+ return temp;
+}
+
+
+/*
+ Updates the state so next symbol can be decoded.
+ Input:
+ -rangecoder *rc, the range coder to be used.
+ -unsigned long tot_f, the maximum cumulative probability
+ -unsigned long lt_f, the cumulative probabilty of the symbol
+ -unsigned long sy_f, the probability of the symbol
+
+*/
+void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f)
+{
+ unsigned long temp;
+
+ temp=rc->help*lt_f;
+ rc->low-=temp;
+ if(lt_f+sy_f<tot_f)
+ rc->range=rc->help*sy_f;
+ else
+ rc->range-=temp;
+}
+
+
+/*
+ Renormalizes the state while decoding.
+ -rangecoder *rc, the range coder to be used.
+*/
+void range_decoder_renormalize(rangecoder *rc)
+{
+ while(rc->range<=0x00800000)
+ {
+ rc->low=(rc->low<<8)|((rc->byte_buffer<<7)&0xFF);
+ rc->byte_buffer=getc(rc_file);
+ rc->low |= rc->byte_buffer >> (1);
+ rc->range<<=8;
+ }
+}
+
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this file for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "range.h"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+ Declarations for the coder.
+*/
+
+#include <stdio.h>
+
+typedef struct{
+ unsigned long low, range, help;
+ unsigned char byte_buffer;
+}rangecoder;
+
+FILE *rc_file;
+
+void range_coder_init(rangecoder *rc, FILE *stream); //coding routines
+void range_coder_encode(rangecoder *rc,unsigned long tot_f, unsigned long lt_f,unsigned long sy_f);
+void range_coder_renormalize(rangecoder *rc);
+void range_coder_flush(rangecoder *rc);
+void range_decoder_init(rangecoder *rc, FILE *stream);//decoding routines
+unsigned long range_decoder_decode(rangecoder *rc, unsigned long tot_f);
+void range_decoder_update(rangecoder *rc, unsigned long tot_f, unsigned long lt_f,unsigned long sy_f);
+void range_decoder_renormalize(rangecoder *rc);
+
+
+typedef unsigned long code_value;
+#define CODE_BITS 32
+#define Top_value ((code_value)1 << (CODE_BITS-1))
+#define SHIFT_BITS (CODE_BITS - 9)
+#define EXTRA_BITS ((CODE_BITS-2) % 8 + 1)
+#define Bottom_value (Top_value >> 8)
+#define outbyte(cod,x) putc(x,stdout)
+#define inbyte(cod) getc(stdin)
--- /dev/null
+/*
+ Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.
+ Permission is granted to make verbatim copies of this program for private
+ use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.
+
+ This file is: "unppmc.c"
+ Email: arturo@arturocampos.com
+ Web: http://www.arturocampos.com
+
+
+ Part of the ppmc decoder.
+
+ This module is the main module and calls the different modules to do
+ the decoding of a file. When done prints kbyps.
+*/
+
+
+// Bibliotecas necesarias
+#include <stdio.h>
+#include <stdlib.h>
+#include "range.h" //the range coder functions and data
+#include "ppmcdata.h"
+
+
+// Declaracion de funciones del ppmcmain.c
+long filesize(FILE *stream);
+
+
+
+
+//Main
+void main (int argc, char *argv[])
+{
+ unsigned long counter, //temporal counter for loops like for or while
+ counter2, //another temporal counter for sub loops
+ size_file_output, //the size of the output file
+ main_counter; //used in main
+ char expected_flush=0; //used for checking flushing which can't be done
+
+
+ // Print title, version and copyright
+ printf("UNPPMC using range coder.\n");
+ printf("Copyright (C) Arturo San Emeterio Campos 1999. All rights reserved.\n");
+ printf("Permission is granted to make verbatim copies of this program for private\n");
+ printf("use only. There is ABSOLUTELY NO WARRANTY. Use it at your OWN RISK.\n");
+
+
+
+ // Check for correct number of parameters
+ if(argc!=3)
+ {
+ printf("Bad number of arguments.\n");
+ exit(1);
+ }
+
+
+ // Try to open input and output files
+ if((file_input=fopen(argv[1],"r+b"))==NULL)
+ {
+ printf("Couldn't open %s.\n",argv[1]);
+ exit(1);
+ }
+
+ if((file_output=fopen(argv[2],"w+b"))==NULL)
+ {
+ printf("Couldn't create %s.\n",argv[2]);
+ exit(1);
+ }
+
+
+ // Get output length
+ fread(&size_file_output,1,4,file_input);
+
+
+ // Initialize ppmc decoder
+ ppmc_alloc_memory();
+ ppmc_initialize_contexts();
+ ppmc_decoder_initialize();
+
+
+
+ // Initialize decoder
+ range_decoder_init(&rc_decoder,file_input);
+
+
+ // Start main loop which decodes the file
+ main_counter=size_file_output-4; //take in account the bytes already written
+ expected_flush=0; //we don't expect a flush yet
+
+ while(main_counter!=0)
+ {
+
+// Try to decode current byte in order-4 if possible, else in lower ones
+ppmc_decode_order4();
+if(byte==-1)
+ ppmc_decode_order3();
+ if(byte==-1)
+ {
+ ppmc_decode_order2();
+ if(byte==-1)
+ {
+ ppmc_decode_order1();
+ if(byte==-1)
+ {
+ ppmc_decode_order0();
+ if(byte==-1) //check if it was an escape code
+ {
+ // Decode in order-(-1)
+ ppmc_get_totf_ordern1();
+ symb_cump=range_decoder_decode(&rc_decoder,total_cump);
+ byte=ppmc_get_symbol_ordern1();
+ ppmc_get_prob_ordern1();
+ range_decoder_update(&rc_decoder,total_cump,symb_cump,symb_prob);
+ coded_in_order=0; //update all orders
+
+ // Now see if it's the code of flushing
+
+ if(symb_cump==256)
+ {
+ printf("Flushing.\n");
+ ppmc_flush_mem_dec();
+ expected_flush=0;
+ continue; //do not output byte nor update
+ }
+
+ }
+ }
+ }
+ }
+
+ // Output byte and update model
+
+ fputc(byte,file_output);
+
+ switch(coded_in_order) //update exclusion
+ {
+ case 0: ppmc_update_order0(); //update only order-0
+ case 1: ppmc_update_order1(); //update order-0 and order-1
+ case 2: ppmc_update_dec_order2(); //update order-0 1 and 2
+ case 3: ppmc_update_dec_order3();
+ case 4: ppmc_update_dec_order4();
+ default: break;
+ };
+
+
+ // Check if flushing has to be done and has not been done.
+ // This is optional, in case you limit the memory usage, you don't
+ // need to include this
+
+ if(expected_flush==1) // If flushing didn't happen, we can't decode
+ {
+ printf("Can't decompress file. Not enough memory.\nTry in a machine with more memory.\n");
+ exit(1);
+ }
+ if(ppmc_out_of_memory==1)
+ {
+ expected_flush=1; // Next code must be a flush code, otherwise we don't
+ // have enough memory, and therefore we can't decode
+ }
+
+
+ // Update order variables
+
+ o4_byte=o3_byte;
+ o3_byte=o2_byte;
+ o2_byte=o1_byte;
+ o1_byte=byte; //current one, is next time order-1
+
+ // Byte decoded and model updated, loop
+ main_counter--;
+
+
+ }
+
+
+ ppmc_free_memory();
+
+ // Close file handles and free memory
+ fclose(file_input);
+ fclose(file_output);
+
+
+ // Nicely exit
+ exit(0);
+}
+
+
+// Ruotines not used by ppmc but rather by main.
+// Not including the range coder.
+
+
+// Returns the file size of a given file.
+long filesize(FILE *stream)
+{
+ long curpos, length;
+
+ curpos = ftell(stream);
+ fseek(stream, 0L, SEEK_END);
+ length = ftell(stream);
+ fseek(stream, curpos, SEEK_SET);
+ return length;
+}
+
+