Algorithms – Huffman encoding
Hello everyone, let’s talk about the Huffman encoding.
The algorithm was taken from here and slightly modified to be more clear.
As this algorithm is fairly simple and Internet is full of explanation regarding it, I will not go in too much detail.
The algorithm is compressing the number of bits that are used for representing a string by assigning each letter a unique binary code, the one being more frequent receiving the smallest binary representation of that ASCII character.
First of all, we need to create a map of frequencies.
This is done by counting the number of times a character appear in the sentence we want to encode.
The frequency looks like this, for the string ‘hello’ ->
‘h’ : 1
‘e’ : 1
‘l’ : 2
‘o’ : 1
Secondly, we will create a priority tree, where:
- The leafs are the actual characters, with their assigned frequency
- The parent nodes will contain the frequency sum of the two child leafs assigned to it.
The tree is created by keeping a priority queue, containing at start all the characters and their given frequency, retrieved for a specific string.
Until the queue contain only one element (the root), we will take the two most frequent leafs/nodes and we will create a parent node that will contain them.
Once we have the tree created, we can generate the codes for the used characters, by iterating through each node until all the leafs were visited.
Starting from the root, all the left nodes will receive the value 0, while all the right nodes will receive the value 1. (or opposite way, just be consistent!).
Let’s see some code.
Generate Tree function:
- Receives the frequency map
- Returns the root node
INode* GenerateTree(FrequencyMap& frequencyMap)
INode* root = nullptr;
// insert the characters into the priority tree
for (auto const& i : frequencyMap)
auto character = i.second;
auto frequency = i.first;
// get the two most frequent nodes, up to this point, and remove them from the queue
INode* childL = priorityTree.top();
INode* childR = priorityTree.top();
// create a parent node that contains them
// returns the root node
The code generator will receive the root node and will insert data into the HuffmanCodes structure
(which is a map with key = character, and value = it’s corresponding Huffman code)
void GenerateCode(INode* node, PrefixQueue prefix, HuffmanCodes& codes)
// if the node is a leaf, we store its code into the map, with key = character
// the node is a parent node, forking in order to reach both child leafs
// the left children have 0 added to the Huffman code
PrefixQueue leftSide = prefix;
// the right children have 1 added to the Huffman code
PrefixQueue rightSide = prefix;
The full code can be found in the github repository: