Using playing cards to store hidden data:
The implied card method for
encoding data into playing cards
Using the method to encode letters of the alphabet
The Implied Card Method becomes much more useful when we use it to encode messages using letters of the alphabet. There are 26 letters in the English alphabet, which can all be portrayed using only 5-bits of data: the binary numbers 00000 to 11001 (0 to 25 in decimal). At first, it might seem sensible if we let 00000 represent "a", 00001 represent "b", 00010: "c", and so on. However, we can encode data much more efficiently than this.
One possible idea, if we phrased our messages cleverly, is to use only the most popular 16 letters of the alphabet. If we did this, we could fit the letters we were going to use into just 4-bits: 0000 to 1111. The order of frequency for the English alphabet is "etaoinsrhldcumfpgwybvkxjqz". [Strictly speaking, the order of the later letters varies depending on which English text has been analysed to create the order. However, to keep things simple, we will say that this order is correct.] If we wanted to use a 4-bit frequency-alphabet encoding, "e" would be represented by 0000, "t" would be 0001, "a" would be 0010, "o" would be "0011", and so on. We would only be able to use letters up to and including "p". The rest of the letters ("gwybvkxjqz") would not be included, but we might be able to cope with this if we put some thought into our messages.
As it is, it is better to use the whole alphabet. To do this, we will need to use 5-bits. We can still use the alphabet sorted by frequency. This gives us some advantages, as lower numbers will match on to the most frequently used letters. The 5-bit binary number 00000 will be mapped on to the most common letter, "e". This means that the most common letter in the English alphabet won't actually use any cards at all. A string of a few "e"s will have no effect on the number of cards used, thus leaving more cards for other letters. The next most common letter is "t". The number 00001 maps on to "t", and every time "t" appears, it will only use one card.
One of the biggest problems to look out for when encoding long text strings is that we do not end up with too many consecutive zeroes near the end of the pack. For example, if we have a string of five zeroes, but only 5 cards left to use, those five cards would be removed and put to the back of the pack in exactly the same order and position as before. On decoding, the digits they represented would not appear. In that situation, we would have to stop before we got to the end of the pack, which would mean that on decoding, we would have five unused cards (decoded as ones). Those unused cards might be misinterpreted as a valid character. To stop that happening, it makes sense to treat the binary numbers containing all ones as a sign that we have reached the end of the pack, and not to map them on to letters.
The chart for mapping 5-bit numbers to the frequency-alphabet is as follows:
Mapping Table 1:
00000 (00 in decimal): e
00001 (01): t
00010 (02): a
00011 (03): o
00100 (04): i
00101 (05): n
00110 (06): s
00111 (07): r
01000 (08): h
01001 (09): l
01010 (10): d
01011 (11): c
01100 (12): u
01101 (13): m
01110 (14): f
01111 (15): p
10000 (16): g
10001 (17): w
10010 (18): y
10011 (19): b
10100 (20): v
10101 (21): k
10110 (22): x
10111 (23): j
11000 (24): q
11001 (25): z
11010 (26): [unused]
11011 (27): [unused]
11100 (28): [unused]
11101 (29): [unused]
11110 (30): [unused]
11111 (31): Not a character: End of the pack
As you can see, we have 5 unused numbers. This would allow this encoding method to work with some non-English alphabets containing more than 26 letters, but as we are using the basic English alphabet here, we could use the numbers for things such as spaces, punctuation or the more common numerals. If we were going to do this here, I think it would be best to use them to represent the most commonly used English words. In this way, we will save even more cards. The most commonly used words in the English language are: the, be, to, of, and, a, in, that, have, I, it, for, not, on, with, he, as, you, do, at. The word "the" can be ignored. Single letter words can be skipped too as they are already in our alphabet. This leaves: "be", "to", "of", "and" and "in" for 11010 (26) to 11110 (30).
I will show later that this alphabet mapping is not as efficient a way for encoding the letters as it could be. However, it is simpler than later encodings and a good starting point to understand how this works.
The following is how to encode a simple message using this alphabet mapping. We are going to encode the message: "I am encoding text using some cards" using the frequency alphabet and 5-bit numbers. We will ignore the spaces, leaving us 29 letters to encode into 52 cards, which is just under two cards per letter. There will be 5-bits for each letter, which means 145 bits of data will be encoded within one pack of cards. This works out as 2.8 bits per card.
Using the table above, the letters of the message, "I am encoding text using some cards" are encoded as 5-bit numbers as follows:
i: 00100 (4 in decimal)
a: 00010 (2)
m: 01101 (13)
e: 00000 (0)
n: 00101 (5)
c: 01011 (11)
o: 00011 (3)
d: 01010 (10)
i: 00100 (4)
n: 00101 (5)
g: 10000 (16)
t: 00001 (1)
e: 00000 (0)
x: 10110 (16)
t: 00001 (1)
u: 01100 (12)
s: 00110 (6)
i: 00100 (4)
n: 00101 (5)
g: 10000 (16)
s: 00110 (6)
o: 00011 (3)
m: 01101 (13)
e: 00000 (0)
c: 01011 (11)
a: 00010 (2)
r: 00111 (7)
d: 01010 (10)
s: 00110 (6)
To start encoding this message, we must first make sure the pack is in order: aces to kings, with the suits in order of spades, clubs, hearts then diamonds.
Now that the idea behind the absent cards is clearer, try keeping the yet to be used (and previously absent) cards of the pack in one hand, and putting the "kept" cards either in your other hand or in a neat stack face down on the table. It is a lot quicker to do this.
- The first number is 00100. We pass the first two cards to the back of the pack, Keep the third one, and pass the next two to the back of the pack.
- The next number is 00010. We pass the first three cards to the back of the pack, keep the next one, and pass the one after that to the back.
- Next, we have 01101. We pass the first and fourth cards of this group of five to the back of the pack and keep the second, third and fifth.
We then continue like this through the pack until we have done the last letter. We will have three leftover cards that we do not need to use: the jack of spades, the four of clubs and the seven of spades.
We now have a pack of cards containing our encoded message.
Decoding this message:
Decoding the message in this pack of cards is the same as with our earlier example. On a piece of paper, we write out the names of the cards in order from the ace of spades to the king of diamonds.
Then, we go through the pack from the start, card by card, marking a 1 if that card is present, and a 0 if that card is absent. The first card is the 3 of spades, so we put a 0 underneath the ace and the 2 of spades, and a 1 underneath the 3 of spades. The next card is the 9 of spades, so we put a 1 under that, and a 0 underneath the 4, 5, 6, 7 and 8. We continue like this until we get to the end of our Decoding Row (in other words when we pass the point that the king of diamonds should have appeared). Next, we make another row underneath, consisting of the cards that did not appear in the first row (the cards that have a zero beneath them). We go through this second row marking ones and zeroes until we get to the end of this too. We then make a third row, a fourth row, and so on until we have done all the cards. In this particular example, we have to do 8 Decoding Rows. The 8th row only contains the 7 of spades, which, up to that point, acted as the zero in 7 different numbers.
When the rows are done, we can read off the binary numbers from left to right, top to bottom. To do this most easily, we mark them into groups of five, treat each group as a binary number, and then read off which letter that number corresponds to in the table. The first group of five gives the binary number 00100. This translates to the letter "i". The second group of five gives the binary number 00010, which translates to the letter "a". When we have gone through all the rows, we will have recovered our original message.
First Optimisation:
Although it looks like we are saving space in the last example by using the frequency alphabet and 5-bit numbers, there is one big flaw in the mapping table. The Implied Card Method benefits from using as few ones in every 5-bit number as possible; it does not necessarily benefit from using
low numbers (unless those numbers happen to contain few ones). In the previous example, we were just mapping on to low numbers as opposed to numbers with few ones in them. Therefore, a better form of mapping would map numbers with the fewest number of ones with the most frequently used letters in the English alphabet, as I have done in the following table. This requires re-ordering the binary numbers as follows:
Mapping Table 2:
Numbers without any 1s:
00000 (0 in decimal): e
Numbers with one occurrence of 1:
00001 (01): t
00010 (02): a
00100 (04): o
01000 (08): i
10000 (16): n
Numbers with two occurrences of 1s:
00011 (03): s
00101 (05): r
00110 (06): h
01001 (09): l
01010 (10): d
01100 (12): c
10001 (17): u
10010 (18): m
10100 (20): f
11000 (24): p
Three 1s:
00111 (07): g
01011 (11): w
01101 (13): y
01110 (14): b
10011 (19): v
10101 (21): k
10110 (22): x
11001 (25): j
11010 (26): q
11100 (28): z
Four 1s:
01111 (15): [Unassigned]
10111 (23): [Unassigned]
11011 (27): [Unassigned]
11101 (29): [Unassigned]
11110 (30): [Unassigned]
Five 1s:
11111 (31): Not a character: End of the pack
The first 16 letters of the alphabet are mapped on to numbers with two or fewer ones, so will use two or fewer cards in the pack. The remaining letters use just three cards. Already, we have improved our system to be more efficient. Admittedly, it is now more complicated. It would be harder to remember which letter maps to which number, but the mapping table isn't random. It is calculated logically, so can easily be recreated even if the person decoding cannot remember the actual details.
For the unassigned numbers that contain four or more ones, we could again assign commonly used words. However, now that we are aware that we should be using as few ones as possible, we need to be more careful with the words. Our previous choice included "be". If that is mapped to, say, 01111, it will use 4 cards. However, if it were spelled out with individual letters, it would be "b" (01110) and "e" (00000) so only use 3 cards. The letters "t" and "o" only use 2 cards, so the word "to" is also less efficient, given that it would end up using four ones. The word "and" spelled out with individual letters uses 4 cards, so there is no advantage in using it in the surplus numbers. We could of course swap "and" with "z", which is generally a pointless letter at the best of times and can be easily replaced in written messages with "s". The word "and" would then only use 3 cards. There are probably quite a few words that would appear in written text more often than the letters "z", "q", "j" and "x". It would make sense to swap the words with the letters so the words use three ones and the rare letters use four. There are huge advantages in mapping common words on to some of the numbers, but it needs some thought to make it as efficient as possible. For now, I will ignore commonly used words and the unassigned numbers, and just concentrate on letters in order to keep this explanation simple.
If we want to encode our previous message ("I am encoding text using some cards") using the new "fewest ones" table then the binary numbers used will be as follows:
i: 01000
a: 00010
m: 10010
e: 00000
n: 10000
c: 01100
o: 00100
d: 01010
i: 01000
n: 10000
g: 00111
t: 00001
e: 00000
x: 10110
t: 00001
u: 10001
s: 00011
i: 01000
n: 10000
g: 00111
s: 00011
o: 00100
m: 10010
e: 00000
c: 01100
a: 00010
r: 00101
d: 01010
s: 00011
Encoding the letters using this mapping leaves 9 unneeded cards at the end of the pack: the ace, 2, 10 and king of diamonds, the 6 of spades, the 4, 6, 10, and king of clubs.
Therefore, we have encoded a message of 29 letters using just 43 cards (as opposed to 49 in the last example). This works out at 1.48 cards per letter. When it comes to binary, we are encoding 145 bits of data in 43 cards, which works out at 3.3 bits per card, although they are specially chosen bits so it doesn't really count.
To decode this, we only need 6 Decoding Rows. The 6th row contains the last five unused cards.
Second Optimisation:
Now that we have reduced the number of ones in all our letter mappings, there is a counter-intuitive way to reduce them even more. Ideally, we want to reduce the number of ones used in each binary number. As we are using 5-bit numbers, it is unavoidable that 16 of our numbers should have 3 or more ones in them. However, if we used 6-bit numbers, there would be a larger quantity of numbers containing two ones. We would technically be using more bits, but counter-intuitively we would be using fewer cards. (The bits we are using are carefully chosen, and we are actually taking advantage of the patterns in the numbers to fit the letters into the cards). If we were not encoding letters of the alphabet, there would be no advantage to this, but when using letters it makes sense.
Here is a 6-bit table with the numbers with the fewest ones mapped on to the most frequent letters of the English alphabet. I have split each binary number into two parts to make it easier to read.
Mapping Table 3:
No 1s at all:
000 000 (00): e
One 1:
000 001 (01): t
000 010 (02): a
000 100 (04): o
001 000 (08): i
010 000 (16): n
100 000 (32): s
Two 1s:
000 011 (03): r
000 101 (05): h
000 110 (06): l
001 001 (09): d
001 010 (10): c
001 100 (12): u
010 001 (17): m
010 010 (18): f
010 100 (20): p
011 000 (24): g
100 001 (33): w
100 010 (34): y
100 100 (36): b
101 000 (40): v
110 000 (48): k
Three 1s:
000 111 (07): x
001 011 (11): j
001 101 (13): q
001 110 (14): z
010 011 (19): [Unassigned 1]
010 101 (21): [Unassigned 2]
010 110 (22): [Unassigned 3]
011 001 (25): [Unassigned 4]
011 010 (26): [Unassigned 5]
011 100 (28): [Unassigned 6]
100 011 (35): [Unassigned 7]
100 101 (37): [Unassigned 8]
100 110 (38): [Unassigned 9]
101 001 (41): [Unassigned 10]
101 010 (42): [Unassigned 11]
101 100 (44): [Unassigned 12]
110 001 (49): [Unassigned 13]
110 010 (50): [Unassigned 14]
110 100 (52): [Unassigned 15]
111 000 (56): [Unassigned 16]
Other:
111 111 (63): Not a character: End of the pack
All except 4 letters are now encoded using two or fewer ones. No letters require four ones. We also have 16 unassigned numbers using three ones, and we can use them to assign common words and punctuation if we so wish. Not only does the new table provide us with more words, but also each word will use one less card than before. The extra unassigned numbers also allow the method to work properly with alphabets containing more than 26 letters. Up to 42 letters of an alphabet can be encoded using three ones or fewer. There are also more 6-bit numbers containing more ones that we could use if we wanted.
Now, when we encode our message ("I am encoding text using some cards") using the 6-bit numbers, the binary numbers used will be as follows: (again, I have split each binary number into two parts to make it easier to read):
i: 001 000
a: 000 010
m: 010 001
e: 000 000
n: 010 000
c: 001 010
o: 000 100
d: 001 001
i: 001 000
n: 010 000
g: 011 000
t: 000 001
e: 000 000
x: 000 111
t: 000 001
u: 001 100
s: 100 000
i: 001 000
n: 010 000
g: 011 000
s: 100 000
o: 000 100
m: 010 001
e: 000 000
c: 001 010
a: 000 010
r: 000 011
d: 001 001
s: 100 000
After encoding, this leaves 14 unused cards: the 2, 3, 7, 8 and 9 of clubs; the 3, 4 and 10 of hearts; the 2 and jack of diamonds; and the 2, 4, 7 and 8 of spades. We have encoded 29 letters with 38 cards, which works out as 1.31 cards per letter. (In the previous example, we used 43 cards in total). On decoding, we will use 7 Decoding Rows, the last of which will contain the last four cards.
Further optimisations:
If we used 7 bits instead of 6, we would be able to encode all of our letters using just either one 1 or two 1s (or zero 1s for "e"). In the 7-bit numbers from 0000000 to 1111111, there exist:
- 1 number with no 1s (0000000)
- 7 numbers with just one 1.
- 21 numbers with two 1s.
- 35 numbers with three 1s.
- 36 numbers with four 1s.
- 20 numbers with five 1s.
- 7 numbers with six 1s.
- 1 number with 7 1s (1111111)
Therefore, there are 29 numbers with two ones or fewer. This leaves three unassigned numbers with two ones. There are also a further 35 opportunities to map frequently used words using numbers with three ones. Most alphabets (as far as I know) could be efficiently encoded using 7-bits.
If we used 8-bit numbers, we would increase the number of letters encoded with two or fewer ones even more, and create more opportunities for having numbers with two ones mapped on to whole words.
There is a potential problem with using numbers with large numbers of bits in them. The more bits in each number, the more likely it is that we won't be able to use the final few cards. For example, with 8-bit numbers, if there were only 8 or fewer cards left and we needed to encode 00000000, then we would pass all those cards to the back of the pack, and the remainder of the pack would look exactly the same as if we had done nothing. On decoding, that 00000000 would not exist. If we had 9 cards left in the pack, encoding the 8-bit number 00000000 would be fine
if that was the last number we were doing, or the next number that needed encoding started with a 1. That might not seem too terrible, but it becomes a bigger problem when we consider that often we might be encoding consecutive letters that
between them contain lots of zeroes. For example, two "e"s would require 16 zeroes, so could not occur within the last 16 cards of the pack.
Just for interest's sake, to show how our message would look with 7-bit numbers, here is the way we would encode the alphabet. I have split each binary number into two parts to make it easier to read:
Mapping Table 4:
Containing no 1s:
000 0000 (00 in decimal): e
One 1:
000 0001 (01): t
000 0010 (02): a
000 0100 (04): o
000 1000 (08): i
001 0000 (16): n
010 0000 (32): s
100 0000 (64): r
Two 1s:
000 0011 (03): h
000 0101 (05): l
000 0110 (06): d
000 1001 (09): c
000 1010 (10): u
000 1100 (12): m
001 0001 (17): f
001 0010 (18): p
001 0100 (20): g
001 1000 (24): w
010 0001 (33): y
010 0010 (34): b
010 0100 (36): v
010 1000 (40): k
011 0000 (48): x
100 0001 (65): j
100 0010 (66): q
100 0100 (68): z
100 1000 (72): [Unassigned]
101 0000 (80): [Unassigned]
110 0000 (96): [Unassigned]
Three 1s:
000 0111 (07): [Unassigned]
000 1011 (11): [Unassigned]
000 1101 (13): [Unassigned]
(and another 32 unassigned numbers containing just three ones)
Other:
111 1111 : Not a character: End of the pack
Our message ("I am encoding text using some cards") mapped out using this 7-bit system is as follows:
i: 000 1000
a: 000 0010
m: 000 1100
e: 000 0000
n: 001 0000
c: 000 1001
o: 000 0100
d: 000 0110
i: 000 1000
n: 001 0000
g: 001 0100
t: 000 0001
e: 000 0000
x: 011 0000
t: 000 0001
u: 000 1010
s: 010 0000
i: 000 1000
n: 001 0000
g: 001 0100
s: 010 0000
o: 000 0100
m: 000 1100
e: 000 0000
c: 000 1001
a: 000 0010
r: 100 0000
d: 000 0110
s: 010 0000
When encoding this, we are left with 16 unused cards: the 5, 6, 7, 10 and queen of spades; the 2, 4, 9 and jack of clubs; the 6, 8 and queen of hearts; the 2, 4, and 6 of diamonds; and the ace of spades. The ace of spades ends up as the very last card.
We have encoded 29 letters into 36 cards, which works out as 1.24 cards per letter.
When we decode the encoded message, we use 8 Decoding Rows, the last of which contains just one card: the ace of spades.
Previous page
Next page