Python Frequency Analysis for Ciphers
Frequency Analysis is the study of the frequency of letters or groups of letters in a cipher text.
Using Python we can extract the count of letters, bigrams, and trigrams, lets have a look shall we:
$ ./frequency.py --help
usage: frequency.py [-h] [--letters] [--bigrams] [--trigrams] msg
positional arguments:
msg Message to count letters in
optional arguments:
-h, --help show this help message and exit
--letters, -l Frequency of letters
--bigrams, -b Frequency of bigrams
--trigrams, -t Frequency of trigrams
Lets go ahead and enter a simple sentence and do some testing:
$ ./frequency.py 'all work no play makes jack a dull boy' -l
===== Letters =====
('a', 5)
('l', 5)
('k', 3)
('o', 3)
('y', 2)
('c', 1)
('b', 1)
('e', 1)
('d', 1)
('j', 1)
('m', 1)
('n', 1)
('p', 1)
('s', 1)
('r', 1)
('u', 1)
('w', 1)
How about bigrams:
$ ./frequency.py 'all work no play makes jack a dull boy' -b
===== Bigrams =====
('ll', 2)
('ck', 1)
('ac', 1)
('bo', 1)
('ma', 1)
('ke', 1)
('no', 1)
('wo', 1)
('la', 1)
('al', 1)
('ak', 1)
('ja', 1)
('ul', 1)
('es', 1)
('oy', 1)
('ay', 1)
('du', 1)
('or', 1)
('pl', 1)
('rk', 1)
And lastly Trigrams:
$ ./frequency.py 'all work no play makes jack a dull boy' -t
===== Trigrams =====
('boy', 1)
('all', 1)
('dul', 1)
('ull', 1)
('ack', 1)
('wor', 1)
('lay', 1)
('pla', 1)
('mak', 1)
('kes', 1)
('jac', 1)
('ake', 1)
('ork', 1)
The underlying code for this tool is pretty horrendous, but its just a small tool for performing a simple task:
#!/usr/bin/env python
import argparse
from string import ascii_letters
from operator import itemgetter
# Build my Parser with help for user input
parser = argparse.ArgumentParser()
parser.add_argument('msg', help='Message to count letters in')
parser.add_argument('--letters', '-l', help='Frequency of letters',
action='store_true',dest='letters', default=None)
parser.add_argument('--bigrams', '-b', help='Frequency of bigrams',
action='store_true',dest='bigrams', default=None)
parser.add_argument('--trigrams', '-t', help='Frequency of trigrams',
action='store_true',dest='trigrams', default=None)
args = parser.parse_args()
args = parser.parse_args()
if args.letters:
letter_dict = {}
for letter in args.msg:
if letter in ascii_letters:
try:
letter_dict[letter] += 1
except KeyError:
letter_dict[letter] = 1
print "="*5, 'Letters', "="*5
for letter in sorted(letter_dict.items(), key=itemgetter(1), reverse=True):
print letter
if args.bigrams:
bigram_dict = {}
bigram_holder = []
for letter in args.msg:
if letter not in ascii_letters:
bigram_holder = []
continue
else:
bigram_holder.append(letter)
if len(bigram_holder) == 2:
bigram = bigram_holder[0] + bigram_holder[1]
try:
bigram_dict[bigram] += 1
except KeyError:
bigram_dict[bigram] = 1
last = bigram_holder.pop()
bigram_holder = []
bigram_holder.append(last)
print "="*5, 'Bigrams', "="*5
for bigram in sorted(bigram_dict.items(), key=itemgetter(1), reverse=True):
print bigram
if args.trigrams:
trigram_dict = {}
trigram_holder = []
for letter in args.msg:
if letter not in ascii_letters:
trigram_holder = []
continue
else:
trigram_holder.append(letter)
if len(trigram_holder) == 3:
trigram = trigram_holder[0] + trigram_holder[1] + trigram_holder[2]
try:
trigram_dict[trigram] += 1
except KeyError:
trigram_dict[trigram] = 1
l1 = trigram_holder.pop()
l2 = trigram_holder.pop()
trigram_holder = []
trigram_holder.append(l2)
trigram_holder.append(l1)
print "="*5, 'Trigrams', "="*5
for trigram in sorted(trigram_dict.items(), key=itemgetter(1), reverse=True):
print trigram