Python Frequency Analysis for Ciphers

Dancing_men

Frequency Analysis is the study of the frequency of letters or groups of letters in a cipher text.

Using Python we can extract the count of letters, bigrams, and trigrams, lets have a look shall we:

$ ./frequency.py --help
usage: frequency.py [-h] [--letters] [--bigrams] [--trigrams] msg

positional arguments:
  msg             Message to count letters in

optional arguments:
  -h, --help      show this help message and exit
  --letters, -l   Frequency of letters
  --bigrams, -b   Frequency of bigrams
  --trigrams, -t  Frequency of trigrams

Lets go ahead and enter a simple sentence and do some testing:

$ ./frequency.py 'all work no play makes jack a dull boy' -l
===== Letters =====
('a', 5)
('l', 5)
('k', 3)
('o', 3)
('y', 2)
('c', 1)
('b', 1)
('e', 1)
('d', 1)
('j', 1)
('m', 1)
('n', 1)
('p', 1)
('s', 1)
('r', 1)
('u', 1)
('w', 1)

How about bigrams:

$ ./frequency.py 'all work no play makes jack a dull boy' -b
===== Bigrams =====
('ll', 2)
('ck', 1)
('ac', 1)
('bo', 1)
('ma', 1)
('ke', 1)
('no', 1)
('wo', 1)
('la', 1)
('al', 1)
('ak', 1)
('ja', 1)
('ul', 1)
('es', 1)
('oy', 1)
('ay', 1)
('du', 1)
('or', 1)
('pl', 1)
('rk', 1)

And lastly Trigrams:

$ ./frequency.py 'all work no play makes jack a dull boy' -t
===== Trigrams =====
('boy', 1)
('all', 1)
('dul', 1)
('ull', 1)
('ack', 1)
('wor', 1)
('lay', 1)
('pla', 1)
('mak', 1)
('kes', 1)
('jac', 1)
('ake', 1)
('ork', 1)

The underlying code for this tool is pretty horrendous, but its just a small tool for performing a simple task:

#!/usr/bin/env python
import argparse
from string import ascii_letters
from operator import itemgetter

# Build my Parser with help for user input
parser = argparse.ArgumentParser()
parser.add_argument('msg', help='Message to count letters in')
parser.add_argument('--letters', '-l',  help='Frequency of letters',
            action='store_true',dest='letters', default=None)
parser.add_argument('--bigrams', '-b',  help='Frequency of bigrams',
            action='store_true',dest='bigrams', default=None)
parser.add_argument('--trigrams', '-t',  help='Frequency of trigrams',
            action='store_true',dest='trigrams', default=None)
args = parser.parse_args()
args = parser.parse_args()

if args.letters:
    letter_dict = {}
    for letter in args.msg:
        if letter in ascii_letters:
            try:
                letter_dict[letter] += 1
            except KeyError:
                letter_dict[letter] = 1

    print "="*5, 'Letters', "="*5
    for letter in sorted(letter_dict.items(), key=itemgetter(1), reverse=True):
        print letter

if args.bigrams:
    bigram_dict = {}
    bigram_holder = []
    for letter in args.msg:
        if letter not in ascii_letters:
            bigram_holder = []
            continue
        else:
            bigram_holder.append(letter)

        if len(bigram_holder) == 2:
            bigram = bigram_holder[0] + bigram_holder[1]
            try:
                bigram_dict[bigram] += 1
            except KeyError:
                bigram_dict[bigram] = 1

            last = bigram_holder.pop()
            bigram_holder = []
            bigram_holder.append(last)

    print "="*5, 'Bigrams', "="*5
    for bigram in sorted(bigram_dict.items(), key=itemgetter(1), reverse=True):
        print bigram

if args.trigrams:
    trigram_dict = {}
    trigram_holder = []
    for letter in args.msg:
        if letter not in ascii_letters:
            trigram_holder = []
            continue
        else:
            trigram_holder.append(letter)

        if len(trigram_holder) == 3:
            trigram = trigram_holder[0] + trigram_holder[1] + trigram_holder[2]
            try:
                trigram_dict[trigram] += 1
            except KeyError:
                trigram_dict[trigram] = 1

            l1 = trigram_holder.pop()
            l2 = trigram_holder.pop()
            trigram_holder = []
            trigram_holder.append(l2)
            trigram_holder.append(l1)

    print "="*5, 'Trigrams', "="*5
    for trigram in sorted(trigram_dict.items(), key=itemgetter(1), reverse=True):
        print trigram