PostgreSQL Vacuum Script

PostgreSQL does have a built in auto vacuum , but sometimes you just want a small script that can be ran through Jenkins to perform the vacuum for you.

Wanted to share with you guys a small Python script I wrote that will perform a VACUUM VERBOSE ANALYZE on every table within a database.

You will need to get psycopg2 installed from PyPi first:

pip install psycopg2

At which point you should be able to use the below script with the correct environment variables to vacuum your database:

import os
import psycopg2
from datetime import datetime

dbname = os.environ.get('PG_DATABASE', '')
user = os.environ.get('PG_USER', '')
host = os.environ.get('PG_HOST', '')
password = os.environ.get('PG_PASS', '')
schema = os.environ.get('PG_SCHEMA', '')

class Vaccum(object):

    def __init__(self):
        """
        Connect to Postgre database.
        """
        c = "dbname='%s' user='%s' host='%s' password='%s'"
        try:
            self.conn = psycopg2.connect(c % (dbname, user, host, password))
        except:
            raise Exception('Unable to connect to PostgreSQL.')

    def get_tables(self):
        """
        Get all tables in database's schema.
        """

        query = """
            SELECT table_name
            FROM information_schema.tables
            WHERE table_schema = '%s'
            """ % schema

        cur = self.conn.cursor()
        cur.execute(query)

        for table in cur.fetchall():
            yield(table[0])

    def vaccum(self, table):
        """
        Run Vacuum on a given table.
        """

        query = "VACUUM VERBOSE ANALYZE %s" % table

        # VACUUM can not run in a transaction block,
        # which psycopg2 uses by default.
        # http://bit.ly/1OUbYB3
        isolation_level = self.conn.isolation_level
        self.conn.set_isolation_level(0)

        cur = self.conn.cursor()
        cur.execute(query)

        # Set our isolation_level back to normal
        self.conn.set_isolation_level(isolation_level)

        return self.conn.notices

if __name__ == '__main__':

    v = Vaccum()
    for table in v.get_tables():

        # simple verbose printout for VACUUM process.
        now = datetime.utcnow()
        verbose = '[%s] VACUUM VERBOSE ANALYZE %s' % (now, table)
        print '=' * len(verbose)
        print verbose

        # print out the VACUUM  VERBOSE results
        # using formating.
        notices = v.vaccum(table)
        for notice in notices:
            print notice

Usage looks like the following:

## set our environment variables for our database.
export PG_DATABASE=project
export PG_USER=user
export PG_PASS=password
export PG_SCHEMA=public
export PG_HOST=localhost

## execute the script to vacuum the database
$ python pg_vacuum.p

You should then get verbose output from the vacuum:

=====================================================================
[2016-03-08 18:56:28.332974] VACUUM VERBOSE ANALYZE django_migrations
INFO:  vacuuming "public.django_migrations"

INFO:  index "django_migrations_pkey" now contains 22 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "django_migrations": found 0 removable, 22 nonremovable row
    versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  vacuuming "pg_toast.pg_toast_95319"

INFO:  index "pg_toast_95319_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "pg_toast_95319": found 0 removable, 0 nonremovable row versions
    in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  analyzing "public.django_migrations"

INFO:  "django_migrations": scanned 1 of 1 pages, containing 22 live rows
    and 0 dead rows; 22 rows in sample, 22 estimated total rows