Simple CLI for Categorizing and Sentiment of Text

Now that I’ve spent some time with huggingface.co, specifically their NPL Course (natural language processing) I wanted to combine a couple of the learnings into a simple python script.

What I ended up with was a script that could both categorize using a zero-shot-classification model, as well as get sentiment using a sentiment-analysis model.

You can interact with this script in one of two ways, first by sending a string as input during execution:

% python main.py "Riding bikes on the beach is glorious"
Loading...

Sentiment: POSITIVE 99.99%
Categories:
  Beauty 44.21%
  Sports 17.98%

The second is more efficient, and allows for multiple interactions while only needing to load the models once:

% python main.py
Loading...

Text: My favorite tv show lately has been the Mandalorian

Sentiment: POSITIVE 99.60%
Categories:
  Television 89.12%

Text: I wish I had more pizza at home :(

Sentiment: NEGATIVE 99.61%
Categories:
  Food 60.00%
  Home 19.30%

And that is it, that’s the post ✌️

main.py

#!/usr/bin/env python3

import sys

from transformers import pipeline

# List of categories used in zero_shot_classification
categories = [
    'Automotive', 'Beauty', 'Books', 'Literature', 'Business', 'Careers', 'Education', 'Family', 'Parenting', 'Food', 'Gaming', 'Health', 'Hobbies', 
    'Interests', 'Home', 'Garden', 'Law,', 'Government,', 'Politics', 'Life', 'Movies', 'Television', 'Music', 'Radio', 'Finance', 'Pets', 
    'Science', 'Sports', 'Fashion', 'Technology', 'Computing', 'Travel'
]

print("Loading...")

# load our language models into memory, this can take time.
zero_shot_classification = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
sentiment_analysis = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

while True:

    print()

    if len(sys.argv) > 1:
        text = sys.argv[1]
    else:
        text = input("Text: ")
        print()

    try:

        # run zero_shot_classification on text using our categories
        zsc = zero_shot_classification(
            text,
            candidate_labels=categories
        )

        # convert scores to human readable %
        zsc['scores'] = [ '%.2f' % round(i * 100, 2) + '%' for i in zsc['scores'] ]

        # combine labels and scores into single list
        zsc_results = list(zip(zsc['labels'], zsc['scores']))

        sa_results = sentiment_analysis(text)

        # display sentiment
        sentiment = sa_results[0]['label']
        sentiment_score = '%.2f' % round(sa_results[0]['score'] * 100, 2) + '%'
        print('Sentiment: %s %s' % (sentiment, sentiment_score))

        # display top 4 labels
        print('Categories:')
        for results in zsc_results:
            if float(results[1].replace('%', '')) > 10:
                print('  %s %s' % (results[0], results[1]))

        if len(sys.argv) > 1:
            sys.exit()

    except KeyboardInterrupt:
        sys.exit()