Early Exploration of Large Language Models on Python
With the current hype on artificial intelligence and platforms like OpenAI’s ChatGPT, I decided it’s about time I explore.
I’m not going to lie, I’m not exactly excited about a company named OpenAI being a closed source, for profit platform; however, this has been one of the reason I decided to explore offline, open, and self hosted solutions (in the hopes of creating my own models).
If you recall I’ve experimented with python and machine learning in the past, look no further than my 2016 Test your machine learning blog post.
It didn’t take me long to find https://huggingface.co/, described as ‘The AI community building the future’.
This community maintains a python library named transformers, and provides access to thousands of pre-trained models.
The first model I explored was the facebook/opt-6.7b
model, a pre-trained model coming in
at around 12GB on disk:
% du -sh ~/.cache/huggingface/hub/models--facebook--opt-6.7b
12G /Users/jness/.cache/huggingface/hub/models--facebook--opt-6.7b
Let’s take a step back and let me talk through the process of installing transformers
on my M1 macbook.
Like every python project I’m going to start with a new virtualenv
, this is to isolate my requirements from other projects, and the base python installation.
% virtualenv -p python3 venv/
% source venv/bin/activate
After sourced we are ready to install transformers, as well as python’s torch
library (needed for model generation).
% pip install transformers torch
At the time of writing these were the dependencies:
% pip freeze
certifi==2022.12.7
charset-normalizer==3.1.0
filelock==3.11.0
huggingface-hub==0.13.4
idna==3.4
Jinja2==3.1.2
MarkupSafe==2.1.2
mpmath==1.3.0
networkx==3.1
numpy==1.24.2
packaging==23.1
PyYAML==6.0
regex==2023.3.23
requests==2.28.2
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.0
tqdm==4.65.0
transformers==4.28.0
typing_extensions==4.5.0
urllib3==1.26.15
Once installed you can fire up your python interpreter:
% python
Python 3.9.15 (main, Oct 11 2022, 21:39:54)
[Clang 14.0.0 (clang-1400.0.29.102)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
The first time you access the model and tokenizer the transformers
library will fetch that 12GB model (in my case the model is already present).
Time to get to the fun stuff, lets interact with the facebook/opt-6.7b
model, and see what it can do (make sure to be patient during generation, this is a very computational heavy process):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>>
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/opt-6.7b")
>>> model = AutoModelForCausalLM.from_pretrained("facebook/opt-6.7b")
>>>
>>> prompt = "What is a blog?"
>>>
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> generated_ids = model.generate(input_ids, max_new_tokens=50)
>>>
After generation all that is left is to display the textual result:
>>> print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
What is a blog?
A blog is a website that is updated regularly with new content. It is a great way to share your thoughts and ideas with others.
What is a website?
A website is a collection of web pages that are linked together
How about a question that is a little tougher to answer:
>>> prompt = "What colors are dogs?"
>>>
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> generated_ids = model.generate(input_ids, max_new_tokens=30)
>>>
>>> print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
What colors are dogs?
Dogs are a very diverse group of animals. They come in a wide variety of colors, shapes, sizes, and breeds.
This model works well when asking question, but it doesn’t take instruction well (from what I’ve seen):
>>> prompt = "Write a technical introduction."
>>>
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> generated_ids = model.generate(input_ids, max_new_tokens=40)
>>>
>>> print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
Write a technical introduction.
The introduction should be written in a way that it is easy to understand and read. It should be written in a way that it is easy to understand and read. It should be written in
>>> prompt = "The cow jumped over"
>>>
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> generated_ids = model.generate(input_ids, max_new_tokens=40)
>>>
>>> print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
The cow jumped over the moon.
The cow jumped over the moon. The cow jumped over the moon. The cow jumped over the moon. The cow jumped over the moon. The cow jumped over
Let step away from this model and explore another, the gpt2
model comes in much smaller, just over 500MB:
% du -sh ~/.cache/huggingface/hub/models--gpt2
537M /Users/jness/.cache/huggingface/hub/models--gpt2
This model is able to perform biased predictions
, and sometimes the result are rather silly.
>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='gpt2')
>>> set_seed(42)
>>>
>>> prompt = "The best pizza topping is"
>>>
>>> for i in generator(prompt, max_length=50, num_return_sequences=1):
... print(i['generated_text'])
...
The best pizza topping is made by combining one recipe with another. This is usually made with olive oil, but you can make it all up with a few extra drops of extra virgin olive oil if you like.
The next step you may want
Let’s try a few more and see what we can generate:
>>> prompt = "I drive around"
>>>
>>> for i in generator(prompt, max_length=50, num_return_sequences=1):
... print(i['generated_text'])
...
I drive around in my dark jeans, my black tie, and the black, black, black and brown hooded sweatshirt on my thighs. I've worked in restaurants for a long time and I'll show you what a perfect job I went through
>>> prompt = "I once went to"
>>>
>>> for i in generator(prompt, max_length=50, num_return_sequences=1):
... print(i['generated_text'])
...
I once went to a museum in Italy," says the 25-year-old. "I was standing in a crowded museum, and a lady came up, who asked me: 'Do you want to do that?" 'When you know what you
As I said these predictions are sometimes quite silly.
And that’s it, that is what I’ve been able to accomplish in a night of exploration.
Hope y’all find this slightly helpful, and it encourages you to go out and explore some of the readily available large language models.
As for me, I hope I get to the point where I can build some of my own models for self exploration and hobby.
Cheers