Python and sentiment analysis
While looking for datasets to throw at sklearn , I came across UCI Sentiment Labelled Sentences Data Set .
UCI is providing us with positive / negative tagging on real world data, the data comes from three sources ( Amazon , Yelp , and IMDB ).
The only problem is the format is a little strange.. We have a .txt file for each source, this is a raw unstructured formatting, plus not every line is tagged with sentiment.
Test your Machine Learning
In my previous post " Python Machine Learning with Presidential Tweets “, I started messing around with sklearn and text classification.
Since then I’ve discovered a great tutorial from SciPy 2015 . This video starts out slow enough for novices, and a reoccurring theme is testing your datasets.
After watching a good chunk of this video, I decided to go back to my code and implement a testing phase. Basically I’ll split my data into two pieces, a training set , and a testing set .
Python Machine Learning with Presidential Tweets
I’ve been spending a little bit of time researching Machine Learning , and was very happy to come across a Python library called sklearn .
While digging around Google, I came across a fantastic write up on Document Classification by Zac Steward . This article went pretty deep into writing a spam filter using machine learning, and sklearn. After reading the article I wanted to try some of the concepts, but had no interest in writing a spam filter.