Python and sentiment analysis

While looking for datasets to throw at sklearn , I came across UCI Sentiment Labelled Sentences Data Set . UCI is providing us with positive / negative tagging on real world data, the data comes from three sources ( Amazon , Yelp , and IMDB ). The only problem is the format is a little strange.. We have a .txt file for each source, this is a raw unstructured formatting, plus not every line is tagged with sentiment.

Test your Machine Learning

In my previous post " Python Machine Learning with Presidential Tweets “, I started messing around with sklearn and text classification. Since then I’ve discovered a great tutorial from SciPy 2015 . This video starts out slow enough for novices, and a reoccurring theme is testing your datasets. After watching a good chunk of this video, I decided to go back to my code and implement a testing phase. Basically I’ll split my data into two pieces, a training set , and a testing set .

Python Machine Learning with Presidential Tweets

I’ve been spending a little bit of time researching Machine Learning , and was very happy to come across a Python library called sklearn . While digging around Google, I came across a fantastic write up on Document Classification by Zac Steward . This article went pretty deep into writing a spam filter using machine learning, and sklearn. After reading the article I wanted to try some of the concepts, but had no interest in writing a spam filter.