Sentiment analysis can provide key insight into the feelings of your customers towards your company & hence is becoming an increasingly important part of data analysis.Building a machine learning model to identify positive and negative sentiments is pretty complex, but luckily for us, there is a Python library that can help us out. It's called TextBlob.Through this post, we'll look at how we use TextBlob with Python & the CSV functionality & also with Pandas, using dataframes.

In [4]:
from textblob import TextBlob
import pandas as pd
path = 'sentiment.csv'
In [6]:
df = pd.read_csv(path, delimiter=',', header='infer', encoding='latin-1')
In [8]:
df.head()
Out[8]:
text id pubdate
0 10 Things Missing In The New Twitter Interface... 2602860537 18536
1 RT @_NATURALBWINNER OH AND I DONT LIKE THIS #N... 2602850443 18536
2 RT @HBO24 yo the #newtwitter is better.. YUPP ... 2602761852 18535
3 Aaaaaaaand I have the new twitter! Yay! I shou... 2602738438 18535
4 can I please have the new twitter? #twitter #n... 2602684185 18535

Calculating Subjectivity & Polarity

I guess you're probably wondering what polarity and subjectivity are? Well, polarity is a measure of how positive or negative a statement is, ranging from -1 (very negative) to +1 (very positive) and subjectivity is how opinionated the comment is ranging from 0 (very opinionated) to 1 (very fact based views).

In [12]:
df['subjectivity'] = df.text.apply(lambda x: TextBlob(str(x)).sentiment.subjectivity)
df.head()
Out[12]:
text id pubdate subjectivity
0 10 Things Missing In The New Twitter Interface... 2602860537 18536 0.252273
1 RT @_NATURALBWINNER OH AND I DONT LIKE THIS #N... 2602850443 18536 0.627273
2 RT @HBO24 yo the #newtwitter is better.. YUPP ... 2602761852 18535 0.477273
3 Aaaaaaaand I have the new twitter! Yay! I shou... 2602738438 18535 0.377273
4 can I please have the new twitter? #twitter #n... 2602684185 18535 0.454545
In [13]:
df['polarity'] = df.text.apply(lambda x: TextBlob(str(x)).sentiment.polarity)
df.head()
Out[13]:
text id pubdate subjectivity polarity
0 10 Things Missing In The New Twitter Interface... 2602860537 18536 0.252273 -0.031818
1 RT @_NATURALBWINNER OH AND I DONT LIKE THIS #N... 2602850443 18536 0.627273 -0.014773
2 RT @HBO24 yo the #newtwitter is better.. YUPP ... 2602761852 18535 0.477273 0.318182
3 Aaaaaaaand I have the new twitter! Yay! I shou... 2602738438 18535 0.377273 0.169034
4 can I please have the new twitter? #twitter #n... 2602684185 18535 0.454545 0.136364

We can take it a step further, by cleaning up input data and creating columns to say 'yes' it's positive or negative. In my tests, I ran this across a 5,000 row dataset of Amazon reviews. It achieved a 90% accuracy (when manually checking 500 rows).

In [14]:
import os
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/keenek1/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Cleanup data for better sentiment analysis

In [ ]:
#Remove punctuation from the text
df['text1'] = df.text.str.replace("[^ws]", "")
#Make everything lower case
df['text1'] = df.text1.apply(lambda x: x.lower())
#Handle strange character in source
df['text1'] = df.text1.str.replace("‰‰Ûª", "''")
#correct incorrect spellings
df['text1'] = df.text1.apply(lambda x: TextBlob(str(x)).correct()).str.join('')
#drop stopwords
df['text1'] = df['text1'].apply(lambda x: [item for item in x.split() if item not in stop]).str.join(' ')
df

Calculate subjectivity & polarity on the cleaned data

In [ ]:
df2 = df[['text', 'text1']]
#Calculate subjectivity & polarity, as above
df2['subjectivity'] = df2.text1.apply(lambda x: TextBlob(str(x)).sentiment.subjectivity)
df2['polarity'] = df2.text1.apply(lambda x: TextBlob(str(x)).sentiment.polarity)

#Based on the defined polarity, give YES / NO answers to 'Neutral', 'Positive' and 'Negative'
df2['neutral'] = np.where(((df2['polarity']>-0.2) & (df2['polarity']0.199) & (df2['polarity']-1.1) & (df2['polarity'] 10]
                                                                                                      

Output data to a file

In [ ]:
file_path = 'sentiment1.csv'
if not os.path.exists(file_path):
    os.makedirs(file_path)
    
outpath = "Desktop/sentiment/out.csv"
df3.to_csv(outpath)