Text Mining - Twitter Sentiment Analysis

Sentiment analysis is the automated process of analyzing text data and classifying it into sentiments positive, negative. Using sentiment analysis tools to analyze opinions in Twitter data can help companies understand how people are talking about their brand.

Dataset Details:
Training data : 90K rows and 6 columns
Test data: 10K rows

Models: Implementing Logistic regression from scratch (no in-built libraries for gradient/logistic) were used.

Results: with 10-fold cross validation Test accuracy: 70% precision: 70% and recall: 0.89

Text Mining Report

Git Link

By analyzing social media posts, product reviews, customer feedback, and NPS responses (among other unstructured data), businesses can understand how their customers feel about their product or service.

Sentiment Analysis has become one of the very important aspect in business. What is sentiment? It is nothing but the feel, emotion associated with the text i.e. message or tweet or review posted. This task is specific to twitter dataset and each tweet is classified as either positive or negative in our training data.

In the field of data science, processing text data is studied at very high rate as the internet is full of text data, processing it and understanding what is being shared or posted for them has become important for businesses to understand. Business can understand what is posted about them on the internet about them by processing huge amount of data available on internet.

As part of this Project I used various text mining skills. The very first is cleaning the dataset.

This is most important step in any NLP task, why ? because an uncleaned messy data may not be able to infer anything and tell our models what exactly they need to know.

To clean dataset I removed words starting with symbols, converted text into lowercase. After cleaning dataset text stemming and tokenization was performed using python's NLTK library. Then, using sklearn Tfidfvectorizer created encoded vectors.

Now, our data is clean for further model development. As this is classic classification problem, built Logistic Regression model with 10-fold Cross Validation to tune parameters.

For this project, the Logistic model was developed from scratch instead of using the in-built libraries.

Results achieved were quite impressive and can further be optimized.

Please Feel free to contact me to know more about this project.

Code: https://github.com/PawanSran/Twitter_Sentiment_Analysis