Twitter Sentiment Analysis

CPE342 Mini Project

Impact

Automated emotional tone classification of social media text with 85%+ precision.

Overview

The Challenge: Listening to the Digital Noise

Social media is a firehose of raw human emotion. For brands and researchers, manually reading through thousands of tweets to gauge public sentiment is impossible. We needed a scalable, automated way to "listen" to the digital noise and categorize it into meaningful emotional buckets: Positive, Negative, or Neutral.

The Solution: A NLP Classification Pipeline

We built a robust sentiment analysis system that transforms unstructured text into structured emotional insights. Using Scikit-learn as our engine and Streamlit as our interactive dashboard, we created a tool that provides real-time analysis for any user input.

Figure: Twitter Sentiment Analysis Hero

Technical Architecture

The system follows a classic Machine Learning pipeline, optimized for short-form text processing.

Figure: System Architecture

1. Text Preprocessing Pipeline

Tweets are messy. Our pipeline cleans the "digital noise" before it hits the model:

Tokenization: Splitting sentences into individual words.
Stopword Removal: Filtering out common words (the, is, at) that don't carry emotional weight.
Stemming: Reducing words to their root form (e.g., "loving" -> "love") to decrease vocabulary dimensionality.

2. Multi-Model Benchmarking

We evaluated several architectures focusing on accuracy, training speed, and deployment efficiency. We implemented a comparison framework to find the best performer for the task.

Naive Bayes: Fast and surprisingly effective for text classification.
Logistic Regression: Provides clear probabilistic interpretations of sentiment.
Random Forest: Captures non-linear relationships between word combinations.

Dataset

We utilized the Kaggle Sentiment Analysis Dataset, which provides a rich diversity of labeled tweets, including metadata like time of posting and user demographics.

Interactive Web Application

Built with Streamlit, the application enables users to:

Real-time Prediction: Type any tweet and get an instant sentiment classification.
Confidence Scores: Visualize how "sure" the model is about its prediction.
Model Selection: Toggle between models (NB, LR, RF) to see how they interpret the same text differently.

Figure: Application Interface

Why This Matters

This project serves as a foundation for understanding how NLP can be used to monitor brand health, public opinion, and social trends. By combining powerful ML models with an intuitive UI, we bridge the gap between complex data science and accessible user interfaces.

NLPMachine LearningStreamlitScikit-Learn

Gallery Overview

Twitter Sentiment Analysis gallery image

View Project Site