About
Learn About My Work
I am Sourav Kumar.
A Computer Science Engineering graduate (2025) with more than two years of hands-on experience in the data domain, working across data analytics, applied statistics, machine learning, deep learning, and Generative AI. My foundation lies in statistical analysis and exploratory data analysis (EDA), including hypothesis testing, regression, and data validation, which I apply to understand and prepare data for downstream modeling.
Experience
Professional Journey
Building expertise through hands-on experience in data science, machine learning, and MLOps
Data Analyst
Globus IT Onsite- Providing real-time technical and analytical support to the district administration by transforming raw school data into actionable insights for district-level planning.
- Implemented and monitored structured data synchronization and validation workflows between the e-Shikshakosh portal and the national UDISE+ database, ensuring schema-level consistency across teacher, student, and infrastructure datasets.
AI/ML Intern
AI4S Solutions Pvt. Ltd. Kolkata- Built an AI-driven legacy medical data migration system for a hospital’s initial SAP S/4HANA implementation, reducing manual data correction by automating data quality checks, normalization, and clinically aware imputation.
- Developed an automated invoice-to-SAP pipeline that extracts structured data from invoice email attachments via OCR and seamlessly loads it into the SAP S/4HANA database, reducing manual data entry.
- Contributed to the implementation of AI solutions within enterprise-level SAP environments.
Data Analytics Virtual Intern
Tata iQ (Forage) Remote- Completed a Tata iQ job simulation on forage that focused on AI driven data analysis, leveraging GenAI for EDA, risk assessment and strategic insights.
- Proposed a predictive model for delinquency risk, designed an AI-powered collections strategy, and created a stakeholder focused report.
Expertise
Empowering Decisions with AI & Analytics
Specialized skills in Machine Learning, Deep Learning, Data Science, and MLOps for enterprise solutions
Data Analytics & Applied Statistics
Expert in exploratory data analysis (EDA), statistical modeling, and business intelligence. 2+ years solving real-world business problems with data-driven insights.
Machine Learning & Deep Learning
Proficient in building end-to-end ML models using TensorFlow, Scikit-Learn, and PyTorch. Expertise in CNN, RNN, NLP, and Computer Vision for production applications.
MLOps & Cloud Deployment
Specialized in model lifecycle management, CI/CD pipelines, containerization with Docker, and cloud deployment on AWS. Proficient with MLflow, DVC, and GitHub Actions.
Generative AI & LLMs
Building AI-powered applications using Gemini API, LLMs, and RAG systems. Experience in prompt engineering and creating intelligent automation solutions.
Tech Stack
Tools & Technologies
A comprehensive collection of languages, frameworks, and tools I use to build scalable data and AI solutions
Portfolio
Explore My Projects
Here is a showcase of my significant projects, including web applications, data analytics, and machine learning models.
AutoStocks: Weekly Stock Forecasts & Insights Delivery
An end-to-end AI system that automatically delivers weekly stock insights as PDFs to users. Includes historical trends, LSTM-based forecasts, and visual analytics that powered by a fully scheduled pipeline of GitHub Actions.
Chat with Website: RAG-based Web QA
Built a GenAI assistant using Python, BeautifulSoup, ChromaDB & Gemini API to answer questions from any website URL. Integrated scraping, embeddings, and semantic search into an interactive Streamlit chat UI.
Gurugram Housing Price Predictor
Developed a Flask-based ML app to analyze Gurugram real estate by scraping 99acres, predicting prices using GBM & RF models, and offering personalized property recommendations with an interactive dashboard.
NewsRoom: AI-Powered Summarizer & Verifier
Fetches news from BBC, summarizes with NLP, checks authenticity using ML, and updates automatically via GitHub Actions. Offers a smooth, scrollable card-based UI.
Kidney Disease Classification
Built a CNN model to classify kidney disease from medical images with 95% accuracy. Integrated MLflow, DVC, and CI/CD for full MLOps automation, and deployed the Dockerized app on AWS EC2 for real-time access.
Multiple Disease Prediction
Developed a Streamlit web app to predict multiple human diseases based on user input. Uses trained ML models to provide early diagnosis support with a clean UI and instant feedback.
New York Taxi Trip Analysis
Big Data + MLOps project to analyze and cluster high-dimensional taxi trip data using K-Means and t-SNE. Includes trip duration prediction, visualizations, and pipeline-based automation.
YouTube Sentiment Analysis
Built a sentiment analysis plugin for YouTube comments using Llama & Scikit-learn, with summary generation and insightful visualizations. Optimized via Optuna and deployed using DVC, MLflow, Docker, and GitHub Actions on AWS EC2.
WhatsApp Chat Analyzer
Performs detailed chat analysis from exported WhatsApp data using Python. Generates user-wise message stats, word clouds, activity heatmaps, and emoji insights with rich visualizations. That helps users understand their chat patterns and interactions.
Instacaption: AI-Powered Caption Generator
Built an image captioning tool using Hugging Face for visual descriptions and Gemini API to generate emotion-aware captions. Upload an image to get creative Instagram captions instantly via a smooth web UI.
Spotify Hybrid Recommendation System
Built a hybrid music recommender using content-based & collaborative filtering with Spotify data. Integrated DVC, CI/CD, and green deployment strategies on AWS for versioning the model.
Next Word with LSTM
Trained an LSTM model on custom PDF text to predict the next word in a sentence. Used NLP preprocessing, tokenization, and sequential modeling to build a context-aware text generation tool with deep learning.
Need Assistance? Get in Touch!
Feel free to contact me for collaboration or inquiries.