About

Learn About My Work

I am Sourav Kumar.

A Computer Science Engineering graduate (2025) with more than two years of hands-on experience in the data domain, working across data analytics, applied statistics, machine learning, deep learning, and Generative AI. My foundation lies in statistical analysis and exploratory data analysis (EDA), including hypothesis testing, regression, and data validation, which I apply to understand and prepare data for downstream modeling.

Experience

Professional Journey

Building expertise through hands-on experience in data science, machine learning, and MLOps

Data Analyst

Globus IT Onsite
Dec 2025 - Present
  • Providing real-time technical and analytical support to the district administration by transforming raw school data into actionable insights for district-level planning.
  • Implemented and monitored structured data synchronization and validation workflows between the e-Shikshakosh portal and the national UDISE+ database, ensuring schema-level consistency across teacher, student, and infrastructure datasets.

AI/ML Intern

AI4S Solutions Pvt. Ltd. Kolkata
Aug 2025 - Dec 2025
  • Built an AI-driven legacy medical data migration system for a hospital’s initial SAP S/4HANA implementation, reducing manual data correction by automating data quality checks, normalization, and clinically aware imputation.
  • Developed an automated invoice-to-SAP pipeline that extracts structured data from invoice email attachments via OCR and seamlessly loads it into the SAP S/4HANA database, reducing manual data entry.
  • Contributed to the implementation of AI solutions within enterprise-level SAP environments.

Data Analytics Virtual Intern

Tata iQ (Forage) Remote
June 2025
  • Completed a Tata iQ job simulation on forage that focused on AI driven data analysis, leveraging GenAI for EDA, risk assessment and strategic insights.
  • Proposed a predictive model for delinquency risk, designed an AI-powered collections strategy, and created a stakeholder focused report.

Expertise

Empowering Decisions with AI & Analytics

Specialized skills in Machine Learning, Deep Learning, Data Science, and MLOps for enterprise solutions

Data Analytics & Applied Statistics

Expert in exploratory data analysis (EDA), statistical modeling, and business intelligence. 2+ years solving real-world business problems with data-driven insights.

Machine Learning & Deep Learning

Proficient in building end-to-end ML models using TensorFlow, Scikit-Learn, and PyTorch. Expertise in CNN, RNN, NLP, and Computer Vision for production applications.

MLOps & Cloud Deployment

Specialized in model lifecycle management, CI/CD pipelines, containerization with Docker, and cloud deployment on AWS. Proficient with MLflow, DVC, and GitHub Actions.

Generative AI & LLMs

Building AI-powered applications using Gemini API, LLMs, and RAG systems. Experience in prompt engineering and creating intelligent automation solutions.

Tech Stack

Tools & Technologies

A comprehensive collection of languages, frameworks, and tools I use to build scalable data and AI solutions

Python SQL Pandas NumPy SciPy Statsmodels Matplotlib Seaborn Scikit-Learn TensorFlow PyTorch OpenCV MLflow DVC Docker Git GitHub Actions FastAPI Streamlit Gemini API LLMs LangChain RAG Pipelines FAISS Chroma AWS Azure

Portfolio

Explore My Projects

Here is a showcase of my significant projects, including web applications, data analytics, and machine learning models.

AutoStocks: Weekly Stock Forecasts & Insights Delivery

An end-to-end AI system that automatically delivers weekly stock insights as PDFs to users. Includes historical trends, LSTM-based forecasts, and visual analytics that powered by a fully scheduled pipeline of GitHub Actions.

Chat with Website: RAG-based Web QA

Built a GenAI assistant using Python, BeautifulSoup, ChromaDB & Gemini API to answer questions from any website URL. Integrated scraping, embeddings, and semantic search into an interactive Streamlit chat UI.

Gurugram Housing Price Predictor

Developed a Flask-based ML app to analyze Gurugram real estate by scraping 99acres, predicting prices using GBM & RF models, and offering personalized property recommendations with an interactive dashboard.

NewsRoom: AI-Powered Summarizer & Verifier

Fetches news from BBC, summarizes with NLP, checks authenticity using ML, and updates automatically via GitHub Actions. Offers a smooth, scrollable card-based UI.

Kidney Disease Classification

Built a CNN model to classify kidney disease from medical images with 95% accuracy. Integrated MLflow, DVC, and CI/CD for full MLOps automation, and deployed the Dockerized app on AWS EC2 for real-time access.

Multiple Disease Prediction

Developed a Streamlit web app to predict multiple human diseases based on user input. Uses trained ML models to provide early diagnosis support with a clean UI and instant feedback.

New York Taxi Trip Analysis

Big Data + MLOps project to analyze and cluster high-dimensional taxi trip data using K-Means and t-SNE. Includes trip duration prediction, visualizations, and pipeline-based automation.

YouTube Sentiment Analysis

Built a sentiment analysis plugin for YouTube comments using Llama & Scikit-learn, with summary generation and insightful visualizations. Optimized via Optuna and deployed using DVC, MLflow, Docker, and GitHub Actions on AWS EC2.

WhatsApp Chat Analyzer

Performs detailed chat analysis from exported WhatsApp data using Python. Generates user-wise message stats, word clouds, activity heatmaps, and emoji insights with rich visualizations. That helps users understand their chat patterns and interactions.

Instacaption: AI-Powered Caption Generator

Built an image captioning tool using Hugging Face for visual descriptions and Gemini API to generate emotion-aware captions. Upload an image to get creative Instagram captions instantly via a smooth web UI.

Spotify Hybrid Recommendation System

Built a hybrid music recommender using content-based & collaborative filtering with Spotify data. Integrated DVC, CI/CD, and green deployment strategies on AWS for versioning the model.

Next Word with LSTM

Trained an LSTM model on custom PDF text to predict the next word in a sentence. Used NLP preprocessing, tokenization, and sequential modeling to build a context-aware text generation tool with deep learning.

Need Assistance? Get in Touch!

Feel free to contact me for collaboration or inquiries.