I’m a passionate and curious Data Scientist, Analyst, and Machine Learning Engineer, focused on turning raw data into meaningful insights and intelligent solutions. With a strong foundation in statistics, Python, and machine learning, I love exploring real-world problems—from predicting disease risk to building energy-efficient models—and bringing them to life through clean code and impactful visualizations. Through hands-on projects, internships at AWS, and research in AI-based solutions, I've built experience in data preprocessing, model building, and visualization. Whether it's structured metadata or unstructured images, I enjoy transforming complexity into clarity. Currently, I'm on a journey to grow as a full-stack data professional—eager to collaborate, learn, and build data-driven systems that make a difference.
Samarth Mule
+91 9665942917
samarthmule1704@gmail.com
As a Research Intern, I am working on the problem statement “Metadata Preprocessing and Feature Engineering for Diabetic Retinopathy Risk Scoring” using the OLIVES dataset. My role involves cleaning and standardizing clinical metadata, engineering features such as age groups, blood sugar levels, and visual acuity, and identifying key predictors for DR risk. I’ve built a preprocessing pipeline and implemented baseline models like logistic regression and random forest to validate the effectiveness of the features. The goal is to develop an interpretable risk scoring framework that supports early diagnosis using metadata alone.
Led the coordination of 10+ technical events and 5+ workshops on campus, handling logistics, promotion, and participant engagement. Successfully managed large-scale events including CODIGO (200+ participants), Technical Tuesday (100+), Query Quest and Poster Presentation (250+), and Break;Through (50+).
I am currently pursuing my undergraduate studies at Pimpri Chinchwad College of Engineering Pune, a reputed institute known for its focus on technical education and innovation. The college offers a strong curriculum in engineering and technology, supported by experienced faculty and modern infrastructure. It encourages hands-on learning through projects, workshops, and industrial training. I’ve had the opportunity to lead and participate in various technical events, enhancing both my leadership and technical skills. The collaborative environment and exposure to real-world challenges have helped me grow as a problem solver. My time here has laid a strong foundation for my career in data science and machine learning.
Explored and analyzed customer demographics to uncover patterns influencing bike purchases using a clean 1000-record dataset.Ideal for marketing strategy, customer profiling, and predictive modeling.
Built an interactive Power BI dashboard to analyze global survey data from data professionals, covering roles, salaries, tools, and industry trends.Extracted key insights to support career planning and industry benchmarking in the data domain.
Built a dynamic dashboard in Power BI to analyze and visualize real-time weather data, including temperature, humidity, and wind patterns across different cities.
Developed a Python-based voice assistant capable of performing tasks like web search, application launching, and more using speech recognition and text-to-speech.Integrated APIs and automation libraries to create a functional and interactive desktop assistant.
Developed a machine learning pipeline to detect fraudulent transactions using data preprocessing, feature engineering, and model evaluation techniques. Applied supervised learning models and ensemble methods, with a focus on improving recall for fraud cases and explaining model predictions.
Developed a machine learning pipeline for predicting Titanic passenger survival using structured metadata. Focused on data cleaning, feature transformation, and decision tree model tuning to demonstrate end-to-end classification workflow.
Developed a machine learning pipeline for early-stage diabetic retinopathy risk prediction using patient metadata. Focused on feature engineering, data preprocessing, and model evaluation to support proactive medical intervention.