Checkout my GitHub page for details of my work!
Deep AUC Maximization (DAM) is a promising approach that optimizes the Area Under the Curve (AUC) score of deep neural networks for improved classification performance. We employ the LibAUC library to conduct simulations based experiments on seven distinct medical image datasets. By exploring various strategies to mitigate overfitting and enhance DAM's performance, we aim to identify optimal configurations. Through rigorous evaluation and comparisons with benchmark results, our study provides valuable insights into the potential applications of DAM in the medical field. Check out the details of the methods implemented to cross the standard cross-entropy benchmark values along with visualizations of the methods used.
StyleMate is an end-to-end e-commerce platform that utilizes AI-powered personal fashion assistants to provide customized product recommendations to users based on their individual style preferences. Our platform provides a chat-based interface that makes it easy for users to interact with our fashion assistants and receive tailored recommendations for products that match their preferences. In addition, we provide top positive and negative reviews for each product, similar-item suggestions, and aspect-based sentiment analysis for personalized features.
Performed topic modelling using Latent Dirichlet Allocation (LDA) to discover trends in Machine learning over past 10 years using a coagulated dataset of NIPS research papers. Implemented in Python (Jupyter Notebook) parallelizing the code reducing computation time by 75%. Identified evolution of major topics in machine learning using zero-shot classification and created impactful visualizations using seaborn, matplotlib and plotly packages. LDA can be used to discover hidden themes in large collections of documents, such as text corpora, webpages, blog posts, or even tweets.
This python program can be used to automatically download (tested as on Dec 17, 2022) research papers published in NIPS (https://proceedings.neurips.cc/) between YEAR_MIN and YEAR_MAX. The code will also create a dataset containing Author, Year, Paper title, Abstract, Paper Text for each research paper which can be used for further data analytics. Implemented multiprocessing to speed up the process (from 3+ hrs to <15 mins ). This dataset has been used in the Topic Modeling project on the left!
This project focuses on analyzing the forest dataset to predict forest cover types using machine learning models and exploratory data analysis (EDA). We applied various classification algorithms, including Support Vector Machines (SVM), Random Trees, Gradient Boosted Trees, K-Nearest Neighbors, and Normal Bayes classifier, to determine the optimal method for classifying forest cover types. Through hyperparameter tuning and visualization techniques, we achieved an 18% increase in prediction accuracy and improved performance parameters across the models. Our project highlights the importance of anomaly detection, cross-validation, and data-driven approaches in enhancing the forest cover type classification.
Our project aims to conduct exploratory data analysis on the European IT market salary survey dataset comprising 3 years of data during COVID'19. By examining the salary distributions across different countries, we can uncover valuable insights and identify trends in the IT job market. Through the exploration of relationships between variables, we can gain a deeper understanding of the factors influencing IT Job market. Additionally, the identification of outliers and anomalies within the dataset allows us to address potential issues and improve the overall efficiency of the IT market.
This project is a comprehensive repository showcasing widely used recommender system algorithms in the industry. Using the Movie Lens Dataset, we provide detailed explanations and code implementations from scratch for each algorithm. The repository covers various recommender system techniques, including Neural Collaborative Filtering using PyTorch, User-User Collaborative Filtering, Matrix Factorization (MF), and Bayesian Personalized Ranking (BPR). These algorithms leverage user-item interactions to make personalized recommendations, either by considering explicit preferences or by decomposing the interaction matrix to learn latent factors.
This project focuses on the diagnosis of epileptic seizures using machine learning algorithms applied to time series EEG signal datasets. The objective is to accurately classify EEG signals into different seizure types, enabling timely intervention and personalized treatment for epilepsy patients. To address the challenge of multi-class classification, various machine learning techniques were explored, including LSTM-based models combined with ARIMA and spectral analysis. Time-frequency analysis methods such as wavelet transforms and spectrogram analysis were utilized to capture both spectral and temporal characteristics, further enhancing the accuracy of the classification models. Class imbalance issues were effectively addressed through the implementation of various sampling techniques to ensure reliable classification performance across different seizure types.
Recent Certifications
Data doesn't lie, but it does tell interesting stories!