Selected Projects
[Conformalized Quantile Regression for Energy Disaggregation] Building upon the previously accepted [paper] at Buildsys22, we have used distribution-free likelihood by using Conformal Prediction. Implemented distribution-free uncertainty quantification for energy disaggregation using Conformalized Quantile Regression disaggregation, utilizing the sequence-to-point (S2P) model and publicly available REDD dataset, ensuring reliable prediction intervals and coverage without assuming specific distributions. Applied smoothing to mitigate same score function outputs in case of sparsely used appliances. Achieved superior or comparable results relative to state-of-the-art, compute-intensive distribution-dependent uncertainty quantification methods with 10 times faster training and 10 - 50 times faster inference. Submitted in Special Track on Web4Good of The Web Conference 2024. (Previously known as WWW Conference)
[Employing Coresets in Machine Learning] Conducted extensive research on coresets, analyzed mathematical bounds of predictions obtained in problems using linear regression and K-means clustering and highlighted the lack of guarantees for logistic regression. Developed and implemented a new importance sampling algorithm for the K-means classifier, achieving comparable results to the original algorithm on real-world clusterable datasets, namely, Dry Beans and MNIST.
[CNAPs for Multitask Few-Shot Classification and Regression] Reproduced the architecture of the Conditional Neural Adaptive Processes (CNAPs) for Multitask few-shot classification, tested it on MNIST, CIFAR10, and CIFAR100, and analyzed the results for increasing number of shots. Built a pre-processing pipeline in CNAPs that supports our own complex datasets, namely Celeb10 (10 facial features as classes), CelebFaces (150 celebrity names as classes), and CelebCars (100 celebrity names and 50 car models as classes). Compared the results of CNAPs on Celeb10 and CelebFaces with Transfer Learning as a baseline and obtained better results in CNAPs for same number of shots. Contributed to implementing CNAPs for regression from scratch and achieved better performance than Hypernetworks and comparable performance with Conditional Neural Processes (CNP).
[Patient Vitals Extraction from ICU Monitor Screens, Challenge by Cloudphysician] Developed an innovative deep learning model utilizing YOLOv7 for real-time extraction and interpretation of vital signs such as heart rate, blood pressure, and pulse rate from patient monitor screens, contributing to a Smart ICU solution aimed at enabling doctors to monitor multiple patients concurrently from their workstations. Implemented a comprehensive pipeline encompassing monitor screen segmentation, object classification, and graph digitization using advanced techniques like Easy OCR and clustering algorithms, ensuring accurate and contextually relevant extraction of patient vitals for enhanced clinical decision-making.
[[Multimodal Content Analysis and Generation for Social Media Platforms, Challenge by Adobe]] Developing a sophisticated DNN for predicting the popularity of tweets (likes) using a multimodal representation of data, including timestamps, content, and images using ResNet-50 and Universal Sentence Encoder. Utilized Bootstrapping Language-Image Pre-training (BLIP) for initial image-based captioning, enriching content with media insights, and fine-tuned LLaMA-2 for enhanced tweet accuracy and relevance. Established a sophisticated pipeline, effectively combining image captioning and large language models to generate contextually rich and engaging tweets.
[Statistical Language Modeling using N-Grams] Spearheaded a comprehensive NLP pipeline for processing a vast dataset of Reddit comments, initiating data cleaning and preprocessing techniques to refine and tokenize the corpus, ensuring optimal quality for subsequent analysis. Engineered a custom Language Modeling class from the ground up, integrating advanced smoothing techniques including Laplace, Add-k, and Good-Turing for n-gram models (unigram to pentagram), meticulously evaluating their efficacy through rigorous log perplexity value comparisons to ascertain the most effective smoothing strategy.
[Global Income Inequality Analysis: Challenging Gaussian Distribution Assumptions] Conducted an in-depth analysis of global income inequality using comprehensive datasets, employing key metrics such as the Gini index, Atkinson Coefficient, and percentile ratios to evaluate wealth distribution patterns across multiple countries. Employed advanced data visualization techniques to present compelling plots conclusively refuting the Gaussian distribution assumption for income and wealth, highlighting significant skewness and long-tailed distributions indicative of pervasive global income inequalities.
[Notpy - Our own programming language] Worked in a team of 5 to develop a new programming language, “notpy,” from scratch, including language rules, data structures, and core features by building tree walking and bytecode interpreters. Earned top honors in a rigorous hackathon by successfully solving all twenty Euler problems using “notpy” while addressing language bugs and eventually ranked second in the course.
[Alumni Database Management System] Led a 10-member team in developing an Alumni Database System for IIT Gandhinagar from ER Diagram to MySQL database creation, with Flask hosting and Tailwind CSS-driven UI, facilitating role-specific access. Engineered features encompassing data management functionalities like insertion, updates, deletions, filtering, indexing, and custom searches, empowering admin with data export, import, and mailing tools.
[Analysis of Web Cookies and identifying vulnerabilities] Conducted an in-depth analysis of Web Cookies of top websites using tools like Requests and Selenium. Scraped cookies of around 2500 IITGN websites using Hackrawler in Kali Linux and analyzed vulnerabilities using TLS versions. Attempted basic ARP spoofing and XSS attacks on IITGN websites.