Research — Ayushmaan Puri

My work sits at the intersection of performance and sustainability in computing. I'm drawn to questions about architectural tradeoffs. Where to run computation, how to measure and optimize efficiency in resource-constrained environments, and what "sustainable" actually means at the systems level. I've been lucky to explore these questions across four very different research environments.

The Junkyard project repurposes discarded smartphones as compute nodes for distributed systems. My focus is autograding: in a recent offering of CSE 160, 330 students shared 40 reservable GPU pods on UCSD's DSMLP cluster. Around deadlines, demand reliably outpaced supply — students waited for feedback at exactly the moment they needed it most.

The insight is that student submissions don't need raw GPU power; they need a consistent, available execution environment. A cluster of repurposed phones can provide that at a fraction of the cost, using hardware that would otherwise be discarded. Phones are power-efficient, self-contained, and cheap(er than GPUs) to source. I'm investigating two core questions: how many phones are needed to serve a class of n students without meaningful queuing delays, and what is the throughput ceiling of a phone cluster before it degrades under load?

I worked with Prof. Patrick Pannuto and Raymond Dueñas on CPU–GPU split inference for CNN models on NVIDIA Jetson devices. While edge devices ship with multiple CPUs, these processors are ignored for AI processing in favor of the “always superior” GPU.

The core question we explored: how can we make the CPU a part of the AI inference story on devices with unified memory?

Our observation: Convolutions (which act like powerful image filters) are very parallelizable and run 370–740x faster on the GPU, while fully connected (less parallelizable) layers see only 37–38x gains.

Our hypothesis: we assign convolutions to the GPU and the remaining layers to the CPU. Unified memory lets both processors share data without copies, and a queue-based pipeline overlaps their work across batches, keeping each busy. A summary of our work can be found in this methods paper I wrote and submitted to UCSD's Summer Research Conference 2025.

This project also produced my first research talk, presented at UCSD SRC 2025. For my work in UnifiedSplitting, I received an honorable mention in the CRA outstanding undergraduate researcher awards. I'm grateful to Pat and Raymond for taking a chance on me and teaching me how research actually works.

I spent a summer at IITD SeNSE working with Prof. Ravibabu Mulaveesala on signal processing for nondestructive testing. The work involved adapting an open-source pipeline to reconstruct subsurface images using Fourier and correlation transforms, even under noisy measurement conditions.

This was my first mentored research experience — and where I learned what it actually means to do science: reading papers critically, communicating findings clearly, working through problems as a team. Tea time with Ravi sir and the PhD students was also excellent.

I worked on the sensors and circuitry team for a lunar rover prototype, improving Arduino-based control code to synchronize UV and Hall effect sensors. The team repurposed an RC car as a base — which meant adding custom suspension to keep the sensors stable enough to collect consistent data.

Through signal processing improvements and low-level code refinements, we pushed pipeline efficiency up by ~65% and sensor precision by 2.5×. I presented our findings and poster to the NASA California Space Grant Consortium at the end of the summer.

Publications & Talks

SCCUR 2025 — Southern California Conference for Undergraduate Research Nov 2025

Presented UnifiedSplitting with Parth Mehta and Fahad Alkhazam at CSU Channel Islands. Abstract · Slides

UCSD Summer Research Conference (SRC) 2025 Aug 2025

CPU–GPU split inference optimizations on edge devices; results reproduced across multiple Jetson platforms. Abstract · Slides