I work with Prof. Patrick Pannuto and Raymond Dueñas on CPU–GPU split inference for CNN models on NVIDIA Jetson edge devices. The core question: given a model too large to run entirely on a GPU, at what layer should you cut it — and does the answer change when you optimize for energy rather than throughput?
This project produced my first full research paper, presented at UCSD SRC 2025 and being integrated into a submission to an international conference (name redacted for double-blind review). I'm grateful to Pat and Raymond for taking a chance on me and teaching me how research actually works.