Matt Baughman
Postdoctoral Computational Scientist // Princeton University & PPPL
Hi there! I recently started a postdoc at Princeton working within the DOE's Princeton Plasma Physics Laboratory as an Associate Computational Scientist. At PPPL, I work in the AI4Science group with Shantenu Jha. Prior to Princeton, I completed my Ph.D. in Computer Science with Globus Labs at the University of Chicago in August 2025. I was advised by Ian Foster and Kyle Chard. My research spans high-performance computing, distributed systems, and cost-aware computing. I completed my Bachelors in Computer Science and Philosphy at Minerva University and have experience at Argonne National Lab, Opinary, and Google.
science RESEARCH link
book DISSERTATIONS link
| Coralling the Computing Continuum: Enabling Multi-System Workflows with Serverless Computing [Aug 2025] link |
| Abstract | Committee | PDF | Slides | Doctoral Dissertation |
|
ABSTRACT: The computing continuum describes the convergence of global compute infrastructure as network bandwidths increase. To mobilize that infrastructure, we need to create a system that ties these diverse resources together—we need to corral the computing continuum. This effort began with task-wise solutions, addressing different components of task placement—profiling, predicting, and provisioning. These individual solutions enabled an early system that took into account compute costs, workload execution profiles, and the ability to move compute between systems. We combined and extended these works into a more robust task scheduling system called DELTA and its successor DELTA+. These systems incorporated notions of task execution time, data transfer costs, and machine performance but could not be used on batch scheduled systems or in multi-node environments. While compute is the currency of the future, there is no unified way to access that currency. To fill this gap, we introduce Adaptive Task Management (ATM)—a framework that acts as a multi-system task manager, mapping tasks to the many resources that comprise the continuum. ATM is designed on top of the Globus Compute framework, using existing infrastructure from edge devices to batch-scheduled HPC systems. ATM includes a novel placement algorithm and novel monitoring and task management systems designed to accommodate both large batches of tasks as well as more complex DAG-based workflows. To ground and evaluate the development of these frameworks, we explore the application of cost-aware principles in federated learning, material design and protein docking science applications, and in the performance optimization of serverless computing benchmarks.
|
|
Committee: Kyle Chard, Ian Foster, and Omer Rana (Cardiff University)
|
| Profiling, Predicting, and Provisioning: Enabling Cost-Aware Computation for the Cloud and Modern Heterogeneous Environments [Aug 2021] link |
| Abstract | Committee | PDF | Slides | Masters Thesis |
|
ABSTRACT: The growing prevalence of cloud resources and specialized hardware in the form of GPUs, ASICs, and IoT devices requires increasingly efficient and intentional use of these resources. Moreover, the complexity of choice presented by these diverse resources creates an optimization problem largely intractable to manual control. Therefore, modern computation in heterogeneous environments must be executed in a cost-aware, automated fashion. This control system can be decomposed into three discrete tasks: profiling, prediction, and provisioning. We profile the execution characteristics of a range of workloads on a range of hardware. Given those characteristics, we optimize our choice of resources for workload deployment based on predicted cost. Finally, we seamlessly provision any necessary resources and deploy the workload given the optimized choice of resource. In this thesis, we integrate several projects spanning the profiling, predicting, and provisioning cycle towards a unified system for the cost-aware distribution of workloads in dynamic, heterogeneous computing environments. Specifically, we develop a modular profiling system that characterizes the execution performance of scientific workflows deployed on cloud resources, employ statistical analyses and machine learning to predict the cost of using preemptible cloud resources, examine the role of computational tradeoffs in various workloads, and build on existing Function-as-a-Service (FaaS) frameworks to demonstrate a novel, cost-aware function distribution system. Through this work, we show the significant cost and time reductions for scientific workload execution, while enabling function-based distributed computing in a cost-aware heterogeneous environment.
|
|
Committee: Kyle Chard, Ian Foster, and Hank Hoffman
|
engineering PROJECTS link
Check out all of my projects on GitHub.
star SELECTED PUBLICATIONS link
Ordered by most recent.
article ALL PUBLICATIONS link
Ordered by most recent and grouped by topic. Bibtex file available for download here.
co_present PRESENTATIONS link
Ordered by most recent.