PROPOSAL
Extending GPU Memory Profiling Dataset with Transformer-Based Models
This project focuses on extending an existing dataset for predicting GPU memory requirements during deep learning training by incorporating transformer-based models such as BERT, GPT, and their variants. The student will study the architecture of these models and develop training scripts to run them under controlled conditions.
During training, key GPU metrics—including memory usage, utilization, and runtime characteristics—will be recorded. These observations will be used to build a richer dataset that links model specifications with their GPU resource needs. The extended dataset will help improve the accuracy and generalizability of GPU memory estimation models, particularly for transformer workloads. The project will also offer practical experience in profiling deep learning training and understanding the hardware demands of modern AI models.