Skip to product information
1 of 1

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch by Chris Fregly

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch by Chris Fregly

Regular price Tk 650.00 BDT
Regular price Tk 1,100.00 BDT Sale price Tk 650.00 BDT
Sale Sold out
Shipping calculated at checkout.

🚚 ক্যাশ অন ডেলিভারি সারা বাংলাদেশ 🕒 ৭২ ঘন্টার মধ্যে সারা দেশ এ ডেলিভারি

Quantity

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch by Chris Fregly

The core thesis of AI Systems Performance Engineering is that throwing more hardware at an AI workload is an expensive, fundamentally flawed strategy; true scalability requires hardware, software, and algorithmic co-design. Fregly targets a widespread, costly failure pattern across modern tech companies: software teams regularly deploy massive generative or neural network models using standard, high-level code configurations without considering memory bandwidth limits, compute bound states, or data communication delays. This results in massive server bills, low hardware utility, and major inference delays.

Instead of presenting high-level conceptual summaries or basic cloud service guides, Fregly digs deeply into low-level infrastructure tuning. He teaches engineers to look past standard processing speeds to analyze "goodput"—the actual volume of useful data a system processes per unit of time. The book provides intensive coverage on diagnosing bottlenecks using advanced diagnostic tools like NVIDIA Nsight Systems and PyTorch Profiler. It shows readers how to bypass restrictive C++ boilerplate using modern compiler stacks like OpenAI Triton to write high-impact custom GPU kernels, handle memory layouts effectively, and remove processing delays across large-scale distributed computing systems.


As regional software infrastructure houses, dedicated AI cloud startups, and offshore engineering operations aggressively spin up large-scale machine learning systems, engineering groups are hitting an intense financial and operational barrier. While developers can effortlessly build and test models on small, localized training data, moving those systems into production clusters frequently results in massive cloud expenses, slow system responsiveness, and broken data flows due to underlying code bottlenecks.

AI Systems Performance Engineering delivers the definitive, low-level cure our industry requires. Chris Fregly merges his unmatched history at tech giants like Netflix and AWS with incredibly detailed code implementations. By packing over a thousand pages with clear, actionable PyTorch, CUDA C++, and Triton engineering examples—and providing an elite, 200+ point optimization checklist—this book arms machine learning platform engineers, infrastructure leads, and systems architects with the precise skills needed to run massive models at maximum speed and lowest cost. It is an indispensable desk reference for serious tech teams.

Language: English.

Genre: : High-Performance Computing.

Binding: সেলাই করা বাইন্ডিং

Quality: Premium Quality Books.

Printing: High Quality Printing.

Paper: Eye Friendly paper (Cream White)

Cover: Matt cover (Paperback).

View full details