{"product_id":"ai-systems-performance-engineering","title":"AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch by Chris Fregly","description":"\u003ch2\u003eAI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch by Chris Fregly\u003c\/h2\u003e\n\u003cp data-path-to-node=\"6\"\u003e\u003cspan class=\"\"\u003eThe core thesis of \u003c\/span\u003e\u003ci data-path-to-node=\"6\" data-index-in-node=\"19\" class=\"\"\u003eAI Systems Performance Engineering\u003c\/i\u003e\u003cspan class=\"\"\u003e is that \u003c\/span\u003e\u003cb data-path-to-node=\"6\" data-index-in-node=\"62\" class=\"\"\u003ethrowing more hardware at an AI workload is an expensive, fundamentally flawed strategy; true scalability requires hardware, software, and algorithmic co-design.\u003c\/b\u003e\u003cspan class=\"\"\u003e Fregly targets a widespread,\u003c\/span\u003e\u003cspan class=\"\"\u003e costly failure pattern across modern tech companies:\u003c\/span\u003e\u003cspan class=\"\"\u003e software teams regularly deploy massive generative or neural network models using standard,\u003c\/span\u003e\u003cspan class=\"\"\u003e high-level code configurations without considering memory bandwidth limits,\u003c\/span\u003e\u003cspan class=\"\"\u003e compute bound states,\u003c\/span\u003e\u003cspan class=\"\"\u003e or data communication delays.\u003c\/span\u003e\u003cspan class=\"\"\u003e This results in massive server bills,\u003c\/span\u003e\u003cspan class=\"\"\u003e low hardware utility,\u003c\/span\u003e\u003cspan class=\"\"\u003e and major inference delays.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp data-path-to-node=\"7\"\u003e\u003cspan class=\"\"\u003eInstead of presenting high-level conceptual summaries or basic cloud service guides,\u003c\/span\u003e\u003cspan class=\"\"\u003e Fregly digs deeply into low-level infrastructure tuning.\u003c\/span\u003e\u003cspan class=\"\"\u003e He teaches engineers to look past standard processing speeds to analyze \"goodput\"—the actual volume of useful data a system processes per unit of time.\u003c\/span\u003e\u003cspan class=\"\"\u003e The book provides intensive coverage on diagnosing bottlenecks using advanced diagnostic tools like NVIDIA Nsight Systems and PyTorch Profiler.\u003c\/span\u003e\u003cspan class=\"\"\u003e It shows readers how to bypass restrictive C++ boilerplate using modern compiler stacks like \u003c\/span\u003e\u003cb data-path-to-node=\"7\" data-index-in-node=\"531\" class=\"\"\u003eOpenAI Triton\u003c\/b\u003e\u003cspan class=\"\"\u003e to write high-impact custom GPU kernels,\u003c\/span\u003e\u003cspan class=\"\"\u003e handle memory layouts effectively,\u003c\/span\u003e\u003cspan class=\"\"\u003e and remove processing delays across large-scale distributed computing systems.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c!----\u003e\u003c\/p\u003e\n\u003cdiv class=\"code-block ng-tns-c3878679226-132 ng-animate-disabled ng-trigger ng-trigger-codeBlockRevealAnimation\" data-hveid=\"0\" data-ved=\"0CAAQhtANahgKEwjP-v3438mUAxUAAAAAHQAAAAAQggQ\"\u003e\n\u003c!----\u003e\n\u003cdiv class=\"formatted-code-block-internal-container ng-tns-c3878679226-132\"\u003e\n\u003cdiv class=\"animated-opacity ng-tns-c3878679226-132\"\u003e\n\u003c!----\u003e\n\u003cpre class=\"ng-tns-c3878679226-132\"\u003e\u003ccode role=\"text\" data-test-id=\"code-content\" class=\"code-container formatted ng-tns-c3878679226-132 no-decoration-radius\"\u003e\u003c\/code\u003e\u003cbr\u003e\u003c\/pre\u003e\n\u003cp data-path-to-node=\"29\"\u003e\u003cspan class=\"\"\u003eAs regional software infrastructure houses,\u003c\/span\u003e\u003cspan class=\"\"\u003e dedicated AI cloud startups,\u003c\/span\u003e\u003cspan class=\"\"\u003e and offshore engineering operations aggressively spin up large-scale machine learning systems,\u003c\/span\u003e\u003cspan class=\"\"\u003e engineering groups are hitting an intense financial and operational barrier.\u003c\/span\u003e\u003cspan class=\"\"\u003e While developers can effortlessly build and test models on small,\u003c\/span\u003e\u003cspan class=\"\"\u003e localized training data,\u003c\/span\u003e\u003cspan class=\"\"\u003e moving those systems into production clusters frequently results in massive cloud expenses,\u003c\/span\u003e\u003cspan class=\"\"\u003e slow system responsiveness,\u003c\/span\u003e\u003cspan class=\"\"\u003e and broken data flows due to underlying code bottlenecks.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp data-path-to-node=\"30\"\u003e\u003ci data-path-to-node=\"30\" data-index-in-node=\"0\" class=\"\"\u003eAI Systems Performance Engineering\u003c\/i\u003e\u003cspan class=\"\"\u003e delivers the definitive,\u003c\/span\u003e\u003cspan class=\"\"\u003e low-level cure our industry requires.\u003c\/span\u003e\u003cspan class=\"\"\u003e Chris Fregly merges his unmatched history at tech giants like Netflix and AWS with incredibly detailed code implementations.\u003c\/span\u003e\u003cspan class=\"\"\u003e By packing over a thousand pages with clear,\u003c\/span\u003e\u003cspan class=\"\"\u003e actionable PyTorch,\u003c\/span\u003e\u003cspan class=\"\"\u003e CUDA C++,\u003c\/span\u003e\u003cspan class=\"\"\u003e and Triton engineering examples—and providing an elite,\u003c\/span\u003e\u003cspan class=\"\"\u003e 200+ point optimization checklist—this book arms machine learning platform engineers,\u003c\/span\u003e\u003cspan class=\"\"\u003e infrastructure leads,\u003c\/span\u003e\u003cspan class=\"\"\u003e and systems architects with the precise skills needed to run massive models at maximum speed and lowest cost.\u003c\/span\u003e\u003cspan class=\"\"\u003e It is an indispensable desk reference for serious tech teams.\u003c\/span\u003e\u003c\/p\u003e\n\u003csection id=\"shopify-section-template--21334189375673__main\" class=\"shopify-section section\"\u003e\n\u003cdiv class=\"page-width\"\u003e\n\u003cdiv class=\"product product--medium product--left product--thumbnail product--mobile-hide grid grid--1-col grid--2-col-tablet\"\u003e\n\u003cdiv class=\"product__info-wrapper grid__item scroll-trigger animate--slide-in\"\u003e\n\u003csection id=\"ProductInfo-template--21334189375673__main\" class=\"product__info-container product__column-sticky\"\u003e\n\u003cdiv class=\"product__description rte quick-add-hidden\"\u003e\n\u003cdiv class=\"code-block ng-tns-c3299913081-108 ng-animate-disabled ng-trigger ng-trigger-codeBlockRevealAnimation\" data-hveid=\"0\" data-ved=\"0CAAQhtANahgKEwjvvJOByMeUAxUAAAAAHQAAAAAQ-gI\"\u003e\n\u003cdiv class=\"formatted-code-block-internal-container ng-tns-c3299913081-108\"\u003e\n\u003cdiv class=\"animated-opacity ng-tns-c3299913081-108\"\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003eLanguage: English.\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003eGenre: \u003cb data-path-to-node=\"32,1,0\" data-index-in-node=\"0\"\u003e:\u003c\/b\u003e\u003cspan class=\"citation-296 citation-end-296\"\u003e High-Performance Computing.\u003c\/span\u003e\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003eBinding: সেলাই করা বাইন্ডিং\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003eQuality: Premium Quality Books.\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003ePrinting: High Quality Printing.\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003ePaper: Eye Friendly paper (Cream White)\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003eCover: Matt cover (Paperback).\u003c\/strong\u003e\u003c\/span\u003e\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv id=\"shopify-block-AbTNjUDdGSldnOEd5a__judge_me_reviews_preview_badge_MPK6Xc\" class=\"shopify-block shopify-app-block\"\u003e\u003c\/div\u003e\n\u003c\/section\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/section\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e","brand":"Royal Books BD","offers":[{"title":"Default Title","offer_id":47231951405241,"sku":null,"price":650.0,"currency_code":"BDT","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0780\/0874\/6169\/files\/231191100.jpg?v=1779351764","url":"https:\/\/royalbooksbd.com\/products\/ai-systems-performance-engineering","provider":"Royal Books BD","version":"1.0","type":"link"}