General purpose graphics processing units GPGPU have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high end supercomputers to embedded mobile platforms Relative to traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading hundreds of hardware thread contexts vs tens , a return to wide vector units several tens vs 1 10 , memory architectures that deliver higher peak memory bandwidth hundreds of gigabytes per second vs tens , and smaller caches scratchpad memories less than 1 megabyte vs 1 10 megabytes In this book, we provide a high level overview of current GPGPU architectures and programming models We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms We aim to provide hints to architects about understanding algorithm aspect to GPGPU We also provide detailed performance analysis and guide optimizations from high level algorithms to low level instruction level optimizations As a case study, we use n body particle simulations known as the fast multipole method FMM as an example We also briefly survey the state of the art in GPU performance analysis tools and techniques Table of Contents GPU Design, Programming, and Trends Performance Principles From Principles to Practice Analysis and Tuning Using Detailed Performance Analysis to Guide Optimization...
|Title||:||Performance Analysis and Tuning for General Purpose Graphics Processing Units (Synthesis Lectures on Computer Architecture)|
|Publisher||:||Morgan 1 edition November 26, 2012|
|Number of Pages||:||96 pages|
|File Size||:||763 KB|
|Status||:||Available For Download|
|Last checked||:||21 Minutes ago!|