大衛(wèi)·B 柯克(David B Kirk) 美國國家工程院院士,NVIDIA Fellow,曾任NVIDIA公司首席科學(xué)家。他領(lǐng)導(dǎo)了NVIDIA圖形技術(shù)的開發(fā),并且是CUDA技術(shù)的創(chuàng)始人之一。2002年,他榮獲ACM SIGGRAPH計(jì)算機(jī)圖形成就獎(jiǎng),以表彰其在把高性能計(jì)算機(jī)圖形系統(tǒng)推向大眾市場方面做出的杰出貢獻(xiàn)。他擁有加州理工學(xué)院計(jì)算機(jī)科學(xué)博士學(xué)位。胡文美(Wen-mei W Hwu) 美國伊利諾伊大學(xué)厄巴納-香檳分校電氣與計(jì)算機(jī)工程系A(chǔ)MD Jerry Sanders講席教授,并行計(jì)算研究中心首席科學(xué)家,領(lǐng)導(dǎo)IMPACT團(tuán)隊(duì)和CUDA卓越中心的研究工作。他在編譯器設(shè)計(jì)、計(jì)算機(jī)體系結(jié)構(gòu)、微體系結(jié)構(gòu)和并行計(jì)算方面做出了卓越貢獻(xiàn),是IEEE Fellow、ACM Fellow,榮獲了包括ACM SigArch Maurice Wilkes Award在內(nèi)的眾多獎(jiǎng)項(xiàng)。他還是MulticoreWare公司的聯(lián)合創(chuàng)始人兼CTO。他擁有加州大學(xué)伯克利分校計(jì)算機(jī)科學(xué)博士學(xué)位。
圖書目錄
Preface Acknowledgements CHAPTER1 Introduction1 11 Heterogeneous Parallel Computing2 12 Architecture of a Modern GPU6 13 Why More Speed or Parallelism?8 14 Speeding Up Real Applications10 15 Challenges in Parallel Programming 12 16 Parallel Programming Languages and Models12 17 Overarching Goals14 18 Organization of the Book15 References 18 CHAPTER2 Data Parallel Computing19 21 Data Parallelism20 22 CUDA C Program Structure22 23 A Vector Addition Kernel 25 24 Device Global Memory and Data Transfer27 25 Kernel Functions and Threading32 26 Kernel Launch37 27 Summary38 Function Declarations38 Kernel Launch38 Built-in (Predefined) Variables 39 Run-time API39 28 Exercises39 References 41 CHAPTER3 Scalable Parallel Execution43 31 CUDA Thread Organization43 32 Mapping Threads to Multidimensional Data47 33 Image Blur: A More Complex Kernel 54 34 Synchronization and Transparent Scalability 58 35 Resource Assignment60 36 Querying Device Properties61 37 Thread Scheduling and Latency Tolerance64 38 Summary67 39 Exercises67 CHAPTER4 Memory and Data Locality 71 41 Importance of Memory Access Efficiency72 42 Matrix Multiplication73 43 CUDA Memory Types77 44 Tiling for Reduced Memory Traffic84 45 A Tiled Matrix Multiplication Kernel90 46 Boundary Checks94 47 Memory as a Limiting Factor to Parallelism97 48 Summary99 49 Exercises