Preface xv C H A P T E R S 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Seven Great Ideas in Computer Architecture 10 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 Th e Power Wall 40 1.8 Th e Sea Change: Th e Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff : Benchmarking the Intel Core i7 46 1.10 Going Faster: Matrix Multiply in Python 49 1.11 Fallacies and Pitfalls 50 1.12 Concluding Remarks 53 1.13 Historical Perspective and Further Reading 55 1.14 Self-Study 55 1.15 Exercises 59 2 Instructions: Language of the Computer 66 2.1 Introduction 68 2.2 Operations of the Computer Hardware 69 2.3 Operands of the Computer Hardware 72 2.4 Signed and Unsigned Numbers 79 2.5 Representing Instructions in the Computer 86 2.6 Logical Operations 93 2.7 Instructions for Making Decisions 96 2.8 Supporting Procedures in Computer Hardware 102 2.9 Communicating with People 112 2.10 MIPS Addressing for 32-Bit Immediates and Addresses 118 2.11 Parallelism and Instructions: Synchronization 127 2.12 Translating and Starting a Program 129 2.13 A C Sort Example to Put It All Together 138 2.14 Arrays versus Pointers 147 2.15 Advanced Material: Compiling C and Interpreting Java 151 2.16 Real Stuff : ARMv7 (32-bit) Instructions 151 2.17 Real Stuff : ARMv8 (64-bit) Instructions 155 2.18 Real Stuff : RISC-V Instructions 156 2.19 Real Stuff : x86 Instructions 157 2.20 Going Faster: Matrix Multiply in C 166 2.21 Fallacies and Pitfalls 167 2.22 Concluding Remarks 169 2.23 Historical Perspective and Further Reading 172 2.24 Self Study 172 2.25 Exercises 175 3 Arithmetic for Computers 186 3.1 Introduction 188 3.2 Addition and Subtraction 188 3.3 Multiplication 193 3.4 Division 199 3.5 Floating Point 206 3.6 Parallelism and Computer Arithmetic: Subword Parallelism 232 3.7 Real Stuff : Streaming SIMD Extensions and Advanced Vector Extensions in x86 234 3.8 Going Faster: Subword Parallelism and Matrix Multiply 235 3.9 Fallacies and Pitfalls 237 3.10 Concluding Remarks 241 3.11 Historical Perspective and Further Reading 245 3.12 Self Study 245 3.13 Exercises 248 4 The Processor 254 4.1 Introduction 256 4.2 Logic Design Conventions 260 4.3 Building a Datapath 263 4.4 A Simple Implementation Scheme 271 4.5 A Multicycle Implementation 284 4.6 An Overview of Pipelining 285 4.7 Pipelined Datapath and Control 298 4.8 Data Hazards: Forwarding versus Stalling 315 4.9 Control Hazards 328 4.10 Exceptions 337 4.11 Parallelism via Instructions 344 4.12 Putting It All Together: Th e Intel Core i7 6700 and ARM Cortex-A53 358 4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 366 4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 368 4.15 Fallacies and Pitfalls 369 4.16 Concluding Remarks 370 4.17 Historical Perspective and Further Reading 371 4.18 Self-Study 371 4.19 Exercises 372 5 Large and Fast: Exploiting Memory Hierarchy 390 5.1 Introduction 392 5.2 Memory Technologies 396 5.3 Th e Basics of Caches 401 5.4 Measuring and Improving Cache Performance 416 5.5 Dependable Memory Hierarchy 436 5.6 Virtual Machines 442 5.7 Virtual Memory 446 5.8 A Common Framework for Memory Hierarchy 472 5.9 Using a Finite-State Machine to Control a Simple Cache 479 5.10 Parallelism and Memory Hierarchies: Cache Coherence 484 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 488 5.12 Advanced Material: Implementing Cache Controllers 488 5.13 Real Stuff : Th e ARM Cortex-A8 an......