并行程序設(shè)計C、MPI與OpenMP

定　價：￥33.00

作　者：	Michael J.Quinn著
出版社：	清華大學(xué)出版社
叢編項：	大學(xué)計算機教育國外著名教材系列
標(biāo)　簽：	程序理論

購買這本書可以去

ISBN：	9787302111573	出版時間：	2005-08-01	包裝：	膠版紙
開本：	21cm	頁數(shù)：	519	字?jǐn)?shù)：

內(nèi)容簡介

　　本書是美國Oregon州立大學(xué)的Miachael J. Quinn教授在多年講授“并行程序設(shè)計”課程的基礎(chǔ)上編寫而成的，主要介紹用C語言，并結(jié)合使用MPI和OpenMP進(jìn)行并行程序設(shè)計，內(nèi)容包括并行體系結(jié)構(gòu)、并行算法設(shè)計、消息傳遞編程、Eratosthenes篩選、Floyd算法、性能分析、矩陣向量乘法、文檔分類、蒙特卡洛法、矩陣乘法、線性方程組求解、有限差分方法、排序、快速傅立葉變換、組合搜索、共享存儲編程、融合OpenMP和MPI以及5個附錄。本書按授課方式安排章節(jié)，通過劃分、通信、集聚和映射等四步的并行程序設(shè)計方法，來解決各種實際的并行性問題，使讀者掌握系統(tǒng)化的并行程序設(shè)計方法，開發(fā)出高效的并行程序。本書不僅是一本優(yōu)秀的并行程序設(shè)計教材，對廣大的相關(guān)專業(yè)人員也很有參考價值。

作者簡介

暫缺《并行程序設(shè)計C、MPI與OpenMP》作者簡介

圖書目錄

CHAPTER
Motivation and History
1.1 Introduction 1
1.2 Modern Scientific Method 3
1.3 Evolution of Supercomputing
1.4 Modern Parallel Computers
1.4.1 The Cosmic Cube 6
1.4.2 Commercial Parallel Computers 6
1.4.3 Beowulf 7
1.4.4 Advanced Strategic Computing Initiative 8
1.5 Seeking Concurrency 9
1.5.1 Data Dependence Graphs 9
1.5.2 Data Parallelism I0
1.5.3 Functional Parallelism I0
1.5.4 Pipelining 12
1.5.5 Size Considerations 13
1.6 Data Clustering 14
1.7 Programming Parallel Computers 17
1.7.1 Extenda Compiler 17
1.7.2 Extend a Sequential Programming Language 18
1.7.3 Add a Parallel Programming Layer 19
1.7.4 Create a Parallel Language 19
1.7.5 Current Status 21
1.8 Summary 21
1.9 Key Terms 22
1.10 Bibliographic Notes 22
1.11 Exercises 23
CHAPTER 2
Parallel Architectures 27
2.1 Introduction 27
2.2 Interconnection Networks 28
2.2.1 Shared versus Switched Media 28
2.2.2 Switch Network Topologies 29
2.2.3 2-D Mesh Network 29
2.2.4 Binary Tree Network 30
2.2.5 Hypertree Network 31
2.2.6 Butterfly Network 32
2.2.7 Hypercube Network 33
2.2.8 Shuffle-exchange Network 35
2.2.9 Summary 36
2.3 Processor Arrays 37
2.3.1 Architecture and Data-parallel Operations 37
2.3.2 Processor Array Performance 39
2.3.3 Processor Interconnection Network 40
2.3.4 Enabling and Disabling Processors 40
2.3.5 Additional ArchitecturaI Features 42
2.3.6 Shortcomings of Processor Arrays 42
2.4 Multiprocessors 43
2.4.1 Centralized Multiprocessors 43
2.4.2 Distributed Multiprocessors 45
2.5 Multicomputers 49
2.5.1 Asymmetrical Multicomputers 49
2.5.2 Symmetrical Multicomputers 51
2.5.3 Which Model Is Best for a Commodiy Cluster? 52
2.5.4 Differences between Clusters and Networks of Workstations 53
2.6 Flynn's Taxonomy 54
2.6.1 SISD 54
2.6.2 SIMD 55
2.6.3 MISD 55
2.6.4 MIMD 56
2.7 Summary 58
2.8 Key Terms 59
2.9 Bibliographic Notes 59
2.10 Exercises 60
CHAPTER 3
Parallel Algorithm Design 63
3.1 Introduction 63
3.2 The Task/Channel Model 63
3.3 Foster's Design Methodology 64
3.3.1 Partitioning 65
3.3.2 Communication 67
3.3.3 Agglomeration 68
3.3.4 Mapping 70
3.4 Boundary Value Problem 73
3.4.1 Introduction 73
3.4.2 Partitioning 75
3.4.3 Communication 75
3.4.4 Agglomeration and Mapping 76
3.4.5 Analysis 76
3.5 Finding the Maximum 77
3.5.1 Introduction 77
3.5.2 Partitioning 77
3.5.3 Communication 77
3.5.4 Agglomeration and Mapping 81
3.5.5 Analysis 82
3.6 The n-Body Problem 82
3.6.1 Introduction 82
3.6.2 Partitioning 83
3.6.3 Communication 83
3.6.4 Agglomeration and Mapping 85
3.6.5 Analysis 85
3.7 Adding Data Input 86
3.7.1 Introduction 86
3.7.2 Communication 87
3.7.3 Analysis 88
3.8 Summary 89
3.9 Key Terms 90
3.10 Bibliographic Notes 90
3.11 Exercises 90
CHAPTER 4
Message.Passing Programming 93
4.1 Introduction 93
4.2 The Message-Passing Model 94
4.3 The Message-Passing Interface 95
4.4 Circuit Satisfiability 96
4.4.1 Function MPI Init 99
4.4.2 Functions MPI Comm rank and MPI Comm size 99
4.4.3 Function MPI Finalize 101
4.4.4 Compiling MPI Programs I02
4.4.5 Running MPI Programs 102
4.5 Introducing Collective Communication 104
4.5.1 Function MPI Reduce 105
4.6 Benchmarking Parallel Performance 108
4.6.1 Functions MPI wt ime and MPI Wtick 1.8
4.6.2 Function MPI Barri er 108
4.7 Summary 110
4.8 Key Terms I10
4.9 Bibliographic Notes 110
4.10 Exercises 111
CHAPTER 5
The Sieve of Eratosthenes 115
5.1 Introduction 115
5.2 Sequential Algorithm 115
5.3 Sources of Parallelism 117
5.4 Data Decomposition Options 117
5.4.1 Interleaved Data Decomposition 118
5.4.2 Block Data Decomposition !18
5.4.3 Block Decomposition Macros 120
5.4.4 Local lndex versus Global Index 120
5.4.5 Ramifications of Block
Decomposition 121
5.5 Developing the Parallel Algorithm 121
5.5.1 Function MPI Bcast 122
5.6 Analysis of Parallel Sieve Algorithm 122
5.7 Documenting the Parallel Program 123
5.8 Benchmarking 128
5.9 Improvements 129
5.9.1 Delete Even Integers 129
5.9.2 Eliminate Broadcast 130
5.9.3 Reorganize Loops 131
5.9.4 Benchmarking 131
5.10 Summary 133
5.11 Key Terms 134
5.12 Bibliographic Notes 134
5.13 Exercises 134
CHAPTER 6
Floyd's Algorithm 137
6.1 Introduction 137
6.2 The All-Pairs Shortest-Path
Problem 137
6.3 Creating Arrays at Run Time 139
6.4 Designing the Parallel Algorithm 140
6.4.1 Partitioning 140
6.4.2 Communication 141
6.4.3 Agglomeration and Mapping 142
6.4.4 Matrix Input/Output 143
6.5 Point-to-Point Communication 145
6.5.1 Function MPI Send 146
6.5.2 Function MPI Recv 147
6.5.3 Deadlock 148
6.6 Documenting the Parallel
Program 149
6.7 Analysis and Benchmarking 151
6.8 Summary 154
6.9 Key Terms 154
6.10 Bibliographic Notes 154
6.11 Exercises 154
CHAPTER 7
Performance Analysis 159
7.1 Introduction 159
7.2 Speedup and Efficiency 159
7.3 Amdahl's Law 161
7.3.1 Limitations of Amdahl's Law 164
7.3.2 The Amdahl Effect 164
7.4 Gustafson-Barsis's Law 164
7.5 The Karp-Flatt Metric 167
7.6 The Isoefficiency Metric 170
7.7 Summary 174
7.8 Key Terms 175
7.9 Bibliographic Notes 175
7.10 Exercises 176
CHAPTER 8
Matrix.Vector Multiplication 178
8.1 Introduction 178
8.2 Sequential Algorithm 179
8.3 Data Decomposition Options 180
8.4 Rowwise Block-Striped Decomposition 181
8.4.1 Design and Analysis 181
8.4.2 Replicating a Block-Mapped Vector 183
8.4.3 Function MPI Allqatherv 184
8.4.4 Replicated Vector Input/Output 186
8.4.5 Documenting the Parallel Program 187
8.4.6 Benchmarking 187
8.5 Columnwise Block-Striped Decomposition 189
8.5.1 Design and Analysis 189
8.5.2 Reading a Columnwise Block-Striped Matrix 191
8.5.3 Function MPI Scatterv 191
8.5.4 Printing a Coluranwise Block-Striped Matrix 193
8.5.5 Function MPI Gatherv 193
8.5.6 Distributing Partial Results 195
8.5.7 Function MPI_Al l t oal l v 195
8.5.8 Documenting the Parallel Program 196
8.5.9 Benchmarking 198
8.6 Checkerboard Block Decomposition 199
8.6.1 Design and Analysis 199
8.6.2 Creating a Communicator 202
8.6.3 Function MPI Dims creat e 203
8.6.4 Function MPI Cart create 204
8.6.5 Reading a Checkerboard Matrix 205
8.6.6 Function MPI Cart rank 205
8.6.7 Function MPI Cart coords 207
8.6.8 Function MPI Comm split 207
8.6.9 Benchmarking 208
8.7 Summary 210
8.8 Key Terms 211
8.9 Bibliographic Notes 211
8.10 Exercises 211
CHAPTER 9
Document Classification 216
9.1 Introduction 216
9.2 Parallel Algorithm Design 217
9.2.1 Partitioning and Comraunication 217
9.2.2 Agglomeration and Mapping 217
9.2.3 Manager/Worker Paradigm 218
9.2.4 Manager Process 219
9.2.5 Function MPI Abort 220
9.2.6 Worker Process 221
9.2.7 Creating a Workers-only
Communicator 223
9.3 Nonblocking Communications 223
9.3.1 Manager's Communication 224
9.3.2 Function MPI Irecv 224
9.3.3 Function MPI Wai t 225
9.3.4 Workers' Communications 225
9.3.5 Function MPI Isend 225
--
9.3.6 Function MPI Probe 225
9.3.7 Function MPI Get count 226
9.4 Documenting the Parallel Program 226
9.5 Enhancements 232
9.5.1 Assigning Groups of Documents 232
9.5.2 Pipelining 232
9.5.3 Function MPI Testsome 234
9.6 Summary 235
9.7 Key Terms 236
9.8 Bibliographic Notes 236
9.9 Exercises 236
CHAPTER 10
Monte Carlo Methods 239
10.1 Introduction 239
10.1.1 Why Monte Carlo Works 240
10.1.2 Monte Carlo and Parallel Computing 243
10.2 Sequential Random Number Generators 243
10.2.1 Linear Congruential 244
10.2.2 Lagged Fibonacci 245
10.3 Parallel Random Number Generators 245
10.3.1 Manager-Worker Method 246
10.3.2 Leapfrog Method 246
10.3.3 Sequence Splitting 247
10.3.4 Parameterization 248
10.4 Other Random Number Distributions 248
10.4.1 lnverse Cumulative Distribution Function Transformation 249
10.4.2 Box-Muller Transformation 250
10.4.3 The Rejection Method 251
10.5 Case Studies 253
10.5.1 Neutron Transport 253
10.5.2 Temperature at a Point Insidea 2-D Plate 255
10.5.3 Two-Dimensional lsing Model 257
10.5.4 Room Assignment Problem 259
10.5.5 Parking Garage 262
10.5.6 TraJ]ic Circle 264
10.6 Summary 268
10.7 Key Terms 269
10.8 Bibliographic Notes 269
10.9 Exercises 270
CHAPTER 11
Matrix Multiplication 273
11.1 Introduction 273
11.2 Sequential Matrix Multiplication 274
11.2.1 lterative, Row-Oriented
Algorithm 274
11.2,2 Recursive, Block-Oriented
Algorithm 275
11.3 Rowwise Block-Striped Parallel
Algorithm 277
11.3.1 Identifying Primitive Tasks 277
11.3,2 Agglomeration 278
11,3.3 Communication and Further
Agglomeration 279
11.3.4 Analysis 279
11.4 Cannon's Algorithm 281
11.4.1 Agglomeration 281
11.4.2 Communication 283
11.4.3 Analysis 284
11.5 Summary 286
11.6 Key Terms 287
11.7 Bibliographic Notes 287
11.8 Exercises 287
CHAPTER t2
Solying Linear Systems 2go
12.1 Introduction 290
12.2 Terminology 291
12.3 Back Substitution 292
12.3.1 Sequential Algorithm 292
12.3.2 Row-Oriented Parallel Algorithm 293
12.3.3 Column-Oriented Parallel
Algorithm 295
12.3.4 Comparison 295
12.4 Gaussian Elimination 296
12.4.1 Sequential Algorithm 296
12.4.2 Parallel Algorithms 298
12,4.3 Row-Oriented Algorithm 299
12.4.4 Column-Oriented Algorithm 303
12.4.5 Comparison 303
12.4.6 Pipeline& Row-Oriented Algorithm 304
12.5 Iterative Methods 306
12.6 The Conjugate Gradient Method 309
12.6.1 Sequential Algorithm 309
12.6.2 Parallel Implementation 310
12.7 Summary 313
12.8 Key Terms 314
12.9 Bibliographic Notes 314
12.10 Exercises 314
CHAPTER 13
Finite Difference Methods 318
13.1 Introduction 318
13.2 Partial Differential Equations 320
13.2.1 Categorizing PDEs 320
13.2.2 Difference Quotients 321
13.3 Vibrating String 322
13.3.1 Deriving Equations 322
13.3.2 Deriving the Sequential Program 323
13.3.3 Parallel Program Design 324
13.3.4 Isoefficiency Analysis 327
13.3.5 Replicating Computations 327
13.4 Steady-State Heat Distribution 329
13.4.1 Deriving Equations 329
13.4.2 Deriving the Sequential Program 330
13.4.3 Parallel Program Design 332
13.4.4 Isoeffciency Analysis 332
13.4.5 Implementation Details 334
13.5 Summary 334
13.6 Key Terms 335
13.7 Bibliographic Notes 335
13.8 Exercises 335
CHAPTER 14
Sorting 338
14.1 Introduction 338
14.2 Quicksort 339
14.3 A Parallel Quicksort Algorithm 340
14.3.1 Definition of Sorted 340
14.3.2 Algorithm Development 341
14.3.3 Analysis 341
14.4 Hyperquicksort 343
14.4.1 Algorithm Description 343
14.4.2 lsoefficiency Analysis 345
14.5 Parallel Sorting by Regular Sampling 346
14.5.1 Algorithm Description 346
14.5.2 Isoefficiency Analysis 347
14.6 Summary 349
14.7 Key Terms 349
14.8 Bibliographic Notes 350
14.9 Exercises 350
CHAPTER 15
The Fast Fourier Transform 353
15.1 Introduction 353
15.2 Fourier Analysis 353
15.3 The Discrete Fourier Transform 355
15.3.1 Inverse Discrete Fourier
Transform 357
15.3.2 Sample Application: Polynomial
Multiplication 357
15.4 The Fast Fourier Transform 360
15.5 Parallel Program Design 363
15.5.1 Partitioning and Communication 363
15.5.2 Agglomeration and Mapping 365
15.5.3 lsoefficiency Analysis 365
15.6 Summary 367
15.7 Key Terms 367
15.8 Bibliographic Notes 367
15.9 Exercises 367
CHAPTER 16
Combinatorial-Search 369
16.1 Introduction 369
16.2 Divide and Conquer 370
16.3 Backtrack Search 371
16.3.1 Example 371
16.3.2 Time and Space Complexit3' 374
16.4 Parallel Backtrack Search 374
16.5 Distributed Termination Detection 377
16.6 Branch and Bound 380
16.6.1 Example 380
16.6.2 Sequential Algorithm 382
16.6.3 Analysis 385
16.7 Parallel Branch and Bound 385
16.7.1 Storing and Sharing Unexamined
Subproblems 386
16.7.2 Efficiency 387
16.7.3 Halting Conditions 387
16.8 Searching Game Trees 388
16.8.1 Minimax Algorithm 388
16.8.2 Alpha-Beta Pruning 392
16.8.3 Enhancements to Alpha-Beta
Pruning 395
16.9 Parallel Alpha-Beta Search 395
16.9.1 Parallel Aspiration Search 396
16.9.2 Parallel Subtree Evaluation 396
16.9.3 Distributed Tree Search 397
16.10 Summary 399
16.11 Key Terms 400
16.12 Bibliographic Notes 400
16.13 Exercises 401
CHAPTER 17
Shared-Memory Programming 404
17.1 Introduction 404
17.2 The Shared-Memory Model 405
17.3 Parallel for Loops 407
17.3.1 parallel for Pragma 408
17.3.2 Function omp get num procs 410
17.3.3 Function omp set num threads 410
17.4 Declaring Private Variables 410
17.4.1 private Clause 411
17.4.2 frstprivate Clause 412
17.4.3 lastprivate Clause 412
17.5 Critical Sections 413
17.5.1 critical Pragma 415
17.6 Reductions 415
17.7 Performance Improvements 417
17.7.1 Inverting Loops 417
17.7.2 Conditionally Executing Loops 418
17.7.3 Scheduling Loops 419
17.8 More General Data Parallelism 421
17.8.1 parallel Pragma 422
17.8.2 Function omp get thread_num 423
17.8.3 Function omp get num threads 425
17.8.4 for Pragma 425
17.8.5 singlePragma 427
17.8.6 nowai t Clause 427
17.9 Functional Parallelism 428
17.9.I parallel sections Pragma 429
17.9.2 sectionPragma 429
17.9.3 sections Pragma 429
17.10 Summary 430
17.11 Key Terms 432
17.12 Bibliographic Notes 432
17.13 Exercises 433
CHAPTER 18
Combining MPI and OpenMP 436
18.1 Introduction 436
18.2 Conjugate Gradient Method 438
18.2.1 MPI Program 438
18.2.2 Functional Profiling 442
18.2.3 Parallelizing Function
matrix vector product 442
18.2.4 Benchmarking 443
18.3 Jacobi Method 444
18.3.1 Profiling MPI Progmm 444
1&3.2 Parallelizing Function find steady state 444
1&3.3 Benchmarkmg 446
18.4 Summary 448
18.5 Exercises 448
APPENDIX A
MPI Functions 450
APPENDIX a
Utility Functions 485
B.1 Header FileMyMPI.h 485
B.2 Source File MyMPI.c 486
APPENDIX C
Debugging MPI Programs 505
C.1 Introduction 505
C.2 Typical Bugs in MPI Programs 505
C.2.1 Bugs Resulting in Deadlock 505
C.2.2 Bugs Resulting in Incorrect Results 506
C.2.3 Advantages of Collective
Communications 507
C.3 Practical Debugging Strategies 507
APPENDIX D
Review of Complex Numbers 509
APPENIX E
OpenMP Functions 513
Bibliography 515