preface chapter 1 introduction 1.1 is pattern recognition important? 1.2 features, feature vectors, and classifiers 1.3 supervised versus unsupervised pattern recognition 1.4 outline of the book chapter classifiers based on bayes decision theory 2.1 introduction 2.2 bayes decision theory 2.3 discriminant functions and decision surfaces 2.4 bayesian classification for normal distributions 2.5 estimation of unknown probability density functions 2.5.1 maximum likelihood parameter estimation 2.5.2 maximum a posteriori probability estimation 2.5.3 bayesian inference 2.5.4 maximum entropy estimation 2.5.5 mixture models 2.5.6 nonparametric estimation 2.6 the nearest neighbor rule chapter 3 linear classifiers 3.1 introduction 3.2 linear discriminant functions and decision hyperplanes 3.3 the perceptron algorithm 3.4 least squares methods 3.4.1 mean square error estimation 3.4.2 stochastic approximation and the lms algorithm 3.4.3 sum of error squares estimation 3.5 mean square estimation revisited 3.5.1 mean square error regression 3.5.2 mse estimates posterior class probabilities 3.5.3 the bias-variance dilemma 3.6 support vector machines 3.6.1 separable classes 3.6.2 nonseparable classes chapter 4 nonlinear classifiers 4.1 introduction 4.2 the xor problem 4.3 the two-layer perceptron 4.3.1 classification capabilities of the two-layer perceptron 4.4 three-layer perceptrons 4.5 algorithms based on exact classification of the training set 4.6 the backpropagation algorithm 4.7 variations on the; backpropagation theme 4.8 the cost function choice 4.9 choice of the network size 4.10 a simulation example 4.11 networks with weight sharing 4.12 generalized linear classifiers 4.13 capacity of the/-dimensional space in linear dichotomies 4.14 polynomial classifiers 4.15 radial basis function networks 4.16 universal approximators 4.17 support vector machines: the nonlinear case 4.18 decision trees 4.18.1 set of questions 4.18.2 splitting criterion 4.18.3 stop-splitting rule 4.18.4 class assignment rule 4.19 discussion chapter 5 feature selection 5.1 introduction 5.2 preprocessing 5.2.1 outlier removal 5.2.2 data normalization 5.2.3 missing data 5.3 feature selection based on statistical hypothesis testing 5.3.1 hypothesis testing basics 5.3.2 application of the t-test in feature selection 5.4 the receiver operating characteristics croc curve 5.5 class separability measures 5.5.1 divergence 5.5.2 chernoff bound and bhattacharyya distance 5.5.3 scatter matrices 5.6 feature subset selection 5.6.1 scalar feature selection 5.6.2 feature vector selection 5.7 optimal feature generation 5.8 neural networks and feature generation/selection 5.9 a hint on the vapnik--chemovenkis learning theory chapter 6 feature generation i: linear transforms 6.1 introduction 6.2 basis vectors and images 6.3 the karhunen-loeve transform 6.4 the singular value decomposition 6.5 independent component analysis 6.5.1 ica based on second- and fourth-order cumulants 6.5.2 ica based on mutual information 6.5.3 an ica simulation example 6.6 the discrete fourier transform (dft) 6.6.1 one-dimensional dft 6.6.2 two-dimensional dft 6.7 the discrete cosine and sine transforms 6.8 the hadamard transform 6.9 the haar transform 6.10 the haar expansion revisited 6.11 discrete time wavelet transform (dtwt) 6.12 the multiresolution interpretation 6.13 wavelet packets 6.14 a look at two-dimensional generalizations 6.15 applications chapter 7 feature generation ii 7.1 introduction 7.2 regional features 7.2.1 features for texture characterization 7.2.2 local linear transforms for texture feature extraction 7.2.3 moments 7.2.4 parametric models 7.3 features for shape and size characterization 7.3.1 fourier features 7.3.2 chain codes 7.3.3 moment-based features 7.3.4 geometric features 7.4 a glimpse at fractals 7.4.1 self-similarity and fractal dimension 7.4.2 fractional brownian motion chapter 8 template matching 8.1 introduction 8.2 measures based on optimal path searching techniques 8.2.1 bellman's optimality principle and dynamic programming 8.2.2 the edit distance 8.2.3 dynamic time warping in speech recognition 8.3 measures based on correlations 8.4 deformable template models chapter 9 context-dependent classification 9.1 introduction 9.2 the bayes classifier 9.3 markov chain models 9.4 the viterbi algorithm 9.5 channel equalization 9.6 hidden markov models 9.7 training markov models via neural networks 9.8 a discussion of markov random fields chaptsr 10 system evaluation 10.1 introduction 10.2 error counting approach 10.3 exploiting the finite size of the data set 10.4 a case study from medical imaging chapter 11 clustering: basic concepts 11.1 introduction 11.1.1 applications of cluster analysis 11.1.2 types of features 11.1.3 definitions of clustering 11.2 proximity measures 11.2.1 definitions 11.2.2 proximity measures between two points 11.2.3 proximity functions between a point and a set 11.2.4 proximity functions between two sets chapter 12 clustering algorithms i: sequential algorithms 12.1 introduction 12.1.1 number of possible clusterings 12.2 categories of clustering algorithms 12.3 sequential clustering algorithms 12.3.1 estimation of the number of clusters 12.4 a modification of bsas 12.5 a two-threshold sequential scheme 12.6 refinement stages 12.7 neural network implementation 12.7.1 description of the architecture 12.7.2 implementation of the bsas algorithm chapter 13 clustering algorithms ii: hierarchical algorithms 13.1 introduction 13.2 agglomerative algorithms 13.2.1 definition of some useful quantities 13.2.2 agglomerative algorithms based on matrix thetry 13.2.3 monotonicity and crossover 13.2.4 implementational issues 13.2.5 agglomerative algorithms based on graph theory 13.2.6 ties in the proximity matrix 13.3 the cophenetic matrix 13.4 divisive algorithms 13.5 choice of the best number of clusters chapter 14 clustering algorithms iii: schemes based on function optimization 14.1 introduction 14.2 mixture decomposition schemes 14.2.1 compact and hyperellipsoidal clusters 14.2.2 a geometrical interpretation 14.3 fuzzy clustering algorithms 14.3.1 point representatives 14.3.2 quadric surfacesas representatives 14.3.3 hyperplane representatives 14.3.4 combining quadric and hyperplane representatives 14.3.5 a geometrical interpretation 14.3.6 convergence aspects of the fuzzy clustering algorithms 14.3.7 alternating cluster estimation 14.4 possibilistic clustering 14.4.1 the mode-seeking property 14.4.2 an alternative possibilistic scheme 14.5 hard clustering algorithms 14.5.1 the isodata or k-means or c-means algorithm 14.6 vector quantization chapter 15 clustering algorithms iv 15.1 introduction 15.2 clustering algorithms based on graph theory 15.2.1 minimum spanning tree algorithms 15.2.2 algorithms based on regions of influence 15.2.3 algorithms based on directed trees 15.3 competitive learning algorithms 15.3.1 basic competitive learning algorithm 15.3.2 leaky learning algorithm 15.3.3 conscientious competitive learning algorithms 15.3.4 competitive learning-like algorithms associated with cost functions 15.3.5 self-organizing maps 15.3.6 supervised learning vector quantization 15.4 branch and bound clustering algorithms 15.5 binary morphology clustering algorithms (bmcas) 15.5.1 discretization 15.5.2 morphological operations 15.5.3 determination of the clusters in a discrete binary set 15.5.4 assignment of feature vectors to clusters 15.5.5 the algorithmic scheme 15.6 boundary detection algorithms 15.7 valley-seeking clustering algorithms 15.8 clustering via cost optimization (revisited) 15.8.1 simulated annealing 15.8.2 deterministic annealing 15.9 clustering using genetic algorithms 15.10 other clustering algorithms chapter 16 cluster validity 16.1 introduction 16.2 hypothesis testing revisited 16.3 hypothesis testing in cluster validity 16.3.1 external criteria 16.3.2 internal criteria 16.4 relative criteria 16.4.1 hard clustering 16.4.2 fuzzy clustering 16.5 validity of individual clusters 16.5.1 external criteria 16.5.2 internal criteria 16.6 clustering tendency 16.6.1 tests for spatial randomness appendix a hints from probability and statistics appendix b linear algebra basics appendix c cost function optimization appendix d basic definitions from linear systems theory index