1 Introduction 1.1 Types of Uncertainty 1.2 Uncertainty Modeling and Data Mining 1.3 Related Works References 2 Induction and Learning 2.1 Introduction 2.2 Machine Learning 2.2.1 Searching in Hypothesis Space 2.2.2 Supervised Learning 2.2.3 Unsupervised Leaming 2.2.4 Instance-Based Learning 2.3 Data Mining and Algorithms 2.3.1 Why Do We Need Data Mining? 2.3.2 How Do We do Data Mining? 2.3.3 Artificial Neural Networks 2.3.4 Support Vector Machines 2.4 Measurement of Classifiers 2.4.1 ROC Analysis for Classification 2.4.2 Area Under the ROC Curve 2.5 Summary References 3 Label Semantics Theory 3.1 Uncertainty Modeling with Labels 3.1.1 Fuzzy Logic 3.1.2 Computing with Words 3.1.3 Mass Assignment Theory 3.2 Label Semantics 3.2.1 Epistemic View of Label Semantics 3.2.2 Random Set Framework 3.2.3 Appropriateness Degrees 3.2.4 Assumptions for Data Analysis 3.2.5 Linguistic Translation 3.3 Fuzzy Discretization 3.3.1 Percentile-Based Discretization 3.3.2 Entropy-Based Discretization 3.4 Reasoning with Fuzzy Labels 3.4.1 Conditional Distribution Given Mass Assignments 3.4.2 Logical Expressions of Fuzzy Labels 3.4.3 Linguistic Interpretation of Appropriate Labels 3.4.4 Evidence Theory and Mass Assignment 3.5 Label Relations 3.6 Summary References 4 Linguistic Decision Trees for Classification 4.1 Introduction 4.2 Tree Induction 4.2.1 Entropy 4.2.2 Soft Decision Trees 4.3 Linguistic Decision for Classification 4.3.1 Branch Probability 4.3.2 Classification by LDT 4.3.3 Linguistic ID3 Algorithm 4.4 Experimental Studies 4.4.1 Influence of the Threshold 4.4.2 Overlapping Between Fuzzy Labels 4.5 Comparison Studies 4.6 Merging of Branches 4.6.1 Forward Merging Algorithm 4.6.2 Dual-Branch LDTs 4.6.3 Experimental Studies for Forward Merging 4.6.4 ROC Analysis for Forward Merging 4.7 Linguistic Reasoning 4.7.1 Linguistic Interpretation of an LDT 4.7.2 Linguistic Constraints 4.7.3 Classification of Fuzzy Data 4.8 Summary References 5 Linguistic Decision Trees for Prediction 5.1 Prediction Trees 5.2 Linguistic Prediction Trees 5.2.1 Branch Evaluation 5.2.2 Defuzzification 5.2.3 Linguistic ID3 Algorithm for Prediction 5.2.4 Forward Branch Merging for Prediction 5.3 Experimental Studies 5.3.1 3D Surface Regression 5.3.2 Abalone and Boston Housing Problem 5.3.3 Prediction of Sunspots 5.3.4 Flood Forecasting 5.4 Query Evaluation 5.4.1 Single Queries 5.4.2 Compound Queries 5.5 ROC Analysis for Prediction 5.5.1 Predictors and Probabilistic Classifiers 5.5.2 AUC Value for Prediction 5.6 Summary References 6 Bayesian Methods Based on Label Semantics 6.1 Introduction 6.2 Naive Bayes 6.2.1 Bayes Theorem 6.2.2 Fuzzy Naive Bayes 6.3 Fuzzy Semi-Naive Bayes 6.4 Online Fuzzy Bayesian Prediction 6.4.1 Bayesian Methods 6.4.2 Online Learning 6.5 Bayesian Estimation Trees 6.5.1 Bayesian Estimation Given an LDT 6.5.2 Bayesian Estimation from a Set of Trees 6.6 Experimental Studies 6.7 Summary References 7 Unsupervised Learning with Label Semantics 7.1 Introduction 7.2 Non-Parametric Density Estimation 7.3 Clustering 7.3.1 Logical Distance 7.3.2 Clustering of Mixed Objects 7.4 Experimental Studies 7.4.1 Logical Distance Example 7.4.2 Images and Labels Clustering 7.5 Summary References 8 Linguistic FOIL and Multiple Attribute Hierarchy for Decision Making 8.1 Introduction 8.2 Rule Induction 8.3 Multi-Dimensional Label Semantics 8.4 Linguistic FOIL 8.4.1 Information Heuristics for LFOIL 8.4.2 Linguistic Rule Generation 8.4.3 Class Probabilities Given a Rule Base 8.5 Experimental Studies 8.6 Multiple Attribute Decision Making 8.6.1 Linguistic Attribute Hierarchies 8.6.2 Information Propagation Using LDT 8.7 Summary References 9 A Prototype Theory Interpretation of Label Semantics 9.1 Introduction 9.2 Prototype Semantics for Vague Concepts 9.2.1 Uncertainty Measures about the Similarity Neighborhoods Determined by Vague Concepts 9.2.2 Relating Prototype Theory and Label Semantics 9.2.3 Gaussian-Type Density Function 9.3 Vague Information Coarsening in Theory of Prototypes 9.4 Linguistic Inference Systems 9.5 Summary References 10 Prototype Theory for Learning 10.1 Introduction 10.1.1 General Rule Induction Process 10.1.2 A Clustering Based Rule Coarsening 10.2 Linguistic Modeling of Time Series Predictions 10.2.1 Mackey-Glass Time Series Prediction 10.2.2 Prediction of Sunspots 10.3 Summary References 11 Prototype-Based Rule Systems 11.1 Introduction 11.2 Prototype-Based IF-THEN Rules 11.3 Rule Induction Based on Data Clustering and Least-Square Regression 11.4 Rule Learning Using a Conjugate Gradient Algorithm 11.5 Applications in Prediction Problems 11.5.1 Surface Predication 11.5.2 Mackey-Glass Time Series Prediction 11.5.3 Prediction of Sunspots 11.6 Summary References 12 Information Cells and Information Cell Mixture Models 12.1 Introduction 12.2 Information Cell for Cognitive Representation of Vague Concept Semantics 12.3 Information Cell Mixture Model (ICMM) for Semantic Representation of Complex Concept 12.4 Learning Information Cell Mixture Model from Data Set 12.4.1 Objective Function Based on Positive Density Function.. 12.4.2 Updating Probability Distribution of Information Cells... 12.4.3 Updating Density Functions of Information Cells 12.4.4 Information Cell Updating Algorithm 12.4.5 Learning Component Number of ICMM 12.5 Experimental Study 12.6 Summary References