Part I Descriptive Statistics Unit 1 Statistics 3 1.1 What is Statistics? 4 1.1.1 Meanings of Statistics 4 1.1.2 Definition of Statistics 5 1.1.3 Types of Statistics 6 1.1.4 Applications of Statistics 6 1.2 The language of Statistics 9 1.2.1 Population and Sample 9 1.2.2 Kinds of Variables 11 1.3 Measurability and Variability 14 1.4 Data Collection 16 1.4.1 The Data Collection Process 17 1.4.2 Sampling Frame and Elements 18 1.5* Single-Stage Methods 21 1.5.1 Simple Random Sample 21 1.5.2 Systematic Sample 22 1.6* Multistage Methods 25 1.7* Types of Statistical Study 27 1.8 The Process of a Statistical Study 31 Glossary 34 Reading English Materials 35 Passage 1. What is Statistics? 35 Passage 2. From Data to Foresight 35 Problems 36 Unit 2 Descriptive Analysis of Single-Variable Data 40 2.1 Graphs, Pareto Diagrams, and Stem-and-Leaf Displays 41 2.1.1 Qualitative Data 41 2.1.2 Quantitative Data 43 2.2 Frequency Distributions and Histograms 47 2.2.1 Frequency Distribution 47 2.2.2 Histograms 51 2.2.3 Cumulative Frequency Distribution and Ogives 53 2.3 Measures of Central Tendency 55 2.3.1 Finding the Mean 55 2.3.2 Finding the Median 56 2.3.3 Finding the Mode 57 2.3.4 Finding the Midrange 58 2.4 Measures of Dispersion 60 2.4.1 Sample Standard Deviation 62 2.5 Measures of Position 64 2.5.1 Quartiles 64 2.5.2 Percentiles 64 2.5.3 Other Measures of Position 66 2.6 Interpreting and Understanding Standard Deviation 70 2.6.1 The Empirical Rule and Testing for Normality 70 2.6.2 Chebyshev’s Theorem 72 Glossary 74 Problems 75 Unit 3 Descriptive Analysis of Bivariate Data 79 3.1 Bivariate Data 80 3.1.1 Two Qualitative Variables 80 3.1.2 One Qualitative and One Quantitative Variable 82 3.1.3 Two Quantitative Variables 83 3.2 Linear Correlation 85 3.2.1 Calculating the Linear Correlation Coefficient, r 86 *3.2.2 Causation and Lurking Variables 89 3.3 Linear Regression 91 3.3.1 Line of Best Fit 92 3.3.2 Making Predictions 97 Reading English Materials 99 Passage 1. The First Regression 99 Passage 2. Simpson’s Paradox 99 Problems 100 Unit 4 Introduction to Probability 104 4.1 Sample Spaces, Events and Sets 105 4.1.1 Introduction 105 4.1.2 Sample Spaces 105 4.1.3 Events 106 4.1.4 Set Theory 108 4.2 Probability Axioms and Simple Counting Problems 109 4.2.1 Probability Axioms and Simple Properties 109 4.2.2 Interpretations of Probability 111 4.2.3 Classical Probability 112 4.2.4 The Multiplication Principle 113 4.3 Permutations and Combinations 115 4.3.1 Introduction 115 4.3.2 Permutations 116 4.3.3 Combinations 118 4.3.4 The Difference Between Permutations and Combinations 120 4.4 Conditional Probability and the Multiplication Rule 122 4.4.1 Conditional Probability 122 4.4.2 The Multiplication Rule 123 4.5 Independent Events, Partitions and Bayes Theorem 124 4.5.1 Independence 124 4.5.2 Partitions 125 4.5.3 Law of Total Probability 126 4.5.4 Bayes Theorem 126 4.5.5 Bayes Theorem for Partitions 127 Reading English Materials 130 Passage 1. Probability and Odds 130 Passage 2. The Relationship between Odds and Probability 130 Passage 3. How the Odds Change across the Range of the Probability 131 Problems 132 Unit 5 Discrete Probability Models 134 5.1 Introduction, Mass Functions and Distribution Functions 135 5.1.1 Introduction 135 5.1.2 Probability Mass Functions (PMFs) 136 5.1.3 Cumulative Distribution Functions (CDFs) 137 5.2 Expectation and Variance for Discrete Random Quantities 138 5.2.1 Expectation 138 5.2.2 Variance 139 5.3 Properties of Expectation and Variance 140 5.3.1 Expectation of a Function of a Random Quantity 140 5.3.2 Expectation of a Linear Transformation 140 5.3.3 Expectation of the Sum of Two Random Quantities 141 5.3.4 Expectation of an Independent Product 141 5.3.5 Variance of an Independent Sum 142 5.4 The Binomial Distribution 142 5.4.1 Introduction 142 5.4.2 Bernoulli Random Quantities 143 5.4.3 The Binomial Distribution 143 5.4.4 Expectation and Variance of a Binomial Random Quantity 145 5.5 The Geometric Distribution 146 5.5.1 PMF 146 5.5.2 CDF 147 5.5.3 Useful Series in Probability 148 5.5.4 Expectation and Variance of Geometric Random Quantities 148 5.6 The Poisson Distribution 149 5.6.1 Poisson as the Limit of a Binomial 149 5.6.2 PMF 150 5.6.3 Expectation and Variance of Poisson 151 5.6.4 Sum of Poisson Random Quantities 152 5.6.5 The Poisson Process 152 Reading English Materials 154 Passage 1. The Founder of Modern Statistics―Karl Pearson 154 Passage 2. The Relations of Several Discrete Probability Models 154 Problems 155 Unit 6 Discrete Probability Models 158 6.1 Introduction, PDF and CDF 159 6.1.1 Introduction 159 6.1.2 The Probability Density Function 159 6.1.3 The Distribution Function 160 6.1.4 Median and Quartiles 161 6.2 Properties of Continuous Random Quantities 161 6.2.1 Expectation and variance of continuous random quantities 161 6.2.2 PDF and CDF of a Linear Transformation 162 6.3 The Uniform Distribution 163 6.4 The Exponential Distribution 165 6.4.1 Definition and Properties 165 6.4.2 Relationship with the Poisson Process 166 6.4.3 The Memoryless Property 167 6.5 The Normal Distribution 168 6.5.1 Definition 168 6.5.2 Properties 168 6.6 The Standard Normal Distribution 169 6.6.1 Properties of the Standard Normal Distribution 170 6.6.2 Finding Area to The Right of z = 0 171 6.6.3 Finding Area in The Right Tail of a Normal Curve 171 6.6.4 Finding Area to the Left of a Positive z Value 172 6.6.5 Finding Area from a Negative z to z = 0 172 6.6.6 Finding Area in the Left Tail of a Normal Curve 172 6.6.7 Finding Area from A Negative z to a Positive z 172 6.6.8 Finding Area Between two z Values of the Same Sign 173 6.6.9 Finding z-Scores Associated with a Percentile 173 6.6.10 Finding z-scores that Bound an Area 174 6.7 Applications of Normal Distributions 175 6.7.1 Probabilities and Normal Curves 175 6.7.2 Using the Normal Curve and z 176 6.8 Specific z-score 178 6.8.1 Visual Interpretation of z(a) 179 6.8.2 Determining Corresponding z Values for z (a) 179 6.8.3 Determining z-scores for Bounded Areas 180 6.9 Normal Approximation of Binomial and Poisson 181 6.9.1 Normal Approximation of the Binomial 181 6.9.2 Normal Approximation of the Poisson 182 Problems 182 Unit 7 Sampling Distributions and CLT 187 7.1 Sampling Distributions 188 7.1.1 Forming a Sampling Distribution of Means 188 7.1.2 Creating a Sampling Distribution of Sample Means 189 7.2 The Sampling Distribution of Sample Means 192 7.2.1 Central Limit Theorem 193 7.2.2 Constructing a Sampling Distribution of Sample Means 194 7.3 Application of the Sampling Distribution of Sample Means 199 7.3.1 Converting Information into z-scores 199 7.3.2 Distribution of and Increasing Individual Sample Size 200 7.4 Advanced Central Limit Theorem 202 7.4.1 Central Limit Theorem (Sample Mean) 203 7.4.2 Central Limit Theorem (Sample Sum) 203 Problems 207 Part II Inferential Statistics Unit 8 Introduction to Statistical Inferences 210 8.1 Point Estimation and Interval Estimation 211 8.1.1 Point Estimate 211 8.1.2 Interval Estimate 212 8.2 Estimation of Mean m (s Known) 214 8.2.1 The Principle of Constructing a Confidence Interval 214 8.2.2 Applications 216 8.2.3 Sample Size and Confidence Interval 217 8.3 Introduction to Hypothesis Testing 220 8.3.1 Null Hypothesis and Alternative Hypothesis 220 8.3.2 Four Possible Outcomes in a Hypothesis Test 222 8.4 Formulating the Statistical Null and Alternative Hypotheses 226 8.4.1 Writing Null and Alternative Hypothesis in One-Tailed Situation 226 8.4.2 Writing Null and Alternative Hypothesis in Two-Tailed Situation 227 8.5 Hypothesis Test of Mean m (s Known): A Probability-Value Approach 228 8.5.1 One-Tailed Hypothesis Test Using the p-Value Approach 229 8.5.2 Two-Tailed Hypothesis Test Using the p-Value Approach 233 8.5.3 Evaluating the p-Value Approach 234 8.6 Hypothesis Test of Mean m (s Known): A Classical Approach 235 8.6.1 One-Tailed Hypothesis Test Using the Classical Approach 236 8.6.2 Two-Tailed Hypothesis Test Using the Classical Approach 239 Problems 241 Unit 9 Inferences Involving One Population 246 9.1 Inferences about the Mean m (s Unknown) 247 9.1.1 Using the t-Distribution Table 249 9.1.2 Confidence Interval Procedure 251 9.1.3 Hypothesis-Testing Procedure 252 9.2 Inferences about the Binomial Probability of Success 258 9.2.1 Confidence Interval Procedure 259 9.2.2 Determining Sample Size 261 9.2.3 Hypothesis-Testing Procedure 263 9.3 Inferences about the Variance and Standard Deviation 268 9.3.1 Critical Values of Chi-Square 269 9.3.2 Hypothesis-Testing Procedure 270 Problems 279 Unit 10 Inferences Involving Two Populations 284 10.1 Dependent and Independent Samples 285 10.2 Inferences Concerning the Mean Difference Using Two Dependent Samples 287 10.2.1 Procedures and Assumptions for Inferences Involving Paired Data 287 10.2.2 Confidence Interval Procedure 288 10.2.3 Hypothesis-Testing Procedure 290 10.3 Inferences Concerning the Difference between Means Using Two Independent Samples 294 10.3.1 Confidence Interval Procedure 295 10.3.2 Hypothesis-Testing Procedure 297 10.4 Inferences Concerning the Difference between Proportions 301 10.4.1 Confidence Interval Procedure 303 10.4.2 Hypothesis-Testing Procedure 304 10.5 Inferences Concerning the Ratio of Variances Using Two Independent Samples 308 10.5.1 Writing for the Equality of Variances 308 10.5.2 Using the F-Distribution 309 10.5.3 One-Tailed Hypothesis Test for the Equality of Variances 310 10.5.4 Critical F-Values for One- and Two-Tailed Tests 313 Problems 315 Unit 11 An Introduction to Simple Regression 321 11.1 Regression as a Best Fitting Line 322 11.1.1 Regression as a Best Fitting Line 322 11.1.2 Errors and Residuals 324 11.2 Interpreting OLS Estimates 326 11.3 Fitted Values and R2: Measuring the Fit of a Regression Model 328 11.4 Nonlinearity in Regression 331 Reading English Materials 335 Problems 336 Part III Statistical Methods and Data Science Unit 12 Statistics and Data Science 339 12.1 Statistics and Data Science (I) 340 12.1.1 What is Data Science 340 12.1.2 Statistics and Data Science 340 12.2 Statistics and Data Science (II) 343 12.2.1 Statistics as Part of Data Science 343 12.2.2 The Modern Statistical Analysis Process 344 12.2.3 Statistician and Data Scientist 345 12.3 Statistical Thinking 348 12.3.1 What is Statistical Thinking 348 12.3.2 The Two Cultures of Statistical Modeling 348 12.3.3 A New Research Community 350 12.4 Distinguishing Analytics, Business Intelligence, Data Science 352 12.4.1 Analytics 352 12.4.2 Business Intelligence 355 12.4.3 Data Science 356 Reading English Materials 359 Problems 361 Commonly Used Statistical Terms 362 Appendix A Commonly Used Statistical Tables 367 Appendix B Summary of Univariate Descriptive Statistics and Graphs for the Four Level of Measurement 379 Appendix C Order of Magnitude of Data 380 References 381