注冊 | 登錄讀書好,好讀書,讀好書!
讀書網(wǎng)-DuShu.com
當(dāng)前位置: 首頁出版圖書科學(xué)技術(shù)計算機/網(wǎng)絡(luò)軟件與程序設(shè)計程序設(shè)計綜合數(shù)據(jù)挖掘(概念與技術(shù)英文版第2版)

數(shù)據(jù)挖掘(概念與技術(shù)英文版第2版)

數(shù)據(jù)挖掘(概念與技術(shù)英文版第2版)

定 價:¥79.00

作 者: (加)韓家煒
出版社: 機械工業(yè)出版社
叢編項: 經(jīng)典原版書庫
標(biāo) 簽: 數(shù)據(jù)庫存儲與管理

ISBN: 9787111188285 出版時間: 2006-04-01 包裝: 平裝
開本: 16開 頁數(shù): 770 字?jǐn)?shù):  

內(nèi)容簡介

  本書第2版更新和改進了原本已十分豐富和全面的第1版內(nèi)容,并增添了新的重要課題,例如挖掘流數(shù)據(jù)、挖掘社群網(wǎng)絡(luò)和挖掘空間、多媒體和其他復(fù)雜數(shù)據(jù)。本書將是一本適用于數(shù)據(jù)挖掘和知識發(fā)現(xiàn)課程的優(yōu)秀教材。:GregoryPiatetsky-Shapiro,KDnuggets的總裁本書第2版最完整、最全面地講述了數(shù)據(jù)挖掘領(lǐng)域的重要知識和技術(shù)創(chuàng)新。相比內(nèi)容已經(jīng)相當(dāng)全面的第1版,第2版展示了該領(lǐng)域的最新研究成果,例如挖掘流、時序數(shù)據(jù)和序列數(shù)據(jù)以及挖掘空間、多媒體、文本和Web數(shù)據(jù)。本書是數(shù)據(jù)挖掘和知識發(fā)現(xiàn)領(lǐng)域內(nèi)所有教師、研究人員、開發(fā)人員和用戶都必讀的一本書。:Hans-PeterKriegel,德國慕尼黑大學(xué)我們產(chǎn)生和收集數(shù)據(jù)的能力正在快速增長。除了大多數(shù)商業(yè)、科學(xué)和政府事務(wù)的日益計算機化會產(chǎn)生數(shù)據(jù)之外,數(shù)碼相機、發(fā)布工具和條碼的廣泛應(yīng)用也會產(chǎn)生數(shù)據(jù)。在數(shù)據(jù)收集方面,掃描的文本和圖像平臺、衛(wèi)星遙感系統(tǒng)和國際互聯(lián)網(wǎng)已經(jīng)使我們的生活被巨大的數(shù)據(jù)量所包圍。這種爆炸性的數(shù)據(jù)增長促使我們比以往更加迫切地需要新技術(shù)和自動化工具來幫助我們將這些數(shù)據(jù)轉(zhuǎn)換為有用的信息和知識。本書第1版曾被KDnuggets的讀者評選為最受歡迎的數(shù)據(jù)挖掘?qū)V?,是一本可讀性極佳的教材。它從數(shù)據(jù)庫角度全面系統(tǒng)地介紹了數(shù)據(jù)挖掘的基本概念、基本方法和基本技術(shù)以及數(shù)據(jù)挖掘的技術(shù)研究進展,重點關(guān)注其可行性、有用性、有效性和可伸縮性問題。但是,自第1版出版之后,數(shù)據(jù)挖掘領(lǐng)域的研究又取得了很大的進展,開發(fā)出了新的數(shù)據(jù)挖掘方法、系統(tǒng)和應(yīng)用。第2版在這一方面進行了加強,增加了多個章節(jié)講述最新的數(shù)據(jù)挖掘方法,以便能夠挖掘出復(fù)雜類型的數(shù)據(jù),包括流數(shù)據(jù)、序列數(shù)據(jù)、圖結(jié)構(gòu)數(shù)據(jù)、社群網(wǎng)絡(luò)數(shù)據(jù)和多重關(guān)系數(shù)據(jù)。本書適合作為高等院校計算機及相關(guān)專業(yè)高年級本科生的選修課教材,特別適合作為研究生的專業(yè)課教材,同時也可供從事數(shù)據(jù)挖掘研究和應(yīng)用開發(fā)工作的相關(guān)人員作為必備的參考書。本書主要特點●全面實用地論述了從實際業(yè)務(wù)數(shù)據(jù)中抽取出的讀者需要知道的概念和技術(shù)?!窀虏⒔Y(jié)合了來自讀者的反饋、數(shù)據(jù)挖掘領(lǐng)域的技術(shù)變化以及統(tǒng)計和機器學(xué)習(xí)方面的更多資料?!癜嗽S多算法和實現(xiàn)示例,全部以易于理解的偽代碼編寫,適用于實際的大規(guī)模數(shù)據(jù)挖掘項目。

作者簡介

  Jiawei Han伊利諾伊大學(xué)厄巴納一尚佩恩分校計算機科學(xué)系教授。由于在數(shù)據(jù)挖掘和數(shù)據(jù)庫系統(tǒng)領(lǐng)域卓有成效的研究工作,他曾多次獲得各種榮譽和獎勵,其中包括2004年ACM SIGKDD頒發(fā)的創(chuàng)新獎。同時,他還是《ACM Trarlsactiorls on Krlowledge Discovery fronl Data》雜志的主編,以及《IEEE Trarlsactiorls 0n Krlowledge and Data Engirleering》和《Data Mirling and Krlowledge Discovery》雜志的編委會成員。Micheline Kamber擁有加拿大康考迪亞大學(xué)計算機科學(xué)碩士學(xué)位,現(xiàn)在加拿大西蒙·弗雷澤大學(xué)從事博士后研究工作。

圖書目錄

Foreword
Preface
Chapter   Introduction
1.1   What Motivated Data Mining? Why Is It Important?
1.2  So, What Is Data Mining?
1.3   Data Mining-On What Kind of Data?
1.3.1  Relational Databases
1.3.2 Data Warehouses
1.3.3  Transactional Databases
1.3.4  Advanced Data and Information Systems and Advanced Applications
1.4   Data Mining Functionalities-What Kinds of Patterns Can Be Mined?
1.4.1  Concept/Class Description: Characterization and Discrimination
1.4.2  Mining Frequent Patterns, Associations, and Correlations
1.4.3 Classification and Prediction
1.4.4 Cluster Analysis
1.4.5 Outlier Analysis
1.4.6  Evolution Analysis
1.5   Are All of the Patterns Interesting?
1.6   Classification of Data Mining Systems
1.7   Data Mining Task Primitives
1.8   Integration of a Data Mining System with a Database or Data Warehouse System
1.9   Major Issues in Data Mining
1.1O  Summary
Exercises
Bibliographic Notes
Chapter   Data Preprocessing
2.1   Why Preprocess the Data?
2.2   Descriptive Data Summarization
2.2.1  Measuring the Central Tendency
2.2.2 Measuring the Dispersion of Data
2.2.3  Graphic Displays of Basic Descriptive Data Summaries
2.3   Data Cleaning
2.3.1  Missing Values
2.3.2 Noisy Data
2.3.3 Data Cleaning as a Process
2.4   Data Integration and Transformation
2.4.1  Data Integration
2.4.2  Data Transformation
2.5   Data Reduction
2.5.1  Data Cube Aggregation
2.5.2 Attribute Subset Selection
2.5.3  Dimensionality Reduction
2.5.4 Numerosity Reduction
2.6   Data Oiscretization and Concept Hierarchy Generation
2.6.1 Discretization and Concept Hierarchy Generation for Numerical Data
2.6.2  Concept Hierarchy Generation for Categorical Data
2.7   Summary
Exercises
Bibliographic Notes
Chapter 3 Data Warehouse and OLAP Technology: An Overview
3.1   What Is a Data Warehouse?
3.1.1  Differences between Operational Database System and Data Warehouses
3.1.2  But, Why Have a Separate Data Warehouse?
3.2   A Multidimensional Data Model
3.2.1 From Tables and Spreadsheets to Data Cubes
3.2.2  Stars, Snowflakes, and Fact Constellations:Schemas for Multidimensional Databases
3.2.3  Examples for DefTnzng Star, Snowflake,and Fact Constellation Schemas
3.2.4  Measures: Their Categorization and Computation
3.2.5 Concept Hierarchies
3.2.60LAP Operations in the Multidimensional Data Model
3.2.7 A Stamet Query Model for Querying Multidimensional Databases
3.3   Data Warehouse Architecture
3.3.1  Steps for the Design and Construction of Data Warehouses
3.3.2 A Three-Tier Data Warehouse Architecture
3.3.3  Data Warehouse Back-End Tools and Utilities
3.3.4 Metadata Repository
3.3.5  Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP
3.4   Data Warehouse Implementation
3.4.1  Efficient Computation of Data Cubes
3.4.2 Indexing OLAP Data
3.4.3 Efficient Processing of OLAP Queries
3.5   From Data Warehousing to Data Mining
3.5.1  Data Warehouse Usage
3.5.2  From On-Line Analytical Processing to On-Line Analytical Mining
3.6   Summary
Exercises
Bibliographic Notes
Chapter 4 Data Cube Computation and Data Generalization
4. 1   Efficlent Methods for Data Cube Computation
4.1.1  A Road Map for the Materialization of Different Kinds of Cubes
4.1.2  Multiway Array Aggregation for Full Cube Computation
4.1.3  BUC: Computing Iceberg Cubes from the Apex Cuboid Downward
4.1.4  Star-cubing: Computing Iceberg Cubes Using a Dynamic Star-tree Structure
4.1.5  Precomputing Shell Fragments for Fast High-Dimensional OLAP
4.1.6 Computing Cubes with Complex Iceberg Conditions
4.2   Further Development of Data Cube and OLAP
4.3   Attribute-Oriented Induction-An Alternative Method for Data Generalization and Concept
Description
4.3.1  Attribute-Oriented Induction for Data Characterization
4.3.2  Efficient implementation of Attribute Oriented Induction
4.3.3  Presentation of the Derived Generalization
4.3.4  Mining Class Comparisons: Discriminating between Different Classes
4.3.5  Class Description: Presentation of Both Characterization and Comparison
4.4   Summary
Exercises
Bibliographic Notes
Chapter 5 Mining Frequent Patterns, Associations, and Correlations
5. 1   Basic Concepts and a Road Map
5.1.1 Market Basket Analysis: A Motivating Example
5.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules
5.1.3  Frequent Pattern Mining; A Road Map
5.2   Efficient and Scalable Frequent Itemset Mining Methods
5.2.1  The Apriori Algorithm: Finding Frequent ltemsets Using Candidate Generation
5.2.2  Generating Association Rules from Frequent Itemsets
5.2.3  Improving the Efficiency of Apriori
5.2.4  Mining Frequent Itemsets without Candidate Generation
5.2.5  Mining Frequent Itemsets Using Vertical Data Format
5.2.6  Mining Closed Frequent Itemsets
5.3   Mining Various Kinds of Association Rules
5.3.1  Mining Multilevel Association Rules
5.3.2 Mining Multidimensional Association Rules from Relational Databases and Data
Warehouses
5.4   From Association Mining to Correlation Analysis
5.4.1  Strong Rules Are Not Necessarily Interesting, An Example
5.4.2 From Association Analysis to Correlation Analysis
5.5   Constraint-Based Association Mining
5.5.1  Metarule-Guided Mining of Association Rules
5.5.2 Constraint Pushing: Mining Guided by Rule Constraints
5.6  Summary
Exercises
Bibliographic Notes
Chapter 6 Classification and Prediction
6. 1  What Is Classification? What Is Prediction?
6.2   Issues Regarding Classification and Prediction
6.2.1  Preparing the Data for Classification and Prediction
6.2.2 Comparing Classification and Prediction Methods
6.3   Classification by Decision Tree Induction
6.3.1  Decision Tree Induction
6.3.2 Attribute Selection Measures
6.3.3 Tree Pruning
6.3.4 Scalability and Decision Tree Induction
6.4  Bayesian Classification
6.4.1  Bayes' Theorem
6.4.2  Naive Bayesian Classification
6.4.3  Bayesian Belief Networks
6.4.4 Training Bayesian Belief Networks
6.5   Rule-Based Classification
6.5.1  Using IF-THEN Rules for Classification
6.5.2  Rule Extraction from a Decision Tree
6.5.3  Rule Induction Using a Sequential Covering Algorithm
6.6   Classification by Backpropagation
6.6.1  A Multilayer Feed-Forward Neural Network
6.6.2 Defining a Network Topology
6.6.3  Backpropagation
6.6.4 Inside the Black Box: Backpropagation and Interpretability
6.7   Support Vector Machines
6.7.1  The Case When the Data Are Linearly Separable
6.7.2 The Case When the Data Are Linearly Inseparable
6.8  Associative Classification: Classification by Association Rule Analysis
6.9   Lazy Learners (or Learning from Your Neighbors)
6.9.1  k-Nearest-Neighbor Classifiers
6.9.2 Case-Based Reasoning
6.10  Other Classification Methods
6.10.1 Genetic Algorithms
6.10.2 Rough Set Approach
6.10.3 Fuzz'/Set Approaches
6.11  Prediction
6.11.1 Linear Regression
6.11.2 Nonlinear Regression
6.11.3 Other Regression-Based Methods
7.6.2  OPTICS: Ordering Points to Identify the Clustering Structure
7.6.3  DENCLUE: Clustering Based on Density Distribution Functions
7.7  Grid-Based Methods
7.7.1  STING: STatistical INformation Grid
7.7.2 WaveCluster: Clustering Using Wavelet Transformation
7.8  Model-Based Clustering Methods
7.8.1  Expectation-Maximization
7.82 Conceptual Clustering
7.8.3 Neural Network Approach
7.9   Clustering High-Dimensional Data
7.9.1 CLIQUE: A Dimension-Growth Subspace Clustering Method
7.9.2 PROCLUS: A Dimension-Reduction Subspace Clustering Method
7.9.3  Frequent Pattern-Based Clustering Methods
7.10  Constraint-Based Cluster Analysis
7.10.1 Clustering with Obstacle Objects
7.10.2 User-Constrained Cluster Analysis
7.10.3 Semi-Supervised Cluster Analysis
7.11  Outlier Analysis
7.11.1 Statistical Distribution-Based Outlier Detection
7.11.2 Distance-Based Outlier Detection
7.11.3 Density-Based Local Outlier Detection
7.11.4 Deviation-Based Outlier Detection
7.12  Summary
Exercises
Bibliographic Notes
Chapter 8 Mining Stream, Time-Series, and Sequence Data
8.1   Mining Data Streams
8.1.1  Methodologies for Stream Data Processing and Stream Data Systems
8.1.2 Stream OLAP and Stream Data Cubes
8.1.3  Frequent-Pattern Mining in Data Streams
8.1.4 Classification of Dynamic Data Streams
8.1.5  Clustering Evolving Data Streams
8.2   Mining Time-Series Data
8.2.1  Trend Analysis
8.2.2 Similarity Search in Time-Series Analysis
8.3   Mining Sequence Patterns in Transactional Databases
8.3.1  Sequential Pattern Mining: Concepts and Primitives
8.3.2  Scalable Methods for Mining Sequential Patterns
8.3.3  Constraint-Based Mining of Sequential Patterns
8.3.4  Periodicity Analysis for Time-Related Sequence Data
8.4   Mining Sequence Patterns in Biological Data
8.4.1  Alignment of Biological Sequences
8.4.2 Hidden Markov Model for Biological Sequence Analysis
8.5   Summary
Exercises
Bibliographic Notes
Chapter 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining
9.1   Graph Mining
9.1.1  Methods for Mining Frequent Subgraphs
9.1.2 Mining Variant and Constrained Substructure Patterns
9.1.3  Applications: Graph Indexing, Similarity Search, Classification,and Clustering
9.2   Social Network Analysis
9.2.1  What Is a Social Network?
9.2.2  Characteristics of Social Networks
9.2.3  Link Mining: Tasks and Challenges
9.2.4  Mining on Social Networks
9.3   Multirelational Data Mining
9.3.1  What Is Multirelational Data Mining?
9.3.2  ILP Approach to Multirelational Classification
9.3.3 Tuple ID Propagation
9.3.4 Multirelational Classification Using Tuple ID Propagation
9.3.5  Muitirelational Clustering with User Guidance
9.4   Summary
Exercises
Bibliographic Notes
Chapter  10 Mining Object, Spatial, Multimedia, Text, and Web Data
10.1 Multidimensional Analysis and Descriptive Mining of Comple Data Objects
10.1.1 Generalization of Structured Data
10.1.2 Aggregation and Approximation in Spatial and Multimedia Data Generalization
10.1.3 Generalization of Object Identifiers and Class/Subclass Hierarchies
10.1.4 Generalization of Class Composition Hierarchies
1O.1.5 Construction and Mining of Object Cubes
10.1.6 Generalization-Based Mining of Plan Databases by Divide-and-Conquer
102  Spatial Data Mining
10.2.1 Spatial Data Cube Construction and Spatial OLAP
10.2.2 Mining Spatial Association and Co-location Patterns
10.2.3 Spatial Clustering Methods
10.2.4 Spatial Classification and Spatial Trend Analysis
10.2.5 Mining Raster Databases
10.3  Multimedia Data Mining
10.3.1 Similarity Search in Multimedia Data
10.3.2 Multidimensional Analysis of Multimedia Data
10.3.3 Classification and Prediction Analysis of Multimedia Data
10.3.4 Mining Associations in Multimedia Data
10.3.5 Audio and Video Data Mining
10.4  Text Mining
10.4.1 Text Data Analysis and Information Retrieval
10.4.2 Dimensionality Reduction for Text
10.4.3 Text Mining Approaches
10.5  Mining the World Wide Web
10.5. I Mining the Web Page Layout Structure
10.5.2 Mining the Web's Link Structures to Identify Authoritative Web Pages
10.5.3 Mining Multimedia Data on the Web
10.5.4 Automatic Classification of Web Documents
10.5.5 Web Usage Mining
10.6  Summary
Exercises
Bibliographic Notes
Chapter 11 Applications and Trends in Data Mining
11.1  Data Mining Applications
11.1.1 Data Mining for Financial Data Analysis
11.1.2 Data Mining for the Retail Industry
11.1.3 Data Mining for the Telecommunication Industry
11.1.4 Data Mining for Biological Data Analysis
11.1.5 Data Mining in Other Scientific Applications
11.1.6 Data Minin for Intrusion Detection
11.2  Data Mining System Products and Research Prototypes
11.2.1 How to Choose a Data Mining System
11.2.2 Examples of Commercial Data Mining Systems
1.3  Additional Themes on Data Mining
11.3.1 Theoretical Foundations of Data Mining
11.3.2 Stat/stical Data Mining
11.3.3 Visual and Audio Data Mining
11.3.4 Data Mining and Collaborative Filtering
1.4  Social Impacts of Data Mining
11.4.1 Ubiquitous and Invisible Data Mining
11.4.2 Data Mining, Privacyand Data Security
1.5  Trends in Data Mining
11.6  Summary
Exercises
Bibliographic Notes
Appendix  An Introduction to Microsoft's OLE DB for Data Mining
A.I Model Creation
A.2 Model Training
A.3 Model Prediction and Browsing
Bibliography
Index

本目錄推薦

掃描二維碼
Copyright ? 讀書網(wǎng) m.ranfinancial.com 2005-2020, All Rights Reserved.
鄂ICP備15019699號 鄂公網(wǎng)安備 42010302001612號