注冊 | 登錄讀書好,好讀書,讀好書!
讀書網(wǎng)-DuShu.com
當前位置: 首頁出版圖書科學技術計算機/網(wǎng)絡軟件與程序設計深度強化學習實踐(影印版 英文版)

深度強化學習實踐(影印版 英文版)

深度強化學習實踐(影印版 英文版)

定 價:¥109.00

作 者: Maxim Lapan 著
出版社: 東南大學出版社
叢編項:
標 簽: 暫缺

購買這本書可以去


ISBN: 9787564183219 出版時間: 2019-05-01 包裝: 平裝
開本: 16開 頁數(shù): 523 字數(shù):  

內(nèi)容簡介

  強化學習(RL)的新發(fā)展結(jié)合深度學習(DL),在訓練代理以類似人的方式解決復雜問題方面取得了未有的進步。Google使用算法在著名的Atari街機游戲中獲勝將該領域推至高峰,研究人員也在源源不斷地產(chǎn)生新的想法?!渡疃葟娀瘜W習實踐(影印版 英文版)》介紹了RL的基礎知識,為你提供了編寫智能學習代理所需的原理,以承擔一系列艱巨的實際任務。讓你了解如何在“網(wǎng)格世界”環(huán)境中實現(xiàn)Q-learning,教你的代理購買和交易股票,發(fā)現(xiàn)自然語言模型如何推動了聊天機器人的火爆。

作者簡介

  Maxim Lapan,is a deep learning enthusiast and independent researcher. His background and 15 years' work expertise as a software developer and a systems architect lays from low-level Linux kernel driver development to performance optimization and design of distributed applications working on thousands of servers. With vast work experiences in big data,Machine Learning, and large parallel distributed HPC and nonHPC systems, he has a talent to explain a gist of complicated things in simple words and vivid examples.His current areas of interest lie in practical applications of Deep Learning, such as Deep Natural Language Processing and Deep Reinforcement Learning.Maxim lives in Moscow, Russian Federation, with his family, and he works for an Israeli start-up as a Senior NLP developer.

圖書目錄

Preface
Chapter 1: What is Reinforcement Learning?
Learning - supervised, unsupervised, and reinforcement
RL formalisms and relations
Reward
The agent
The environment
Actions
Observations
Markov decision processes
Markov process
Markov reward process
Markov decision process
Summary
Chapter 2: OpenAI Gym
The anatomy of the agent
Hardware and software requirements
OpenAI Gym API
Action space
Observation space
The environment
Creation of the environment
The CartPole session
The random CartPole agent
The extra Gym functionality - wrappers and monitors
Wrappers
Monitor
Summary
Chapter 3: Deep Learning with PyTorch
Tensors
Creation of tensors
Scalar tensors
Tensor operations
GPU tensors
Gradients
Tensors and gradients
NN building blocks
Custom layers
Final glue - loss functions and optimizers
Loss functions
Optimizers
Monitoring with TensorBoard
TensorBoard 101
Plotting stuff
Example -GAN on Atari images
Summary
Chapter 4: The Cross-Entropy Method
Taxonomy of RL methods
Practical cross-entropy
Cross-entropy on CartPole
Cross-entropy on FrozenLake
Theoretical background of the cross-entropy method
Summary
Chapter 5: Tabular Learning and the Bellman Equation
Value, state, and optimality
The Bellman equation of optimality
Value of action
The value iteration method
Value iteration in practice
Q-learning for FrozenLake
Summary
Chapter 6: Deep Q-Networks
Chapter 7: DQN Extensions
Chapter 8: Stocks Trading Using RL
Chapter 9: Policy Gradients - An Alternative
Chapter 10: The Actor-Critic Method
Chapter 11: Asynchronous Advantaqe Actor-Critic
Chapter 12: Chatbots Training with RL
Chapter 13: Web Navigation
Chapter 14: Continuous Action Space
Chapter 15: Trust Regions - TRPO, PPO, and ACKTR
Chapter 16: Black-Box Optimization in RL
Chapter 17: Beyond Model-Free - Imagination
Chapter 18: AlphaGo Zero
Other Books You May Enjoy
Index

本目錄推薦

掃描二維碼
Copyright ? 讀書網(wǎng) m.ranfinancial.com 2005-2020, All Rights Reserved.
鄂ICP備15019699號 鄂公網(wǎng)安備 42010302001612號