Maxim Lapan,is a deep learning enthusiast and independent researcher. His background and 15 years' work expertise as a software developer and a systems architect lays from low-level Linux kernel driver development to performance optimization and design of distributed applications working on thousands of servers. With vast work experiences in big data,Machine Learning, and large parallel distributed HPC and nonHPC systems, he has a talent to explain a gist of complicated things in simple words and vivid examples.His current areas of interest lie in practical applications of Deep Learning, such as Deep Natural Language Processing and Deep Reinforcement Learning.Maxim lives in Moscow, Russian Federation, with his family, and he works for an Israeli start-up as a Senior NLP developer.
圖書目錄
Preface Chapter 1: What is Reinforcement Learning? Learning - supervised, unsupervised, and reinforcement RL formalisms and relations Reward The agent The environment Actions Observations Markov decision processes Markov process Markov reward process Markov decision process Summary Chapter 2: OpenAI Gym The anatomy of the agent Hardware and software requirements OpenAI Gym API Action space Observation space The environment Creation of the environment The CartPole session The random CartPole agent The extra Gym functionality - wrappers and monitors Wrappers Monitor Summary Chapter 3: Deep Learning with PyTorch Tensors Creation of tensors Scalar tensors Tensor operations GPU tensors Gradients Tensors and gradients NN building blocks Custom layers Final glue - loss functions and optimizers Loss functions Optimizers Monitoring with TensorBoard TensorBoard 101 Plotting stuff Example -GAN on Atari images Summary Chapter 4: The Cross-Entropy Method Taxonomy of RL methods Practical cross-entropy Cross-entropy on CartPole Cross-entropy on FrozenLake Theoretical background of the cross-entropy method Summary Chapter 5: Tabular Learning and the Bellman Equation Value, state, and optimality The Bellman equation of optimality Value of action The value iteration method Value iteration in practice Q-learning for FrozenLake Summary Chapter 6: Deep Q-Networks Chapter 7: DQN Extensions Chapter 8: Stocks Trading Using RL Chapter 9: Policy Gradients - An Alternative Chapter 10: The Actor-Critic Method Chapter 11: Asynchronous Advantaqe Actor-Critic Chapter 12: Chatbots Training with RL Chapter 13: Web Navigation Chapter 14: Continuous Action Space Chapter 15: Trust Regions - TRPO, PPO, and ACKTR Chapter 16: Black-Box Optimization in RL Chapter 17: Beyond Model-Free - Imagination Chapter 18: AlphaGo Zero Other Books You May Enjoy Index