OpenAI universe

AlphaGO provides us with possibility and imagination that AI one day can do almost things in the future, for example translation, writing, driving and even write high efficiency code, coders will lost their jobs if companies hire AI robots to write code for them.

Let’s back to reality, reinforcement learning algorithm-Monte Carlo Search Tree plays an important role in AlphaGo, reinforcement learning is powerful in interactive environment, especially in scenes for example driving and video games. For those who want to apply RL algorithms to video games environment, the crucial task he should to complete is to create a game simulator, it is a hard work and a waste of time for researchers who aren’t familiar with game development.  Continue Reading


A glimpse of Markov property

Markov property is a core property in Markov Process, understanding it will give you a broader horizon on Reinforcement Learning. It’s simple that Markov Process doesn’t care about the past, however it is the past that definite the present, which means present is the outcome of the past. Nevertheless, the only thing we should do is focus on the present, because the present will be the past.

So, what we should take into consideration? Remembering all the past is not a ideal method, we should summarize them. From Sutton’s book “What we would like, ideally, is a state signal that summarize past sensation compactly, yet in such a way that all relevant information is retained. … A state signal that succeeds in retaining all relevant information is said to be Markov, or to have Markov property.”


What is deterministic model?

I have been reading Sutton’s new book Reinforcement Learning: An Introduction(2nd edition) for many days, and what made me confused was the deterministic model. What is the deterministic model?

Mathematical model in which outcomes are precisely determined through known relationships among states and events, without any room for random variation. In such models, a given input will always produce the same output, such as in a known chemical reaction. In comparison, stochastic models use ranges of values for variables in the form of probability distributions[1].

In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. For example, the multi-arm bandits problem in Sutton’s book, if the value of an arbitrary action is selected according to a normal distribution, say with mean 0 and variance 1, the model is stochastic, if for each action, the value is fixed for example, that is deterministic.


[1] http://www.businessdictionary.com/definition/deterministic-model.html




  • 打开http://www.lfd.uci.edu/~gohlke/pythonlibs/#OpenCV
  • 选择你要的版本,然后pip install opencv_python-xxx.whl


然后再Python命令行import cv2,如果没有问题表示按照成功了




Numpy是整个Python的科学计算的基础,撑起了Python作为机器学习重要语言的整个计算基础,包括Numpy, Scipy, Pandas, Scikit-Learn以及最近兴起的机器学习和深度学习库,包括TensorFlow, Keras等,它们多维数据的表示和使用无不基于Numpy,因此在语法和使用上以及功能上都和Numpy有着很多相似之处,这篇文章介绍Numpy中的非常重要的概念,那就是axis,如果你完全理解了axis,那么操作起多维的数组的时候将得心应手。 Continue Reading


Python奇技淫巧——Unpacking Argument Lists

在Python2.7的文档中,有介绍Unpacking Argument Lists的文档,unpacking arguments lists的字面意思看起来比较陌生,但是如果你看下面的代码就知道它是什么意思了

**d就是unpacking argument的一种方法,将字典d解包之后将对应的参数传给有名元组Point,需要注意的是,被解包的字典的key的名字一定要和namedtuple的field_names相对应,否则会报错。 Continue Reading











Continue Reading