W2D5 -- Reinforcement Learning: Erratum

Daily post for feedback collected from TAs, for potential issues with tutorial notebooks that possibly weren’t resolved in time before the start of the tutorials:

Bugs:

none reported.

Clarifications / Inconsistencies:

  • T1: In interactive demo 2, it’s impossible to truly set alpha to zero (ie no learning), even though the number 0.00 is shown (this is slightly confusing)

  • T1: In interactive demo 2: What is so special about states 9 and 19? Is at a position of a cue & of a reward? Maybe it would be better to explain it in the plot?

  • T2 section 2: maybe it will be less confusing to write q(a,s) = q(a) = … and comment that this environment does not change its state? Otherwise it is hard to relate it to the intro video.

  • T2: maybe add alpha=0.7 to plot_parameter_performance? It is not clear which epsilon is better in section 4.

T1: In interactive demo 2, the widgets.FloatSlider shows by default .2f format. Thus, the very small values are shown as zero (even if they are not zero). The widget redout_format should be changed.

T2: In interactive demo Changing Epsilon and Alpha, the k can take values \le 0, and thus producing an error. k should take values \ge2

I have raised a github issue for these incosistencies.