Daily post for feedback collected from TAs, for potential issues with tutorial notebooks that possibly weren’t resolved in time before the start of the tutorials:
Bugs:
none reported.
Clarifications / Inconsistencies:
-
T1: In interactive demo 2, it’s impossible to truly set alpha to zero (ie no learning), even though the number 0.00 is shown (this is slightly confusing)
-
T1: In interactive demo 2: What is so special about states 9 and 19? Is at a position of a cue & of a reward? Maybe it would be better to explain it in the plot?
-
T2 section 2: maybe it will be less confusing to write q(a,s) = q(a) = … and comment that this environment does not change its state? Otherwise it is hard to relate it to the intro video.
-
T2: maybe add alpha=0.7 to plot_parameter_performance? It is not clear which epsilon is better in section 4.