Your cart is currently empty!
Q1) [Chain MDP] The state is given by s = (x, y). There are 2 actions from each state namely A = {lef t, right}. Each action is successful with probability p, and the other action is made with probability 1 − p. There are two terminal states T1 and T2 (once in terminal state,…
Q1) [Chain MDP] The state is given by s = (x, y). There are 2 actions from each state namely A = {lef t, right}. Each action is successful with probability p, and the other action is made with probability 1 − p. There are two terminal states T1 and T2 (once in terminal state, the agent is stuck there forever). The reward in the L state is −1 and R state is +1, and every other state it is 0.
Q2) [Grid MDP] The state is given by s = (x, y). There are 4 actions from each state namely
|
A = {up, down, lef t, right}. Each action is successful with probability p, and with probability 1−p
other 3 actions are chosen.
1-dim valley. It needs to find its way to the top. The car has three actions namely A=-1,0,+1 which means accelerate backward, no acceleration and accelerate forward respectively. The ranges for position and velocity are [-1.2,0.5] and [-0.07,0.07] respectively. The car is needs to reach the top on the right, i.e., position of 0.5. The dynamics is according to the equations:
vt+1 = vt + 0.001at − 0.0025cos(3pt )
pt+1 = pt + vt
(1)