YeeKal

note7_sim_based

YeeKal
"#"

Simulation Based Search

forward search

Build a search tree with the current state $s_t$ and each final node reaches the end. Then select the action which maximum the reward. 1. simulate episodes os experience from now with the model 2. apply model_free RL to simulated episodes

MCTS

MCTS.ai

Monte-Carlo Tree Search

simple Monte-Carlo search

  1. given a model $M_v$ and a simulation policy $\pi$
  2. for each action $a\in A$
    • simulate k episodes from current real state $s_t$
    • evaluate actions by mean return (Monte-Carlo evaluation)
  3. select action with maximum value

Monte-Carlo tree search

  1. given a model $M_v$
  2. simulate K episodes from current state $s_t$ using current simulation policy $\pi$
  3. build a search tree containing visited states and actions
  4. evaluate actions $Q(s,a)$ by mean return of episodes from s,a
  5. select action with maximum value in search tree

Each simulation consists of two phases(in-tree,out-of-tree): - tree policy(improves): pick actions to maximise $Q(S,A)$ - default policy(fixed): pick actions randomly

Repeat (each simulation): - evaluate $Q(S,A)$ - improve tree policy, e.g by $\epsilon - greedy(Q)$

Uppder Confidence Bound(UCB)

UCT: Uppder Confidence Bound for Trees, UCT=MCTS+UCB

alphag0 build own github 五子棋