# Important Sampling

### Monte Carlo&#x20;

* Repeated random sampling&#x20;
* RL: estimate directly from experiences&#x20;
* DP
  * Agent knows the transition probabilities&#x20;
* Monte Carlo: Estimate values by averaging over a large number of random samples&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkB3WlTyIJKs3SuCPDo%2Fimage.png?alt=media\&token=78c76646-00ac-4b19-8189-62a85f0745d9)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkB3be5Rj8Bc0TEwhk2%2Fimage.png?alt=media\&token=7c767551-2264-4672-91b2-e536c3092dd9)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkB3dcK0JpeCVRAglzB%2Fimage.png?alt=media\&token=fa4f908d-ddf1-49ca-aa62-90268ce1a204)

### Epsilon-soft policies&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkBJ2cjnL6a5QHhvssr%2Fimage.png?alt=media\&token=5aab5e5b-8424-4a1d-bb7a-b5873b881493)

* Continuously explore&#x20;
* Non-zero probability to each action in every state&#x20;
* Always stochastic (probability)&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkBJSQwKT7QS9be7jPe%2Fimage.png?alt=media\&token=34be068a-19bd-49be-8c17-9dd2327f9884)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkBJnJWFtitSppA0GXZ%2Fimage.png?alt=media\&token=f48c424b-52f4-4ce6-830e-a53dc3e1b2d6)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MkB0_WM_-fHfTS4KBdP%2F-MkBJvpDLPLx01_uocvr%2Fimage.png?alt=media\&token=010c5887-07f5-4f70-ab1b-ca5821f1ddac)
