# BayesFlow Monte Carlo (contrib)

[TOC]

Monte Carlo integration and helpers.

## Background

Monte Carlo integration refers to the practice of estimating an expectation with
a sample mean. For example, given random variable `Z in R^k`

with density `p`

,
the expectation of function `f`

can be approximated like:

```
E_p[f(Z)] = \int f(z) p(z) dz
~ S_n
:= n^{-1} \sum_{i=1}^n f(z_i), z_i iid samples from p.
```

If `E_p[|f(Z)|] < infinity`

, then `S_n --> E_p[f(Z)]`

by the strong law of large
numbers. If `E_p[f(Z)^2] < infinity`

, then `S_n`

is asymptotically normal with
variance `Var[f(Z)] / n`

.

Practitioners of Bayesian statistics often find themselves wanting to estimate
`E_p[f(Z)]`

when the distribution `p`

is known only up to a constant. For
example, the joint distribution `p(z, x)`

may be known, but the evidence
`p(x) = \int p(z, x) dz`

may be intractable. In that case, a parameterized
distribution family `q_lambda(z)`

may be chosen, and the optimal `lambda`

is the
one minimizing the KL divergence between `q_lambda(z)`

and
`p(z | x)`

. We only know `p(z, x)`

, but that is sufficient to find `lambda`

.

## Log-space evaluation and subtracting the maximum.

Care must be taken when the random variable lives in a high dimensional space.
For example, the naive importance sample estimate `E_q[f(Z) p(Z) / q(Z)]`

involves the ratio of two terms `p(Z) / q(Z)`

, each of which must have tails
dropping off faster than `O(|z|^{-(k + 1)})`

in order to have finite integral.
This ratio would often be zero or infinity up to numerical precision.

For that reason, we write

```
Log E_q[ f(Z) p(Z) / q(Z) ]
= Log E_q[ exp{Log[f(Z)] + Log[p(Z)] - Log[q(Z)] - C} ] + C, where
C := Max[ Log[f(Z)] + Log[p(Z)] - Log[q(Z)] ].
```

The maximum value of the exponentiated term will be 0.0, and the the expectation can be evaluated in a stable manner.

## Ops

`tf.contrib.bayesflow.monte_carlo.expectation(f, p, z=None, n=None, seed=None, name='expectation')`

Monte Carlo estimate of an expectation: `E_p[f(Z)]`

with sample mean.

This `Op`

returns

```
n^{-1} sum_{i=1}^n f(z_i), where z_i ~ p
\approx E_p[f(Z)]
```

User supplies either `Tensor`

of samples `z`

, or number of samples to draw `n`

##### Args:

: Callable mapping samples from`f`

`p`

to`Tensors`

.:`p`

`tf.contrib.distributions.BaseDistribution`

.:`z`

`Tensor`

of samples from`p`

, produced by`p.sample_n`

.: Integer`n`

`Tensor`

. Number of samples to generate if`z`

is not provided.: Python integer to seed the random number generator.`seed`

: A name to give this`name`

`Op`

.

##### Returns:

A `Tensor`

with the same `dtype`

as `p`

.

:`Example`

```
N_samples = 10000
distributions = tf.contrib.distributions
dist = distributions.Uniform([0.0, 0.0], [1.0, 2.0])
elementwise_mean = lambda x: x
mean_sum = lambda x: tf.reduce_sum(x, 1)
estimate_elementwise_mean_tf = monte_carlo.expectation(elementwise_mean,
dist,
n=N_samples)
estimate_mean_sum_tf = monte_carlo.expectation(mean_sum,
dist,
n=N_samples)
with tf.Session() as sess:
estimate_elementwise_mean, estimate_mean_sum = (
sess.run([estimate_elementwise_mean_tf, estimate_mean_sum_tf]))
print estimate_elementwise_mean
>>> np.array([ 0.50018013 1.00097895], dtype=np.float32)
print estimate_mean_sum
>>> 1.49571
```

`tf.contrib.bayesflow.monte_carlo.expectation_importance_sampler(f, log_p, sampling_dist_q, z=None, n=None, seed=None, name='expectation_importance_sampler')`

Monte Carlo estimate of `E_p[f(Z)] = E_q[f(Z) p(Z) / q(Z)]`

.

With `p(z) := exp{log_p(z)}`

, this `Op`

returns

```
n^{-1} sum_{i=1}^n [ f(z_i) p(z_i) / q(z_i) ], z_i ~ q,
\approx E_q[ f(Z) p(Z) / q(Z) ]
= E_p[f(Z)]
```

This integral is done in log-space with max-subtraction to better handle the
often extreme values that `f(z) p(z) / q(z)`

can take on.

If `f >= 0`

, it is up to 2x more efficient to exponentiate the result of
`expectation_importance_sampler_logspace`

applied to `Log[f]`

.

User supplies either `Tensor`

of samples `z`

, or number of samples to draw `n`

##### Args:

: Callable mapping samples from`f`

`sampling_dist_q`

to`Tensors`

with shape broadcastable to`q.batch_shape`

. For example,`f`

works "just like"`q.log_prob`

.: Callable mapping samples from`log_p`

`sampling_dist_q`

to`Tensors`

with shape broadcastable to`q.batch_shape`

. For example,`log_p`

works "just like"`sampling_dist_q.log_prob`

.: The sampling distribution.`sampling_dist_q`

`tf.contrib.distributions.BaseDistribution`

.`float64`

`dtype`

recommended.`log_p`

and`q`

should be supported on the same set.:`z`

`Tensor`

of samples from`q`

, produced by`q.sample_n`

.: Integer`n`

`Tensor`

. Number of samples to generate if`z`

is not provided.: Python integer to seed the random number generator.`seed`

: A name to give this`name`

`Op`

.

##### Returns:

The importance sampling estimate. `Tensor`

with `shape`

equal
to batch shape of `q`

, and `dtype`

= `q.dtype`

.

`tf.contrib.bayesflow.monte_carlo.expectation_importance_sampler_logspace(log_f, log_p, sampling_dist_q, z=None, n=None, seed=None, name='expectation_importance_sampler_logspace')`

Importance sampling with a positive function, in log-space.

With `p(z) := exp{log_p(z)}`

, and `f(z) = exp{log_f(z)}`

, this `Op`

returns

```
Log[ n^{-1} sum_{i=1}^n [ f(z_i) p(z_i) / q(z_i) ] ], z_i ~ q,
\approx Log[ E_q[ f(Z) p(Z) / q(Z) ] ]
= Log[E_p[f(Z)]]
```

This integral is done in log-space with max-subtraction to better handle the
often extreme values that `f(z) p(z) / q(z)`

can take on.

In contrast to `expectation_importance_sampler`

, this `Op`

returns values in
log-space.

User supplies either `Tensor`

of samples `z`

, or number of samples to draw `n`

##### Args:

: Callable mapping samples from`log_f`

`sampling_dist_q`

to`Tensors`

with shape broadcastable to`q.batch_shape`

. For example,`log_f`

works "just like"`sampling_dist_q.log_prob`

.: Callable mapping samples from`log_p`

`sampling_dist_q`

to`Tensors`

with shape broadcastable to`q.batch_shape`

. For example,`log_p`

works "just like"`q.log_prob`

.: The sampling distribution.`sampling_dist_q`

`tf.contrib.distributions.BaseDistribution`

.`float64`

`dtype`

recommended.`log_p`

and`q`

should be supported on the same set.:`z`

`Tensor`

of samples from`q`

, produced by`q.sample_n`

.: Integer`n`

`Tensor`

. Number of samples to generate if`z`

is not provided.: Python integer to seed the random number generator.`seed`

: A name to give this`name`

`Op`

.

##### Returns:

Logarithm of the importance sampling estimate. `Tensor`

with `shape`

equal
to batch shape of `q`

, and `dtype`

= `q.dtype`

.