Monitors (contrib)

[TOC]

Monitors allow user instrumentation of the training process.

Monitors are useful to track training, report progress, request early stopping and more. Monitors use the observer pattern and notify at the following points:

  • when training begins
  • before a training step
  • after a training step
  • when training ends

Monitors are not intended to be reusable.

There are a few pre-defined monitors:

  • CaptureVariable: saves a variable's values
  • GraphDump: intended for debug only - saves all tensor values
  • PrintTensor: outputs one or more tensor values to log
  • SummarySaver: saves summaries to a summary writer
  • ValidationMonitor: runs model validation, by periodically calculating eval metrics on a separate data set; supports optional early stopping

For more specific needs, you can create custom monitors by extending one of the following classes:

  • BaseMonitor: the base class for all monitors
  • EveryN: triggers a callback every N training steps

Example:

  class ExampleMonitor(monitors.BaseMonitor):
    def __init__(self):
      print 'Init'

    def begin(self, max_steps):
      print 'Starting run. Will train until step %d.' % max_steps

    def end(self):
      print 'Completed run.'

    def step_begin(self, step):
      print 'About to run step %d...' % step
      return ['loss_1:0']

    def step_end(self, step, outputs):
      print 'Done running step %d. The value of "loss" tensor: %s' % (
        step, outputs['loss_1:0'])

  linear_regressor = LinearRegressor()
  example_monitor = ExampleMonitor()
  linear_regressor.fit(
    x, y, steps=2, batch_size=1, monitors=[example_monitor])

tf.contrib.learn.monitors.get_default_monitors(loss_op=None, summary_op=None, save_summary_steps=100, output_dir=None, summary_writer=None)

Returns a default set of typically-used monitors.

Args:
  • loss_op: Tensor, the loss tensor. This will be printed using PrintTensor at the default interval.
  • summary_op: See SummarySaver.
  • save_summary_steps: See SummarySaver.
  • output_dir: See SummarySaver.
  • summary_writer: See SummarySaver.
Returns:

list of monitors.


class tf.contrib.learn.monitors.BaseMonitor

Base class for Monitors.

Defines basic interfaces of Monitors. Monitors can either be run on all workers or, more commonly, restricted to run exclusively on the elected chief worker.


tf.contrib.learn.monitors.BaseMonitor.__init__()


tf.contrib.learn.monitors.BaseMonitor.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.BaseMonitor.end(session=None)

Callback at the end of training/evaluation.

Args:
  • session: A tf.Session object that can be used to run ops.
Raises:
  • ValueError: if we've not begun a run.

tf.contrib.learn.monitors.BaseMonitor.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.BaseMonitor.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.BaseMonitor.post_step(step, session)

Callback after the step is finished.

Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.

Args:
  • step: int, global step of the model.
  • session: Session object.

tf.contrib.learn.monitors.BaseMonitor.run_on_all_workers


tf.contrib.learn.monitors.BaseMonitor.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.BaseMonitor.step_begin(step)

Callback before training step begins.

You may use this callback to request evaluation of additional tensors in the graph.

Args:
  • step: int, the current value of the global step.
Returns:

List of Tensor objects or string tensor names to be run.

Raises:
  • ValueError: if we've already begun a step, or step < 0, or step > max_steps.

tf.contrib.learn.monitors.BaseMonitor.step_end(step, output)

Callback after training step finished.

This callback provides access to the tensors/ops evaluated at this step, including the additional tensors for which evaluation was requested in step_begin.

In addition, the callback has the opportunity to stop training by returning True. This is useful for early stopping, for example.

Note that this method is not called if the call to Session.run() that followed the last call to step_begin() failed.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool. True if training should stop.

Raises:
  • ValueError: if we've not begun a step, or step number does not match.

class tf.contrib.learn.monitors.CaptureVariable

Captures a variable's values into a collection.

This monitor is useful for unit testing. You should exercise caution when using this monitor in production, since it never discards values.

This is an EveryN monitor and has consistent semantic for every_n and first_n.


tf.contrib.learn.monitors.CaptureVariable.__init__(var_name, every_n=100, first_n=1)

Initializes a CaptureVariable monitor.

Args:
  • var_name: string. The variable name, including suffix (typically ":0").
  • every_n: int, print every N steps. See PrintN.
  • first_n: int, also print the first N steps. See PrintN.

tf.contrib.learn.monitors.CaptureVariable.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.CaptureVariable.end(session=None)


tf.contrib.learn.monitors.CaptureVariable.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.CaptureVariable.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.CaptureVariable.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.CaptureVariable.every_n_step_begin(step)


tf.contrib.learn.monitors.CaptureVariable.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.CaptureVariable.post_step(step, session)


tf.contrib.learn.monitors.CaptureVariable.run_on_all_workers


tf.contrib.learn.monitors.CaptureVariable.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.CaptureVariable.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.CaptureVariable.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


tf.contrib.learn.monitors.CaptureVariable.values

Returns the values captured so far.

Returns:

dict mapping int step numbers to that values of the variable at the respective step.


class tf.contrib.learn.monitors.CheckpointSaver

Saves checkpoints every N steps.


tf.contrib.learn.monitors.CheckpointSaver.__init__(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None)

Initialize CheckpointSaver monitor.

Args:
  • checkpoint_dir: str, base directory for the checkpoint files.
  • save_secs: int, save every N secs.
  • save_steps: int, save every N steps.
  • saver: Saver object, used for saving.
  • checkpoint_basename: str, base name for the checkpoint files.
  • scaffold: Scaffold, use to get saver object.
Raises:
  • ValueError: If both save_steps and save_secs are not None.
  • ValueError: If both save_steps and save_secs are None.

tf.contrib.learn.monitors.CheckpointSaver.begin(max_steps=None)


tf.contrib.learn.monitors.CheckpointSaver.end(session=None)


tf.contrib.learn.monitors.CheckpointSaver.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.CheckpointSaver.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.CheckpointSaver.post_step(step, session)


tf.contrib.learn.monitors.CheckpointSaver.run_on_all_workers


tf.contrib.learn.monitors.CheckpointSaver.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.CheckpointSaver.step_begin(step)


tf.contrib.learn.monitors.CheckpointSaver.step_end(step, output)

Callback after training step finished.

This callback provides access to the tensors/ops evaluated at this step, including the additional tensors for which evaluation was requested in step_begin.

In addition, the callback has the opportunity to stop training by returning True. This is useful for early stopping, for example.

Note that this method is not called if the call to Session.run() that followed the last call to step_begin() failed.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool. True if training should stop.

Raises:
  • ValueError: if we've not begun a step, or step number does not match.

class tf.contrib.learn.monitors.EveryN

Base class for monitors that execute callbacks every N steps.

This class adds three new callbacks:

  • every_n_step_begin
  • every_n_step_end
  • every_n_post_step

The callbacks are executed every n steps, or optionally every step for the first m steps, where m and n can both be user-specified.

When extending this class, note that if you wish to use any of the BaseMonitor callbacks, you must call their respective super implementation:

def step_begin(self, step): super(ExampleMonitor, self).step_begin(step) return []

Failing to call the super implementation will cause unpredictable behavior.

The every_n_post_step() callback is also called after the last step if it was not already called through the regular conditions. Note that every_n_step_begin() and every_n_step_end() do not receive that special treatment.


tf.contrib.learn.monitors.EveryN.__init__(every_n_steps=100, first_n_steps=1)

Initializes an EveryN monitor.

Args:
  • every_n_steps: int, the number of steps to allow between callbacks.
  • first_n_steps: int, specifying the number of initial steps during which the callbacks will always be executed, regardless of the value of every_n_steps. Note that this value is relative to the global step

tf.contrib.learn.monitors.EveryN.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.EveryN.end(session=None)


tf.contrib.learn.monitors.EveryN.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.EveryN.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.EveryN.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.EveryN.every_n_step_begin(step)

Callback before every n'th step begins.

Args:
  • step: int, the current value of the global step.
Returns:

A list of tensors that will be evaluated at this step.


tf.contrib.learn.monitors.EveryN.every_n_step_end(step, outputs)

Callback after every n'th step finished.

This callback provides access to the tensors/ops evaluated at this step, including the additional tensors for which evaluation was requested in step_begin.

In addition, the callback has the opportunity to stop training by returning True. This is useful for early stopping, for example.

Args:
  • step: int, the current value of the global step.
  • outputs: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool. True if training should stop.


tf.contrib.learn.monitors.EveryN.post_step(step, session)


tf.contrib.learn.monitors.EveryN.run_on_all_workers


tf.contrib.learn.monitors.EveryN.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.EveryN.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.EveryN.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.ExportMonitor

Monitor that exports Estimator every N steps.


tf.contrib.learn.monitors.ExportMonitor.__init__(*args, **kwargs)

Initializes ExportMonitor. (deprecated arguments)

SOME ARGUMENTS ARE DEPRECATED. They will be removed after 2016-09-23. Instructions for updating: The signature of the input_fn accepted by export is changing to be consistent with what's used by tf.Learn Estimator's train/evaluate. input_fn (and in most cases, input_feature_key) will both become required args.

Args:
  every_n_steps: Run monitor every N steps.
  export_dir: str, folder to export.
  input_fn: A function that takes no argument and returns a tuple of
    (features, targets), where features is a dict of string key to `Tensor`
    and targets is a `Tensor` that's currently not used (and so can be
    `None`).
  input_feature_key: String key into the features dict returned by
    `input_fn` that corresponds to the raw `Example` strings `Tensor` that
    the exported model will take as input. Can only be `None` if you're
    using a custom `signature_fn` that does not use the first arg
    (examples).
  exports_to_keep: int, number of exports to keep.
  signature_fn: Function that returns a default signature and a named
    signature map, given `Tensor` of `Example` strings, `dict` of `Tensor`s
    for features and `dict` of `Tensor`s for predictions.
  default_batch_size: Default batch size of the `Example` placeholder.

Raises:
  ValueError: If `input_fn` and `input_feature_key` are not both defined or
    are not both `None`.

tf.contrib.learn.monitors.ExportMonitor.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.ExportMonitor.end(session=None)


tf.contrib.learn.monitors.ExportMonitor.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.ExportMonitor.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.ExportMonitor.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.ExportMonitor.every_n_step_begin(step)

Callback before every n'th step begins.

Args:
  • step: int, the current value of the global step.
Returns:

A list of tensors that will be evaluated at this step.


tf.contrib.learn.monitors.ExportMonitor.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.ExportMonitor.export_dir


tf.contrib.learn.monitors.ExportMonitor.exports_to_keep


tf.contrib.learn.monitors.ExportMonitor.last_export_dir

Returns the directory containing the last completed export.

Returns:

The string path to the exported directory. NB: this functionality was added on 2016/09/25; clients that depend on the return value may need to handle the case where this function returns None because the estimator being fitted does not yet return a value during export.


tf.contrib.learn.monitors.ExportMonitor.post_step(step, session)


tf.contrib.learn.monitors.ExportMonitor.run_on_all_workers


tf.contrib.learn.monitors.ExportMonitor.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.ExportMonitor.signature_fn


tf.contrib.learn.monitors.ExportMonitor.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.ExportMonitor.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.GraphDump

Dumps almost all tensors in the graph at every step.

Note, this is very expensive, prefer PrintTensor in production.


tf.contrib.learn.monitors.GraphDump.__init__(ignore_ops=None)

Initializes GraphDump monitor.

Args:
  • ignore_ops: list of string. Names of ops to ignore. If None, GraphDump.IGNORE_OPS is used.

tf.contrib.learn.monitors.GraphDump.begin(max_steps=None)


tf.contrib.learn.monitors.GraphDump.compare(other_dump, step, atol=1e-06)

Compares two GraphDump monitors and returns differences.

Args:
  • other_dump: Another GraphDump monitor.
  • step: int, step to compare on.
  • atol: float, absolute tolerance in comparison of floating arrays.
Returns:

Returns tuple:

  • matched: list of keys that matched.
  • non_matched: dict of keys to tuple of 2 mismatched values.
Raises:
  • ValueError: if a key in data is missing from other_dump at step.

tf.contrib.learn.monitors.GraphDump.data


tf.contrib.learn.monitors.GraphDump.end(session=None)

Callback at the end of training/evaluation.

Args:
  • session: A tf.Session object that can be used to run ops.
Raises:
  • ValueError: if we've not begun a run.

tf.contrib.learn.monitors.GraphDump.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.GraphDump.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.GraphDump.post_step(step, session)

Callback after the step is finished.

Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.

Args:
  • step: int, global step of the model.
  • session: Session object.

tf.contrib.learn.monitors.GraphDump.run_on_all_workers


tf.contrib.learn.monitors.GraphDump.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.GraphDump.step_begin(step)


tf.contrib.learn.monitors.GraphDump.step_end(step, output)


class tf.contrib.learn.monitors.LoggingTrainable

Writes trainable variable values into log every N steps.

Write the tensors in trainable variables every_n steps, starting with the first_nth step.


tf.contrib.learn.monitors.LoggingTrainable.__init__(scope=None, every_n=100, first_n=1)

Initializes LoggingTrainable monitor.

Args:
  • scope: An optional string to match variable names using re.match.
  • every_n: Print every N steps.
  • first_n: Print first N steps.

tf.contrib.learn.monitors.LoggingTrainable.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.LoggingTrainable.end(session=None)


tf.contrib.learn.monitors.LoggingTrainable.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.LoggingTrainable.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.LoggingTrainable.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.LoggingTrainable.every_n_step_begin(step)


tf.contrib.learn.monitors.LoggingTrainable.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.LoggingTrainable.post_step(step, session)


tf.contrib.learn.monitors.LoggingTrainable.run_on_all_workers


tf.contrib.learn.monitors.LoggingTrainable.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.LoggingTrainable.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.LoggingTrainable.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.NanLoss

NaN Loss monitor.

Monitors loss and stops training if loss is NaN. Can either fail with exception or just stop training.


tf.contrib.learn.monitors.NanLoss.__init__(loss_tensor, every_n_steps=100, fail_on_nan_loss=True)

Initializes NanLoss monitor.

Args:
  • loss_tensor: Tensor, the loss tensor.
  • every_n_steps: int, run check every this many steps.
  • fail_on_nan_loss: bool, whether to raise exception when loss is NaN.

tf.contrib.learn.monitors.NanLoss.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.NanLoss.end(session=None)


tf.contrib.learn.monitors.NanLoss.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.NanLoss.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.NanLoss.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.NanLoss.every_n_step_begin(step)


tf.contrib.learn.monitors.NanLoss.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.NanLoss.post_step(step, session)


tf.contrib.learn.monitors.NanLoss.run_on_all_workers


tf.contrib.learn.monitors.NanLoss.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.NanLoss.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.NanLoss.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.PrintTensor

Prints given tensors every N steps.

This is an EveryN monitor and has consistent semantic for every_n and first_n.

The tensors will be printed to the log, with INFO severity.


tf.contrib.learn.monitors.PrintTensor.__init__(tensor_names, every_n=100, first_n=1)

Initializes a PrintTensor monitor.

Args:
  • tensor_names: dict of tag to tensor names or iterable of tensor names (strings).
  • every_n: int, print every N steps. See PrintN.
  • first_n: int, also print the first N steps. See PrintN.

tf.contrib.learn.monitors.PrintTensor.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.PrintTensor.end(session=None)


tf.contrib.learn.monitors.PrintTensor.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.PrintTensor.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.PrintTensor.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.PrintTensor.every_n_step_begin(step)


tf.contrib.learn.monitors.PrintTensor.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.PrintTensor.post_step(step, session)


tf.contrib.learn.monitors.PrintTensor.run_on_all_workers


tf.contrib.learn.monitors.PrintTensor.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.PrintTensor.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.PrintTensor.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.StepCounter

Steps per second monitor.


tf.contrib.learn.monitors.StepCounter.__init__(every_n_steps=100, output_dir=None, summary_writer=None)


tf.contrib.learn.monitors.StepCounter.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.StepCounter.end(session=None)


tf.contrib.learn.monitors.StepCounter.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.StepCounter.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.StepCounter.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.StepCounter.every_n_step_begin(step)

Callback before every n'th step begins.

Args:
  • step: int, the current value of the global step.
Returns:

A list of tensors that will be evaluated at this step.


tf.contrib.learn.monitors.StepCounter.every_n_step_end(current_step, outputs)


tf.contrib.learn.monitors.StepCounter.post_step(step, session)


tf.contrib.learn.monitors.StepCounter.run_on_all_workers


tf.contrib.learn.monitors.StepCounter.set_estimator(estimator)


tf.contrib.learn.monitors.StepCounter.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.StepCounter.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.StopAtStep

Monitor to request stop at a specified step.


tf.contrib.learn.monitors.StopAtStep.__init__(num_steps=None, last_step=None)

Create a StopAtStep monitor.

This monitor requests stop after either a number of steps have been executed or a last step has been reached. Only of the two options can be specified.

if num_steps is specified, it indicates the number of steps to execute after begin() is called. If instead last_step is specified, it indicates the last step we want to execute, as passed to the step_begin() call.

Args:
  • num_steps: Number of steps to execute.
  • last_step: Step after which to stop.
Raises:
  • ValueError: If one of the arguments is invalid.

tf.contrib.learn.monitors.StopAtStep.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.StopAtStep.end(session=None)

Callback at the end of training/evaluation.

Args:
  • session: A tf.Session object that can be used to run ops.
Raises:
  • ValueError: if we've not begun a run.

tf.contrib.learn.monitors.StopAtStep.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.StopAtStep.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.StopAtStep.post_step(step, session)

Callback after the step is finished.

Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.

Args:
  • step: int, global step of the model.
  • session: Session object.

tf.contrib.learn.monitors.StopAtStep.run_on_all_workers


tf.contrib.learn.monitors.StopAtStep.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.StopAtStep.step_begin(step)


tf.contrib.learn.monitors.StopAtStep.step_end(step, output)


class tf.contrib.learn.monitors.SummarySaver

Saves summaries every N steps.


tf.contrib.learn.monitors.SummarySaver.__init__(summary_op, save_steps=100, output_dir=None, summary_writer=None, scaffold=None)

Initializes a SummarySaver monitor.

Args:
  • summary_op: Tensor of type string. A serialized Summary protocol buffer, as output by TF summary methods like scalar_summary or merge_all_summaries.
  • save_steps: int, save summaries every N steps. See EveryN.
  • output_dir: string, the directory to save the summaries to. Only used if no summary_writer is supplied.
  • summary_writer: SummaryWriter. If None and an output_dir was passed, one will be created accordingly.
  • scaffold: Scaffold to get summary_op if it's not provided.

tf.contrib.learn.monitors.SummarySaver.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.SummarySaver.end(session=None)


tf.contrib.learn.monitors.SummarySaver.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.SummarySaver.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.SummarySaver.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.SummarySaver.every_n_step_begin(step)


tf.contrib.learn.monitors.SummarySaver.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.SummarySaver.post_step(step, session)


tf.contrib.learn.monitors.SummarySaver.run_on_all_workers


tf.contrib.learn.monitors.SummarySaver.set_estimator(estimator)


tf.contrib.learn.monitors.SummarySaver.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.SummarySaver.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.


class tf.contrib.learn.monitors.ValidationMonitor

Runs evaluation of a given estimator, at most every N steps.

Note that the evaluation is done based on the saved checkpoint, which will usually be older than the current step.

Can do early stopping on validation metrics if early_stopping_rounds is provided.


tf.contrib.learn.monitors.ValidationMonitor.__init__(x=None, y=None, input_fn=None, batch_size=None, eval_steps=None, every_n_steps=100, metrics=None, early_stopping_rounds=None, early_stopping_metric='loss', early_stopping_metric_minimize=True, name=None)

Initializes a ValidationMonitor.

Args:
  • x: See BaseEstimator.evaluate.
  • y: See BaseEstimator.evaluate.
  • input_fn: See BaseEstimator.evaluate.
  • batch_size: See BaseEstimator.evaluate.
  • eval_steps: See BaseEstimator.evaluate.
  • every_n_steps: Check for new checkpoints to evaluate every N steps. If a new checkpoint is found, it is evaluated. See EveryN.
  • metrics: See BaseEstimator.evaluate.
  • early_stopping_rounds: int. If the metric indicated by early_stopping_metric does not change according to early_stopping_metric_minimize for this many steps, then training will be stopped.
  • early_stopping_metric: string, name of the metric to check for early stopping.
  • early_stopping_metric_minimize: bool, True if early_stopping_metric is expected to decrease (thus early stopping occurs when this metric stops decreasing), False if early_stopping_metric is expected to increase. Typically, early_stopping_metric_minimize is True for loss metrics like mean squared error, and False for performance metrics like accuracy.
  • name: See BaseEstimator.evaluate.
Raises:
  • ValueError: If both x and input_fn are provided.

tf.contrib.learn.monitors.ValidationMonitor.begin(max_steps=None)

Called at the beginning of training.

When called, the default graph is the one we are executing.

Args:
  • max_steps: int, the maximum global step this training will run until.
Raises:
  • ValueError: if we've already begun a run.

tf.contrib.learn.monitors.ValidationMonitor.best_step

Returns the step at which the best early stopping metric was found.


tf.contrib.learn.monitors.ValidationMonitor.best_value

Returns the best early stopping metric value found so far.


tf.contrib.learn.monitors.ValidationMonitor.early_stopped

Returns True if this monitor caused an early stop.


tf.contrib.learn.monitors.ValidationMonitor.end(session=None)


tf.contrib.learn.monitors.ValidationMonitor.epoch_begin(epoch)

Begin epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've already begun an epoch, or epoch < 0.

tf.contrib.learn.monitors.ValidationMonitor.epoch_end(epoch)

End epoch.

Args:
  • epoch: int, the epoch number.
Raises:
  • ValueError: if we've not begun an epoch, or epoch number does not match.

tf.contrib.learn.monitors.ValidationMonitor.every_n_post_step(step, session)

Callback after a step is finished or end() is called.

Args:
  • step: int, the current value of the global step.
  • session: Session object.

tf.contrib.learn.monitors.ValidationMonitor.every_n_step_begin(step)

Callback before every n'th step begins.

Args:
  • step: int, the current value of the global step.
Returns:

A list of tensors that will be evaluated at this step.


tf.contrib.learn.monitors.ValidationMonitor.every_n_step_end(step, outputs)


tf.contrib.learn.monitors.ValidationMonitor.post_step(step, session)


tf.contrib.learn.monitors.ValidationMonitor.run_on_all_workers


tf.contrib.learn.monitors.ValidationMonitor.set_estimator(estimator)

A setter called automatically by the target estimator.

If the estimator is locked, this method does nothing.

Args:
  • estimator: the estimator that this monitor monitors.
Raises:
  • ValueError: if the estimator is None.

tf.contrib.learn.monitors.ValidationMonitor.step_begin(step)

Overrides BaseMonitor.step_begin.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
Returns:

A list, the result of every_n_step_begin, if that was called this step, or an empty list otherwise.

Raises:
  • ValueError: if called more than once during a step.

tf.contrib.learn.monitors.ValidationMonitor.step_end(step, output)

Overrides BaseMonitor.step_end.

When overriding this method, you must call the super implementation.

Args:
  • step: int, the current value of the global step.
  • output: dict mapping string values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpy array, for non-scalar tensors.
Returns:

bool, the result of every_n_step_end, if that was called this step, or False otherwise.

Other Functions and Classes


class tf.contrib.learn.monitors.RunHookAdapterForMonitors

Wraps monitors into a SessionRunHook.


tf.contrib.learn.monitors.RunHookAdapterForMonitors.__init__(monitors)


tf.contrib.learn.monitors.RunHookAdapterForMonitors.after_run(run_context, run_values)


tf.contrib.learn.monitors.RunHookAdapterForMonitors.before_run(run_context)


tf.contrib.learn.monitors.RunHookAdapterForMonitors.begin()


tf.contrib.learn.monitors.RunHookAdapterForMonitors.end(session)


class tf.contrib.learn.monitors.SummaryWriterCache

Cache for summary writers.

This class caches summary writers, one per directory.


tf.contrib.learn.monitors.SummaryWriterCache.clear()

Clear cached summary writers. Currently only used for unit tests.


tf.contrib.learn.monitors.SummaryWriterCache.get(logdir)

Returns the SummaryWriter for the specified directory.

Args:
  • logdir: str, name of the directory.
Returns:

A SummaryWriter.

results matching ""

    No results matching ""