Monitors (contrib)
[TOC]
Monitors allow user instrumentation of the training process.
Monitors are useful to track training, report progress, request early stopping and more. Monitors use the observer pattern and notify at the following points:
- when training begins
- before a training step
- after a training step
- when training ends
Monitors are not intended to be reusable.
There are a few pre-defined monitors:
CaptureVariable
: saves a variable's valuesGraphDump
: intended for debug only - saves all tensor valuesPrintTensor
: outputs one or more tensor values to logSummarySaver
: saves summaries to a summary writerValidationMonitor
: runs model validation, by periodically calculating eval metrics on a separate data set; supports optional early stopping
For more specific needs, you can create custom monitors by extending one of the following classes:
BaseMonitor
: the base class for all monitorsEveryN
: triggers a callback every N training steps
Example:
class ExampleMonitor(monitors.BaseMonitor):
def __init__(self):
print 'Init'
def begin(self, max_steps):
print 'Starting run. Will train until step %d.' % max_steps
def end(self):
print 'Completed run.'
def step_begin(self, step):
print 'About to run step %d...' % step
return ['loss_1:0']
def step_end(self, step, outputs):
print 'Done running step %d. The value of "loss" tensor: %s' % (
step, outputs['loss_1:0'])
linear_regressor = LinearRegressor()
example_monitor = ExampleMonitor()
linear_regressor.fit(
x, y, steps=2, batch_size=1, monitors=[example_monitor])
tf.contrib.learn.monitors.get_default_monitors(loss_op=None, summary_op=None, save_summary_steps=100, output_dir=None, summary_writer=None)
Returns a default set of typically-used monitors.
Args:
loss_op
:Tensor
, the loss tensor. This will be printed usingPrintTensor
at the default interval.summary_op
: SeeSummarySaver
.save_summary_steps
: SeeSummarySaver
.output_dir
: SeeSummarySaver
.summary_writer
: SeeSummarySaver
.
Returns:
list
of monitors.
class tf.contrib.learn.monitors.BaseMonitor
Base class for Monitors.
Defines basic interfaces of Monitors. Monitors can either be run on all workers or, more commonly, restricted to run exclusively on the elected chief worker.
tf.contrib.learn.monitors.BaseMonitor.__init__()
tf.contrib.learn.monitors.BaseMonitor.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.BaseMonitor.end(session=None)
Callback at the end of training/evaluation.
Args:
session
: Atf.Session
object that can be used to run ops.
Raises:
ValueError
: if we've not begun a run.
tf.contrib.learn.monitors.BaseMonitor.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.BaseMonitor.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.BaseMonitor.post_step(step, session)
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
Args:
step
:int
, global step of the model.session
:Session
object.
tf.contrib.learn.monitors.BaseMonitor.run_on_all_workers
tf.contrib.learn.monitors.BaseMonitor.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.BaseMonitor.step_begin(step)
Callback before training step begins.
You may use this callback to request evaluation of additional tensors in the graph.
Args:
step
:int
, the current value of the global step.
Returns:
List of Tensor
objects or string tensor names to be run.
Raises:
ValueError
: if we've already begun a step, orstep
< 0, orstep
>max_steps
.
tf.contrib.learn.monitors.BaseMonitor.step_end(step, output)
Callback after training step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
Note that this method is not called if the call to Session.run()
that
followed the last call to step_begin()
failed.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
. True if training should stop.
Raises:
ValueError
: if we've not begun a step, orstep
number does not match.
class tf.contrib.learn.monitors.CaptureVariable
Captures a variable's values into a collection.
This monitor is useful for unit testing. You should exercise caution when using this monitor in production, since it never discards values.
This is an EveryN
monitor and has consistent semantic for every_n
and first_n
.
tf.contrib.learn.monitors.CaptureVariable.__init__(var_name, every_n=100, first_n=1)
Initializes a CaptureVariable monitor.
Args:
var_name
:string
. The variable name, including suffix (typically ":0").every_n
:int
, print every N steps. SeePrintN.
first_n
:int
, also print the first N steps. SeePrintN.
tf.contrib.learn.monitors.CaptureVariable.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.CaptureVariable.end(session=None)
tf.contrib.learn.monitors.CaptureVariable.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.CaptureVariable.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.CaptureVariable.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.CaptureVariable.every_n_step_begin(step)
tf.contrib.learn.monitors.CaptureVariable.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.CaptureVariable.post_step(step, session)
tf.contrib.learn.monitors.CaptureVariable.run_on_all_workers
tf.contrib.learn.monitors.CaptureVariable.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.CaptureVariable.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.CaptureVariable.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
tf.contrib.learn.monitors.CaptureVariable.values
Returns the values captured so far.
Returns:
dict
mapping int
step numbers to that values of the variable at the
respective step.
class tf.contrib.learn.monitors.CheckpointSaver
Saves checkpoints every N steps.
tf.contrib.learn.monitors.CheckpointSaver.__init__(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None)
Initialize CheckpointSaver monitor.
Args:
checkpoint_dir
:str
, base directory for the checkpoint files.save_secs
:int
, save every N secs.save_steps
:int
, save every N steps.saver
:Saver
object, used for saving.checkpoint_basename
:str
, base name for the checkpoint files.scaffold
:Scaffold
, use to get saver object.
Raises:
ValueError
: If bothsave_steps
andsave_secs
are notNone
.ValueError
: If bothsave_steps
andsave_secs
areNone
.
tf.contrib.learn.monitors.CheckpointSaver.begin(max_steps=None)
tf.contrib.learn.monitors.CheckpointSaver.end(session=None)
tf.contrib.learn.monitors.CheckpointSaver.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.CheckpointSaver.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.CheckpointSaver.post_step(step, session)
tf.contrib.learn.monitors.CheckpointSaver.run_on_all_workers
tf.contrib.learn.monitors.CheckpointSaver.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.CheckpointSaver.step_begin(step)
tf.contrib.learn.monitors.CheckpointSaver.step_end(step, output)
Callback after training step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
Note that this method is not called if the call to Session.run()
that
followed the last call to step_begin()
failed.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
. True if training should stop.
Raises:
ValueError
: if we've not begun a step, orstep
number does not match.
class tf.contrib.learn.monitors.EveryN
Base class for monitors that execute callbacks every N steps.
This class adds three new callbacks:
- every_n_step_begin
- every_n_step_end
- every_n_post_step
The callbacks are executed every n steps, or optionally every step for the first m steps, where m and n can both be user-specified.
When extending this class, note that if you wish to use any of the
BaseMonitor
callbacks, you must call their respective super implementation:
def step_begin(self, step): super(ExampleMonitor, self).step_begin(step) return []
Failing to call the super implementation will cause unpredictable behavior.
The every_n_post_step()
callback is also called after the last step if it
was not already called through the regular conditions. Note that
every_n_step_begin()
and every_n_step_end()
do not receive that special
treatment.
tf.contrib.learn.monitors.EveryN.__init__(every_n_steps=100, first_n_steps=1)
Initializes an EveryN
monitor.
Args:
every_n_steps
:int
, the number of steps to allow between callbacks.first_n_steps
:int
, specifying the number of initial steps during which the callbacks will always be executed, regardless of the value ofevery_n_steps
. Note that this value is relative to the global step
tf.contrib.learn.monitors.EveryN.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.EveryN.end(session=None)
tf.contrib.learn.monitors.EveryN.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.EveryN.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.EveryN.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.EveryN.every_n_step_begin(step)
Callback before every n'th step begins.
Args:
step
:int
, the current value of the global step.
Returns:
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.EveryN.every_n_step_end(step, outputs)
Callback after every n'th step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
Args:
step
:int
, the current value of the global step.outputs
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
. True if training should stop.
tf.contrib.learn.monitors.EveryN.post_step(step, session)
tf.contrib.learn.monitors.EveryN.run_on_all_workers
tf.contrib.learn.monitors.EveryN.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.EveryN.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.EveryN.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.ExportMonitor
Monitor that exports Estimator every N steps.
tf.contrib.learn.monitors.ExportMonitor.__init__(*args, **kwargs)
Initializes ExportMonitor. (deprecated arguments)
SOME ARGUMENTS ARE DEPRECATED. They will be removed after 2016-09-23. Instructions for updating: The signature of the input_fn accepted by export is changing to be consistent with what's used by tf.Learn Estimator's train/evaluate. input_fn (and in most cases, input_feature_key) will both become required args.
Args:
every_n_steps: Run monitor every N steps.
export_dir: str, folder to export.
input_fn: A function that takes no argument and returns a tuple of
(features, targets), where features is a dict of string key to `Tensor`
and targets is a `Tensor` that's currently not used (and so can be
`None`).
input_feature_key: String key into the features dict returned by
`input_fn` that corresponds to the raw `Example` strings `Tensor` that
the exported model will take as input. Can only be `None` if you're
using a custom `signature_fn` that does not use the first arg
(examples).
exports_to_keep: int, number of exports to keep.
signature_fn: Function that returns a default signature and a named
signature map, given `Tensor` of `Example` strings, `dict` of `Tensor`s
for features and `dict` of `Tensor`s for predictions.
default_batch_size: Default batch size of the `Example` placeholder.
Raises:
ValueError: If `input_fn` and `input_feature_key` are not both defined or
are not both `None`.
tf.contrib.learn.monitors.ExportMonitor.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.ExportMonitor.end(session=None)
tf.contrib.learn.monitors.ExportMonitor.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.ExportMonitor.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.ExportMonitor.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.ExportMonitor.every_n_step_begin(step)
Callback before every n'th step begins.
Args:
step
:int
, the current value of the global step.
Returns:
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.ExportMonitor.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.ExportMonitor.export_dir
tf.contrib.learn.monitors.ExportMonitor.exports_to_keep
tf.contrib.learn.monitors.ExportMonitor.last_export_dir
Returns the directory containing the last completed export.
Returns:
The string path to the exported directory. NB: this functionality was added on 2016/09/25; clients that depend on the return value may need to handle the case where this function returns None because the estimator being fitted does not yet return a value during export.
tf.contrib.learn.monitors.ExportMonitor.post_step(step, session)
tf.contrib.learn.monitors.ExportMonitor.run_on_all_workers
tf.contrib.learn.monitors.ExportMonitor.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.ExportMonitor.signature_fn
tf.contrib.learn.monitors.ExportMonitor.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.ExportMonitor.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.GraphDump
Dumps almost all tensors in the graph at every step.
Note, this is very expensive, prefer PrintTensor
in production.
tf.contrib.learn.monitors.GraphDump.__init__(ignore_ops=None)
Initializes GraphDump monitor.
Args:
ignore_ops
:list
ofstring
. Names of ops to ignore. If None,GraphDump.IGNORE_OPS
is used.
tf.contrib.learn.monitors.GraphDump.begin(max_steps=None)
tf.contrib.learn.monitors.GraphDump.compare(other_dump, step, atol=1e-06)
Compares two GraphDump
monitors and returns differences.
Args:
other_dump
: AnotherGraphDump
monitor.step
:int
, step to compare on.atol
:float
, absolute tolerance in comparison of floating arrays.
Returns:
Returns tuple:
matched
:list
of keys that matched.non_matched
:dict
of keys to tuple of 2 mismatched values.
Raises:
ValueError
: if a key indata
is missing fromother_dump
atstep
.
tf.contrib.learn.monitors.GraphDump.data
tf.contrib.learn.monitors.GraphDump.end(session=None)
Callback at the end of training/evaluation.
Args:
session
: Atf.Session
object that can be used to run ops.
Raises:
ValueError
: if we've not begun a run.
tf.contrib.learn.monitors.GraphDump.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.GraphDump.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.GraphDump.post_step(step, session)
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
Args:
step
:int
, global step of the model.session
:Session
object.
tf.contrib.learn.monitors.GraphDump.run_on_all_workers
tf.contrib.learn.monitors.GraphDump.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.GraphDump.step_begin(step)
tf.contrib.learn.monitors.GraphDump.step_end(step, output)
class tf.contrib.learn.monitors.LoggingTrainable
Writes trainable variable values into log every N steps.
Write the tensors in trainable variables every_n
steps,
starting with the first_n
th step.
tf.contrib.learn.monitors.LoggingTrainable.__init__(scope=None, every_n=100, first_n=1)
Initializes LoggingTrainable monitor.
Args:
scope
: An optional string to match variable names using re.match.every_n
: Print every N steps.first_n
: Print first N steps.
tf.contrib.learn.monitors.LoggingTrainable.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.LoggingTrainable.end(session=None)
tf.contrib.learn.monitors.LoggingTrainable.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.LoggingTrainable.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.LoggingTrainable.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.LoggingTrainable.every_n_step_begin(step)
tf.contrib.learn.monitors.LoggingTrainable.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.LoggingTrainable.post_step(step, session)
tf.contrib.learn.monitors.LoggingTrainable.run_on_all_workers
tf.contrib.learn.monitors.LoggingTrainable.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.LoggingTrainable.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.LoggingTrainable.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.NanLoss
NaN Loss monitor.
Monitors loss and stops training if loss is NaN. Can either fail with exception or just stop training.
tf.contrib.learn.monitors.NanLoss.__init__(loss_tensor, every_n_steps=100, fail_on_nan_loss=True)
Initializes NanLoss monitor.
Args:
loss_tensor
:Tensor
, the loss tensor.every_n_steps
:int
, run check every this many steps.fail_on_nan_loss
:bool
, whether to raise exception when loss is NaN.
tf.contrib.learn.monitors.NanLoss.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.NanLoss.end(session=None)
tf.contrib.learn.monitors.NanLoss.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.NanLoss.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.NanLoss.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.NanLoss.every_n_step_begin(step)
tf.contrib.learn.monitors.NanLoss.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.NanLoss.post_step(step, session)
tf.contrib.learn.monitors.NanLoss.run_on_all_workers
tf.contrib.learn.monitors.NanLoss.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.NanLoss.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.NanLoss.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.PrintTensor
Prints given tensors every N steps.
This is an EveryN
monitor and has consistent semantic for every_n
and first_n
.
The tensors will be printed to the log, with INFO
severity.
tf.contrib.learn.monitors.PrintTensor.__init__(tensor_names, every_n=100, first_n=1)
Initializes a PrintTensor monitor.
Args:
tensor_names
:dict
of tag to tensor names oriterable
of tensor names (strings).every_n
:int
, print every N steps. SeePrintN.
first_n
:int
, also print the first N steps. SeePrintN.
tf.contrib.learn.monitors.PrintTensor.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.PrintTensor.end(session=None)
tf.contrib.learn.monitors.PrintTensor.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.PrintTensor.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.PrintTensor.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.PrintTensor.every_n_step_begin(step)
tf.contrib.learn.monitors.PrintTensor.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.PrintTensor.post_step(step, session)
tf.contrib.learn.monitors.PrintTensor.run_on_all_workers
tf.contrib.learn.monitors.PrintTensor.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.PrintTensor.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.PrintTensor.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.StepCounter
Steps per second monitor.
tf.contrib.learn.monitors.StepCounter.__init__(every_n_steps=100, output_dir=None, summary_writer=None)
tf.contrib.learn.monitors.StepCounter.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.StepCounter.end(session=None)
tf.contrib.learn.monitors.StepCounter.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.StepCounter.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.StepCounter.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.StepCounter.every_n_step_begin(step)
Callback before every n'th step begins.
Args:
step
:int
, the current value of the global step.
Returns:
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.StepCounter.every_n_step_end(current_step, outputs)
tf.contrib.learn.monitors.StepCounter.post_step(step, session)
tf.contrib.learn.monitors.StepCounter.run_on_all_workers
tf.contrib.learn.monitors.StepCounter.set_estimator(estimator)
tf.contrib.learn.monitors.StepCounter.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.StepCounter.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.StopAtStep
Monitor to request stop at a specified step.
tf.contrib.learn.monitors.StopAtStep.__init__(num_steps=None, last_step=None)
Create a StopAtStep monitor.
This monitor requests stop after either a number of steps have been executed or a last step has been reached. Only of the two options can be specified.
if num_steps
is specified, it indicates the number of steps to execute
after begin()
is called. If instead last_step
is specified, it
indicates the last step we want to execute, as passed to the step_begin()
call.
Args:
num_steps
: Number of steps to execute.last_step
: Step after which to stop.
Raises:
ValueError
: If one of the arguments is invalid.
tf.contrib.learn.monitors.StopAtStep.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.StopAtStep.end(session=None)
Callback at the end of training/evaluation.
Args:
session
: Atf.Session
object that can be used to run ops.
Raises:
ValueError
: if we've not begun a run.
tf.contrib.learn.monitors.StopAtStep.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.StopAtStep.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.StopAtStep.post_step(step, session)
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
Args:
step
:int
, global step of the model.session
:Session
object.
tf.contrib.learn.monitors.StopAtStep.run_on_all_workers
tf.contrib.learn.monitors.StopAtStep.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.StopAtStep.step_begin(step)
tf.contrib.learn.monitors.StopAtStep.step_end(step, output)
class tf.contrib.learn.monitors.SummarySaver
Saves summaries every N steps.
tf.contrib.learn.monitors.SummarySaver.__init__(summary_op, save_steps=100, output_dir=None, summary_writer=None, scaffold=None)
Initializes a SummarySaver
monitor.
Args:
summary_op
:Tensor
of typestring
. A serializedSummary
protocol buffer, as output by TF summary methods likescalar_summary
ormerge_all_summaries
.save_steps
:int
, save summaries every N steps. SeeEveryN
.output_dir
:string
, the directory to save the summaries to. Only used if nosummary_writer
is supplied.summary_writer
:SummaryWriter
. IfNone
and anoutput_dir
was passed, one will be created accordingly.scaffold
:Scaffold
to get summary_op if it's not provided.
tf.contrib.learn.monitors.SummarySaver.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.SummarySaver.end(session=None)
tf.contrib.learn.monitors.SummarySaver.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.SummarySaver.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.SummarySaver.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.SummarySaver.every_n_step_begin(step)
tf.contrib.learn.monitors.SummarySaver.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.SummarySaver.post_step(step, session)
tf.contrib.learn.monitors.SummarySaver.run_on_all_workers
tf.contrib.learn.monitors.SummarySaver.set_estimator(estimator)
tf.contrib.learn.monitors.SummarySaver.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.SummarySaver.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
class tf.contrib.learn.monitors.ValidationMonitor
Runs evaluation of a given estimator, at most every N steps.
Note that the evaluation is done based on the saved checkpoint, which will usually be older than the current step.
Can do early stopping on validation metrics if early_stopping_rounds
is
provided.
tf.contrib.learn.monitors.ValidationMonitor.__init__(x=None, y=None, input_fn=None, batch_size=None, eval_steps=None, every_n_steps=100, metrics=None, early_stopping_rounds=None, early_stopping_metric='loss', early_stopping_metric_minimize=True, name=None)
Initializes a ValidationMonitor.
Args:
x
: SeeBaseEstimator.evaluate
.y
: SeeBaseEstimator.evaluate
.input_fn
: SeeBaseEstimator.evaluate
.batch_size
: SeeBaseEstimator.evaluate
.eval_steps
: SeeBaseEstimator.evaluate
.every_n_steps
: Check for new checkpoints to evaluate every N steps. If a new checkpoint is found, it is evaluated. SeeEveryN
.metrics
: SeeBaseEstimator.evaluate
.early_stopping_rounds
:int
. If the metric indicated byearly_stopping_metric
does not change according toearly_stopping_metric_minimize
for this many steps, then training will be stopped.early_stopping_metric
:string
, name of the metric to check for early stopping.early_stopping_metric_minimize
:bool
, True ifearly_stopping_metric
is expected to decrease (thus early stopping occurs when this metric stops decreasing), False ifearly_stopping_metric
is expected to increase. Typically,early_stopping_metric_minimize
is True for loss metrics like mean squared error, and False for performance metrics like accuracy.name
: SeeBaseEstimator.evaluate
.
Raises:
ValueError
: If both x and input_fn are provided.
tf.contrib.learn.monitors.ValidationMonitor.begin(max_steps=None)
Called at the beginning of training.
When called, the default graph is the one we are executing.
Args:
max_steps
:int
, the maximum global step this training will run until.
Raises:
ValueError
: if we've already begun a run.
tf.contrib.learn.monitors.ValidationMonitor.best_step
Returns the step at which the best early stopping metric was found.
tf.contrib.learn.monitors.ValidationMonitor.best_value
Returns the best early stopping metric value found so far.
tf.contrib.learn.monitors.ValidationMonitor.early_stopped
Returns True if this monitor caused an early stop.
tf.contrib.learn.monitors.ValidationMonitor.end(session=None)
tf.contrib.learn.monitors.ValidationMonitor.epoch_begin(epoch)
Begin epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've already begun an epoch, orepoch
< 0.
tf.contrib.learn.monitors.ValidationMonitor.epoch_end(epoch)
End epoch.
Args:
epoch
:int
, the epoch number.
Raises:
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.ValidationMonitor.every_n_post_step(step, session)
Callback after a step is finished or end()
is called.
Args:
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.ValidationMonitor.every_n_step_begin(step)
Callback before every n'th step begins.
Args:
step
:int
, the current value of the global step.
Returns:
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.ValidationMonitor.every_n_step_end(step, outputs)
tf.contrib.learn.monitors.ValidationMonitor.post_step(step, session)
tf.contrib.learn.monitors.ValidationMonitor.run_on_all_workers
tf.contrib.learn.monitors.ValidationMonitor.set_estimator(estimator)
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
Args:
estimator
: the estimator that this monitor monitors.
Raises:
ValueError
: if the estimator is None.
tf.contrib.learn.monitors.ValidationMonitor.step_begin(step)
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.
Returns:
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
Raises:
ValueError
: if called more than once during a step.
tf.contrib.learn.monitors.ValidationMonitor.step_end(step, output)
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
Args:
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
Returns:
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Other Functions and Classes
class tf.contrib.learn.monitors.RunHookAdapterForMonitors
Wraps monitors into a SessionRunHook.
tf.contrib.learn.monitors.RunHookAdapterForMonitors.__init__(monitors)
tf.contrib.learn.monitors.RunHookAdapterForMonitors.after_run(run_context, run_values)
tf.contrib.learn.monitors.RunHookAdapterForMonitors.before_run(run_context)
tf.contrib.learn.monitors.RunHookAdapterForMonitors.begin()
tf.contrib.learn.monitors.RunHookAdapterForMonitors.end(session)
class tf.contrib.learn.monitors.SummaryWriterCache
Cache for summary writers.
This class caches summary writers, one per directory.
tf.contrib.learn.monitors.SummaryWriterCache.clear()
Clear cached summary writers. Currently only used for unit tests.
tf.contrib.learn.monitors.SummaryWriterCache.get(logdir)
Returns the SummaryWriter for the specified directory.
Args:
logdir
: str, name of the directory.
Returns:
A SummaryWriter
.