Layers (contrib)
[TOC]
Ops for building neural network layers, regularizers, summaries, etc.
Higher level ops for building neural network layers.
This package provides several ops that take care of creating variables that are used internally in a consistent way and provide the building blocks for many common machine learning algorithms.
tf.contrib.layers.avg_pool2d(*args, **kwargs)
Adds a 2D average pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs
: ATensor
of size [batch_size, height, width, channels].kernel_size
: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.stride
: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.padding
: The padding method, either 'VALID' or 'SAME'.outputs_collections
: The collections to which the outputs are added.scope
: Optional scope for name_scope.
Returns:
A Tensor
representing the results of the pooling operation.
tf.contrib.layers.batch_norm(*args, **kwargs)
Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Can be used as a normalizer function for conv2d and fully_connected.
Note: When is_training is True the moving_mean and moving_variance need to be
updated, by default the update_ops are placed in tf.GraphKeys.UPDATE_OPS
so
they need to be added as a dependency to the train_op
, example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)
One can set update_collections=None to force the updates in place, but that can have speed penalty, specially in distributed settings.
Args:
inputs
: a tensor with 2 or more dimensions, where the first dimension hasbatch_size
. The normalization is over all but the last dimension.decay
: decay for the moving average.center
: If True, subtractbeta
. If False,beta
is ignored.scale
: If True, multiply bygamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon
: small float added to variance to avoid dividing by zero.activation_fn
: activation function, default set to None to skip it and maintain a linear activation.updates_collections
: collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.is_training
: whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments intomoving_mean
andmoving_variance
using an exponential moving average with the givendecay
. When it is not in training mode then it would use the values of themoving_mean
and themoving_variance
.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional collections for the variables.outputs_collections
: collections to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(seetf.Variable
).batch_weights
: An optional tensor of shape[batch_size]
, containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)scope
: Optional scope forvariable_scope
.
Returns:
A Tensor
representing the output of the operation.
Raises:
ValueError
: if rank or last dimension ofinputs
is undefined.
tf.contrib.layers.convolution2d(*args, **kwargs)
Adds a 2D convolution followed by an optional batch_norm layer.
convolution2d
creates a variable called weights
, representing the
convolutional kernel, that is convolved with the inputs
to produce a
Tensor
of activations. If a normalizer_fn
is provided (such as
batch_norm
), it is then applied. Otherwise, if normalizer_fn
is
None and a biases_initializer
is provided then a biases
variable would be
created and added the activations. Finally, if activation_fn
is not None
,
it is applied to the activations as well.
Performs a'trous convolution with input stride equal to rate if rate is greater than one.
Args:
inputs
: a 4-D tensor[batch_size, height, width, channels]
.num_outputs
: integer, the number of output filters.kernel_size
: a list of length 2[kernel_height, kernel_width]
of of the filters. Can be an int if both values are the same.stride
: a list of length 2[stride_height, stride_width]
. Can be an int if both strides are the same. Note that presently both strides must have the same value.padding
: one ofVALID
orSAME
.rate
: integer. If less than or equal to 1, a standard convolution is used. If greater than 1, than the a'trous convolution is applied andstride
must be set to 1.activation_fn
: activation function, set to None to skip it and maintain a linear activation.normalizer_fn
: normalization function to use instead ofbiases
. Ifnormalizer_fn
is provided thenbiases_initializer
andbiases_regularizer
are ignored andbiases
are not created nor added. default set to None for no normalizer functionnormalizer_params
: normalization function parameters.weights_initializer
: An initializer for the weights.weights_regularizer
: Optional regularizer for the weights.biases_initializer
: An initializer for the biases. If None skip biases.biases_regularizer
: Optional regularizer for the biases.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional list of collections for all the variables or a dictionary containing a different list of collection per variable.outputs_collections
: collection to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable).scope
: Optional scope forvariable_scope
.
Returns:
a tensor representing the output of the operation.
Raises:
ValueError
: if both 'rate' andstride
are larger than one.
tf.contrib.layers.convolution2d_in_plane(*args, **kwargs)
Performs the same in-plane convolution to each channel independently.
This is useful for performing various simple channel-independent convolution operations such as image gradients:
image = tf.constant(..., shape=(16, 240, 320, 3)) vert_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[2, 1]) horz_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[1, 2])
Args:
inputs
: a 4-D tensor with dimensions [batch_size, height, width, channels].kernel_size
: a list of length 2 holding the [kernel_height, kernel_width] of of the pooling. Can be an int if both values are the same.stride
: a list of length 2[stride_height, stride_width]
. Can be an int if both strides are the same. Note that presently both strides must have the same value.padding
: the padding type to use, either 'SAME' or 'VALID'.activation_fn
: activation function, set to None to skip it and maintain a linear activation.normalizer_fn
: normalization function to use instead ofbiases
. Ifnormalizer_fn
is provided thenbiases_initializer
andbiases_regularizer
are ignored andbiases
are not created nor added. default set to None for no normalizer functionnormalizer_params
: normalization function parameters.weights_initializer
: An initializer for the weights.weights_regularizer
: Optional regularizer for the weights.biases_initializer
: An initializer for the biases. If None skip biases.biases_regularizer
: Optional regularizer for the biases.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional list of collections for all the variables or a dictionary containing a different list of collection per variable.outputs_collections
: collection to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable).scope
: Optional scope forvariable_scope
.
Returns:
A Tensor
representing the output of the operation.
tf.contrib.layers.convolution2d_transpose(*args, **kwargs)
Adds a convolution2d_transpose with an optional batch normalization layer.
The function creates a variable called weights
, representing the
kernel, that is convolved with the input. If batch_norm_params
is None
, a
second variable called 'biases' is added to the result of the operation.
Args:
inputs
: a tensor of size [batch_size, height, width, channels].num_outputs
: integer, the number of output filters.kernel_size
: a list of length 2 holding the [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.stride
: a list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.padding
: one of 'VALID' or 'SAME'.activation_fn
: activation function, set to None to skip it and maintain a linear activation.normalizer_fn
: normalization function to use instead ofbiases
. Ifnormalizer_fn
is provided thenbiases_initializer
andbiases_regularizer
are ignored andbiases
are not created nor added. default set to None for no normalizer functionnormalizer_params
: normalization function parameters.weights_initializer
: An initializer for the weights.weights_regularizer
: Optional regularizer for the weights.biases_initializer
: An initializer for the biases. If None skip biases.biases_regularizer
: Optional regularizer for the biases.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional list of collections for all the variables or a dictionary containing a different list of collection per variable.outputs_collections
: collection to add the outputs.trainable
: whether or not the variables should be trainable or not.scope
: Optional scope for variable_scope.
Returns:
a tensor representing the output of the operation.
Raises:
ValueError
: if 'kernel_size' is not a list of length 2.
tf.contrib.layers.flatten(*args, **kwargs)
Flattens the input while maintaining the batch_size.
Assumes that the first dimension represents the batch.
Args:
inputs
: a tensor of size [batch_size, ...].outputs_collections
: collection to add the outputs.scope
: Optional scope for name_scope.
Returns:
a flattened tensor with shape [batch_size, k].
Raises:
ValueError
: if inputs.shape is wrong.
tf.contrib.layers.fully_connected(*args, **kwargs)
Adds a fully connected layer.
fully_connected
creates a variable called weights
, representing a fully
connected weight matrix, which is multiplied by the inputs
to produce a
Tensor
of hidden units. If a normalizer_fn
is provided (such as
batch_norm
), it is then applied. Otherwise, if normalizer_fn
is
None and a biases_initializer
is provided then a biases
variable would be
created and added the hidden units. Finally, if activation_fn
is not None
,
it is applied to the hidden units as well.
Note: that if inputs
have a rank greater than 2, then inputs
is flattened
prior to the initial matrix multiply by weights
.
Args:
inputs
: A tensor of with at least rank 2 and value for the last dimension, i.e.[batch_size, depth]
,[None, None, None, channels]
.num_outputs
: Integer or long, the number of output units in the layer.activation_fn
: activation function, set to None to skip it and maintain a linear activation.normalizer_fn
: normalization function to use instead ofbiases
. Ifnormalizer_fn
is provided thenbiases_initializer
andbiases_regularizer
are ignored andbiases
are not created nor added. default set to None for no normalizer functionnormalizer_params
: normalization function parameters.weights_initializer
: An initializer for the weights.weights_regularizer
: Optional regularizer for the weights.biases_initializer
: An initializer for the biases. If None skip biases.biases_regularizer
: Optional regularizer for the biases.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: Optional list of collections for all the variables or a dictionary containing a different list of collections per variable.outputs_collections
: collection to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable).scope
: Optional scope for variable_scope.
Returns:
the tensor variable representing the result of the series of operations.
Raises:
ValueError
: if x has rank less than 2 or if its last dimension is not set.
tf.contrib.layers.layer_norm(*args, **kwargs)
Adds a Layer Normalization layer from https://arxiv.org/abs/1607.06450.
"Layer Normalization"
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Can be used as a normalizer function for conv2d and fully_connected.
Args:
inputs
: a tensor with 2 or more dimensions. The normalizationoccurs over all but the first dimension.
center
: If True, subtractbeta
. If False,beta
is ignored.scale
: If True, multiply bygamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.activation_fn
: activation function, default set to None to skip it and maintain a linear activation.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional collections for the variables.outputs_collections
: collections to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable).scope
: Optional scope forvariable_op_scope
.
Returns:
A Tensor
representing the output of the operation.
Raises:
ValueError
: if rank or last dimension ofinputs
is undefined.
tf.contrib.layers.max_pool2d(*args, **kwargs)
Adds a 2D Max Pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs
: ATensor
of size [batch_size, height, width, channels].kernel_size
: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.stride
: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.padding
: The padding method, either 'VALID' or 'SAME'.outputs_collections
: The collections to which the outputs are added.scope
: Optional scope for name_scope.
Returns:
A Tensor
representing the results of the pooling operation.
Raises:
ValueError
: If 'kernel_size' is not a 2-D list
tf.contrib.layers.one_hot_encoding(*args, **kwargs)
Transform numeric labels into onehot_labels using tf.one_hot
.
Args:
labels
: [batch_size] target labels.num_classes
: total number of classes.on_value
: A scalar defining the on-value.off_value
: A scalar defining the off-value.outputs_collections
: collection to add the outputs.scope
: Optional scope for name_scope.
Returns:
one hot encoding of the labels.
tf.contrib.layers.repeat(inputs, repetitions, layer, *args, **kwargs)
Applies the same layer with the same arguments repeatedly.
y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1')
# It is equivalent to:
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1')
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2')
y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3')
If the scope
argument is not given in kwargs
, it is set to
layer.__name__
, or layer.func.__name__
(for functools.partial
objects). If neither __name__
nor func.__name__
is available, the
layers are called with scope='stack'
.
Args:
inputs
: ATensor
suitable for layer.repetitions
: Int, number of repetitions.layer
: A layer with arguments(inputs, *args, **kwargs)
*args
: Extra args for the layer.**kwargs
: Extra kwargs for the layer.
Returns:
a tensor result of applying the layer, repetitions times.
Raises:
ValueError
: if the op is unknown or wrong.
tf.contrib.layers.safe_embedding_lookup_sparse(embedding_weights, sparse_ids, sparse_weights=None, combiner=None, default_id=None, name=None, partition_strategy='div')
Lookup embedding results, accounting for invalid IDs and empty features.
The partitioned embedding in embedding_weights
must all be the same shape
except for the first dimension. The first dimension is allowed to vary as the
vocabulary size is not necessarily a multiple of P
. embedding_weights
may be a PartitionedVariable
as returned by using tf.get_variable()
with a
partitioner.
Invalid IDs (< 0) are pruned from input IDs and weights, as well as any IDs
with non-positive weight. For an entry with no features, the embedding vector
for default_id
is returned, or the 0-vector if default_id
is not supplied.
The ids and weights may be multi-dimensional. Embeddings are always aggregated along the last dimension.
Args:
embedding_weights
: A list ofP
float tensors or values representing partitioned embedding tensors. Alternatively, aPartitionedVariable
, created by partitioning along dimension 0. The total unpartitioned shape should be[e_0, e_1, ..., e_m]
, wheree_0
represents the vocab size ande_1, ..., e_m
are the embedding dimensions.sparse_ids
:SparseTensor
of shape[d_0, d_1, ..., d_n]
containing the ids.d_0
is typically batch size.sparse_weights
:SparseTensor
of same shape assparse_ids
, containing float weights corresponding tosparse_ids
, orNone
if all weights are be assumed to be 1.0.combiner
: A string specifying how to combine embedding results for each entry. Currently "mean", "sqrtn" and "sum" are supported, with "mean" the default.default_id
: The id to use for an entry with no features.name
: A name for this operation (optional).partition_strategy
: A string specifying the partitioning strategy. Currently"div"
and"mod"
are supported. Default is"div"
.
Returns:
Dense tensor of shape [d_0, d_1, ..., d_{n-1}, e_1, ..., e_m]
.
Raises:
ValueError
: ifembedding_weights
is empty.
tf.contrib.layers.separable_convolution2d(*args, **kwargs)
Adds a depth-separable 2D convolution with optional batch_norm layer.
This op first performs a depthwise convolution that acts separately on
channels, creating a variable called depthwise_weights
. If num_outputs
is not None, it adds a pointwise convolution that mixes channels, creating a
variable called pointwise_weights
. Then, if batch_norm_params
is None,
it adds bias to the result, creating a variable called 'biases', otherwise
it adds a batch normalization layer. It finally applies an activation function
to produce the end result.
Args:
inputs
: a tensor of size [batch_size, height, width, channels].num_outputs
: the number of pointwise convolution output filters. If is None, then we skip the pointwise convolution stage.kernel_size
: a list of length 2: [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.depth_multiplier
: the number of depthwise convolution output channels for each input channel. The total number of depthwise convolution output channels will be equal tonum_filters_in * depth_multiplier
.stride
: a list of length 2: [stride_height, stride_width], specifying the depthwise convolution stride. Can be an int if both strides are the same.padding
: one of 'VALID' or 'SAME'.activation_fn
: activation function, set to None to skip it and maintain a linear activation.normalizer_fn
: normalization function to use instead ofbiases
. Ifnormalizer_fn
is provided thenbiases_initializer
andbiases_regularizer
are ignored andbiases
are not created nor added. default set to None for no normalizer functionnormalizer_params
: normalization function parameters.weights_initializer
: An initializer for the weights.weights_regularizer
: Optional regularizer for the weights.biases_initializer
: An initializer for the biases. If None skip biases.biases_regularizer
: Optional regularizer for the biases.reuse
: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: optional list of collections for all the variables or a dictionay containing a different list of collection per variable.outputs_collections
: collection to add the outputs.trainable
: whether or not the variables should be trainable or not.scope
: Optional scope for variable_scope.
Returns:
A Tensor
representing the output of the operation.
tf.contrib.layers.stack(inputs, layer, stack_args, **kwargs)
Builds a stack of layers by applying layer repeatedly using stack_args.
stack
allows you to repeatedly apply the same operation with different
arguments stack_args[i]
. For each application of the layer, stack
creates
a new scope appended with an increasing number. For example:
y = stack(x, fully_connected, [32, 64, 128], scope='fc')
# It is equivalent to:
x = fully_connected(x, 32, scope='fc/fc_1')
x = fully_connected(x, 64, scope='fc/fc_2')
y = fully_connected(x, 128, scope='fc/fc_3')
If the scope
argument is not given in kwargs
, it is set to
layer.__name__
, or layer.func.__name__
(for functools.partial
objects). If neither __name__
nor func.__name__
is available, the
layers are called with scope='stack'
.
Args:
inputs
: ATensor
suitable for layer.layer
: A layer with arguments(inputs, *args, **kwargs)
stack_args
: A list/tuple of parameters for each call of layer.**kwargs
: Extra kwargs for the layer.
Returns:
a Tensor
result of applying the stacked layers.
Raises:
ValueError
: if the op is unknown or wrong.
tf.contrib.layers.unit_norm(*args, **kwargs)
Normalizes the given input across the specified dimension to unit length.
Note that the rank of input
must be known.
Args:
inputs
: ATensor
of arbitrary size.dim
: The dimension along which the input is normalized.epsilon
: A small value to add to the inputs to avoid dividing by zero.scope
: Optional scope for variable_scope.
Returns:
The normalized Tensor
.
Raises:
ValueError
: If dim is smaller than the number of dimensions in 'inputs'.
Aliases for fully_connected which set a default activation function are
available: relu
, relu6
and linear
.
Regularizers
Regularization can help prevent overfitting. These have the signature
fn(weights)
. The loss is typically added to
tf.GraphKeys.REGULARIZATION_LOSSES
.
tf.contrib.layers.apply_regularization(regularizer, weights_list=None)
Returns the summed penalty by applying regularizer
to the weights_list
.
Adding a regularization penalty over the layer weights and embedding weights can help prevent overfitting the training data. Regularization over layer biases is less common/useful, but assuming proper data preprocessing/mean subtraction, it usually shouldn't hurt much either.
Args:
regularizer
: A function that takes a singleTensor
argument and returns a scalarTensor
output.weights_list
: List of weightsTensors
orVariables
to applyregularizer
over. Defaults to theGraphKeys.WEIGHTS
collection ifNone
.
Returns:
A scalar representing the overall regularization penalty.
Raises:
ValueError
: Ifregularizer
does not return a scalar output, or if we find no weights.
tf.contrib.layers.l1_regularizer(scale, scope=None)
Returns a function that can be used to apply L1 regularization to weights.
L1 regularization encourages sparsity.
Args:
scale
: A scalar multiplierTensor
. 0.0 disables the regularizer.scope
: An optional scope name.
Returns:
A function with signature l1(weights)
that apply L1 regularization.
Raises:
ValueError
: If scale is negative or if scale is not a float.
tf.contrib.layers.l2_regularizer(scale, scope=None)
Returns a function that can be used to apply L2 regularization to weights.
Small values of L2 can help prevent overfitting the training data.
Args:
scale
: A scalar multiplierTensor
. 0.0 disables the regularizer.scope
: An optional scope name.
Returns:
A function with signature l2(weights)
that applies L2 regularization.
Raises:
ValueError
: If scale is negative or if scale is not a float.
tf.contrib.layers.sum_regularizer(regularizer_list, scope=None)
Returns a function that applies the sum of multiple regularizers.
Args:
regularizer_list
: A list of regularizers to apply.scope
: An optional scope name
Returns:
A function with signature sum_reg(weights)
that applies the
sum of all the input regularizers.
Initializers
Initializers are used to initialize variables with sensible values given their size, data type, and purpose.
tf.contrib.layers.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)
Returns an initializer performing "Xavier" initialization for weights.
This function implements the weight initialization from:
Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.
This initializer is designed to keep the scale of the gradients roughly the
same in all layers. In uniform distribution this ends up being the range:
x = sqrt(6. / (in + out)); [-x, x]
and for normal distribution a standard
deviation of sqrt(3. / (in + out))
is used.
Args:
uniform
: Whether to use uniform or normal distributed random initialization.seed
: A Python integer. Used to create random seeds. Seeset_random_seed
for behavior.dtype
: The data type. Only floating point types are supported.
Returns:
An initializer for a weight matrix.
tf.contrib.layers.xavier_initializer_conv2d(uniform=True, seed=None, dtype=tf.float32)
Returns an initializer performing "Xavier" initialization for weights.
This function implements the weight initialization from:
Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.
This initializer is designed to keep the scale of the gradients roughly the
same in all layers. In uniform distribution this ends up being the range:
x = sqrt(6. / (in + out)); [-x, x]
and for normal distribution a standard
deviation of sqrt(3. / (in + out))
is used.
Args:
uniform
: Whether to use uniform or normal distributed random initialization.seed
: A Python integer. Used to create random seeds. Seeset_random_seed
for behavior.dtype
: The data type. Only floating point types are supported.
Returns:
An initializer for a weight matrix.
tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False, seed=None, dtype=tf.float32)
Returns an initializer that generates tensors without scaling variance.
When initializing a deep network, it is in principle advantageous to keep the scale of the input variance constant, so it does not explode or diminish by reaching the final layer. This initializer use the following formula:
if mode='FAN_IN': # Count only number of input connections.
n = fan_in
elif mode='FAN_OUT': # Count only number of output connections.
n = fan_out
elif mode='FAN_AVG': # Average number of inputs and output connections.
n = (fan_in + fan_out)/2.0
truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
- To get Delving Deep into Rectifiers, use (Default):
factor=2.0 mode='FAN_IN' uniform=False
- To get Convolutional Architecture for Fast Feature Embedding, use:
factor=1.0 mode='FAN_IN' uniform=True
- To get Understanding the difficulty of training deep feedforward neural
networks,
use:
factor=1.0 mode='FAN_AVG' uniform=True.
- To get
xavier_initializer
use either:
factor=1.0 mode='FAN_AVG' uniform=True
, or
factor=1.0 mode='FAN_AVG' uniform=False
.
Args:
factor
: Float. A multiplicative factor.mode
: String. 'FAN_IN', 'FAN_OUT', 'FAN_AVG'.uniform
: Whether to use uniform or normal distributed random initialization.seed
: A Python integer. Used to create random seeds. Seeset_random_seed
for behavior.dtype
: The data type. Only floating point types are supported.
Returns:
An initializer that generates tensors with unit variance.
Raises:
ValueError
: ifdtype
is not a floating point type.TypeError
: ifmode
is not in ['FAN_IN', 'FAN_OUT', 'FAN_AVG'].
Optimization
Optimize weights given a loss.
tf.contrib.layers.optimize_loss(loss, global_step, learning_rate, optimizer, gradient_noise_scale=None, gradient_multipliers=None, clip_gradients=None, learning_rate_decay_fn=None, update_ops=None, variables=None, name=None, summaries=None, colocate_gradients_with_ops=False)
Given loss and parameters for optimizer, returns a training op.
Various ways of passing optimizers, include:
- string, name of the optimizer like 'SGD', 'Adam', see OPTIMIZER_CLS_NAMES
for full list. E.g.
optimize_loss(..., optimizer='Adam')
. - function, takes learning rate
Tensor
as argument and must returnOptimizer
instance. E.g.optimize_loss(..., optimizer=lambda lr: tf.train.MomentumOptimizer(lr, momentum=0.5))
. Alternatively, iflearning_rate
isNone
, the function takes no arguments. E.g.optimize_loss(..., learning_rate=None, optimizer=lambda: tf.train.MomentumOptimizer(0.5, momentum=0.5))
. - class, subclass of
Optimizer
that takes only one required argument - learning rate, such as AdamOptimizer, AdagradOptimizer. E.g.optimize_loss(..., optimizer=tf.train.AdagradOptimizer)
. - object, instance of subclass of
Optimizer
. E.g.,optimizer_loss(..., optimizer=tf.train.AdagradOptimizer(0.5))
.
Args:
loss
: Tensor, 0 dimensional.global_step
: Tensor, step counter for each update.learning_rate
: float or Tensor, magnitude of update per each training step.optimizer
: string, class or optimizer instance, used as trainer.string should be name of optimizer, like 'SGD', 'Adam', 'Adagrad'. Full list in OPTIMIZER_CLS_NAMES constant. class should be sub-class of `tf.Optimizer` that implements `compute_gradients` and `apply_gradients` functions. optimizer instance should be instantiation of `tf.Optimizer` sub-class and have `compute_gradients` and `apply_gradients` functions.
gradient_noise_scale
: float or None, adds 0-mean normal noise scaled by thisvalue.
gradient_multipliers
: dict of variables or variable names to floats.If present, gradients for specified variables will be multiplied by given constant.
clip_gradients
: float orNone
, clips gradients by this value.learning_rate_decay_fn
: function, takeslearning_rate
andglobal_step
`Tensor`s, returns `Tensor`. Can be used to implement any learning rate decay functions. For example: `tf.train.exponential_decay`.
update_ops
: list of updateOperation
s to execute at each step. IfNone
,uses elements of UPDATE_OPS collection. The order of execution between `update_ops` and `loss` is non-deterministic.
variables
: list of variables to optimize or`None` to use all trainable variables.
name
: The name for this operation is used to scope operations and summaries.summaries
: List of internal quantities to visualize on tensorboard. If notset only the loss and the learning rate will be reported. The complete list is in OPTIMIZER_SUMMARIES.
colocate_gradients_with_ops
: If True, try colocating gradients with thecorresponding op.
Returns:
Training op.
Raises:
ValueError
: if optimizer is wrong type.
Summaries
Helper functions to summarize specific variables or ops.
tf.contrib.layers.summarize_activation(op)
Summarize an activation.
This applies the given activation and adds useful summaries specific to the activation.
Args:
op
: The tensor to summarize (assumed to be a layer activation).
Returns:
The summary op created to summarize op
.
tf.contrib.layers.summarize_tensor(tensor, tag=None)
Summarize a tensor using a suitable summary type.
This function adds a summary op for tensor
. The type of summary depends on
the shape of tensor
. For scalars, a scalar_summary
is created, for all
other tensors, histogram_summary
is used.
Args:
tensor
: The tensor to summarizetag
: The tag to use, if None then use tensor's op's name.
Returns:
The summary op created or None for string tensors.
tf.contrib.layers.summarize_tensors(tensors, summarizer=summarize_tensor)
Summarize a set of tensors.
tf.contrib.layers.summarize_collection(collection, name_filter=None, summarizer=summarize_tensor)
Summarize a graph collection of tensors, possibly filtered by name.
The layers module defines convenience functions summarize_variables
,
summarize_weights
and summarize_biases
, which set the collection
argument
of summarize_collection
to VARIABLES
, WEIGHTS
and BIASES
, respectively.
tf.contrib.layers.summarize_activations(name_filter=None, summarizer=summarize_activation)
Summarize activations, using summarize_activation
to summarize.