Sharing Variables
You can create, initialize, save and load single variables
in the way described in the Variables HowTo.
But when building complex models you often need to share large sets of
variables and you might want to initialize all of them in one place.
This tutorial shows how this can be done using tf.variable_scope()
and
the tf.get_variable()
.
The Problem
Imagine you create a simple model for image filters, similar to our
Convolutional Neural Networks Tutorial
model but with only 2 convolutions (for simplicity of this example). If you use
just tf.Variable
, as explained in Variables HowTo,
your model might look like this.
def my_image_filter(input_images):
conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv1_weights")
conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
conv1 = tf.nn.conv2d(input_images, conv1_weights,
strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(conv1 + conv1_biases)
conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv2_weights")
conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
conv2 = tf.nn.conv2d(relu1, conv2_weights,
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv2 + conv2_biases)
As you can easily imagine, models quickly get much more complicated than
this one, and even here we already have 4 different variables: conv1_weights
,
conv1_biases
, conv2_weights
, and conv2_biases
.
The problem arises when you want to reuse this model. Assume you want to
apply your image filter to 2 different images, image1
and image2
.
You want both images processed by the same filter with the same parameters.
You can call my_image_filter()
twice, but this will create two sets
of variables, 4 variables in each one, for a total of 8 variables.
# First call creates one set of 4 variables.
result1 = my_image_filter(image1)
# Another set of 4 variables is created in the second call.
result2 = my_image_filter(image2)
A common way to share variables is to create them in a separate piece of code and pass them to functions that use them. For example by using a dictionary:
variables_dict = {
"conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv1_weights")
"conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
... etc. ...
}
def my_image_filter(input_images, variables_dict):
conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])
conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv2 + variables_dict["conv2_biases"])
# The 2 calls to my_image_filter() now use the same variables
result1 = my_image_filter(image1, variables_dict)
result2 = my_image_filter(image2, variables_dict)
While convenient, creating variables like above, outside of the code, breaks encapsulation:
- The code that builds the graph must document the names, types, and shapes of variables to create.
- When the code changes, the callers may have to create more, or less, or different variables.
One way to address the problem is to use classes to create a model, where the classes take care of managing the variables they need. For a lighter solution, not involving classes, TensorFlow provides a Variable Scope mechanism that allows to easily share named variables while constructing a graph.
Variable Scope Example
Variable Scope mechanism in TensorFlow consists of 2 main functions:
tf.get_variable(<name>, <shape>, <initializer>)
: Creates or returns a variable with a given name.tf.variable_scope(<scope_name>)
: Manages namespaces for names passed totf.get_variable()
.
The function tf.get_variable()
is used to get or create a variable instead
of a direct call to tf.Variable
. It uses an initializer instead of passing
the value directly, as in tf.Variable
. An initializer is a function that
takes the shape and provides a tensor with that shape. Here are some
initializers available in TensorFlow:
tf.constant_initializer(value)
initializes everything to the provided value,tf.random_uniform_initializer(a, b)
initializes uniformly from [a, b],tf.random_normal_initializer(mean, stddev)
initializes from the normal distribution with the given mean and standard deviation.
To see how tf.get_variable()
solves the problem discussed
before, let's refactor the code that created one convolution into
a separate function, named conv_relu
:
def conv_relu(input, kernel_shape, bias_shape):
# Create variable named "weights".
weights = tf.get_variable("weights", kernel_shape,
initializer=tf.random_normal_initializer())
# Create variable named "biases".
biases = tf.get_variable("biases", bias_shape,
initializer=tf.constant_initializer(0.0))
conv = tf.nn.conv2d(input, weights,
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv + biases)
This function uses short names "weights"
and "biases"
.
We'd like to use it for both conv1
and conv2
, but
the variables need to have different names.
This is where tf.variable_scope()
comes into play:
it pushes a namespace for variables.
def my_image_filter(input_images):
with tf.variable_scope("conv1"):
# Variables created here will be named "conv1/weights", "conv1/biases".
relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
with tf.variable_scope("conv2"):
# Variables created here will be named "conv2/weights", "conv2/biases".
return conv_relu(relu1, [5, 5, 32, 32], [32])
Now, let's see what happens when we call my_image_filter()
twice.
result1 = my_image_filter(image1)
result2 = my_image_filter(image2)
# Raises ValueError(... conv1/weights already exists ...)
As you can see, tf.get_variable()
checks that already existing variables
are not shared by accident. If you want to share them, you need to specify
it by setting reuse_variables()
as follows.
with tf.variable_scope("image_filters") as scope:
result1 = my_image_filter(image1)
scope.reuse_variables()
result2 = my_image_filter(image2)
This is a good way to share variables, lightweight and safe.
How Does Variable Scope Work?
Understanding tf.get_variable()
To understand variable scope it is necessary to first
fully understand how tf.get_variable()
works.
Here is how tf.get_variable
is usually called.
v = tf.get_variable(name, shape, dtype, initializer)
This call does one of two things depending on the scope it is called in. Here are the two options.
- Case 1: the scope is set for creating new variables, as evidenced by
tf.get_variable_scope().reuse == False
.
In this case, v
will be a newly created tf.Variable
with the provided
shape and data type. The full name of the created variable will be set to
the current variable scope name + the provided name
and a check will be
performed to ensure that no variable with this full name exists yet.
If a variable with this full name already exists, the function will
raise a ValueError
. If a new variable is created, it will be
initialized to the value initializer(shape)
. For example:
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
assert v.name == "foo/v:0"
- Case 2: the scope is set for reusing variables, as evidenced by
tf.get_variable_scope().reuse == True
.
In this case, the call will search for an already existing variable with
name equal to the current variable scope name + the provided name
.
If no such variable exists, a ValueError
will be raised. If the variable
is found, it will be returned. For example:
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
v1 = tf.get_variable("v", [1])
assert v1 is v
Basics of tf.variable_scope()
Knowing how tf.get_variable()
works makes it easy to understand variable
scope. The primary function of variable scope is to carry a name that will
be used as prefix for variable names and a reuse-flag to distinguish the two
cases described above. Nesting variable scopes appends their names in a way
analogous to how directories work:
with tf.variable_scope("foo"):
with tf.variable_scope("bar"):
v = tf.get_variable("v", [1])
assert v.name == "foo/bar/v:0"
The current variable scope can be retrieved using tf.get_variable_scope()
and the reuse
flag of the current variable scope can be set to True
by
calling tf.get_variable_scope().reuse_variables()
:
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
tf.get_variable_scope().reuse_variables()
v1 = tf.get_variable("v", [1])
assert v1 is v
Note that you cannot set the reuse
flag to False
. The reason behind
this is to allow to compose functions that create models. Imagine you write
a function my_image_filter(inputs)
as before. Someone calling the function
in a variable scope with reuse=True
would expect all inner variables to be
reused as well. Allowing to force reuse=False
inside the function would break
this contract and make it hard to share parameters in this way.
Even though you cannot set reuse
to False
explicitly, you can enter
a reusing variable scope and then exit it, going back to a non-reusing one.
This can be done using a reuse=True
parameter when opening a variable scope.
Note also that, for the same reason as above, the reuse
parameter is
inherited. So when you open a reusing variable scope, all sub-scopes will
be reusing too.
with tf.variable_scope("root"):
# At start, the scope is not reusing.
assert tf.get_variable_scope().reuse == False
with tf.variable_scope("foo"):
# Opened a sub-scope, still not reusing.
assert tf.get_variable_scope().reuse == False
with tf.variable_scope("foo", reuse=True):
# Explicitly opened a reusing scope.
assert tf.get_variable_scope().reuse == True
with tf.variable_scope("bar"):
# Now sub-scope inherits the reuse flag.
assert tf.get_variable_scope().reuse == True
# Exited the reusing scope, back to a non-reusing one.
assert tf.get_variable_scope().reuse == False
Capturing variable scope
In all examples presented above, we shared parameters only because their names agreed, that is, because we opened a reusing variable scope with exactly the same string. In more complex cases, it might be useful to pass a VariableScope object rather than rely on getting the names right. To this end, variable scopes can be captured and used instead of names when opening a new variable scope.
with tf.variable_scope("foo") as foo_scope:
v = tf.get_variable("v", [1])
with tf.variable_scope(foo_scope)
w = tf.get_variable("w", [1])
with tf.variable_scope(foo_scope, reuse=True)
v1 = tf.get_variable("v", [1])
w1 = tf.get_variable("w", [1])
assert v1 is v
assert w1 is w
When opening a variable scope using a previously existing scope we jump out of the current variable scope prefix to an entirely different one. This is fully independent of where we do it.
with tf.variable_scope("foo") as foo_scope:
assert foo_scope.name == "foo"
with tf.variable_scope("bar")
with tf.variable_scope("baz") as other_scope:
assert other_scope.name == "bar/baz"
with tf.variable_scope(foo_scope) as foo_scope2:
assert foo_scope2.name == "foo" # Not changed.
Initializers in variable scope
Using tf.get_variable()
allows to write functions that create or reuse
variables and can be transparently called from outside. But what if we wanted
to change the initializer of the created variables? Do we need to pass an extra
argument to every function that creates variables? What about the most common
case, when we want to set the default initializer for all variables in one
place, on top of all functions? To help with these cases, variable scope
can carry a default initializer. It is inherited by sub-scopes and passed
to each tf.get_variable()
call. But it will be overridden if another
initializer is specified explicitly.
with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
v = tf.get_variable("v", [1])
assert v.eval() == 0.4 # Default initializer as set above.
w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
assert w.eval() == 0.3 # Specific initializer overrides the default.
with tf.variable_scope("bar"):
v = tf.get_variable("v", [1])
assert v.eval() == 0.4 # Inherited default initializer.
with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
v = tf.get_variable("v", [1])
assert v.eval() == 0.2 # Changed default initializer.
Names of ops in tf.variable_scope()
We discussed how tf.variable_scope
governs the names of variables.
But how does it influence the names of other ops in the scope?
It is natural that ops created inside a variable scope should also
share that name. For this reason, when we do with tf.variable_scope("name")
,
this implicitly opens a tf.name_scope("name")
. For example:
with tf.variable_scope("foo"):
x = 1.0 + tf.get_variable("v", [1])
assert x.op.name == "foo/add"
Name scopes can be opened in addition to a variable scope, and then they will only affect the names of the ops, but not of variables.
with tf.variable_scope("foo"):
with tf.name_scope("bar"):
v = tf.get_variable("v", [1])
x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"
When opening a variable scope using a captured object instead of a string, we do not alter the current name scope for ops.
Examples of Use
Here are pointers to a few files that make use of variable scope. In particular, it is heavily used for recurrent neural networks and sequence-to-sequence models.
File | What's in it? |
---|---|
models/image/cifar10.py |
Model for detecting objects in images. |
models/rnn/rnn_cell.py |
Cell functions for recurrent neural networks. |
models/rnn/seq2seq.py |
Functions for building sequence-to-sequence models. |