TensorFlowtf.gradients
Created at 20170708 Updated at 20170708 Category TensorFlow
TensorFlow is such a powerful tools that you can easily to compute any value on your needs. Today I am going to introduce a function called tf.gradients()
by which you can compute gradients. Let’s go.
Before using it, let’s have a look at [the docs])(https://www.tensorflow.org/api_docs/python/tf/gradients)
gradients(
ys,
xs,
grad_ys=None,
name=’gradients’,
colocate_gradients_with_ops=False,
gate_gradients=False,
aggregation_method=None
)
And its description
Constructs symbolic partial derivatives of sum of
ys
w.r.t. x inxs
ys
andxs
are each aTensor
or a list of tensors.grad_ys
is a list ofTensor
, holding the gradients received by theys
. The list must be the same length asys
.
gradients()
adds ops to the graph to output the partial derivatives ofys
with respect toxs
. It returns a list ofTensor
of lengthlen(xs)
where each tensor is thesum(dy/dx)
for y inys
.
grad_ys
is a list of tensors of the same length asys
that holds the initial gradients for each y inys
. Whengrad_ys
is None, we fill in a tensor of ‘1’s of the shape of y for each y inys
. A user can provide their own initialgrad_ys
to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).
Usually, we need to calculate gradients and update the gradients and practically one can detect whether gradient vanishing or exploding are happening by summarise gradients via TensorBoard— another visulization tool developed by TensorFlow team.
Here is a simple example to clarify its usage
Suppose you have simple linear function
$$
\widehat Y = W \times X + b
$$
And We want to fit a linear function so that we can prediction unseen data. And normally we should have a function which in quadratic way.
$$
cost = \frac{1}{2} \times (\widehat YY)^2
$$
We can get our fit by minimising the cost to a threshold for example 0.01. Here is the code to calculate gradients


If you have learnt calculus, you can easily calculate the result
$$
\frac {\partial cost}{\partial W} = (W \times X + b Y) \times X
$$
and
$$
\frac {\partial cost}{\partial b} = (W \times X +b  Y)
$$
Since the gradients are the accumlated gradients w.r.t each x
in xs
, to get the result, just adding all the corresponding gradients.
Hope you can understand this post.