[튜토리얼3] 자동 미분과 그래디언트 테이프

이번 튜토리얼에서는 머신러닝 모델을 최적화할 수 있는 주요 기술 중 하나인 자동 미분(automatic differentiation)에 대해 알아보겠습니다.

import warnings
warnings.filterwarnings(action='ignore')

import tensorflow as tf

텐서플로우는 자동 미분(주어진 입력 변수에 대한 연산의 그래디언트(gradient)를 계산하는 것)을 위한 tf.GradientTape API를 제공합니다. tf.GradientTape는 컨텍스트(context) 안에서 실행된 모든 연산을 테이프(tape)에 기록합니다. 그 다음 텐서플로우는 후진 방식 자동 미분(reverse mode differentiation)을 사용해 테이프에 기록된 연산의 그래디언트를 계산합니다.

예를 들면:

x = tf.ones((2, 2))

with tf.GradientTape() as t:
    t.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y, y)

print("y : ",y)
print("z : ",z)
    
# 입력 텐서 x에 대한 z의 도함수
dz_dx = t.gradient(z, x)
for i in [0, 1]:
    for j in [0, 1]:
        # assert 뒤의 조건이 True가 아니면 AssertError가 발생합니다.
        assert dz_dx[i][j].numpy() == 8.0
        
print("dz_dx : ",dz_dx)

또한 tf.GradientTape 컨텍스트 안에서 계산된 중간값에 대한 그래디언트도 구할 수 있습니다.

x = tf.ones((2, 2))

with tf.GradientTape() as t:
    t.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y, y)

print("y : ",y)
print("z : ",z)
    
# 테이프를 사용하여 중간값 y에 대한 도함수를 계산합니다. 
dz_dy = t.gradient(z, y)
assert dz_dy.numpy() == 8.0

print("dz_dy : ",dz_dy)

기본적으로 GradientTape.gradient() 메서드가 호출되면 GradientTape에 포함된 리소스가 해제됩니다. 동일한 연산에 대해 여러 그래디언트를 계산하려면, 지속성있는(persistent) 그래디언트 테이프를 생성하면 됩니다. 이 그래디언트 테이프는 gradient() 메서드의 다중 호출을 허용합니다. 테이프 객체가 garbage collection(동적으로 할당했던 메모리 영역 중에서 필요없게 된 영역을 해제)할때 리소스는 해제됩니다. 예를 들면 다음과 같습니다:

x = tf.constant(3.0)

with tf.GradientTape(persistent=True) as t:
    t.watch(x)
    y = x * x
    z = y * y
    
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = t.gradient(y, x)  # 6.0
del t  # 테이프에 대한 참조를 삭제합니다.

print("dz_dx : ",dz_dx)
print("dy_dx : ",dy_dx)

1.1 제어 흐름 기록

연산이 실행되는 순서대로 테이프에 기록되기 때문에, 파이썬 제어 흐름(예를 들어 if while, for문 같은)이 자연스럽게 처리됩니다.

def f(x, y):
    output = 1.0
    for i in range(y):
        if i > 1 and i < 5:
            output = tf.multiply(output, x)
    return output

def grad(x, y):
    with tf.GradientTape() as t:
        t.watch(x)
        out = f(x, y)
    return t.gradient(out, x)

x = tf.convert_to_tensor(2.0)

print(grad(x, 6).numpy())
print(grad(x, 5).numpy())
print(grad(x, 4).numpy())

1.2 고계도 그래디언트

GradientTape 컨텍스트 매니저안에 있는 연산들은 자동미분을 위해 기록됩니다. 만약 이 컨텍스트 안에서 그래디언트를 계산하면 해당 그래디언트 연산 또한 기록됩니다. 그 결과 똑같은 API가 고계도(Higher-order) 그래디언트에서도 잘 작동합니다. 예를 들면:

x = tf.Variable(1.0)  # 1.0으로 초기화된 텐서플로 변수를 생성합니다.

with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x * x * x
    # 't' 컨텍스트 매니저 안의 그래디언트를 계산합니다.
    # 이것은 또한 그래디언트 연산 자체도 미분가능하다는 것을 의미합니다. 
    dy_dx = t2.gradient(y, x)
d2y_dx2 = t.gradient(dy_dx, x)

print(dy_dx.numpy())
print(d2y_dx2.numpy())

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[튜토리얼3] 자동 미분과 그래디언트 테이프

목차

1. 그래디언트 테이프

1.1 제어 흐름 기록

1.2 고계도 그래디언트

Copyright 2018 The TensorFlow Authors.