[튜토리얼5] 분산 학습을 활용한 모델 저장과 복원

학습 중 모델을 저장하고 불러오는 것은 일반적입니다. 케라스(Keras)모델 저장 및 로드를 위해서는 두 가지 API 세트 (고수준 API 및 저수준 API)가 있습니다. 이번 튜토리얼에서는 tf.distribute.Strategy를 쓸 때 SavedModel API를 사용하는 방법을 설명합니다.

먼저 간단한 예제로 시작해보겠습니다.

from __future__ import absolute_import, division, print_function, unicode_literals

import warnings
warnings.simplefilter('ignore')

import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()

mirrored_strategy = tf.distribute.MirroredStrategy()

def get_data():
    datasets, ds_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
    mnist_train, mnist_test = datasets['train'], datasets['test']

    BUFFER_SIZE = 10000

    BATCH_SIZE_PER_REPLICA = 64
    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync

    def scale(image, label):
        image = tf.cast(image, tf.float32)
        image /= 255

        return image, label

    train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
    eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)

    return train_dataset, eval_dataset

def get_model():
    with mirrored_strategy.scope():
        model = tf.keras.Sequential([
            tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
            tf.keras.layers.MaxPooling2D(),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(10, activation='softmax')
        ])

        model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=['accuracy'])
    return model

모델을 학습시킵니다.

model = get_model()
train_dataset, eval_dataset = get_data()
model.fit(train_dataset, epochs=2)

2. 모델 저장하고 불러오기

이제 간단한 모델과 함께 작업할 수 있게 되었습니다. 다음은 API를 저장하고 불러오는 방법에 대해 살펴보겠습니다. 사용할 수 있는 API는 두 가지가 있습니다.

고수준(high level) 케라스의 model.save와 tf.keras.models.load_model
저수준(low level)의 tf.saved_model.save와 tf.saved_model.load

2.1 케라스 API

다음은 Keras API를 사용하여 모델을 저장하고 불러오는 예입니다:

keras_model_path = "/tmp/keras_save"
model.save(keras_model_path)  # save()는 전략 범위 밖에서 호출되어야합니다.

tf.distribute.Strategy를 사용하지 않고 모델을 복원합니다.

restored_keras_model = tf.keras.models.load_model(keras_model_path)
restored_keras_model.fit(train_dataset, epochs=2)

모델을 복원한 후에는 저장하기 전에 이미 컴파일 되었기 때문에 compile()을 호출하지 않아도 계속 학습시킬 수 있습니다. 모델은 텐서플로우의 표준 SavedModel 프로토(proto) 형식으로 저장됩니다.

tf.distribute.strategy의 범위 밖에서만 model.save() 메서드를 호출해야 합니다. 범위 내에서 호출하는 것은 지원되지 않습니다.

이제 tf.distribute.Strategy를 사용해서 모델을 불러오고 학습시켜봅시다:

another_strategy = tf.distribute.OneDeviceStrategy("/cpu:0")
with another_strategy.scope():
    restored_keras_model_ds = tf.keras.models.load_model(keras_model_path)
    restored_keras_model_ds.fit(train_dataset, epochs=2)

보시다시피 tf.distribute.Strategy를 사용하여 로드합니다. 여기서 사용하는 전략은 저장하기 전에 사용한 전략과 같을 필요는 없습니다.

2.2 `tf.saved_model` API

이제 저수준의 API를 살펴보겠습니다. 모델을 저장하는 것은 케라스 API와 유사합니다.

model = get_model()  # get a fresh model
saved_model_path = "/tmp/tf_save"
tf.saved_model.save(model, saved_model_path)

tf.saved_model.load()를 사용하여 데이터를 가져올 수 있습니다. 그러나 저수준의 API사용 사례가 더 넓기 때문에 케라스 모델을 반환하지 않습니다. 대신 추론(inference)하는 데 사용할 수 있는 함수가 포함된 객체를 반환합니다. 다음의 예를 보겠습니다.

DEFAULT_FUNCTION_KEY = "serving_default"
loaded = tf.saved_model.load(saved_model_path)
inference_func = loaded.signatures[DEFAULT_FUNCTION_KEY]

불러온 객체에는 각각 키와 연결된 여러 함수가 포함되어있을 수 있습니다. "serving_default"는 저장된 케라스 모델의 추론(inference) 함수의 기본 키입니다. 추론(inference)하려면 아래의 함수를 사용합니다.

predict_dataset = eval_dataset.map(lambda image, label: image)
for batch in predict_dataset.take(1):
    print(inference_func(batch))

분산하여 불러오고 추론 할 수 있습니다.

another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
    loaded = tf.saved_model.load(saved_model_path)
    inference_func = loaded.signatures[DEFAULT_FUNCTION_KEY]

    dist_predict_dataset = another_strategy.experimental_distribute_dataset(
        predict_dataset)

    # 분산된 방식으로 함수를 호출합니다.
    for batch in dist_predict_dataset:
        another_strategy.experimental_run_v2(inference_func, 
                                         args=(batch,))

복원된 함수를 호출한다는 것은 그저 저장된 모델에 전달해준다는 것을 의미합니다. 불러온 함수를 계속 학습하려면 어떻게 해야 할까요? 불러온 함수를 더 큰 모델에 넣어야 할까요?

일반적으로는 이를 수행하기 위해 불러온 객체를 케라스 레이어로 감싸서 사용합니다. 다행히도 TF Hub에는 이를 위한 hub.KerasLayer가 있습니다. 아래의 예시를 보겠습니다.

import tensorflow_hub as hub

def build_model(loaded):
    x = tf.keras.layers.Input(shape=(28, 28, 1), name='input_x')
    # 불러온 것을 케라스 레이어로 감쌉니다.
    keras_layer = hub.KerasLayer(loaded, trainable=True)(x)
    model = tf.keras.Model(x, keras_layer)
    return model

another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
    loaded = tf.saved_model.load(saved_model_path)
    model = build_model(loaded)

    model.compile(loss='sparse_categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])
    model.fit(train_dataset, epochs=2)

보다시피 hub.KerasLayer는 tf.saved_model.load()에서 불러온 결과를 다른 모델을 만드는 데 사용할 수 있는 케라스 레이어로 변환할 수 있도록 해줍니다. 이는 전이 학습에 매우 유용합니다.

2.3 어떤 API를 사용해야할까?

케라스 모델로 저장하는 경우, 거의 모든 경우에서 Keras의 model.save() API를 사용하는 것이 좋습니다. 하지만 Keras 모델을 저장하는 것이 아니라면 저수준의 API만 사용할 수 있습니다.

모델을 불러오려는 경우, 모델을 불러오는 API가 어떤 것을 얻고자 하는지에 따라 어떤 API를 사용할 지 정해집니다.

케라스 모델을 가져올 수 없거나 원하지 않을 경우 tf.saved_model.load()를 사용합니다.
그렇지 않으면 tf.keras.models.load_model()을 사용합니다. 케라스 모델을 저장한 경우에만 케라스 모델을 다시 가져올 수 있습니다.

API를 혼합하여 사용하는 것도 가능합니다. model.save로 케라스 모델을 저장하고, 저수준의 API인 tf.saved_model.load로 케라스가 아닌 모델을 불러올 수 있습니다.

model = get_model()

# 케라스의 save() API를 이용해서 모델을 저장합니다.
model.save(keras_model_path) 

another_strategy = tf.distribute.MirroredStrategy()
# 저수준 API를 사용해서 모델을 불러옵니다.
with another_strategy.scope():
    loaded = tf.saved_model.load(keras_model_path)

2.4 경고사항

입력이 잘 정의되지 않은 케라스 모델이 있는 경우는 특별합니다. 예를 들어, 시퀀셜(sequetial) 모델은 입력 형태(shape) 없이 생성할 수 있습니다(“Sequential([Dense(3, …])”). 하위 클래스인 모델도 초기화한 후에는 입력이 제대로 정의되지 않습니다. 이러한 경우 저장하고 불러오는 것 모두 하위 레벨 API를 사용해야 합니다. 그렇지 않으면 오류가 발생합니다.

모델에 입력이 잘 정의되어 있는지 확인하려면 model.inputs가 None인지 확인하기만 하면 됩니다. None이 아니면 괜찮습니다. 입력 형태는 모델을 .fit, .evaluate, .predict을 통해 사용하거나 모델을 호출할 때(model(inputs)) 자동으로 정의됩니다.

아래 예시를 보겠습니다.

class SubclassedModel(tf.keras.Model):

    output_name = 'output_layer'

    def __init__(self):
        super(SubclassedModel, self).__init__()
        self._dense_layer = tf.keras.layers.Dense(
            5, dtype=tf.dtypes.float32, name=self.output_name)

    def call(self, inputs):
        return self._dense_layer(inputs)

my_model = SubclassedModel()
# my_model.save(keras_model_path)  # ERROR! 
tf.saved_model.save(my_model, saved_model_path)

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[튜토리얼5] 분산 학습을 활용한 모델 저장과 복원

목차

1. 데이터와 모델 준비

2. 모델 저장하고 불러오기

2.1 케라스 API

2.2 `tf.saved_model` API

2.3 어떤 API를 사용해야할까?

2.4 경고사항

Copyright 2019 The TensorFlow Authors.

[튜토리얼5] 분산 학습을 활용한 모델 저장과 복원

목차

1. 데이터와 모델 준비

2. 모델 저장하고 불러오기

2.1 케라스 API

2.2 tf.saved_model API

2.3 어떤 API를 사용해야할까?

2.4 경고사항

Copyright 2019 The TensorFlow Authors.

2.2 `tf.saved_model` API