[딥러닝 기본 21] Lab 07_2 - meet mnist dataset

티스토리 뷰

ML (Machine Learning)

[딥러닝 기본 21] Lab 07_2 - meet mnist dataset

Nero :) 2017. 6. 8. 00:21

모두를 위한 머신러닝 / 딥러닝 김성훈 교수님 강의를 듣고 정리한 내용입니다.

Lab 07_2 - meet mnist dataset

@ 사전 학습

1. numpy : 고성능 수치계산을 위한 파이썬 라이브러리

- numpy 튜토리얼

- ndarray : NumPy에서 가장 강력한 무기로 칭송받고 있는 N차원의 배열 객체

2. matplotlib : 2d 그래픽 데이터를 시각화하는 파이썬 라이브러리

- matplotlib 튜토리얼

- 기초

1. 라이브러리 import

import matplotlib.pyplot as plt

1. 이미지를 numpy array로 변환

# image to array
img = plt.imread('example.png')
print(img)
# <class 'numpy.ndarray'>

2. numpy array를 이미지로 변환 및 화면에 출력 (plt.show()에는 numpy array가 들어가야한다.)

# array to image
image = [[1, 2, 3, 4, 5],
         [5, 4, 3, 2, 1],]
plt.imshow(image, cmap='gray') #gray scale image로 출력
plt.show()

- plt.show() 결과

@ Mnist dataset

- Mnist data Download

- 0 ~ 9 까지 사람 쓴 손 글씨 이미지 (handwritten digits)

- 28 x 28 (784) pixels

- 파일 구조

[학습 데이터] - 60,000개 (test : validation = 55,000 : 5,000)

train-images-idx3-ubyte.gz

train-labels-idx1-ubyte.gz

[테스트 데이터] - 10,000 개

t10k-images-idx3-ubyte.gz

t10k-labels-idx1-ubyte.gz

- 라이브러리 import

from tensorflow.examples.tutorials.mnist import input_data

- 데이터 호출 예제 : 실제 학습시에는 one_hot을 True로 설정

좌측 이미지 : 첫 번째 training data = 7

우측 이미지 : 첫 번째 test data = 7 (3처럼 보이지만 라벨링이 7로 되어있다.)

# 데이터가 없을 경우 자동으로 다운받는데, 시간이 약간 소요된다.
# one_hot 옵션 - True : [1] // False : [0, 1, 0, 0, 0, 0, 0, 0, 0]
mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=False)

# Training data 호출 : x는 이미지 numpy array, y는 라벨
x, y = mnist.train.next_batch(1, shuffle=False)
print("Training: ", y)
# Training: [7]

# Test data 호출 : x는 이미지 numpy array, y는 라벨
x, y = mnist.test.next_batch(1, shuffle=False)
print("Test: ", y)
# Test: [7]

@ TensorFlow

- mnist dataset 학습 예제 : 데이터가 많아서 나누어 학습

batch : 대용량 데이터를 한번에 메모리에 올릴 수 없으므로 나눠서 작업을 하는데, 한 번 작업할 때 가져오는 데이터의 개수

epoch : 전체 데이터를 학습하는 횟수

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import matplotlib.pyplot as plt
import random

# 데이터가 없을 경우 자동으로 다운받음(시간이 어느정도 걸림)
# one_hot = True로 하면 lable 데이터가 one_hot 방식으로 나옴
mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=True)

# 분류 숫자 0 ~ 9
nb_classes = 10

# MNIST data image of shape 28 * 28 = 784
X = tf.placeholder(tf.float32, [None, 784])
# 0 - 9 digits recognition = 10 classes
Y = tf.placeholder(tf.float32, [None, nb_classes])

W = tf.Variable(tf.random_normal([784, nb_classes]))
b = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis (using softmax)
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

# cross_entropy
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# Test model
is_correct = tf.equal(tf.arg_max(hypothesis, 1), tf.arg_max(Y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# parameters
# epoch : 전체 데이터를 학습하는 횟수
# batch : 한번에 메모리에 올리는 데이터 수
training_epochs = 15
batch_size = 100

with tf.Session() as sess:
    # Initialize TensorFlow variables
    sess.run(tf.global_variables_initializer())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0
        # 전체 데이터의 개수를 batch_size로 나누면 1epoch에 필요한 횟수를 구할 수 있다.
        total_batch = int(mnist.train.num_examples / batch_size)

        for i in range(total_batch):
            # Training data를 이용하여 학습
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            c, _ = sess.run([cost, optimizer], feed_dict={X: batch_xs, Y: batch_ys})
            avg_cost += c / total_batch

        print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))
        # Epoch: 0015 cost = 0.450051648

    # Test data를 활용하여 정확도 측정
   # sess.run()과 tensor.eval은 같은 기능이다.
    print("Accuracy: ", accuracy.eval(session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels}))
    # Accuracy: 0.8886

    # Test data 중 하나를 임의로 뽑아서 테스트
    r = random.randint(0, mnist.test.num_examples - 1)
    print("Label:", sess.run(tf.argmax(mnist.test.labels[r:r+1], 1)))
    print("Prediction:", sess.run(tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r+1]}))
    # Label: [3]
    # Prediction: [3]

    # 화면에 출력
   plt.imshow(mnist.test.images[r:r+1].reshape(28, 28), cmap='Greys', interpolation='nearest')
    plt.show()

- Test data에서 임의로 뽑은 이미지

- Neural Network를 이용하지 않은 기본적인 모델로 정확도 88% 학습

저작자표시

'ML (Machine Learning)' 카테고리의 다른 글

[딥러닝 기본 23] Lec 08_1 - 딥러닝의 기본 개념: 시작과 XOR 문제 (0)	2017.07.05
[딥러닝 기본 22] Lab 07_2 - meet mnist dataset (정리) (0)	2017.07.05
[딥러닝 기본 20] Lab 07_1 - training / test data set, learning rate, normalization (0)	2017.06.06
[딥러닝 기본 19] Lec 07_2 - training / test 데이터 셋 (0)	2017.06.06
[딥러닝 기본 18] Lec 07_1 - learning rate, overfitting 그리고 일반화(regularization) (0)	2017.06.06

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

IT NOTE

티스토리 뷰

[딥러닝 기본 21] Lab 07_2 - meet mnist dataset

'ML (Machine Learning)' 카테고리의 다른 글

티스토리툴바