[딥러닝 기본 10] Lab 04_2 - tensorflow로 파일에서 데이터 읽어오기

티스토리 뷰

ML (Machine Learning)

[딥러닝 기본 10] Lab 04_2 - tensorflow로 파일에서 데이터 읽어오기

Nero :) 2017. 5. 31. 15:33

모두를 위한 머신러닝 / 딥러닝 김성훈 교수님 강의를 듣고 정리한 내용입니다.

Lab 04_2 - tensorflow로 파일에서 데이터 읽어오기

@ x1, x2, x3 (출석, 퀴즈, 중간)를 통해 Y(기말)를 예측

- data set

@ data-01-test-score.csv 파일로 저장

73,80,75,152
93,88,93,185
89,91,90,180
96,98,100,196
73,66,70,142
53,46,55,101

@ numpy library 배열 다루기

- Indexing, Slicing, Iterating

import numpy as np

a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])

print(a[:, 1])
# [2 6 10]

print(a[-1])
# [9 10 11 12]

print(a[-1, :])
# [9 10 11 12]

print(a[-1, ...])
# [9 10 11 12]

print(a[0:2, :])
#[[1 2 3 4]
# [5 6 7 8]]

@ TensorFlow

- data-01-score.csv 파일에서 읽어와서 학습

- 테스트 데이터를 hypothesis에 대입하여 결과 예측

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
import numpy as np

# numpy의 loadtext()를 이용하여 파일을 읽어 Matrix 생성
xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]  # 모든 행, 마지막을 제외한 모든 열
y_data = xy[:, [-1]]  # 모든 행, 마지막 열

# shape 검사
print(x_data.shape , "\n", x_data)
print(y_data.shape , "\n", y_data)
# (6, 3)
# [[  73.   80.   75.]
# [  93.   88.   93.]
# [  89.   91.   90.]
# [  96.   98.  100.]
# [  73.   66.   70.]
# [  53.   46.   55.]]

# (6, 1)
# [[ 152.]
# [ 185.]
# [ 180.]
# [ 196.]
# [ 142.]
# [ 101.]]

# placeholder
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3, 1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = tf.matmul(X, W) + b

# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize. Need a very small learning rate for this data set
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the graph in a session.
sess = tf.Session()

# Initializes global variables in the graph.
sess.run(tf.global_variables_initializer())
for step in range(4001):
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], feed_dict={X: x_data, Y: y_data})
    if step % 1000 == 0:
        print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)
        # 4000
        # Cost: 1.68514
        # Prediction:
        # [[152.03288269]
        #  [183.73718262]
        #  [180.53677368]
        #  [196.42729187]
        #  [140.5350647]
        #  [103.42864227]]

# Hypothesis에 값을 대입하여 추측
print("Your score will be ", sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))
print("Other scores will be ", sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

# Your score will be  [[ 167.70051575]]
# Other scores will be  [[ 120.13847351] [ 196.6124115 ]]

- Queue Runners : 파일이 너무 커서 메모리에 한번에 올릴 수 없을 경우, 여러 파일로 분산하여 Queue로 처리하는 방법

1. 학습하려는 있는 파일 리스트 작성

2. 파일을 읽어올 reader 지정

3. 읽어올 데이터 decode 및 데이터 타입 지정

4. batch를 통해 데이터 읽기 시작

5. 학습 시작

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf

# 1. 파일 목록 지정
filename_queue = tf.train.string_input_producer(['data-01-test-score.csv'], shuffle=False, name='filename_queue')

# 2. Reader 정의
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# 3. decode & data type 지정
record_defaults = [[0.], [0.], [0.], [0.]] # float으로 지정
xy = tf.decode_csv(value, record_defaults=record_defaults) # csv

# 4. batch를 통해 가져와 역할에 맞게 배분 (6개씩 가져옴)
train_x_batch, train_y_batch = tf.train.batch([xy[0:-1], xy[-1:]], batch_size=6)

# shape 주의!
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3, 1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = tf.matmul(X, W) + b

# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# 세션 시작
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# 멀티쓰레드가 함께 종료 되도록 도움
coord = tf.train.Coordinator()
# 동일한 큐 안에 tensor가 동작하도록 쓰레드 생성에 도움
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(2001):
    # 데이터를 배치로 가져옴
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    # 5. 학습
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], feed_dict={X: x_batch, Y:y_batch})
    if step % 1000 == 0:
        print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)
        # 2000
        # Cost: 5.89609
        # Prediction:
        # [[155.44619751]
        #  [182.85728455]
        #  [182.59645081]
        #  [195.05700684]
        #  [141.46882629]
        #  [97.68396759]]

# 쓰레드 멈춤
coord.request_stop()
# 쓰레드가 끝나기 전에 프로그램이 종료되는 것을 막기 위해 기다림
coord.join(threads)

저작자표시

'ML (Machine Learning)' 카테고리의 다른 글

[딥러닝 기본 12] Lec 05_2 - logistic classification의 cost 함수 설명 (0)	2017.05.31
[딥러닝 기본 11] Lec 05_1 - logistic classification의 가설 함수 정의 (0)	2017.05.31
[딥러닝 기본 09] Lab 04_1 - multi-variable linear regression을 tensorflow에서 구현하기 (0)	2017.05.31
[딥러닝 기본 08] Lec 04 - multi-variable linear regression (0)	2017.05.30
[딥러닝 기본 07] Lab 03 - linear regression의 cost 최소화의 tensorflow 구현 (0)	2017.05.30

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

IT NOTE

티스토리 뷰

[딥러닝 기본 10] Lab 04_2 - tensorflow로 파일에서 데이터 읽어오기

'ML (Machine Learning)' 카테고리의 다른 글

티스토리툴바