Lecture 3 | Loss Functions and Optimization

개발 공부

Lecture 3 | Loss Functions and Optimization 본문

AI/cs231n

Lecture 3 | Loss Functions and Optimization

아이셩짱셩 2018. 11. 6. 23:20

#overview

-Loss function : takes in a w, looks at the scores and tells us how bad quantitatively is that w.

-optimization : coming up with best w through loss function

#Loss function

support vector machine(SVM)

binary SVM

multi-class SVM Loss

- is image(pixel)

- is (integer) label : expecting category

-Ss are the predicted scores for the classes that are coming out of the classifier.

- : score of true class

-1: 임의의 큰 수

-Hinge loss

Q1- What happens to loss if car scores change a bit?

-> nothing

Q2- what is the min/max possible loss?

-> 0 / infinite

Q3- At initialization W is small so all s (approximate)=0. What is the loss?

-> C(number of classes) - 1

Q4- What if the sum was over all classes? (including j=y_i)

-> increases by 1

Q5- What if we used mean instead of sum?

-> nothing changes

Q6- What if we used

-> be a different loss function

ex)

def L_i_vectorized (x, y, W):

scores = W.dot(x)

margins = np.maximum(0, scores - scores[y] + 1)

margins[y] = 0

loss_i = np.sum(margins)

return loss_i

E.g. Suppose that we found a W such that L=0. is this W unique?

-> No! wW is also has L=0!

-> Then what should we choose to be W

-Data loss: Model predictions should match training data

-Regularization: Model should be "simple", so it works on test data

-Occam's Razor: "Among competing hypotheses, the simplest is the best"

-: hyper-parameters

#Regularization : penalizing the complexity of the model, rather than explicitly trying to fit the training data

-L2 regularization : penalizing the euclidean norm of the weight vector. (measure the complexity of the model?)

-L1 regulariztion : penalizing the L1 norm of the weight vector. encourage sparsity in the matrix W

Elastic net (L1+L2):

Max norm regularization

Dropout

Fancier: Batch normalization stochastic depth

#Softmax Classifier(Multinomial Logistic Regression)

-endow the scores with some additional meaning

-use the socres to compute a probability distribution over our classes

-exponentiation -> normalization

- where

-the probability of the true class is high and is close to one

Q1- what is the min/max possible loss?

-> 0(but zero can't be shown, just theoretical minimum loss) / infinite

Q2- At initialization W is small so all s (approximate)=0. What is the loss?

-> -log 1/C = log C

- 1. 점수를 계산하는 f를 찾고 2. 정확도를 체크하는 L_i를 찾고 3. L_i를 간단한 형태로 유지하도록하는 R. L.을 찾는 작업

#Optimization

-Strategy #1 : Random search

-Strategy #2 : Follow the slope

: Analytic gradient : exact, fast, error-prone

: (Vanilla) Gradient Descent

-Stochastic Gradient Descent(SGD) - minibatch

저작자표시

'AI > cs231n' 카테고리의 다른 글

Lecture 2 \| Image Classification (0)	2018.11.04
Lecture 1 \| Introduction to Convolutional Neural Networks for Visual Recognition (0)	2018.09.30

'AI/cs231n' Related Articles

Comments

개발 공부

Lecture 3 | Loss Functions and Optimization 본문

Lecture 3 | Loss Functions and Optimization

'AI > cs231n' 카테고리의 다른 글

티스토리툴바