개발 공부

Lecture 3 | Loss Functions and Optimization 본문

AI/cs231n

Lecture 3 | Loss Functions and Optimization

아이셩짱셩 2018. 11. 6. 23:20

#overview

-Loss function : takes in a w, looks at the scores and tells us how bad quantitatively is that w.

-optimization : coming up with best w through loss function



#Loss function 

support vector machine(SVM) 

binary SVM 

multi-class SVM Loss


-

- is image(pixel)

- is (integer) label : expecting category



-

-Ss are the predicted scores for the classes that are coming out of the classifier.

- : score of true class

-1: 임의의 큰 수


-Hinge loss




Q1- What happens to loss if car scores change a bit?

-> nothing


Q2- what is the min/max possible loss?

-> 0 / infinite


Q3- At initialization W is small so all s (approximate)=0. What is the loss?

-> C(number of classes) - 1


Q4- What if the sum was over all classes? (including j=y_i)

-> increases by 1


Q5- What if we used mean instead of sum?  

-> nothing changes


Q6- What if we used 

-> be a different loss function



ex) 

def L_i_vectorized (x, y, W):

scores = W.dot(x)

margins = np.maximum(0, scores - scores[y] + 1)

margins[y] = 0

loss_i = np.sum(margins)

return loss_i 



E.g. Suppose that we found a W such that L=0. is this W unique?

-> No! wW is also has L=0!

-> Then what should we choose to be W


-Data loss: Model predictions should match training data

-Regularization: Model should be "simple", so it works on test data

-Occam's Razor: "Among competing hypotheses, the simplest is the best"

-: hyper-parameters




#Regularization : penalizing the complexity of the model, rather than explicitly trying to fit the training data

-L2 regularization : penalizing the euclidean norm of the weight vector. (measure the complexity of the model?)

-L1 regulariztion : penalizing the L1 norm of the weight vector. encourage sparsity in the matrix W

Elastic net (L1+L2):

Max norm regularization

Dropout

Fancier: Batch normalization stochastic depth




#Softmax Classifier(Multinomial Logistic Regression)

-endow the scores with some additional meaning

-use the socres to compute a probability distribution over our classes

-exponentiation -> normalization

- where  

-


-the probability of the true class is high and is close to one


Q1- what is the min/max possible loss?

-> 0(but zero can't be shown, just theoretical minimum loss) / infinite


Q2- At initialization W is small so all s (approximate)=0. What is the loss?

-> -log 1/C = log C



- 1. 점수를 계산하는 f를 찾고 2. 정확도를 체크하는 L_i를 찾고 3. L_i를 간단한 형태로 유지하도록하는 R. L.을 찾는 작업



#Optimization

-Strategy #1 : Random search


-Strategy #2 : Follow the slope

                 : Analytic gradient : exact, fast, error-prone

                 : (Vanilla) Gradient Descent


-Stochastic Gradient Descent(SGD) - minibatch







Comments