개발 공부
Lecture 3 | Loss Functions and Optimization 본문
#overview
-Loss function : takes in a w, looks at the scores and tells us how bad quantitatively is that w.
-optimization : coming up with best w through loss function
#Loss function
support vector machine(SVM)
binary SVM
multi-class SVM Loss
-
- is image(pixel)
- is (integer) label : expecting category
-
-Ss are the predicted scores for the classes that are coming out of the classifier.
- : score of true class
-1: 임의의 큰 수
-Hinge loss
-
Q1- What happens to loss if car scores change a bit?
-> nothing
Q2- what is the min/max possible loss?
-> 0 / infinite
Q3- At initialization W is small so all s (approximate)=0. What is the loss?
-> C(number of classes) - 1
Q4- What if the sum was over all classes? (including j=y_i)
-> increases by 1
Q5- What if we used mean instead of sum?
-> nothing changes
Q6- What if we used
-> be a different loss function
ex)
def L_i_vectorized (x, y, W):
scores = W.dot(x)
margins = np.maximum(0, scores - scores[y] + 1)
margins[y] = 0
loss_i = np.sum(margins)
return loss_i
E.g. Suppose that we found a W such that L=0. is this W unique?
-> No! wW is also has L=0!
-> Then what should we choose to be W
-Data loss: Model predictions should match training data
-Regularization: Model should be "simple", so it works on test data
-Occam's Razor: "Among competing hypotheses, the simplest is the best"
-: hyper-parameters
#Regularization : penalizing the complexity of the model, rather than explicitly trying to fit the training data
-L2 regularization : penalizing the euclidean norm of the weight vector. (measure the complexity of the model?)
-L1 regulariztion : penalizing the L1 norm of the weight vector. encourage sparsity in the matrix W
Elastic net (L1+L2):
Max norm regularization
Dropout
Fancier: Batch normalization stochastic depth
#Softmax Classifier(Multinomial Logistic Regression)
-endow the scores with some additional meaning
-use the socres to compute a probability distribution over our classes
-exponentiation -> normalization
- where
-
-the probability of the true class is high and is close to one
Q1- what is the min/max possible loss?
-> 0(but zero can't be shown, just theoretical minimum loss) / infinite
Q2- At initialization W is small so all s (approximate)=0. What is the loss?
-> -log 1/C = log C
- 1. 점수를 계산하는 f를 찾고 2. 정확도를 체크하는 L_i를 찾고 3. L_i를 간단한 형태로 유지하도록하는 R. L.을 찾는 작업
#Optimization
-Strategy #1 : Random search
-Strategy #2 : Follow the slope
: Analytic gradient : exact, fast, error-prone
: (Vanilla) Gradient Descent
-Stochastic Gradient Descent(SGD) - minibatch
'AI > cs231n' 카테고리의 다른 글
Lecture 2 | Image Classification (0) | 2018.11.04 |
---|---|
Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition (0) | 2018.09.30 |