개발 공부

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition 본문

AI/cs231n

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

아이셩짱셩 2018. 9. 30. 20:14

※introduction


#computer vision - study of visual data

-censor(smartphones) -> visual data exploded



#statistics (2015 study of cisco)

2017 -> 80% traffic of internet will be video

-pure bits perspective 

-visual data


#problem 

dark matter(astonishingly large fraction of the universe) of the internet 

- difficult for the algorithm to go in and understand and see what is comprising all the visual data.



#statistics (youtube)

1seconds - 5hours of video

-to cataloge and 

-serve relevant video 

-monitize putting adds on those videos

understand the content of visual data is important



#relationship with other studies

-Physics

-Biology

-Psychology

-Computer Science

-Mathematics

-Engineering


-------------------------------------------------------------------------------------------------------------


※history of computer vision


#biological vision

evolution's big bang - vision evolution

survive, work, move around, manipulate things, communicate, entertain



#mechanical vision (camera obscura)

(pinhole camera theory) hole that collect light - plate that image projected - similar to eyes early animals developed



#biology study of mechanism of the vision

Hubel & Wiesel, 1959 - electro-physiology

-how visual processing mechanism is like in mammals 

-cat brain - primary visual cortex

-simple structures - oriented edges 



#history of computer vision

*Block world - Larry Roberts, 1963


*The summer vision project - MIT, 1966 - visual system


*VISION - David Marr (MIT), 1970s

  - primal sketch -> 2 and half-D sketch -> 3-D representation


*Generalized Cylinder, Stanford 1979 / Pictorial Structure, SRI 1973

   -every object is composed of simple geometric primitives

   -reduce the complex structure of the object into a collection of simpler shapes and their geometric configuration.


*David Lowe, 1987

  -visual world -> (razor) lines and edges and mostly straight lines and their combination.


***visual vision to simple structure

***object recognition


--------------------------------------------


#problem to solving vision

***if object recognition is too hard, maybe we should first do object segmentation,

-task of taking an image and group the pixels into meaningful areas

-image segmentation - extracting pixels that belong to certain object from its background

-graph theory algorithm


***face detection

-1999~2000 statistical machine learning techniques.

-support vector machines, boosting graphical models, the first wave of neural networks

-Using AdaBoost algorithm to do real-time face detection, Paul Viola & Michael Jones, 2001

-Fuji camera with face detection, 2006



#feature based object recognition  

*SIFT feature, David Lowe, 1999

-there are some features that tend to remain diagnostic and invariant to changes 

-task began with identifying these critical features on the object

-and then match the features to a similar object


*Spatial Pyramid Matching

-there are features in the images that can give us clues about which type of scene it is, whether it's a landscape or a kitchen or a highway

1takes features from different resolutions

2put them together in a "feature descriptor"

3support vector machine algorithm


*histogram of gradients


*deformable part models



***changing - having better data to study computer vision (internet & camera)


--------------------------------------------


#overfitting data set

*PASCAL Visul Object Challenge (20 object categories), Everingham, 2006-2012

-benchmark data set - to measure progress of object recognition


*Image-net

-to recognize object

-to come back the machine learning overcome the machine learning bottleneck of overfitting

-Large Scale Visual Recognition Challenge



#CNN (convolutional neural network, 합성곱 신경망) - deep learning****

-convnets



#sister studies

*natural language processing

*speech recognition


-------------------------------------------------------------------------------------------------------------


※course overview

#tasks

*image classification

1algorithm looks at an image

2picks from among some fixed set of categories to classify that image


*object detection

-where objects are in the image


*image captioning

-system needs to produce a natural language sentence describing the image.


#CNN

*ALEXNET(Supervision), Alex Krizhevsky & Ilya Sutskever, 2012

-7 layer convolutional neural network


*GoogLeNet, Google, 2014

*VGG, Oxford, 2014

-19 layers


*Microsoft Research Asia(Residual Networks, ResNet), 2015

-152 layers

 

#before CNN(2012)

*Jan leCun, 1998

-convolutional neural network - recognizing digits

-take in the pixels of an image and then classify either what digit, letter it was

1take raw pixels

2many layers of convolution and sub-samplings (fully connected layers)



#what's different from the 90s to 2012

1 increasing computation 

-number of transitsors

-GPUs


2 data

-number of pixels used in training



#further tasks

*activity recognition

*augmented reality, visual reality

*describing pictures

*deep understanding images (including social, political, cultural aspects)



#Computer Vision Technology and better lives

*medical diagnosis

*self-driving cars

*robotics

*understanding human intelligence




derivatives / matrix multiplication














'AI > cs231n' 카테고리의 다른 글

Lecture 3 | Loss Functions and Optimization  (0) 2018.11.06
Lecture 2 | Image Classification  (0) 2018.11.04
Comments