개발 공부
Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition 본문
Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition
아이셩짱셩 2018. 9. 30. 20:14※introduction
#computer vision - study of visual data
-censor(smartphones) -> visual data exploded
#statistics (2015 study of cisco)
2017 -> 80% traffic of internet will be video
-pure bits perspective
-visual data
#problem
dark matter(astonishingly large fraction of the universe) of the internet
- difficult for the algorithm to go in and understand and see what is comprising all the visual data.
#statistics (youtube)
1seconds - 5hours of video
-to cataloge and
-serve relevant video
-monitize putting adds on those videos
understand the content of visual data is important
#relationship with other studies
-Physics
-Biology
-Psychology
-Computer Science
-Mathematics
-Engineering
-------------------------------------------------------------------------------------------------------------
※history of computer vision
#biological vision
evolution's big bang - vision evolution
survive, work, move around, manipulate things, communicate, entertain
#mechanical vision (camera obscura)
(pinhole camera theory) hole that collect light - plate that image projected - similar to eyes early animals developed
#biology study of mechanism of the vision
Hubel & Wiesel, 1959 - electro-physiology
-how visual processing mechanism is like in mammals
-cat brain - primary visual cortex
-simple structures - oriented edges
#history of computer vision
*Block world - Larry Roberts, 1963
*The summer vision project - MIT, 1966 - visual system
*VISION - David Marr (MIT), 1970s
- primal sketch -> 2 and half-D sketch -> 3-D representation
*Generalized Cylinder, Stanford 1979 / Pictorial Structure, SRI 1973
-every object is composed of simple geometric primitives
-reduce the complex structure of the object into a collection of simpler shapes and their geometric configuration.
*David Lowe, 1987
-visual world -> (razor) lines and edges and mostly straight lines and their combination.
***visual vision to simple structure
***object recognition
--------------------------------------------
#problem to solving vision
***if object recognition is too hard, maybe we should first do object segmentation,
-task of taking an image and group the pixels into meaningful areas
-image segmentation - extracting pixels that belong to certain object from its background
-graph theory algorithm
***face detection
-1999~2000 statistical machine learning techniques.
-support vector machines, boosting graphical models, the first wave of neural networks
-Using AdaBoost algorithm to do real-time face detection, Paul Viola & Michael Jones, 2001
-Fuji camera with face detection, 2006
#feature based object recognition
*SIFT feature, David Lowe, 1999
-there are some features that tend to remain diagnostic and invariant to changes
-task began with identifying these critical features on the object
-and then match the features to a similar object
*Spatial Pyramid Matching
-there are features in the images that can give us clues about which type of scene it is, whether it's a landscape or a kitchen or a highway
1takes features from different resolutions
2put them together in a "feature descriptor"
3support vector machine algorithm
*histogram of gradients
*deformable part models
***changing - having better data to study computer vision (internet & camera)
--------------------------------------------
#overfitting data set
*PASCAL Visul Object Challenge (20 object categories), Everingham, 2006-2012
-benchmark data set - to measure progress of object recognition
*Image-net
-to recognize object
-to come back the machine learning overcome the machine learning bottleneck of overfitting
-Large Scale Visual Recognition Challenge
#CNN (convolutional neural network, 합성곱 신경망) - deep learning****
-convnets
#sister studies
*natural language processing
*speech recognition
-------------------------------------------------------------------------------------------------------------
*image classification
1algorithm looks at an image
2picks from among some fixed set of categories to classify that image
*object detection
-where objects are in the image
*image captioning
-system needs to produce a natural language sentence describing the image.
#CNN
*ALEXNET(Supervision), Alex Krizhevsky & Ilya Sutskever, 2012
-7 layer convolutional neural network
*GoogLeNet, Google, 2014
*VGG, Oxford, 2014
-19 layers
*Microsoft Research Asia(Residual Networks, ResNet), 2015
-152 layers
#before CNN(2012)
*Jan leCun, 1998
-convolutional neural network - recognizing digits
-take in the pixels of an image and then classify either what digit, letter it was
1take raw pixels
2many layers of convolution and sub-samplings (fully connected layers)
#what's different from the 90s to 2012
1 increasing computation
-number of transitsors
-GPUs
2 data
-number of pixels used in training
#further tasks
*activity recognition
*augmented reality, visual reality
*describing pictures
*deep understanding images (including social, political, cultural aspects)
#Computer Vision Technology and better lives
*medical diagnosis
*self-driving cars
*robotics
*understanding human intelligence
derivatives / matrix multiplication
'AI > cs231n' 카테고리의 다른 글
Lecture 3 | Loss Functions and Optimization (0) | 2018.11.06 |
---|---|
Lecture 2 | Image Classification (0) | 2018.11.04 |