Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

개발 공부

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition 본문

AI/cs231n

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

아이셩짱셩 2018. 9. 30. 20:14

※introduction

#computer vision - study of visual data

-censor(smartphones) -> visual data exploded

#statistics (2015 study of cisco)

2017 -> 80% traffic of internet will be video

-pure bits perspective

-visual data

#problem

dark matter(astonishingly large fraction of the universe) of the internet

- difficult for the algorithm to go in and understand and see what is comprising all the visual data.

#statistics (youtube)

1seconds - 5hours of video

-to cataloge and

-serve relevant video

-monitize putting adds on those videos

understand the content of visual data is important

#relationship with other studies

-Physics

-Biology

-Psychology

-Computer Science

-Mathematics

-Engineering

-------------------------------------------------------------------------------------------------------------

※history of computer vision

#biological vision

evolution's big bang - vision evolution

survive, work, move around, manipulate things, communicate, entertain

#mechanical vision (camera obscura)

(pinhole camera theory) hole that collect light - plate that image projected - similar to eyes early animals developed

#biology study of mechanism of the vision

Hubel & Wiesel, 1959 - electro-physiology

-how visual processing mechanism is like in mammals

-cat brain - primary visual cortex

-simple structures - oriented edges

#history of computer vision

*Block world - Larry Roberts, 1963

*The summer vision project - MIT, 1966 - visual system

*VISION - David Marr (MIT), 1970s

- primal sketch -> 2 and half-D sketch -> 3-D representation

*Generalized Cylinder, Stanford 1979 / Pictorial Structure, SRI 1973

-every object is composed of simple geometric primitives

-reduce the complex structure of the object into a collection of simpler shapes and their geometric configuration.

*David Lowe, 1987

-visual world -> (razor) lines and edges and mostly straight lines and their combination.

***visual vision to simple structure

***object recognition

--------------------------------------------

#problem to solving vision

***if object recognition is too hard, maybe we should first do object segmentation,

-task of taking an image and group the pixels into meaningful areas

-image segmentation - extracting pixels that belong to certain object from its background

-graph theory algorithm

***face detection

-1999~2000 statistical machine learning techniques.

-support vector machines, boosting graphical models, the first wave of neural networks

-Using AdaBoost algorithm to do real-time face detection, Paul Viola & Michael Jones, 2001

-Fuji camera with face detection, 2006

#feature based object recognition

*SIFT feature, David Lowe, 1999

-there are some features that tend to remain diagnostic and invariant to changes

-task began with identifying these critical features on the object

-and then match the features to a similar object

*Spatial Pyramid Matching

-there are features in the images that can give us clues about which type of scene it is, whether it's a landscape or a kitchen or a highway

1takes features from different resolutions

2put them together in a "feature descriptor"

3support vector machine algorithm

*histogram of gradients

*deformable part models

***changing - having better data to study computer vision (internet & camera)

--------------------------------------------

#overfitting data set

*PASCAL Visul Object Challenge (20 object categories), Everingham, 2006-2012

-benchmark data set - to measure progress of object recognition

*Image-net

-to recognize object

-to come back the machine learning overcome the machine learning bottleneck of overfitting

-Large Scale Visual Recognition Challenge

#CNN (convolutional neural network, 합성곱 신경망) - deep learning****

-convnets

#sister studies

*natural language processing

*speech recognition

-------------------------------------------------------------------------------------------------------------

※course overview

#tasks

*image classification

1algorithm looks at an image

2picks from among some fixed set of categories to classify that image

*object detection

-where objects are in the image

*image captioning

-system needs to produce a natural language sentence describing the image.

#CNN

*ALEXNET(Supervision), Alex Krizhevsky & Ilya Sutskever, 2012

-7 layer convolutional neural network

*GoogLeNet, Google, 2014

*VGG, Oxford, 2014

-19 layers

*Microsoft Research Asia(Residual Networks, ResNet), 2015

-152 layers

#before CNN(2012)

*Jan leCun, 1998

-convolutional neural network - recognizing digits

-take in the pixels of an image and then classify either what digit, letter it was

1take raw pixels

2many layers of convolution and sub-samplings (fully connected layers)

#what's different from the 90s to 2012

1 increasing computation

-number of transitsors

-GPUs

2 data

-number of pixels used in training

#further tasks

*activity recognition

*augmented reality, visual reality

*describing pictures

*deep understanding images (including social, political, cultural aspects)

#Computer Vision Technology and better lives

*medical diagnosis

*self-driving cars

*robotics

*understanding human intelligence

derivatives / matrix multiplication

저작자표시

'AI > cs231n' 카테고리의 다른 글

Lecture 3 \| Loss Functions and Optimization (0)	2018.11.06
Lecture 2 \| Image Classification (0)	2018.11.04

'AI/cs231n' Related Articles

Comments

개발 공부

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition 본문

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

'AI > cs231n' 카테고리의 다른 글

티스토리툴바