[Column] Artificial Intelligence III

[Column] Artificial Intelligence III

- August 11, 2018

Recap :What are AI good at ? Classification !

Two steps of classification : feature extract and feature classify
What is image mean to computer ? pixel , matrix , resolution

Black and white , 0 - 255
Colorful (0-255,0-255,0-255), RGB , tensor
Channel number (# dimension)

Tensor

0 order : scalar
1 order : vector
2 order : matrix

Before deep learning (human designed feature)

graphical feature is defined by computer vision,

edge
texture

after combined with machine learning

object recognization
object detection

Convolution

vector, matrix, tensor can be applied.
steps of convolution : repeat (slide, extract, inner product)
convolution kernel

histogram of oriented gradients (HOG) - to get the edge of figure

step 1 : using convolution calculation get the edge of the figure
step 2 : divide figure into multiple zones
step 3 : according to edge feature , amplitude, orientation conduct statistic.
step 4 : get histogram

Image net

Deep Neutral Network (DNN)

Deep neutral network

automatically get the feature of the figure
before is feature extract > feature classify (two steps independently)
after is (feature extract and classify ) combined together

What is dnn ?

constructed by multiple layer
We can view each layer as a converter , from ( the path from realistic to abstract)
just like learning English (letter > vocabulary > sentence ...)
What is in dnn ?

CONVOLUTIONAL LAYER
FULL-CONNECTED LAYER
SOFTMAX LAYER
NON-LINEAR ACTIVATION LAYER (RELU)
POOLING LAYER

Convolution layer [1]

the layer using convolution calculation to convert feature or original data
using multiple convolution kernel in one layer
FEATURE MAP ~ 3 order tensor (final output of convolutional layer)
the process from feature figure to feature vector

Full connected layer [2]

inner product operation to combined all of the results into one vector as output.
yk = X.Wk +bk ; Y = sum (yks)

X : vector ( as input)
Wk : parameter vector

Softmax layer [n]

the last layer of classification network
output each figure's probability according to each class

Non-linear activation layer

order : convolutional layer > non-linear activation layer
order : full connected layer > non-linear activation layer
the purpose of converting linear function to non-linear function is the avoid superposition effect on linear function that multiple layer effect is gone.
convert LINEAR FUNCTION (such as convolutional layer and full connected layer ) into NON-LINEAR FUNCTION
commonly used non-linear function

logistic function
hyperbolic tangent function
rectified linear function (aka ReLU layer)

ease to use and understand
fast converting
good effect

Pooling layer

the propose of this layer : to decrease feature figure's resolution to decrease the computing amount
where we add ? : commonly add this layer after multiple convolutional layer
the operating ? :

step 1 : dividing channel to get multiple matrix
step 2 : for each matrix we divide it into multiple square, and the size of the square are the same.
step 3 : for each square we pick up the max or average valued combined it into a new matrix
step 4 : put all of the channels together in original order , and the 3 order tensor is done (aka pooling layer output)

max pooling layer and average pooling layer

Biological Neural Networks

the idea of DNN is come from biological neural networks
the algorithm of DNN is reference from a type of biological neural networks
till now , DNN cannot replace BNN, DNN is for specific task only.

DNN Training

the concept of training is the process to get the "optimize parameters"
for high order linear classification training : "BACK PROPAGATION" algorithm is announced. it is the most effective way to train the DNN.

adjust methods include

CHAIN RULE
STOCHASTIC GRADIENT DESCENT

What is deep means in DNN

multiple layers , the number of the layers (the number of the parameters)

Its time for Deep Learning

two major opportunities : data and hardware

diversity huge amount of data, just like the fuel that you used to launch a rocket called deep learning
computing power, CPU to GPU

Is multiple layer always good ?

OVER-FITTING (good training data, bad testing data, there are always exceptions)
UNDER-FITTING (bad talent , bad algorithm , even if the data is good , the parameter is good , but the core is bad, cause the overall performance has its limitations)
the way to solve over-fitting and under-fitting The solutions :!!!

WEIGHT DECAY
REGULARIZATION

multiple layer comes out GRADIENT VANISH and no direction (guide)
the way to solve gradient vanish are

BATCH NORMALIZATION
SHORT-CUT

Comments