[Column] Artificial Intelligence III
Recap :What are AI good at ? Classification !
- Two steps of classification : feature extract and feature classify
- What is image mean to computer ? pixel , matrix , resolution
- Black and white , 0 - 255
- Colorful (0-255,0-255,0-255), RGB , tensor
- Channel number (# dimension)
- Tensor
- 0 order : scalar
- 1 order : vector
- 2 order : matrix
- Before deep learning (human designed feature)
- graphical feature is defined by computer vision,
- edge
- texture
- after combined with machine learning
- object recognization
- object detection
- Convolution
- vector, matrix, tensor can be applied.
- steps of convolution : repeat (slide, extract, inner product)
- convolution kernel
- histogram of oriented gradients (HOG) - to get the edge of figure
- step 1 : using convolution calculation get the edge of the figure
- step 2 : divide figure into multiple zones
- step 3 : according to edge feature , amplitude, orientation conduct statistic.
- step 4 : get histogram
- Image net
Deep Neutral Network (DNN)
- Deep neutral network
- automatically get the feature of the figure
- before is feature extract > feature classify (two steps independently)
- after is (feature extract and classify ) combined together
- What is dnn ?
- constructed by multiple layer
- We can view each layer as a converter , from ( the path from realistic to abstract)
- just like learning English (letter > vocabulary > sentence ...)
- What is in dnn ?
- CONVOLUTIONAL LAYER
- FULL-CONNECTED LAYER
- SOFTMAX LAYER
- NON-LINEAR ACTIVATION LAYER (RELU)
- POOLING LAYER
- Convolution layer [1]
- the layer using convolution calculation to convert feature or original data
- using multiple convolution kernel in one layer
- FEATURE MAP ~ 3 order tensor (final output of convolutional layer)
- the process from feature figure to feature vector
- Full connected layer [2]
- inner product operation to combined all of the results into one vector as output.
- yk = X.Wk +bk ; Y = sum (yks)
- X : vector ( as input)
- Wk : parameter vector
- Softmax layer [n]
- the last layer of classification network
- output each figure's probability according to each class
- Non-linear activation layer
- order : convolutional layer > non-linear activation layer
- order : full connected layer > non-linear activation layer
- the purpose of converting linear function to non-linear function is the avoid superposition effect on linear function that multiple layer effect is gone.
- convert LINEAR FUNCTION (such as convolutional layer and full connected layer ) into NON-LINEAR FUNCTION
- commonly used non-linear function
- logistic function
- hyperbolic tangent function
- rectified linear function (aka ReLU layer)
- ease to use and understand
- fast converting
- good effect
- Pooling layer
- the propose of this layer : to decrease feature figure's resolution to decrease the computing amount
- where we add ? : commonly add this layer after multiple convolutional layer
- the operating ? :
- step 1 : dividing channel to get multiple matrix
- step 2 : for each matrix we divide it into multiple square, and the size of the square are the same.
- step 3 : for each square we pick up the max or average valued combined it into a new matrix
- step 4 : put all of the channels together in original order , and the 3 order tensor is done (aka pooling layer output)
- max pooling layer and average pooling layer
Biological Neural Networks
- the idea of DNN is come from biological neural networks
- the algorithm of DNN is reference from a type of biological neural networks
- till now , DNN cannot replace BNN, DNN is for specific task only.
DNN Training
- the concept of training is the process to get the "optimize parameters"
- for high order linear classification training : "BACK PROPAGATION" algorithm is announced. it is the most effective way to train the DNN.
- adjust methods include
- CHAIN RULE
- STOCHASTIC GRADIENT DESCENT
- What is deep means in DNN
- multiple layers , the number of the layers (the number of the parameters)
Its time for Deep Learning
- two major opportunities : data and hardware
- diversity huge amount of data, just like the fuel that you used to launch a rocket called deep learning
- computing power, CPU to GPU
- OVER-FITTING (good training data, bad testing data, there are always exceptions)
- UNDER-FITTING (bad talent , bad algorithm , even if the data is good , the parameter is good , but the core is bad, cause the overall performance has its limitations)
- the way to solve over-fitting and under-fitting The solutions :!!!
- WEIGHT DECAY
- REGULARIZATION
- multiple layer comes out GRADIENT VANISH and no direction (guide)
- the way to solve gradient vanish are
- BATCH NORMALIZATION
- SHORT-CUT
Comments
Post a Comment