[Column] Artificial Intelligence IV

Voice and AI (transform music to data)
  1. speech and music
  2. cock-tail party effect
  3. 85-1100 Hz for Human
  4. Steps from Human voice to MP3 
    1. VOICE > SENSOR > VOLTAGE
    2. SAMPLING (computer cannot manage continuous signal) : DISCRETE (time)
    3. QUANTIZATION : DISCRETE (amplitude)
    4. ENCODING
  5. TIME SERIES
  6. WAVEFORM
  7. SAMPLING RATE (for MP3 is 44100 Hz , the reason is human is not sensitive to high frequency voice )
Characteristics of Voice
  1. frequency spectrum (x - frequency ; y - amplitude) with log coordinate
  2. Amplitude (Loudness)
  3. Frequency (Pitch)
  4. Timbre (related with harmonics)
Classic Voice Features -MFCC
  1. MFCC - Mel-Frequency Cepstral Coefficients
    1. Formant
    2. Vowel
  2. Steps of using MFCC
    1. 26 dimensions vector 
      1. Mel-Frequency, mel(f) = 1125 *ln(1+f/700)
      2. good classification at low frequency , bad at high frequency (similar as human)
    2. Calculate Its CEPSTRAL
      1. the purpose of cepstral is converting 26 dimensions into 13 dimensions.
      2. window width
      3. window pitch
    3. finally get 13 dimensions MFCC features
  3. Deep learning applied (feature extract and feature classify)
    1. convolutional layer (convert into much more specific feature)
    2. pooling layer  (reduce dimension)
    3. full connected layer (combined convolution by inner product calculation)
    4. softmax layer (outcome probability)
Application - SPEECH RECOGNIZATION
  1. voice assistant (without type writing , meeting record)
  2. acoustic model
  3. language model
  4. music automatically searching ! (fuzzy searching) - window scan

Comments

Popular posts from this blog

[SW] LMTD Calculations

[Column] Artificial Intelligence I & II

[Column] Artificial Intelligence III