ML/음성인식

Phoneme recognition (Spikegram, MFCC, Spectrogram, Melspectrogram)

HanSeokhyeon 2020. 1. 21. 16:41
반응형

Phoneme recognition을 위해 다양한 feature를 사용하여 실험해보았다.
https://github.com/HanSeokhyeon/Deep_learning_for_Phoneme_recognition

1. Spikegram

Obstruent - Stops : 0.5738  
Obstruent - Affricate : 0.4219  
Obstruent - Fricative : 0.7066  
Sonorant - Glides : 0.5514  
Sonorant - Nasals : 0.5915  
Sonorant - Vowels : 0.5305  
Others : 0.9224  

Obstruent : 0.6576  
Sonorant : 0.5412  
Others : 0.9224  

Non-mute : 0.5749  
Mute : 0.9224  

Total : 0.6526  

2. MFCC

Obstruent - Stops : 0.5097
Obstruent - Affricate : 0.3061
Obstruent - Fricative : 0.6693
Sonorant - Glides : 0.5659
Sonorant - Nasals : 0.6236
Sonorant - Vowels : 0.5338
Others : 0.9196

Obstruent : 0.6106
Sonorant : 0.5499
Others : 0.9196

Non-mute : 0.5677
Mute : 0.9196

Total : 0.6550

3. Spectrogram

Obstruent - Stops : 0.5045
Obstruent - Affricate : 0.3606
Obstruent - Fricative : 0.6698
Sonorant - Glides : 0.5598
Sonorant - Nasals : 0.6039
Sonorant - Vowels : 0.5244
Others : 0.9179

Obstruent : 0.6123
Sonorant : 0.5399
Others : 0.9179

Non-mute : 0.5611
Mute : 0.9179

Total : 0.6496

4. Melspectrogram

Obstruent - Stops : 0.4918
Obstruent - Affricate : 0.3574
Obstruent - Fricative : 0.6723
Sonorant - Glides : 0.5543
Sonorant - Nasals : 0.6009
Sonorant - Vowels : 0.5370
Others : 0.9194

Obstruent : 0.6106
Sonorant : 0.5475
Others : 0.9194

Non-mute : 0.5661
Mute : 0.9194

Total : 0.6537

반응형