This page introduces some example of synthesized speech-laugh generated by the model introduced in our research paper. From Sample 1 to Sample 4 are introduced in paper but the others are not.
Abstract: This study is the first challenge of building a synthetic speechlaugh model via a deep learning technique. To maintain the phonetic intelligibility of synthesized speech-laugh, the model was trained with nonlaughing read speech material for both phones of speech-laugh (SL) and of speech (SP). To control laughing onset in SL, the model was also trained using SL material only for the phones of SL instances. The listening tests revealed that the naturalness score for synthesized female SL was as high as that for human SL and that the laughter-likeness score for synthesized SL was higher than that for synthesized SP in almost all conditions. The dictation test revealed that the training for phonetic intelligibility in SL synthesis was highly effective for synthesized SL. However, the difference between segmented SL onset and correct onset was greater for synthesized SL with phonetic intelligibility training than for that without training.
Index Terms: speech-laugh synthesis, paralinguistic information, laughter onset controllability, naturality, intelligibility
R. Setoguchi and Y. Arimoto, “Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis,” in Proceedings of Interspeech2025, 2025. (accepted)
Score: 4.15 out of 5 in naturalness
The input text: "h i cl k a k a cl t e r u y o" in Japanese ("It's stuck")
Score: 4.00 out of 5 in laughter-likeness
The input text: "d o sh i t a N d a r o n e" in Japanese ("I wonder what's going on")
CER: 0
The input text: "n a m a e o k a i t e o k i n a s a i" in Japanese ("Write your name down")
CER: 0.90
The input text: "n a m a e o k a i t e o k i n a s a i" in Japanese ("Write your name down")
Score: 3.50 out of 5 in naturalness
The input text: "m a cl t e m a cl t e m a cl t e" in Japanese ("Wait wait wait")
Score: 3.76 out of 5 in laughter-likeness
The input text: "u w a" in Japanese ("Wow")