IS2025

Overview

This page introduces some example of synthesized speech-laugh generated by the model introduced in our research paper. From Sample 1 to Sample 4 are introduced in paper but the others are not.

Abstract: This study is the first challenge of building a synthetic speechlaugh model via a deep learning technique. To maintain the phonetic intelligibility of synthesized speech-laugh, the model was trained with nonlaughing read speech material for both phones of speech-laugh (SL) and of speech (SP). To control laughing onset in SL, the model was also trained using SL material only for the phones of SL instances. The listening tests revealed that the naturalness score for synthesized female SL was as high as that for human SL and that the laughter-likeness score for synthesized SL was higher than that for synthesized SP in almost all conditions. The dictation test revealed that the training for phonetic intelligibility in SL synthesis was highly effective for synthesized SL. However, the difference between segmented SL onset and correct onset was greater for synthesized SL with phonetic intelligibility training than for that without training.

Index Terms: speech-laugh synthesis, paralinguistic information, laughter onset controllability, naturality, intelligibility

R. Setoguchi and Y. Arimoto, “Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis,” in Proceedings of Interspeech2025, 2025. (accepted)

Speech-laugh synthesis

Overview

Sample 1: synthesized speech-laugh in closed condition via pretraining female model with a high naturalness score in Figure 5 (a)

Sample 2: synthesized speech-laugh in closed condition via pretraining female model with a high laughter-likeness score in Figure 5 (b)

Sample 3: synthesized speech-laugh via pretraining model with a low CER in Figure 6 (a)

Sample 4: synthesized speech-laugh via no-pretraining model with a high CER in Figure 6 (b)

Sample 5: synthesized speech-laugh in open condition via pretraining female model with a high naturalness score

Sample 6: synthesized speech-laugh in open condition via pretraining male model with a high laughter-likeness score