diff --git a/TTS-SotA/0.Papers b/TTS-SotA/0.Papers index faee51ed578e5e234513528d940500dd21d59f0b..0946efcfc87642771194c72af59eaa33cb9474be 100644 --- a/TTS-SotA/0.Papers +++ b/TTS-SotA/0.Papers @@ -1,4 +1,3 @@ - **********2020********** [PAPER] End-to-End Adversarial Text-to-Speech @@ -14,7 +13,7 @@ [PAPER] Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search > https://arxiv.org/abs/2005.11129 -[PAPER] Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech +[PAPER] Using VAEs and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech > https://ieeexplore.ieee.org/document/9053678 [PAPER] Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis @@ -26,6 +25,10 @@ **********2019********** +[PAPER] Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens +> https://arxiv.org/abs/1910.11997 +> https://github.com/NVIDIA/mellotron + [PAPER] Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis > https://arxiv.org/abs/1906.03402 @@ -59,17 +62,17 @@ [PAPER] Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization > https://openreview.net/pdf?id=Bkg9ZeBB37 - **********2017********** -[PAPER] Tacotron: Towards End-to-End Speech Synthesis -> https://arxiv.org/abs/1703.10135 +[PAPER] {TACOTRON2} Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions +> https://arxiv.org/abs/1712.05884 +> https://github.com/NVIDIA/tacotron2 [PAPER] Uncovering Latent Style Factors for Expressive Speech Synthesis > https://arxiv.org/abs/1711.00520 -[PAPER] Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions -> https://arxiv.org/abs/1712.05884 +[PAPER] Tacotron: Towards End-to-End Speech Synthesis +> https://arxiv.org/abs/1703.10135 [PAPER] Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning > https://arxiv.org/abs/1710.07654 @@ -80,6 +83,7 @@ [PAPER] Deep Voice: Real-time Neural Text-to-Speech > https://arxiv.org/abs/1702.07825 -[PAPER] Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention +[PAPER] {DC-TTS} Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention > https://arxiv.org/abs/1710.08969 > https://github.com/Kyubyong/dc_tts +> https://github.com/CSTR-Edinburgh/ophelia