Flow2GAN

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-fidelity Audio Generation


Contents

Samples Generated from MelSpectrogram

Libritts test-clean

Reference Flow2GAN(1step) Flow2GAN(2step) Flow2GAN(4step) Vocos BigVGAN BigVGAN v2 RFWave PeriodWave-Turbo WaveFM

Libritts test-other

Reference Flow2GAN(1step) Flow2GAN(2step) Flow2GAN(4step) Vocos BigVGAN BigVGAN v2 RFWave PeriodWave-Turbo WaveFM

Samples Generated from Encodec audio token

Speech

Reference BandWidth Flow2GAN(1step) Flow2GAN(2step) Flow2GAN(4step) Encodec MBD RFWave PeriodWave-Turbo
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps

Vocals

Reference BandWidth Flow2GAN(1step) Flow2GAN(2step) Flow2GAN(4step) Encodec MBD RFWave PeriodWave-Turbo
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps

Sound Effects

Reference BandWidth Flow2GAN(1step) Flow2GAN(2step) Flow2GAN(4step) Encodec MBD RFWave PeriodWave-Turbo
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps
1.5 kbps
3.0 kbps
6.0 kbps
12.0 kbps