NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Authors: Qichao Wang*, Ziqiao Meng*, Wenqian Cui, Yifei Zhang, Pengcheng Wu, Bingzhe Wu, Irwin King, Liang Chen, Peilin Zhao†

Key features:

Pre-training: Transform single-channel audio into discrete tokens for next-token prediction
SFT: Novel "next-token-pair prediction" objective for natural conversation comprehension
Result: More natural and fluid spoken interactions compared to baseline approaches

Installation

git clone https://github.com/Chaos96/NTPP.git
cd parrot
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt

Usage

Prepare audio data for pre-training and fine-tuning
Pre-train: python pretrain.py --input_data path/to/single_channel_data
Fine-tune: python finetune.py --input_data path/to/double_channel_data
Inference: python inference.py --input_audio path/to/input.wav

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assert		assert
dataset/Dual-channel		dataset/Dual-channel
ntpp		ntpp
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

License

paramedick/NTPP

Folders and files

Latest commit

History

Repository files navigation

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Installation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages