ChainBench-ADD: A Delivery-Aware Dataset and Benchmark for Audio Deepfake Detection

Overview

What is ChainBench-ADD?

Existing audio deepfake datasets have greatly expanded evaluation across generators, languages, and domains, but most remain generation-centric and provide limited support for studying post-generation delivery. In realistic misuse scenarios, forged audio is often altered by platform re-encoding, telephony transmission, or replay-like recapture before it reaches a listener or an automated system. Recent delivery-related studies usually model such factors as isolated distortions rather than structured, ordered delivery routes, and often lack control over transcript and speaker conditions.

ChainBench-ADD addresses this gap by treating post-generation delivery as a structured benchmark dimension. It models delivery through reusable operators, ordered templates, and realized chains across five delivery families: direct, platform-like, telephony, simulated replay, and hybrid. Each delivered sample remains linked to a clean bona fide or spoof parent under matched-parent control, enabling attribution of detector behavior specifically to delivery rather than to differences in content, speaker, or source quality.

Overview of ChainBench-ADD showing the delivery hierarchy and matched-parent design

The current release contains 941,201 waveforms derived from 55,813 parents (18,703 bona fide from Common Voice and AISHELL-3; 37,110 spoof from six contemporary TTS systems) across 448 speakers in English and Mandarin Chinese. From the shared metadata, we define five evaluation tasks: in-chain detection, three matched local interventions (operator substitution, parameter perturbation, and order swap), and lineage-based delivery robustness. A leave-one-template-out protocol further tests transfer to unseen templates within a family.

Families & Operators

Five Delivery Families, Eight Reusable Operators

▶

Direct

1 template · Identity control

No delivery processing. Serves as the clean reference baseline for controlled comparison.

☁

Platform-like

6 templates · Operators: codec, re-encode, resample

Approximates online redistribution and recompression pipelines typical of media-sharing platforms.

☎

Telephony

10 templates · Operators: band-limit, codec, packet loss, call-path, resample

Models bandwidth-limited call transmission with telephony codecs, packet impairment, jitter, and AGC.

🎧

Simulated Replay

7 templates · Operators: RIR, noise, re-encode, resample

Models playback and recapture in physical spaces through room impulse response and environmental noise.

🔀

Hybrid

9 templates · Operators: codec, RIR, band-limit, re-encode, packet loss, resample

Combines communication and acoustic effects within a single delivery path.

Delivery Operator Inventory

Operator	Representative Settings	Families
resample	16→8, 16→24, 16→8→16, 16→24→16, 16→32→16 kHz	P / T / H
band-limit	Narrowband (250–3400 Hz), Wideband (50–7000 Hz)	T / H
codec	AAC (24/32/48 kbps), Opus (16/24/32 kbps), GSM, μ-law PCM	P / T / H
re-encode	Same/cross-codec AAC/Opus recompression (24/32 kbps)	P / T / H
packet loss	1/3/5/10% loss; burst 2/3/5; repeat-fade, interpolation, or noise-fill	T / H
noise	White, pink, brown, babble, hiss, hum; 30/20/15/10 dB SNR	R / H
RIR	Small/medium/large rooms; RT60 0.2/0.4/0.6/0.8 s; 0.5/1/2/3 m	R / H
call-path	Joint channel filtering, codec, packet loss, jitter (0/8/16 ms), AGC	T / H

Fam. P = Platform-like, T = Telephony, R = Simulated Replay, H = Hybrid. All operators are waveform-domain transforms, not symbolic tags.

Audio Samples

Listen: Delivery Transforms Across Families

Each card below shows one parent utterance delivered through all five families. Compare how different delivery scenarios alter the same underlying speech.

Bona Fide English “She has won numerous national and regional awards”

Family	Template	Operator Chain
Direct	direct_clean	identity (no processing)
Platform-like	aac_reencode	codec→re-encode
Telephony	session_nb_mulaw	call-path
Sim. Replay	rir_noise_resample	RIR→noise→resample
Hybrid	reencode_rir_plr	re-encode→RIR→packet loss

Spoof (Spark-TTS) English “She has won numerous national and regional awards”

Family	Template	Operator Chain
Direct	direct_clean	identity (no processing)
Platform-like	aac_single	codec
Telephony	wb_opus	band-limit→codec
Sim. Replay	noise_rir	noise→RIR
Hybrid	rir_aac	RIR→codec

Bona Fide Chinese “住房维修资金约一亿元三是已有人员和机构”

Family	Template	Operator Chain
Direct	direct_clean	identity (no processing)
Platform-like	aac_single	codec
Telephony	nb_mulaw	band-limit→codec
Sim. Replay	rir_reencode	RIR→re-encode
Hybrid	aac_rir	codec→RIR

Spoof (Spark-TTS) Chinese “住房维修资金约一亿元三是已有人员和机构”

Family	Template	Operator Chain
Direct	direct_clean	identity (no processing)
Platform-like	aac_reencode	codec→re-encode
Telephony	nb_mulaw_plr	band-limit→codec→packet loss
Sim. Replay	rir_noise	RIR→noise
Hybrid	bandlimit_codec_rir	band-limit→codec→RIR

Release & License

Release, License, and Use

ChainBench-ADD distinguishes between code and the benchmark package. The code repository, including construction scripts, configurations, and baseline implementations, is released under the MIT License. The dataset release is distributed under the ChainBench-ADD Dataset Terms of Use.

ChainBench-ADD is assembled from multiple upstream speech resources and speech-generation systems. All third-party datasets, models, audio, and other external assets remain subject to their original licenses, terms, and attribution requirements. The ChainBench-ADD Dataset Terms of Use apply to this benchmark release as distributed by the authors and do not replace or weaken any applicable upstream obligations.

ChainBench-ADD is released for research on audio deepfake detection, robustness evaluation, forensic analysis, provenance, and benchmarking. It is provided for defensive and scientific use only. The benchmark must not be used to support or enable impersonation, fraud, harassment, social engineering, unauthorized voice cloning, deceptive media generation, biometric surveillance, or the training or improvement of systems intended for deceptive speech generation.

Redistribution of the benchmark package, or any subset that includes third-party material, is permitted only to the extent allowed by all applicable upstream terms and must preserve the relevant notices and this use statement.

ChainBench-ADD

What is ChainBench-ADD?

ChainBench-ADD at a Glance

Samples by Delivery Family

Label Distribution

Train / Dev / Test Split

Language Distribution