ChainBench-ADD

A Delivery-Aware Dataset and Benchmark for Audio Deepfake Detection
ACM Multimedia 2026 · November 10–14, Rio de Janeiro, Brazil

What is ChainBench-ADD?

Existing audio deepfake datasets have greatly expanded evaluation across generators, languages, and domains, but most remain generation-centric and provide limited support for studying post-generation delivery. In realistic misuse scenarios, forged audio is often altered by platform re-encoding, telephony transmission, or replay-like recapture before it reaches a listener or an automated system. Recent delivery-related studies usually model such factors as isolated distortions rather than structured, ordered delivery routes, and often lack control over transcript and speaker conditions.

ChainBench-ADD addresses this gap by treating post-generation delivery as a structured benchmark dimension. It models delivery through reusable operators, ordered templates, and realized chains across five delivery families: direct, platform-like, telephony, simulated replay, and hybrid. Each delivered sample remains linked to a clean bona fide or spoof parent under matched-parent control, enabling attribution of detector behavior specifically to delivery rather than to differences in content, speaker, or source quality.

Overview of ChainBench-ADD showing the delivery hierarchy and matched-parent design

The current release contains 941,201 waveforms derived from 55,813 parents (18,703 bona fide from Common Voice and AISHELL-3; 37,110 spoof from six contemporary TTS systems) across 448 speakers in English and Mandarin Chinese. From the shared metadata, we define five evaluation tasks: in-chain detection, three matched local interventions (operator substitution, parameter perturbation, and order swap), and lineage-based delivery robustness. A leave-one-template-out protocol further tests transfer to unseen templates within a family.

ChainBench-ADD at a Glance

0
Total Waveforms
0
Parent Utterances
0
Speakers
0
Languages
0
Delivery Families
0
Templates
0
Operator Sequences
0
TTS Generators

Samples by Delivery Family

Label Distribution

Train / Dev / Test Split

Language Distribution

Five Delivery Families, Eight Reusable Operators

Direct

1 template · Identity control
No delivery processing. Serves as the clean reference baseline for controlled comparison.

Platform-like

6 templates · Operators: codec, re-encode, resample
Approximates online redistribution and recompression pipelines typical of media-sharing platforms.

Telephony

10 templates · Operators: band-limit, codec, packet loss, call-path, resample
Models bandwidth-limited call transmission with telephony codecs, packet impairment, jitter, and AGC.
🎧

Simulated Replay

7 templates · Operators: RIR, noise, re-encode, resample
Models playback and recapture in physical spaces through room impulse response and environmental noise.
🔀

Hybrid

9 templates · Operators: codec, RIR, band-limit, re-encode, packet loss, resample
Combines communication and acoustic effects within a single delivery path.

Delivery Operator Inventory

OperatorRepresentative SettingsFamilies
resample16→8, 16→24, 16→8→16, 16→24→16, 16→32→16 kHzP / T / H
band-limitNarrowband (250–3400 Hz), Wideband (50–7000 Hz)T / H
codecAAC (24/32/48 kbps), Opus (16/24/32 kbps), GSM, μ-law PCMP / T / H
re-encodeSame/cross-codec AAC/Opus recompression (24/32 kbps)P / T / H
packet loss1/3/5/10% loss; burst 2/3/5; repeat-fade, interpolation, or noise-fillT / H
noiseWhite, pink, brown, babble, hiss, hum; 30/20/15/10 dB SNRR / H
RIRSmall/medium/large rooms; RT60 0.2/0.4/0.6/0.8 s; 0.5/1/2/3 mR / H
call-pathJoint channel filtering, codec, packet loss, jitter (0/8/16 ms), AGCT / H

Fam. P = Platform-like, T = Telephony, R = Simulated Replay, H = Hybrid. All operators are waveform-domain transforms, not symbolic tags.

Listen: Delivery Transforms Across Families

Each card below shows one parent utterance delivered through all five families. Compare how different delivery scenarios alter the same underlying speech.

Bona Fide English “She has won numerous national and regional awards”
FamilyTemplateOperator ChainAudio
Directdirect_cleanidentity (no processing)
Platform-likeaac_reencodecodecre-encode
Telephonysession_nb_mulawcall-path
Sim. Replayrir_noise_resampleRIRnoiseresample
Hybridreencode_rir_plrre-encodeRIRpacket loss
Spoof (Spark-TTS) English “She has won numerous national and regional awards”
FamilyTemplateOperator ChainAudio
Directdirect_cleanidentity (no processing)
Platform-likeaac_singlecodec
Telephonywb_opusband-limitcodec
Sim. Replaynoise_rirnoiseRIR
Hybridrir_aacRIRcodec
Bona Fide Chinese “住房维修资金约一亿元三是已有人员和机构”
FamilyTemplateOperator ChainAudio
Directdirect_cleanidentity (no processing)
Platform-likeaac_singlecodec
Telephonynb_mulawband-limitcodec
Sim. Replayrir_reencodeRIRre-encode
Hybridaac_rircodecRIR
Spoof (Spark-TTS) Chinese “住房维修资金约一亿元三是已有人员和机构”
FamilyTemplateOperator ChainAudio
Directdirect_cleanidentity (no processing)
Platform-likeaac_reencodecodecre-encode
Telephonynb_mulaw_plrband-limitcodecpacket loss
Sim. Replayrir_noiseRIRnoise
Hybridbandlimit_codec_rirband-limitcodecRIR

Release, License, and Use

ChainBench-ADD distinguishes between code and the benchmark package. The code repository, including construction scripts, configurations, and baseline implementations, is released under the MIT License. The dataset release is distributed under the ChainBench-ADD Dataset Terms of Use.

ChainBench-ADD is assembled from multiple upstream speech resources and speech-generation systems. All third-party datasets, models, audio, and other external assets remain subject to their original licenses, terms, and attribution requirements. The ChainBench-ADD Dataset Terms of Use apply to this benchmark release as distributed by the authors and do not replace or weaken any applicable upstream obligations.

ChainBench-ADD is released for research on audio deepfake detection, robustness evaluation, forensic analysis, provenance, and benchmarking. It is provided for defensive and scientific use only. The benchmark must not be used to support or enable impersonation, fraud, harassment, social engineering, unauthorized voice cloning, deceptive media generation, biometric surveillance, or the training or improvement of systems intended for deceptive speech generation.

Redistribution of the benchmark package, or any subset that includes third-party material, is permitted only to the extent allowed by all applicable upstream terms and must preserve the relevant notices and this use statement.