Amplifying Membership Signal
Through Chained Regeneration
Abstract
The tendency of large generative models to memorize training data makes sample verification critical for privacy auditing and copyright enforcement. Current membership (MIA) and dataset inference (DI) attacks often rely on one-shot generations, which yield weak signals and limited sensitivity across modalities. Inspired by Model Autophagy Disorder (MAD), we introduce MADreMIA, a model-agnostic framework that enhances white-, gray-, and black-box MIA and DI. Rather than relying on shadow model training – often infeasible for large generative models – our framework facilitates scalable inference by leveraging inherent signals through iterative trajectories. This process utilizes chained generations across diverse modalities, where each output serves as the subsequent input, to improve membership evidence at low FPR. We demonstrate that memorized training samples exhibit significantly higher coherence and slower degradation during iterative regeneration than non-member generations. Our results show that MADreMIA provides richer signals across diverse model families and modalities; we present comprehensive evaluations for IARs, diffusion, and language models, alongside preliminary results demonstrating its potential for audio models.
1 Introduction
The rapid development of generative AI triggered a pressing demand for training data, frequently leading to the unauthorized ingestion of private, sensitive, or copyrighted content. Consequently, with the scaling of generative models the importance of Membership inference attacks (MIAs) [28] and dataset inference (DI) [24] has become critical. Practical auditing – ranging from protecting medical privacy [49] to identifying licensed content [12] or detecting benchmark contamination [22, 30, 44] – requires determining whether specific samples or datasets were used to shape a model’s parameters. The definitive test is whether a model retains a structural ”echo” of its training data, manifesting itself as a high-fidelity memorization signal that can be surfaced through targeted inference. Existing auditing methods, however, face a significant bottleneck. Most extract evidence from a single query [46, 39] or a set of loosely coupled samples [8]. These one-shot signals are often fragile; recent evaluations on unbiased benchmarks show that many MIAs degrade significantly under distributional shifts, often performing only slightly better than random guessing [22, 12]. Furthermore, high-performance ”shadow model” attacks [41, 5] – which require training multiple auxiliary models to simulate the target – are computationally expensive and impractical for real-world large-scale generative architectures.
To address these limitations, we shift the perspective from a single static query to a dynamic trajectory. This concept is best illustrated through a forensic parallel: in a criminal interrogation, a suspect may maintain a lie for a single response, but that lie often collapses under the pressure of repeated, recursive follow-up questions. A truthful narrative, by contrast, remains coherent because it is grounded in a fixed reality. We argue that generative models exhibit a similar phenomenon – their ”truth” is the training set. While a model can produce a plausible-looking output for a non-member sample once, it may struggle to sustain that plausibility over a recursive chain of self-generated inputs.
Our framework, MADreMIA, is inspired by the mechanics of Model Autophagy Disorder (MAD) [1, 29]. Traditionally, MAD describes a failure mode where models trained on their own synthetic outputs progressively lose variance and collapse into a state of degenerated ”madness”. We pivot this phenomenon into a diagnostic, interference-time tool: if a sample was present during training, it acts as a stable “attractor” in the model’s latent space. By repeatedly feeding a model’s outputs back into itself – creating an iterative regeneration chain – we can amplify the signal of memorization.
Within this framework, we distinguish between two types of trajectories:
-
•
Re-members: These are member samples (training data) that are iteratively re-generated. Because the model has ”memorized” these points, they exhibit high stability and slow semantic degradation over time.
-
•
Re-non-members: These are unseen samples that are iteratively re-generated. Lacking a structural anchor in the model’s weights, these samples drift rapidly toward the model’s average biases or dissolve into noise (see Figure 1).
MADreMIA functions as a modular, inference-time add-on that is intentionally method-, model-, and modality-agnostic. By measuring consistency across recursive loops, we provide richer signals across diverse architectures, including image autoregressive models (IARs), diffusion models (DMs), large language models (LLMs), and audio voice conversion models. We demonstrate that while a single output is often too noisy to be decisive, the trajectory of a “re-member” is different than “re-non-member” and thus acts as a powerful signal amplifier, surfacing traces of training data that are otherwise invisible.
MADreMIA iterative procedure moves beyond one-shot plausibility by probing whether the model preserves semantic and structural consistency under repeated self-interaction. Consequently, this work investigates a central research question: Can the dynamics of recursive self-generation serve as a signal amplifier to expose training data membership?
In summary, the main contributions of our paper are:
-
•
We introduce an iterative regeneration setup to uncover data memorization invisible during single-pass inference.
-
•
We show theoretically and empirically that trajectory features (generation dynamics over time) yield a significantly more statistically robust membership signal. By functioning as a variance reduction mechanism, these features isolate the underlying membership information much more effectively than standard one-shot baselines.
-
•
We propose an inference-time, cross-modal framework that improves Membership and Dataset Inference efficiency across Vision and Language models without the need for expensive shadow model training.
2 Related Works
Memorization.
Memorization in generative models — the tendency to reproduce training examples rather than generate novel samples — has been studied across multiple model families. Early work formalized the distinction between memorization, mode collapse, and overfitting [35], while subsequent studies characterized the generalization-to-memorization transition in diffusion models [15], localized it through attention patterns [26], and showed that standard evaluation metrics fail to surface it [2]. Mitigation strategies have been proposed for both LLMs [16] and text-to-image models [7].
Membership and Dataset Inference.
Individual Membership Inference Attacks (MIAs) can be confounded by distribution shifts [22], prompting a shift toward Dataset Inference, which aggregates evidence across many samples [23, 12, 19]. Shadow-model approaches [41, 5] are now computationally infeasible for large architectures, so modern attacks extract signals from limited black-box outputs [46, 6, 32]. Most relevant to our work, Li et al. [20] performs MIAs on diffusion models by repeatedly perturbing a target image and comparing averaged outputs to the original — but since queries are independent and do not evolve with model responses, deeper structural memorization remains unexploited.
Model Collapse.
Recursive self-training in generative models leads to progressive quality and diversity degradation when insufficient real data is injected — a phenomenon termed Model Autophagy Disorder [1]. Training on model-generated data further causes tails of the original distribution to disappear [29]. Together, these works suggest that iterative generation is structurally revealing: memorized regions may persist differently from non-member examples under repeated reuse. Our method turns this insight into a privacy-auditing mechanism, exploiting chained regeneration at inference time to amplify membership-relevant differences rather than treating collapse as a training-time pathology. The extended related works section can be found in Appendix E.
3 Theory of Trajectory-Based Signal Amplification
For each sample, we define an iterative trajectory , where is the observed sample and is produced by one regeneration step. Let denote membership. Define a per-step score and the average . The attack predicts from . We use when for a constant independent of , and for two-sided bounds.
Assumption 3.1 (Signal and Noise).
(A1) There exists a sequence such that . (A2) . (A3) The centered process is geometrically mixing with effective autocorrelation time , implying .
Theorem 3.2 (Trajectory Averaging).
Under A1–A3, the signal and SNR satisfy:
Interpretation. Multi-step attacks improve when mean signal decays slowly relative to variance reduction.
Corollary 3.3 (Exponential Leakage).
If , then . If , then where . The maximizer yields an optimal .
Corollary 3.4 (Amplification Gain).
Let . At , the gain over the single-step baseline is , with .
It is worth noting that we do not claim that trajectory iteration increase the Bayes information ceiling ; No, instead it improves practical fixed-form statistics via temporal variance reduction.
4 Method
MADreMIA is a trajectory-augmentation framework for privacy inference on generative models. It is designed as an any-box extension of standard one-shot attacks (MIA/DI): black-box by default, gray-box when richer outputs are available, and white-box when needed. The central design principle is to keep the downstream scorer unchanged and improve only its input representation through additional trajectory-derived evidence.
Unified setup.
Following Sec. 3, for each queried sample we construct
where is the queried sample, is the audited generator, and is a modality-specific regeneration operator executed under a fixed protocol. For MIA, the label is (member/non-member). For DI, we use an analogous binary label (in-target-dataset/out-of-target-dataset).
Threat model.
MADreMIA supports: black-box (query access to outputs only), gray-box (query access plus output-level statistics such as loss/log-probability signals), and white-box (optional access to internals/gradients when available). In all cases, the adversary/auditor has no access to training data identities (labels), performs at most regeneration steps per sample, and outputs a binary prediction via : for MIA or for DI.
Base one-shot signal.
The theory defines
A trajectory one-shot comparator corresponds to the case (using ). When available, we additionally report classical one-shot baselines . Importantly, for each modality/model, the orientation (sign) of is fixed on train data only (equivalently or ) and then frozen for test-time evaluation.
Signals and Fusion.
MADreMIA augments one-shot evidence with trajectory summaries computed from . We define
Here aggregates temporal statistics aligned with the process (e.g., drift, consistency, quality evolution, diversity, score decay, and summaries derived from and ). The fused representation is and the final attack score is with a calibrated scorer. By default, following Kowalczuk et al. [19], is an L1-regularized logistic regression fit as a plug-in estimator of .
Mechanism.
MADreMIA leverages the fact that members often exhibit slower average drift than non-members. Memorized samples typically lie in deeper local probability wells, causing iterative regenerations to remain closer to . Gains represent fixed-statistic SNR improvements consistent with the DPI: .
4.1 Modality-specific instantiations
Image autoregressive models (IARs) and diffusion models.
is image-to-image regeneration under fixed controls (autoregressive decoding for IARs; controlled re-noise/re-denoise for diffusion, i.e., partial forward noising to a fixed noise level followed by reverse denoising under fixed scheduler/settings). Trajectory features are defined relative to , in particular , [47], and [37].
Large language models (LLMs).
is an autophagous text loop where each generation is fed back as the next prompt/input under a fixed template, fixed context-window policy (with left-sided truncation to keep only the newest text), and fixed decoding configuration. We use multiple features to measure the quality and diversity of generations, specifically: Kullback-Leibler Divergence, Jensen-Shannon Divergence, Jaccard Index, Predictive Entropy, and Logit Margin:
for . These are summarized along the trajectory and fused with . For clarity, KLD/JSD are computed on aligned token-distribution vectors: in gray/white-box settings from next-token logits, and in black-box settings from smoothed empirical token-frequency distributions under a fixed tokenizer/vocabulary. In fact, metrics in our experiments follow the gray-box setting, but our framework itself is open to the black-box setting as well. A black-box setting requires repeated queries per step to estimate distributions. Jaccard is computed on token sets after the same fixed preprocessing. More information about features for vision and language models are provided in Appendix H.
Audio generative models.
In the audio domain, employs iterative reconstruction loops. Notably, we do not conduct a full Membership or Dataset Inference evaluation for audio models, as the literature currently lacks proper audio benchmarks and specialized attacks tailored to the voice conversion setting. Nevertheless, to demonstrate the cross-modal generality of our framework, our first experiment explores this potential using an objective audio fidelity metric.
Across all modalities, MADreMIA follows the pipeline: .
5 Experiments
5.1 Experimental Setup
To ensure a scientifically sound evaluation across our MIA tasks, we restrict our setup to models trained on public datasets with well-defined training and test splits. We evaluate our method across three diverse modalities to demonstrate its broad applicability. For image generation, we analyze SOTA autoregressive models (VAR-d{20, 24, 30} [33], RAR-{L, XL, XXL} [43]) and diffusion models (DiT-RF-{XL, G} [13], UViT-T2I-Deep [3]), trained primarily on the ImageNet [10] or COCO [36] datasets for class-conditioned and text-to-image generation. We extend this evaluation to the audio domain using modern Voice Conversion models (AutoVC [25], FreeVC [21]), and to the language domain utilizing prominent LLMs (LLaMA-13B [34], Pythia-6.9B [4], OLMo-7B [14], and OPT-6.7B [48]). Comprehensive details regarding all specific models and datasets used in experiments are provided in the Appendix F and G. All experiments were conducted on a machine equipped with 3 NVIDIA RTX PRO 5000 Blackwell GPUs (48 GB VRAM each) and an Intel Xeon Gold 6526Y CPU.
5.2 Metrics
To measure similarity between feature representations and their fidelity, we utilize the Fréchet Inception Distance (FID) [17], and Fréchet Audio Distance (FAD) [18] for vision and audio models, respectively. For LLMs, we measure Token Diversity as the Kullback–Leibler Divergence (KLD) between the normalized average token probability distribution at the current iteration and that of the first evaluation iteration: Token Diversity at iteration (for ) is defined as the Kullback-Leibler divergence from iteration :
where and are the normalized average token probability distributions for step and step respectively.
5.3 MIA and DI procedures
MIA pipeline.
For each labeled member/non-member sample, we generate , compute , , and modality-specific trajectory features, form , and fit . We evaluate univariate trajectory statistics by direct thresholding and multivariate features by logistic-regression fusion on strictly stratified 80/20 train-test splits. We report AUC, TPR at 1% FPR, and accuracy. Splitting is performed at sample/source level before trajectory generation: all descendants of the same (all , all derived features) remain in the same partition. Thresholds, feature normalization, and LR calibration are fit on train only and applied unchanged to test. Primary endpoint is the multivariate fusion score; univariate results are reported as theory-aligned diagnostics. If is tuned, it is selected on train (or a train-only validation split) and never on test. We use established metrics: TPR@FPR=1%, AUC, and Accuracy.
DI pipeline.
The DI pipeline is identical, replacing the target label with dataset-origin variable . The same , , and trajectory-fusion machinery is used; only label semantics and calibration change. For DI, splitting/evaluation are performed at dataset or source-group level, and per-sample logits are aggregated by a fixed mean rule into a dataset-level score. Dataset-level decisions are evaluated against a permutation-based null over dataset labels within the evaluation fold.
Both MIA and DI setups inherit standard generative privacy-audit conventions, including the IAR setting introduced in [19].
5.4 Research questions
We evaluate whether chained regeneration can be a signal amplifier for one-shot auditing across modalities, model families, and access regimes. Our analysis focuses on the following questions: (Q1) What distinguishes member/non-member chained generation trajectories? (Q2) Can one-shot membership signal be amplified for single features? What are the gains for trajectory-based over across modalities? (Q3) Does MADreMIA increase member/non-member separability compared to one-shot MIA? (Q4) Does increasing generative model stochasticity during regeneration loop affect the trajectories separation between members and non-members? (Q5) How does model size affect member/nonmember trajectory signals? Finally, we also provide a short analysis of the Getty Images case [9] in Appendix K.
5.5 Members and Nonmembers differ in generative trajectories: qaulitative results.
Across all modalities, members and non-members exhibit distinct regeneration dynamics. Members preserve structure longer and drift more slowly, while non-members degrade faster and diverge toward the model’s generic prior. This pattern is visible both in per-step qualitative examples (Figures˜3(a) and 3(b)) and in aggregate divergence trajectories (Figure˜2) comparing the quality of regenerations to base samples (FID for images, FAD for audio) and the drift of output token distribution in text model. The results presented support the core hypothesis that auto-regeneration trajectory contains multiple membership cues. The key trajectory asymmetry findings are:
-
1.
Fidelity and degradation: Re-members maintain high structural quality throughout the trajectory, whereas re-non-members exhibit rapid perceptual and semantic degradation.
-
2.
Persistence and divergence: Re-members demonstrate significant structural persistence and coherence across iterations. Conversely, re-non-members diverge more quickly, drifting toward the model’s general distribution and losing the specific characteristics of the original input.
The asymmetry is present across diverse models and modalities.
We test broad architectural diversity: image autoregressive and diffusion models, audio voice conversion/generation models, and text generative models. Figure˜2 summarizes trajectory behavior using modality-appropriate divergence metrics 5.2. This design directly tests whether our proposed signal amplification is model- and modality-agnostic.
5.6 statistics may increase membership signal over one-shot .
We evaluate the validity of our theoretical assumptions using empirical generative trajectories, fixing to the first 15 iterations. As summarized in Table 2, while Assumption A2 is fully supported, A1 and A3 receive only partial empirical backing. Specifically, for certain values of , the absence of clear exponential decay within the first 15 iterations is acceptable for our main claim, since it indicates slower or plateau-like leakage. It suggests that non-exponential leakage forms may also govern real trajectories.
To assess the efficacy of modality-specific trajectory statistics, we evaluate whether aggregated trajectory evidence remains competitive with - or outperforms - the one-shot evidence. We define and show results in Table 2. Trajectory diagnostics are strong: with median gain . Given the small number of tested features, we interpret these numbers as supportive preliminary evidence.
| Family | Median | |||
|---|---|---|---|---|
| VAR | 3 | 0.67 | 1.00 | 1.00 |
| Diffusion | 3 | 0.67 | 1.00 | 1.00 |
| LLM | 5 | 0.80 | 0.80 | 1.04 |
| Family | A1 | A2 | A3 |
|---|---|---|---|
| VAR | |||
| Diffusion | |||
| LLM |
5.7 MADreMIA amplifies baseline MIA
Tables˜3 and 4 compare MADreMIA-augmented attacks against their unaided baselines across LLMs and IARs. Across all base attacks and model families, incorporating reconstruction Diversity (, ), Quality (, ), or both (Combined) consistently raises attack performance. Gains are most pronounced on OLMo-7B, where, for example, the Zlib baseline collapses to AUC 0.179 yet recovers to 0.868 with Combined signals, and CAMIA reaches AUC 0.969 — the strongest result across all settings. On the remaining LLMs the improvements are more modest but consistent. For IARs, MADreMIA yields clear gains in classification accuracy: VAR-d30 improves from 0.607 to 0.696 (+8.9 p.p.) and RAR-XXL from 0.562 to 0.713 (+15.1 p.p.), although TPR@1%FPR gains are smaller and less stable. Together, these results confirm that iterative reconstruction signals provide complementary, architecture-agnostic information that reliably strengthens membership inference across both LLMs and IARs.
| Pythia-6.9B | OLMo-7B | OPT-6.7B | Llama-13B | |||||
|---|---|---|---|---|---|---|---|---|
| Attack | TPR@1%FPR | AUC | TPR@1%FPR | AUC | TPR@1%FPR | AUC | TPR@1%FPR | AUC |
| Loss [42] | 0.004 0.00 | 0.349 0.02 | 0.008 0.01 | 0.523 0.02 | 0.013 0.01 | 0.390 0.04 | 0.009 0.01 | 0.368 0.04 |
| + Diversity | 0.093 0.06 | 0.647 0.05 | 0.303 0.09 | 0.735 0.04 | 0.092 0.12 | 0.613 0.09 | 0.173 0.14 | 0.690 0.08 |
| + Quality | 0.096 0.07 | 0.686 0.05 | 0.032 0.04 | 0.702 0.04 | 0.084 0.09 | 0.652 0.07 | 0.198 0.13 | 0.679 0.09 |
| + Combined | 0.100 0.08 | 0.673 0.06 | 0.263 0.14 | 0.804 0.03 | 0.112 0.12 | 0.672 0.09 | 0.188 0.15 | 0.702 0.07 |
| Zlib [5]) | 0.000 0.00 | 0.338 0.02 | 0.022 0.01 | 0.179 0.01 | 0.012 0.02 | 0.369 0.03 | 0.009 0.01 | 0.337 0.03 |
| + Diversity | 0.129 0.08 | 0.677 0.05 | 0.318 0.11 | 0.842 0.03 | 0.099 0.11 | 0.628 0.08 | 0.176 0.14 | 0.689 0.07 |
| + Quality | 0.124 0.08 | 0.673 0.06 | 0.208 0.10 | 0.833 0.03 | 0.092 0.10 | 0.667 0.08 | 0.210 0.14 | 0.688 0.08 |
| + Combined | 0.128 0.08 | 0.690 0.06 | 0.295 0.14 | 0.868 0.02 | 0.121 0.12 | 0.672 0.08 | 0.194 0.15 | 0.693 0.08 |
| Min-K% [27] | 0.124 0.08 | 0.680 0.05 | 0.067 0.07 | 0.703 0.04 | 0.086 0.11 | 0.650 0.08 | 0.127 0.11 | 0.648 0.09 |
| + Diversity | 0.120 0.07 | 0.677 0.05 | 0.219 0.08 | 0.775 0.03 | 0.064 0.09 | 0.640 0.08 | 0.144 0.13 | 0.685 0.08 |
| + Quality | 0.124 0.07 | 0.695 0.05 | 0.095 0.09 | 0.772 0.03 | 0.094 0.11 | 0.674 0.09 | 0.178 0.14 | 0.686 0.08 |
| + Combined | 0.113 0.07 | 0.694 0.05 | 0.240 0.15 | 0.837 0.03 | 0.092 0.10 | 0.694 0.08 | 0.182 0.14 | 0.700 0.07 |
| CAMIA [6] | 0.111 0.09 | 0.683 0.05 | 0.428 0.25 | 0.958 0.01 | 0.128 0.12 | 0.664 0.08 | 0.166 0.13 | 0.686 0.09 |
| + Diversity | 0.118 0.08 | 0.690 0.05 | 0.517 0.25 | 0.966 0.01 | 0.104 0.11 | 0.668 0.08 | 0.146 0.12 | 0.692 0.08 |
| + Quality | 0.131 0.08 | 0.708 0.05 | 0.501 0.26 | 0.964 0.01 | 0.115 0.13 | 0.682 0.08 | 0.192 0.14 | 0.712 0.08 |
| + Combined | 0.109 0.08 | 0.696 0.05 | 0.553 0.27 | 0.969 0.01 | 0.109 0.12 | 0.689 0.08 | 0.176 0.13 | 0.716 0.08 |
| VAR-d30 | RAR-XXL | |||||
|---|---|---|---|---|---|---|
| Attack | TPR@1%FPR | AUC | ACC | TPR@1%FPR | AUC | ACC |
| Baseline [19] | 0.040 0.02 | 0.750 0.02 | 0.607 0.07 | 0.044 0.02 | 0.754 0.01 | 0.562 0.02 |
| + Diversity | 0.090 0.09 | 0.755 0.03 | 0.691 0.03 | 0.084 0.06 | 0.771 0.03 | 0.700 0.03 |
| + Quality | 0.076 0.08 | 0.757 0.03 | 0.703 0.03 | 0.079 0.07 | 0.754 0.04 | 0.703 0.03 |
| + Combined | 0.088 0.06 | 0.750 0.04 | 0.696 0.03 | 0.069 0.05 | 0.775 0.03 | 0.713 0.03 |
5.8 MADreMIA amplifies baseline DI
The p-value histograms in Figure˜4 demonstrate that MADreMIA trajectory features consistently strengthen the statistical evidence for dataset-level inference across all evaluated architectures. On Pythia-6.9B, augmented variants reach the 95% confidence threshold at around 100 samples versus roughly 150 for the baseline. Furthermore, augmented variants shift the distribution of values noticeably rightward relative to the baseline, with this pattern holding across all three signal types. The effect is more pronounced on RAR-XXL, where the Combined variant produces a substantially larger rightward shift, indicating that individual trials yield stronger and more reliable evidence for membership inference.
5.9 Sensitivity analysis of generation strength
Figure 5 shows PR curves for VAR-d30 across regeneration strengths , where controls how many final scales are regenerated. Members consistently achieve higher precision and recall than non-members across all values of , confirming that the MIA signal is robust to the choice of regeneration strength. As increases, however, the two groups converge in PR space (see Appendix J).
5.10 Trajectory asymmetry scaling across model families
As illustrated in Figure˜6, the membership signal – quantified by persists across all model scales, suggesting that the observed asymmetry is a fundamental property rather than an artifact of specific parameter regimes. While the magnitude of this separation varies across architectures, its relationship with model scale is not uniform. The separation grows stronger with model size in VAR and DiT-MoE, but remains largely unaffected by scaling in RARs. Ultimately, the underlying trend is robust: iterative trajectory chaining consistently exposes a larger membership gap compared to standard one-shot generations.
6 Conclusions
We introduced MADreMIA, a model-agnostic membership inference signal amplifier for large generative models. By chaining repeated regenerations rather than relying on a single query, MADreMIA exploits a consistent asymmetry: member samples retain coherence across iterations while non-members drift and deteriorate. This signal generalizes across image, text, and audio generators, spanning IAR, diffusion, and LLM families. Our experimental results show that fusing trajectory-derived features with baseline MIA/DI scores further improves member/non-member separability, suggesting that iterative regeneration is a broadly applicable lens for privacy auditing and copyright attribution.
Acknowledgments
We gratefully acknowledge Polish high-performance computing infrastructure PLGrid for providing computer facilities and support within computational grant no. PLG/2025/018391. This research was partially funded by National Science Centre, Poland, grant no: 2023/51/I/ST6/02854.
References
- [1] (2023) Self-consuming generative models go mad. In The Twelfth International Conference on Learning Representations, Cited by: §E.3, §1, §2.
- [2] (2021) On training sample memorization: lessons from benchmarking generative modeling with a large-scale competition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, New York, NY, USA, pp. 2534–2542. External Links: ISBN 9781450383325, Link, Document Cited by: §E.1, §2.
- [3] (2023) All are worth words: a vit backbone for diffusion models. In CVPR, Cited by: Appendix F, §5.1.
- [4] (2023) Pythia: a suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. Cited by: Appendix F, §5.1.
- [5] (2021) Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), pp. 2633–2650. Cited by: §E.2, §1, §2, Table 3.
- [6] (2025) Context-aware membership inference attacks against pre-trained large language models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 7299–7321. Cited by: §E.2, §2, Table 3.
- [7] (2025) Enhancing privacy-utility trade-offs to mitigate memorization in diffusion models. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8182–8191. External Links: Link Cited by: §E.1, §2.
- [8] (2021) Label-only membership inference attacks. In International conference on machine learning, pp. 1964–1974. Cited by: §1.
- [9] (2024) Aiming for fairness: an exploration into getty images v. stability ai and its importance in the landscape of modern copyright law. DePaul J. Art Tech. & Intell. Prop. L 34, pp. 124. Cited by: Appendix K, §5.4.
- [10] (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: Table 8, §5.1.
- [11] (2024) Do membership inference attacks work on large language models?. In Conference on Language Modeling (COLM), Cited by: Table 8, Appendix G.
- [12] (2025) Cdi: copyrighted data identification in diffusion models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 18674–18684. Cited by: §E.2, §1, §2.
- [13] (2024) Scaling diffusion transformers to 16 billion parameters. External Links: 2407.11633, Link Cited by: Appendix F, §5.1.
- [14] (2024) OLMo: accelerating the science of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15789–15809. Cited by: Appendix F, §5.1.
- [15] (2023) On memorization in diffusion models. arXiv preprint arXiv:2310.02664. Cited by: §E.1, §2.
- [16] (2024) Be like a goldfish, don’t memorize! mitigating memorization in generative llms. Advances in Neural Information Processing Systems 37, pp. 24022–24045. Cited by: §E.1, §2.
- [17] (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30. Cited by: §5.2.
- [18] (2018) Fr’echet audio distance: a metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466. Cited by: §5.2.
- [19] (2025) Privacy attacks on image autoregressive models. In Forty-second International Conference on Machine Learning, External Links: Link Cited by: §E.2, §2, §4, §5.3, Table 4.
- [20] (2024) Towards black-box membership inference attack for diffusion models. CoRR abs/2405.20771. External Links: Document, 2405.20771, Link Cited by: §E.2, §2.
- [21] (2023) Freevc: towards high-quality text-free one-shot voice conversion. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. Cited by: Appendix F, §5.1.
- [22] (2024) LLM dataset inference: did you train on my dataset?. CoRR abs/2406.06443. External Links: Document, 2406.06443, Link Cited by: §E.2, §1, §2.
- [23] Reassessing emnlp 2024’s best paper: does divergence-based calibration for mias hold up?. In The Fourth Blogpost Track at ICLR 2025, Cited by: §E.2, §2.
- [24] (2021) Dataset inference: ownership resolution in machine learning. arXiv preprint arXiv:2104.10706. Cited by: §1.
- [25] (2019) Autovc: zero-shot voice style transfer with only autoencoder loss. In International Conference on Machine Learning, pp. 5210–5219. Cited by: Appendix F, §5.1.
- [26] (2024) Mitigating memorization in language models. In The Thirteenth International Conference on Learning Representations, Cited by: §E.1, §2.
- [27] (2023) Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789. Cited by: Table 3.
- [28] (2017) Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp. 3–18. Cited by: §1.
- [29] (2024-07) AI models collapse when trained on recursively generated data. Nature 631, pp. 755–759. External Links: Document Cited by: §E.3, §1, §2.
- [30] (2024) Evaluation data contamination in llms: how do we measure it and (when) does it matter?. arXiv preprint arXiv:2411.03923. Cited by: §1.
- [31] (2024) Dolma: an open corpus of three trillion tokens for language model pretraining research. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15725–15788. Cited by: Table 8.
- [32] (2025) (Token-level) InfoRMIA: stronger membership inference and memorization assessment for LLMs. CoRR abs/2510.05582. External Links: Document, 2510.05582, Link Cited by: §E.2, §2.
- [33] (2024) Visual autoregressive modeling: scalable image generation via next-scale prediction. External Links: 2404.02905, Link Cited by: Appendix F, §5.1.
- [34] (2023) Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971. Cited by: Appendix F, §5.1.
- [35] (2021) On memorization in probabilistic deep generative models. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. W. Vaughan (Eds.), Vol. 34, pp. 27916–27928. External Links: Link Cited by: §E.1, §2.
- [36] (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140. Cited by: Table 8, §5.1.
- [37] (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: item Structural Similarity Index Measure (SSIM) [37]:, item Structural Similarity Index Measure (SSIM) [37]:, §4.1.
- [38] (2024) Detecting, explaining, and mitigating memorization in diffusion models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §E.1.
- [39] (2024) You only query once: an efficient label-only membership inference attack. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1.
- [40] (2019) CSTR vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit (version 0.92). The Rainbow Passage which the speakers read out can be found in the International Dialects of English Archive:(http://web. ku. edu/˜ idea/readings/rainbow. htm).. Cited by: Table 8.
- [41] (2022) Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC conference on computer and communications security, pp. 3093–3106. Cited by: §E.2, §1, §2.
- [42] (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp. 268–282. Cited by: Table 3.
- [43] (2024) Randomized autoregressive visual generation. External Links: 2411.00776, Link Cited by: Appendix F, §5.1.
- [44] (2026) Detecting data contamination in LLMs via in-context learning. In The Fourteenth International Conference on Learning Representations, External Links: Link Cited by: Appendix G, §1.
- [45] (2019) LibriTTS: a corpus derived from librispeech for text-to-speech. In Interspeech, External Links: Link Cited by: Table 8.
- [46] (2024) Min-%++: improved baseline for detecting pre-training data from large language models. CoRR abs/2404.02936. External Links: Document, 2404.02936, Link Cited by: §E.2, §1, §2.
- [47] (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595. Cited by: item Learned Perceptual Image Patch Similarity (LPIPS) [47]:, item Learned Perceptual Image Patch Similarity (LPIPS) [47]:, §4.1.
- [48] (2022) Opt: open pre-trained transformer language models. arXiv preprint arXiv:2205.01068. Cited by: Appendix F, §5.1.
- [49] (2022) Membership inference attacks against synthetic health data. Journal of biomedical informatics 125, pp. 103977. Cited by: §1.
Appendix A Impact Statement
This work advances methods for auditing generative models by improving membership and dataset inference through chained regeneration. The primary positive impact is stronger accountability: MADreMIA can help detect memorization of sensitive, proprietary, or benchmark data, supporting privacy audits, copyright verification, and unlearning validation across model families and modalities.
While enhanced inference capabilities can assist in model auditing and transparency, they also require responsible application to avoid potential misuse. We frame MADreMIA as a tool for research evaluation, compliance monitoring, and internal red-teaming. It is important to note that our method provides statistical evidence rather than a definitive proof of data inclusion; therefore, results should be interpreted alongside additional forensic and procedural evidence within a broader data governance framework.
Appendix B Limitations
While our proposed framework is designed to be cross-modal and model-agnostic, our experimental scope is naturally constrained by several practical and theoretical factors. Most notably, we do not conduct full Membership Inference Attack (MIA) evaluations on audio generation models. Although our initial signal-degradation experiments indicate that iterative trajectory features exist in the audio domain, the literature currently lacks established single-step baselines tailored for these architectures, leaving MIA for audio models untested. Furthermore, while our framework is conceptually compatible with restricted setups, our current empirical evaluations rely on gray-box access to exact next-token logits, meaning that strictly black-box MIA remains untested in our work. Operationally, the primary limitation of our method is its scalability; the iterative regeneration loop inherently introduces a linear computational overhead by requiring multiple forward passes per sample. From a theoretical perspective, our core assumptions A1 and A3 are only partially satisfied in practice, as demonstrated by the empirical measurements in Table 2. Finally, our evaluations may be susceptible to distribution-shift confounds—where trajectory differences might stem from inherent dataset mismatches rather than pure memorization—and the exploratory findings presented in Section 5.6 are based on preliminary small- evidence that will require larger-scale validation in future work.
Appendix C LLM Usage
Large language models were used to improve the readability and clarity of portions of the manuscript, as well as to provide feedback during the writing and revision process. The authors verified all technical statements, citations, and claims and take full responsibility for the final content.
Appendix D Method Overview
Appendix E Extended Related Works
Our work builds upon three intersecting lines of prior research: the characterization of data memorization in generative models, the evolution of membership inference, and the dynamics of model collapse during recursive generation.
E.1 Memorization
Memorization — the tendency of generative models to reproduce training examples rather than generate novel samples — has been studied across multiple model families and from both measurement and mitigation perspectives. van den Burg and Williams [35] formalized the problem for probabilistic generative models such as VAEs, showing that memorization differs fundamentally from mode collapse and overfitting and is not captured by commonly-used nearest-neighbor tests. For diffusion models, Gu et al. [15] show that the denoising score matching objective has a closed-form optimum that can only replicate training samples, and introduces the EMM metric to quantify how dataset size and model configuration govern the generalization-to-memorization transition. Sakarvadia et al. [26] localize this phenomenon through bright-ending cross-attention patterns, while the sharpness-based framework of [38] justifies score-difference memorization metrics and proposes mitigation via sharpness-aware regularization of the initial noise. The benchmarking study [2] demonstrated that standard evaluation metrics fail to surface memorization even in competitive settings. Mitigation has been tackled both for LLMs, where Hans et al. [16] propose the goldfish loss that excludes randomly sampled token subsets from the training objective to prevent verbatim reproduction, and for text-to-image diffusion models, where Chen et al. [7] address the privacy–utility tension by combining prompt re-anchoring with semantic prompt search to improve both dimensions simultaneously.
E.2 Membership/Dataset Inference
A second line of work investigates whether specific examples or datasets can be identified from model behavior. Because individual Membership Inference Attacks (MIAs) can be confounded by distribution shifts [22], recent literature often favors Dataset Inference (DI), which aggregates feature evidence across many samples to statistically detect training data usage [23, 12, 19]. Concurrently, individual MIA methods must adapt to increasingly restrictive black-box deployments. Furthermore, approaches based on training multiple shadow models to learn membership distributions [41, 5] are now computationally infeasible for massive modern architectures. Consequently, modern attacks must extract signals using only limited outputs rather than internal weights or gradients [46, 6, 32].
In these restricted settings, recent black-box attacks heavily rely on output variations. For example, Li et al. [20] perform MIAs on diffusion models by repeatedly perturbing a target image via an API, averaging the results, and comparing them to the original sample. However, in an interrogation analogy, this approach merely asks multiple paraphrased versions of the exact same question. Because the target sample is perturbed independently each time, the query does not dynamically evolve in response to the model’s previous answers, leaving deeper structural memorization unexploited.
E.3 Model Collapse
The last, but very important point is the literature on recursive self-training in generative models. Alemohammad et al. [1] showed that self-consuming generative loops lead to progressive degradation in quality or diversity when insufficient fresh real data is injected at each generation, a phenomenon they term Model Autophagy Disorder. Their analysis is especially important for our setting because it frames repeated regeneration not as a neutral operation, but as a process that can magnify latent properties of the learned distribution. Closely related, Shumailov et al. [29] showed that recursively training on model-generated data causes model collapse, where tails of the original distribution disappear and learned behaviour drifts toward degenerate approximations. Taken together, these works suggest that iterative generation is structurally revealing: under repeated reuse, memorized or high-density regions may persist differently from non-member examples, while generic outputs may drift or collapse. Our method turns this insight into a privacy-auditing mechanism: rather than studying recursive generation as a training-time pathology, we exploit chained regeneration at inference time to amplify membership-relevant differences.
Appendix F Model Details
In our experiments, we consider two vision model families: image autoregressive models (IARs) and diffusion models. The IAR category includes VAR [33] and RAR [43] variants, while the diffusion category includes DiT-MoE [13] and UViT-T2I [3]. Furthermore, as others modalities, we evaluate large language models (LLMs) and voice conversion (VC) models. The LLMs include Pythia [4], OLMo [14], OPT [48], and Llama [34], while the VC models consist of AutoVC [25] and FreeVC [21]. Across all settings, we focus on representative, high-performing model variants.
| IAR Models | Diffusion Models | ||||||||
| VAR-d30 | VAR-d24 | VAR-d20 | RAR-XXL | RAR-XL | RAR-L | DiT-MoE-G | DiT-MoE-XL | UViT-T2I-Deep | |
| Model parameters | 2.1B | 1.0B | 600M | 1.5B | 955M | 462M | 16.5B | 4.1B | 141M |
| Training epochs | 350 | 300 | 250 | 400 | 400 | 400 | — | — | — |
| FID | 1.92 | 2.33 | 2.95 | 1.48 | 1.50 | 1.70 | 1.72 | 2.10 | 5.48 |
| OLMo | Llama | Pythia | OPT | |
|---|---|---|---|---|
| Model parameters | 7B | 13B | 6.9B | 6.7B |
| Training tokens | 2.46T | 1T | 300B | 180B |
| AutoVC | FreeVC | |
|---|---|---|
| Model parameters | 28M | 39M |
| Training data (hours) | 44 | 40 |
| SMOS (seen-to-seen) | 3.5 | 4.1 |
Appendix G Dataset Details
For vision and audio models that have publicly known and available train/test splits we use these datasets. For most LLMs we use established MIA benchmarks (e.g. WikiMIA), but for OLMo, we use their corresponding training sets and the Global News as non-member set, as suggested in [44].
| Model | Members | Non-members |
|---|---|---|
| VAR | ImageNet [10] | ImageNet |
| RAR | ImageNet | ImageNet |
| DiT-MoE | ImageNet | ImageNet |
| UViT-T2I | COCO [36] | COCO |
| Pythia | Mimir [11] | Mimir |
| OLMo | Dolma [31] | Global News |
| Llama | WikiMIA | WikiMIA |
| OPT | WikiMIA | WikiMIA |
| AutoVC | VCTK [40] | LibriTTS [45] |
| FreeVC | VCTK | LibriTTS |
Importantly, for the Pythia-6.9B we use the Mimir dataset [11] which consists of 6 subsets: arxiv, dm_mathematics, github, hackernews, pubmed_central, and wikipedia_(en). We concatenate all these subsets and randomly select samples from the pool. We use the ngram_7_0.2 data split. For the rest of the models, we employ their corresponding datasets’ train split as members and val/test split as nonmembers.
Appendix H Metrics Details
The following metrics are computed over the sequence of model outputs collected across MADreMIA iterations, capturing how the model’s generative behavior evolves under repeated generation.
H.1 Features for Language Models
- Jaccard Similarity:
-
Measures the lexical overlap between the model’s output at a given iteration and its initial response, computed over trigrams. A high Jaccard similarity indicates that the model rigidly reproduces the same surface forms across iterations, which is characteristic of memorized content.
- Token Diversity:
-
Quantifies the divergence between the token probability distribution at the current iteration and the initial distribution . Large values indicate that the model’s vocabulary preferences shift substantially during reconstruction, reflecting instability in its output distribution.
- Token Distribution Shift:
-
We define it as a Jensen-Shannon Divergence, which is a symmetric and bounded variant of KLD that measures the distributional distance between and via their mixture . Compared to KLD, JSD is well-defined even when the supports of and do not fully overlap, making it a more numerically stable measure of distributional drift across iterations.
- Predictive Entropy:
-
Measures the uncertainty of the model’s next-token distribution over the full vocabulary . Low entropy indicates that the model assigns high probability mass to a single token — consistent with confident, memorized reproduction — whereas high entropy reflects diffuse, uncertain predictions.
- Margin:
-
Captures the decisiveness of the model’s token predictions by computing the difference in probability between the top-ranked and second-ranked tokens. A large margin indicates high confidence in a specific token, which may signal memorized recall, while a small margin reflects genuine uncertainty between competing continuations.
H.2 Features for Vision Models
- Mean Squared Error (MSE):
-
Measures the average pixel-level reconstruction error between the generated image at a given iteration and the original input. Lower MSE indicates that the model consistently reproduces fine-grained pixel details across iterations, which is a strong signal of memorization.
- Structural Similarity Index Measure (SSIM) [37]:
-
Evaluates perceptual similarity between the reconstructed image and the original by jointly comparing luminance, contrast, and structural information across local image patches. Unlike MSE, SSIM is sensitive to perceptual distortions that are meaningful to human observers, and its stability across iterations serves as a complementary signal to pixel-level metrics.
where , are local means, , are local variances, is the cross-covariance, and , are stabilization constants.
- Learned Perceptual Image Patch Similarity (LPIPS) [47]:
-
Quantifies perceptual dissimilarity between and using deep feature representations extracted from a pretrained network . By operating in a learned feature space rather than pixel space, LPIPS captures high-level semantic and textural differences that are invisible to MSE or SSIM, making it particularly sensitive to cases where a model reproduces semantic content while varying low-level details.
where denotes the feature map at layer of the pretrained network and are learned channel-wise weights.
Appendix I Additional Dataset Inference Results
Figure˜8 extends our dataset inference evaluation to Llama-13B and VAR-d30. On Llama-13B, augmented variants reach the 95% confidence threshold faster than the baseline, with the Combined and Quality signals leading, though convergence is noisier at low sample counts. On VAR-d30, the benefit is more pronounced: augmented variants cross the threshold at roughly 100 samples compared to over 200 for the baseline, with all three signal types outperforming it consistently. The significance histograms corroborate these findings — the Combined variant shifts the distribution rightward on both models, confirming that trajectory features yield stronger per-trial evidence.
Appendix J Precision and Recall for Generative Models
Figure˜9 shows Precision and Recall across iterations for VAR-d30 and DiT-MoE-XL. In both models and both metrics, members consistently score higher than non-members throughout all iterations, confirming that the membership signal is stable and model-agnostic. Notably, the gap between members and non-members widens as iterations progress, indicating that chained regeneration amplifies the underlying asymmetry rather than merely preserving it.
Appendix K Getty Images Case
As a practical case study, we consider the Getty Images v. Stability AI dispute [9] and evaluate whether chained regeneration can distinguish images that are plausibly associated with the Stable Diffusion training distribution from images that are very unlikely to have been included. We use Stable Diffusion 1.5 as the target model. For the positive pool, we extract 2,000 images from LAION-2B whose metadata contains the string gettyimages and treat them as members. For the negative pool, we collect 2,000 images from the Getty Images website whose upload date is after January 1, 2025, and treat them as non-members. Because these images post-date the original Stable Diffusion 1.5 training era (late 2022), they provide a conservative practical control group for this experiment.
For each pool, we run the same chained-regeneration procedure for 15 iterations and summarize the trajectories with SSIM and reconstruction error (MSE) (see Figure˜10). The SSIM plot measures whether regenerations remain structurally closer to the initial query for the member pool than for the non-member pool. The MSE plot provides a complementary pixel-level view across regeneration depth by measuring how quickly reconstructed samples drift away from their reference images. In our experiments, the two pools remain visibly separated under both SSIM and MSE. We do not use FID in this case, because it is very unstable on 2,000-image pools. We still interpret MSE conservatively: it is sensitive to low-level reconstruction error rather than semantic fidelity alone. For this reason, we use MSE as a stable auxiliary trajectory measure across iterations, while SSIM remains the more directly interpretable structural signal in this case study.
Appendix L Proofs for Section 3
L.1 Proof of Theorem 3.2
Proof. By definition,
Hence
Under A1,
so
Therefore
For the denominator, A3 gives, for each class ,
Hence
Combining with the lower bound on ,
This proves Theorem 3.2. ∎
L.2 Proof of Corollary 3.3
Proof. Assume . Then
Since for , with we get
Therefore
By Theorem 3.2,
where
and absorbs only -independent constants (including and comparability constants).
To optimize the shape in , differentiate:
For , critical points satisfy
This has a unique positive solution , so the surrogate shape is maximized at
∎
L.3 Proof of Corollary 3.4 (shape-constant clarification)
Proof. From the previous corollary (under the same comparability regime),
Assume additionally
so . Taking square roots and ratio:
where
Thus is the idealized shape constant; additional model-dependent prefactors remain absorbed by . ∎
L.4 Additional comments on Bayes-cap statement at the end of Section 3
If membership is deterministic in the initial sample, , then , so
Also, conditioning on already determines , hence
which implies
Therefore, by the chain rule for mutual information,
Thus trajectory iteration cannot increase Bayes-optimal information; it can improve practical fixed-form statistics through variance reduction and temporal aggregation.