Amplifying Membership Signal
Through Chained Regeneration

Wojciech Łapacz
Warsaw University of Technology
&Stanisław Pawlak¹¹footnotemark: 1
Warsaw University of Technology Equal contribution.Contact: wojciech.lapacz02@gmail.com

Abstract

The tendency of large generative models to memorize training data makes sample verification critical for privacy auditing and copyright enforcement. Current membership (MIA) and dataset inference (DI) attacks often rely on one-shot generations, which yield weak signals and limited sensitivity across modalities. Inspired by Model Autophagy Disorder (MAD), we introduce MADreMIA, a model-agnostic framework that enhances white-, gray-, and black-box MIA and DI. Rather than relying on shadow model training – often infeasible for large generative models – our framework facilitates scalable inference by leveraging inherent signals through iterative trajectories. This process utilizes chained generations across diverse modalities, where each output serves as the subsequent input, to improve membership evidence at low FPR. We demonstrate that memorized training samples exhibit significantly higher coherence and slower degradation during iterative regeneration than non-member generations. Our results show that MADreMIA provides richer signals across diverse model families and modalities; we present comprehensive evaluations for IARs, diffusion, and language models, alongside preliminary results demonstrating its potential for audio models.

1 Introduction

The rapid development of generative AI triggered a pressing demand for training data, frequently leading to the unauthorized ingestion of private, sensitive, or copyrighted content. Consequently, with the scaling of generative models the importance of Membership inference attacks (MIAs) [28] and dataset inference (DI) [24] has become critical. Practical auditing – ranging from protecting medical privacy [49] to identifying licensed content [12] or detecting benchmark contamination [22, 30, 44] – requires determining whether specific samples or datasets were used to shape a model’s parameters. The definitive test is whether a model retains a structural ”echo” of its training data, manifesting itself as a high-fidelity memorization signal that can be surfaced through targeted inference. Existing auditing methods, however, face a significant bottleneck. Most extract evidence from a single query [46, 39] or a set of loosely coupled samples [8]. These one-shot signals are often fragile; recent evaluations on unbiased benchmarks show that many MIAs degrade significantly under distributional shifts, often performing only slightly better than random guessing [22, 12]. Furthermore, high-performance ”shadow model” attacks [41, 5] – which require training multiple auxiliary models to simulate the target – are computationally expensive and impractical for real-world large-scale generative architectures.

Refer to caption — Figure 1: Comparison between conventional one-shot membership inference attack and our chained-generation approach. The former use a single query, which yields a weak signal that often fails to separate members from non-members. In the latter, each generation informs the next query, progressively amplifying membership evidence and improving separability: re-members are more coherent and degrade slower than re-non-members .

To address these limitations, we shift the perspective from a single static query to a dynamic trajectory. This concept is best illustrated through a forensic parallel: in a criminal interrogation, a suspect may maintain a lie for a single response, but that lie often collapses under the pressure of repeated, recursive follow-up questions. A truthful narrative, by contrast, remains coherent because it is grounded in a fixed reality. We argue that generative models exhibit a similar phenomenon – their ”truth” is the training set. While a model can produce a plausible-looking output for a non-member sample once, it may struggle to sustain that plausibility over a recursive chain of self-generated inputs.

Our framework, MADreMIA, is inspired by the mechanics of Model Autophagy Disorder (MAD) [1, 29]. Traditionally, MAD describes a failure mode where models trained on their own synthetic outputs progressively lose variance and collapse into a state of degenerated ”madness”. We pivot this phenomenon into a diagnostic, interference-time tool: if a sample was present during training, it acts as a stable “attractor” in the model’s latent space. By repeatedly feeding a model’s outputs back into itself – creating an iterative regeneration chain – we can amplify the signal of memorization.

Within this framework, we distinguish between two types of trajectories:

•

Re-members: These are member samples (training data) that are iteratively re-generated. Because the model has ”memorized” these points, they exhibit high stability and slow semantic degradation over time.
•

Re-non-members: These are unseen samples that are iteratively re-generated. Lacking a structural anchor in the model’s weights, these samples drift rapidly toward the model’s average biases or dissolve into noise (see Figure 1).

MADreMIA functions as a modular, inference-time add-on that is intentionally method-, model-, and modality-agnostic. By measuring consistency across recursive loops, we provide richer signals across diverse architectures, including image autoregressive models (IARs), diffusion models (DMs), large language models (LLMs), and audio voice conversion models. We demonstrate that while a single output is often too noisy to be decisive, the trajectory of a “re-member” is different than “re-non-member” and thus acts as a powerful signal amplifier, surfacing traces of training data that are otherwise invisible.

MADreMIA iterative procedure moves beyond one-shot plausibility by probing whether the model preserves semantic and structural consistency under repeated self-interaction. Consequently, this work investigates a central research question: Can the dynamics of recursive self-generation serve as a signal amplifier to expose training data membership?

In summary, the main contributions of our paper are:

•

We introduce an iterative regeneration setup to uncover data memorization invisible during single-pass inference.
•

We show theoretically and empirically that trajectory features (generation dynamics over time) yield a significantly more statistically robust membership signal. By functioning as a variance reduction mechanism, these features isolate the underlying membership information much more effectively than standard one-shot baselines.
•

We propose an inference-time, cross-modal framework that improves Membership and Dataset Inference efficiency across Vision and Language models without the need for expensive shadow model training.

2 Related Works

Memorization.

Memorization in generative models — the tendency to reproduce training examples rather than generate novel samples — has been studied across multiple model families. Early work formalized the distinction between memorization, mode collapse, and overfitting [35], while subsequent studies characterized the generalization-to-memorization transition in diffusion models [15], localized it through attention patterns [26], and showed that standard evaluation metrics fail to surface it [2]. Mitigation strategies have been proposed for both LLMs [16] and text-to-image models [7].

Membership and Dataset Inference.

Individual Membership Inference Attacks (MIAs) can be confounded by distribution shifts [22], prompting a shift toward Dataset Inference, which aggregates evidence across many samples [23, 12, 19]. Shadow-model approaches [41, 5] are now computationally infeasible for large architectures, so modern attacks extract signals from limited black-box outputs [46, 6, 32]. Most relevant to our work, Li et al. [20] performs MIAs on diffusion models by repeatedly perturbing a target image and comparing averaged outputs to the original — but since queries are independent and do not evolve with model responses, deeper structural memorization remains unexploited.

Model Collapse.

Recursive self-training in generative models leads to progressive quality and diversity degradation when insufficient real data is injected — a phenomenon termed Model Autophagy Disorder [1]. Training on model-generated data further causes tails of the original distribution to disappear [29]. Together, these works suggest that iterative generation is structurally revealing: memorized regions may persist differently from non-member examples under repeated reuse. Our method turns this insight into a privacy-auditing mechanism, exploiting chained regeneration at inference time to amplify membership-relevant differences rather than treating collapse as a training-time pathology. The extended related works section can be found in Appendix E.

3 Theory of Trajectory-Based Signal Amplification

For each sample, we define an iterative trajectory $Z_{0},Z_{1},\dots,Z_{T}$ , where $Z_{0}$ is the observed sample and $Z_{t+1}$ is produced by one regeneration step. Let $M\in\{0,1\}$ denote membership. Define a per-step score $\phi_{t}:=\phi(Z_{t},Z_{t+1})$ and the average $S_{T}:=\frac{1}{T}\sum_{t=0}^{T-1}\phi_{t}$ . The attack predicts $M$ from $S_{T}$ . We use $a_{T}\gtrsim b_{T}$ when $a_{T}\geq c\,b_{T}$ for a constant $c>0$ independent of $T$ , and $a_{T}\asymp b_{T}$ for two-sided bounds.

Assumption 3.1 (Signal and Noise).

(A1) There exists a sequence $(\Delta_{t}\geq 0)$ such that $\mathbb{E}[\phi_{t}\mid M=1]-\mathbb{E}[\phi_{t}\mid M=0]\geq\Delta_{t}$ . (A2) $\max_{m}\sup_{t}\mathrm{Var}(\phi_{t}\mid M=m)\leq\sigma^{2}<\infty$ . (A3) The centered process $\tilde{\phi}_{t}:=\phi_{t}-\mathbb{E}[\phi_{t}\mid M]$ is geometrically mixing with effective autocorrelation time $\tau_{\mathrm{eff}}$ , implying $\mathrm{Var}(S_{T}\mid M)\leq C\frac{\sigma^{2}\tau_{\mathrm{eff}}}{T}$ .

Theorem 3.2 (Trajectory Averaging).

Under A1–A3, the signal $\Gamma_{T}:=|\mathbb{E}[S_{T}\mid M=1]-\mathbb{E}[S_{T}\mid M=0]|$ and SNR satisfy:

\Gamma_{T}\geq\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t},\quad\mathrm{SNR}^{2}(S_{T}):=\frac{\Gamma_{T}^{2}}{\max_{m}\mathrm{Var}(S_{T}\mid M=m)}\geq\frac{(\frac{1}{T}\sum\Delta_{t})^{2}}{C\sigma^{2}\tau_{\mathrm{eff}}/T}.

Interpretation. Multi-step attacks improve when mean signal decays slowly relative to variance reduction.

Corollary 3.3 (Exponential Leakage).

If $\Delta_{t}=\Delta_{0}e^{-t/\tau_{g}}$ , then $\Gamma_{T}\geq\Delta_{0}\frac{1-e^{-T/\tau_{g}}}{T/\tau_{g}}$ . If $\Gamma_{T}\asymp\Delta_{0}\frac{1-e^{-T/\tau_{g}}}{T/\tau_{g}}$ , then $\mathrm{SNR}^{2}(S_{T})\gtrsim g(T/\tau_{g})$ where $g(x):=\frac{(1-e^{-x})^{2}}{x}$ . The maximizer $x^{\star}\approx 1.2564$ yields an optimal $T^{\star}\approx 1.2564\,\tau_{g}$ .

Corollary 3.4 (Amplification Gain).

Let $\kappa:=\tau_{g}/\tau_{\mathrm{eff}}$ . At $T=T^{\star}$ , the gain over the single-step baseline $S_{1}$ is $\frac{\mathrm{SNR}(S_{T^{\star}})}{\mathrm{SNR}(S_{1})}\gtrsim c\sqrt{\kappa}$ , with $c\approx 0.638$ .

It is worth noting that we do not claim that trajectory iteration increase the Bayes information ceiling $I(M;Z_{0})$ ; No, instead it improves practical fixed-form statistics via temporal variance reduction.

This theory applies to any iterative protocol satisfying A1–A3. Theorem 3.2 provides a conditional amplification guarantee. We present proofs in the Appendix L.

4 Method

MADreMIA is a trajectory-augmentation framework for privacy inference on generative models. It is designed as an any-box extension of standard one-shot attacks (MIA/DI): black-box by default, gray-box when richer outputs are available, and white-box when needed. The central design principle is to keep the downstream scorer unchanged and improve only its input representation through additional trajectory-derived evidence.

Unified setup.

Following Sec. 3, for each queried sample we construct

Z_{0},Z_{1},\dots,Z_{T},\qquad Z_{t+1}=\mathcal{R}(f,Z_{t}),\;t=0,\dots,T-1,

where $Z_{0}=x$ is the queried sample, $f$ is the audited generator, and $\mathcal{R}$ is a modality-specific regeneration operator executed under a fixed protocol. For MIA, the label is $M\in\{0,1\}$ (member/non-member). For DI, we use an analogous binary label $D\in\{0,1\}$ (in-target-dataset/out-of-target-dataset).

Threat model.

MADreMIA supports: black-box (query access to $f$ outputs only), gray-box (query access plus output-level statistics such as loss/log-probability signals), and white-box (optional access to internals/gradients when available). In all cases, the adversary/auditor has no access to training data identities (labels), performs at most $T$ regeneration steps per sample, and outputs a binary prediction via $h$ : $M$ for MIA or $D$ for DI.

Base one-shot signal.

The theory defines $\phi_{t}:=\phi(Z_{t},Z_{t+1}),\qquad S_{T}:=\frac{1}{T}\sum_{t=0}^{T-1}\phi_{t}.$

A trajectory one-shot comparator corresponds to the $T=1$ case (using $\phi_{0}=\phi(Z_{0},Z_{1})$ ). When available, we additionally report classical one-shot baselines $z_{\mathrm{base}}=\phi_{\mathrm{base}}(Z_{0})$ . Importantly, for each modality/model, the orientation (sign) of $\phi_{t}$ is fixed on train data only (equivalently $\phi_{t}$ or $-\phi_{t}$ ) and then frozen for test-time evaluation.

Signals and Fusion.

MADreMIA augments one-shot evidence with trajectory summaries computed from $(Z_{0},\dots,Z_{T})$ . We define

z_{\mathrm{base}}=\phi_{\mathrm{base}}(Z_{0})\in\mathbb{R}^{d},\qquad z_{\mathrm{traj}}=\psi(Z_{0},\dots,Z_{T})\in\mathbb{R}^{k},

Here $\psi$ aggregates temporal statistics aligned with the $\phi_{t}$ process (e.g., drift, consistency, quality evolution, diversity, score decay, and summaries derived from $\{\phi_{t}\}_{t=0}^{T-1}$ and $S_{T}$ ). The fused representation is $\tilde{z}=[z_{\mathrm{base}}\|z_{\mathrm{traj}}]\in\mathbb{R}^{d+k},$ and the final attack score is $s(Z_{0})=h(\tilde{z}),$ with $h$ a calibrated scorer. By default, following Kowalczuk et al. [19], $h$ is an L1-regularized logistic regression fit as a plug-in estimator of $P(M=1\mid\tilde{z})$ .

Mechanism.

MADreMIA leverages the fact that members often exhibit slower average drift than non-members. Memorized samples typically lie in deeper local probability wells, causing iterative regenerations to remain closer to $Z_{0}$ . Gains represent fixed-statistic SNR improvements consistent with the DPI: $I(M;\tilde{z})\leq I(M;Z_{0})$ .

4.1 Modality-specific instantiations

Image autoregressive models (IARs) and diffusion models.

$\mathcal{R}$ is image-to-image regeneration under fixed controls (autoregressive decoding for IARs; controlled re-noise/re-denoise for diffusion, i.e., partial forward noising to a fixed noise level followed by reverse denoising under fixed scheduler/settings). Trajectory features are defined relative to $Z_{0}$ , in particular $\mathrm{MSE}(Z_{0},Z_{t})$ , $\mathrm{LPIPS}(Z_{0},Z_{t})$ [47], and $\mathrm{SSIM}(Z_{0},Z_{t})$ [37].

Large language models (LLMs).

$\mathcal{R}$ is an autophagous text loop where each generation is fed back as the next prompt/input under a fixed template, fixed context-window policy (with left-sided truncation to keep only the newest text), and fixed decoding configuration. We use multiple features to measure the quality and diversity of generations, specifically: Kullback-Leibler Divergence, Jensen-Shannon Divergence, Jaccard Index, Predictive Entropy, and Logit Margin:

\mathrm{KLD}(Z_{0},Z_{t}),\quad\mathrm{JSD}(Z_{0},Z_{t}),\quad\mathrm{Jaccard}(Z_{0},Z_{t}),\quad\mathrm{Entropy}(Z_{t}),\quad\mathrm{LogitMargin}(Z_{t}),

for $t\in\{1,\dots,T\}$ . These are summarized along the trajectory and fused with $z_{\mathrm{base}}$ . For clarity, KLD/JSD are computed on aligned token-distribution vectors: in gray/white-box settings from next-token logits, and in black-box settings from smoothed empirical token-frequency distributions under a fixed tokenizer/vocabulary. In fact, metrics in our experiments follow the gray-box setting, but our framework itself is open to the black-box setting as well. A black-box setting requires repeated queries per step to estimate distributions. Jaccard is computed on token sets after the same fixed preprocessing. More information about features for vision and language models are provided in Appendix H.

Audio generative models.

In the audio domain, $\mathcal{R}$ employs iterative reconstruction loops. Notably, we do not conduct a full Membership or Dataset Inference evaluation for audio models, as the literature currently lacks proper audio benchmarks and specialized attacks tailored to the voice conversion setting. Nevertheless, to demonstrate the cross-modal generality of our framework, our first experiment explores this potential using an objective audio fidelity metric.

Across all modalities, MADreMIA follows the pipeline: $Z_{0}\rightarrow(Z_{0:T})\rightarrow(\phi_{0:T-1},S_{T},z_{\mathrm{traj}})\rightarrow\tilde{z}\rightarrow s(Z_{0})$ .

5 Experiments

5.1 Experimental Setup

To ensure a scientifically sound evaluation across our MIA tasks, we restrict our setup to models trained on public datasets with well-defined training and test splits. We evaluate our method across three diverse modalities to demonstrate its broad applicability. For image generation, we analyze SOTA autoregressive models (VAR-d{20, 24, 30} [33], RAR-{L, XL, XXL} [43]) and diffusion models (DiT-RF-{XL, G} [13], UViT-T2I-Deep [3]), trained primarily on the ImageNet [10] or COCO [36] datasets for class-conditioned and text-to-image generation. We extend this evaluation to the audio domain using modern Voice Conversion models (AutoVC [25], FreeVC [21]), and to the language domain utilizing prominent LLMs (LLaMA-13B [34], Pythia-6.9B [4], OLMo-7B [14], and OPT-6.7B [48]). Comprehensive details regarding all specific models and datasets used in experiments are provided in the Appendix F and G. All experiments were conducted on a machine equipped with 3 NVIDIA RTX PRO 5000 Blackwell GPUs (48 GB VRAM each) and an Intel Xeon Gold 6526Y CPU.

5.2 Metrics

To measure similarity between feature representations and their fidelity, we utilize the Fréchet Inception Distance (FID) [17], and Fréchet Audio Distance (FAD) [18] for vision and audio models, respectively. For LLMs, we measure Token Diversity as the Kullback–Leibler Divergence (KLD) between the normalized average token probability distribution at the current iteration and that of the first evaluation iteration: Token Diversity at iteration $t$ (for $t>1$ ) is defined as the Kullback-Leibler divergence from iteration $1$ :

\mathrm{TokenDiversity}(t)=D_{\mathrm{KL}}\!\left(p_{t}\,\|\,p_{1}\right)=\sum_{i\in V}p_{t}(i)\,\log\frac{p_{t}(i)}{p_{1}(i)}.

where $p_{t}$ and $p_{1}$ are the normalized average token probability distributions for step $t$ and step $1$ respectively.

5.3 MIA and DI procedures

MIA pipeline.

For each labeled member/non-member sample, we generate $Z_{0},\dots,Z_{T}$ , compute $\phi_{t}$ , $S_{T}$ , and modality-specific trajectory features, form $\tilde{z}$ , and fit $h_{\mathrm{mia}}$ . We evaluate univariate trajectory statistics by direct thresholding and multivariate features by logistic-regression fusion on strictly stratified 80/20 train-test splits. We report AUC, TPR at 1% FPR, and accuracy. Splitting is performed at sample/source level before trajectory generation: all descendants of the same $Z_{0}$ (all $Z_{t}$ , all derived features) remain in the same partition. Thresholds, feature normalization, and LR calibration are fit on train only and applied unchanged to test. Primary endpoint is the multivariate fusion score; univariate $S_{T}^{\star}$ results are reported as theory-aligned diagnostics. If $T$ is tuned, it is selected on train (or a train-only validation split) and never on test. We use established metrics: TPR@FPR=1%, AUC, and Accuracy.

DI pipeline.

The DI pipeline is identical, replacing the target label with dataset-origin variable $D$ . The same $Z_{t}$ , $\phi_{t}$ , and trajectory-fusion machinery is used; only label semantics and calibration change. For DI, splitting/evaluation are performed at dataset or source-group level, and per-sample logits are aggregated by a fixed mean rule into a dataset-level score. Dataset-level decisions are evaluated against a permutation-based null over dataset labels within the evaluation fold.

Both MIA and DI setups inherit standard generative privacy-audit conventions, including the IAR setting introduced in [19].

5.4 Research questions

We evaluate whether chained regeneration can be a signal amplifier for one-shot auditing across modalities, model families, and access regimes. Our analysis focuses on the following questions: (Q1) What distinguishes member/non-member chained generation trajectories? (Q2) Can one-shot membership signal be amplified for single $\phi(t)$ features? What are the gains for trajectory-based $S_{T}$ over $\phi(t)$ across modalities? (Q3) Does MADreMIA increase member/non-member separability compared to one-shot MIA? (Q4) Does increasing generative model stochasticity during regeneration loop affect the trajectories separation between members and non-members? (Q5) How does model size affect member/nonmember trajectory signals? Finally, we also provide a short analysis of the Getty Images case [9] in Appendix K.

5.5 Members and Nonmembers differ in generative trajectories: qaulitative results.

Across all modalities, members and non-members exhibit distinct regeneration dynamics. Members preserve structure longer and drift more slowly, while non-members degrade faster and diverge toward the model’s generic prior. This pattern is visible both in per-step qualitative examples (Figures˜3(a) and 3(b)) and in aggregate divergence trajectories (Figure˜2) comparing the quality of regenerations to base samples (FID for images, FAD for audio) and the drift of output token distribution in text model. The results presented support the core hypothesis that auto-regeneration trajectory contains multiple membership cues. The key trajectory asymmetry findings are:

1.

Fidelity and degradation: Re-members maintain high structural quality throughout the trajectory, whereas re-non-members exhibit rapid perceptual and semantic degradation.
2.

Persistence and divergence: Re-members demonstrate significant structural persistence and coherence across iterations. Conversely, re-non-members diverge more quickly, drifting toward the model’s general distribution and losing the specific characteristics of the original input.

The asymmetry is present across diverse models and modalities.

We test broad architectural diversity: image autoregressive and diffusion models, audio voice conversion/generation models, and text generative models. Figure˜2 summarizes trajectory behavior using modality-appropriate divergence metrics 5.2. This design directly tests whether our proposed signal amplification is model- and modality-agnostic.

5.6 $\phi(t)$ statistics may increase membership signal over one-shot $\phi(0)$ .

We evaluate the validity of our theoretical assumptions using empirical generative trajectories, fixing $T$ to the first 15 iterations. As summarized in Table 2, while Assumption A2 is fully supported, A1 and A3 receive only partial empirical backing. Specifically, for certain values of $\phi_{t}$ , the absence of clear exponential decay within the first 15 iterations is acceptable for our main claim, since it indicates slower or plateau-like leakage. It suggests that non-exponential leakage forms may also govern real trajectories.

To assess the efficacy of modality-specific trajectory statistics, we evaluate whether aggregated trajectory evidence remains competitive with - or outperforms - the one-shot evidence. We define $\mathrm{gain}:=\frac{\max_{T}\mathrm{SNR}^{2}(S_{T})}{\max_{t}\mathrm{SNR}^{2}(\phi_{t})},$ and show results in Table 2. Trajectory diagnostics are strong: $P(\mathrm{gain}\geq 1)=\frac{8}{11}=0.73,\qquad P(\mathrm{gain}\geq 0.9)=\frac{10}{11}=0.91,$ with median gain $=1.00$ . Given the small number of tested features, we interpret these numbers as supportive preliminary evidence.

Table 1:

S_{T}

gains over

\phi_{t}

across modalities.

P(\text{gain}\geq 1)

indicates the fraction of models where scoring matches or exceeds the baseline.

Family	$n$	$P(\geq 1)$	$P(\geq 0.9)$	Median
VAR	3	0.67	1.00	1.00
Diffusion	3	0.67	1.00	1.00
LLM	5	0.80	0.80	1.04

Table 2: Assumption support across model families. Fractions indicate the number of models satisfying each assumption.

Family	A1	A2	A3
VAR	$3/3$	$3/3$	$3/3$
Diffusion	$3/3$	$3/3$	$2/3$
LLM	$3/5$	$5/5$	$2/5$

5.7 MADreMIA amplifies baseline MIA

Tables˜3 and 4 compare MADreMIA-augmented attacks against their unaided baselines across LLMs and IARs. Across all base attacks and model families, incorporating reconstruction Diversity ( $MSE_{\text{sum}}$ , $LPIPS_{\text{sum}}$ ), Quality ( $SSIM_{\text{sum}}$ , $SSIM_{\text{std}}$ ), or both (Combined) consistently raises attack performance. Gains are most pronounced on OLMo-7B, where, for example, the Zlib baseline collapses to AUC 0.179 yet recovers to 0.868 with Combined signals, and CAMIA reaches AUC 0.969 — the strongest result across all settings. On the remaining LLMs the improvements are more modest but consistent. For IARs, MADreMIA yields clear gains in classification accuracy: VAR-d30 improves from 0.607 to 0.696 (+8.9 p.p.) and RAR-XXL from 0.562 to 0.713 (+15.1 p.p.), although TPR@1%FPR gains are smaller and less stable. Together, these results confirm that iterative reconstruction signals provide complementary, architecture-agnostic information that reliably strengthens membership inference across both LLMs and IARs.

Table 3: MIA results on established LLM benchmarks (described in detail in Appendix G), where MADreMIA trajectory features are aggregated across 15 iterations. Augmenting any base attack with diversity, quality, or combined signals consistently improves all the metrics over the unaided baselines.

	Pythia-6.9B		OLMo-7B		OPT-6.7B		Llama-13B
Attack	TPR@1%FPR	AUC	TPR@1%FPR	AUC	TPR@1%FPR	AUC	TPR@1%FPR	AUC
Loss [42]	0.004 $\pm$ 0.00	0.349 $\pm$ 0.02	0.008 $\pm$ 0.01	0.523 $\pm$ 0.02	0.013 $\pm$ 0.01	0.390 $\pm$ 0.04	0.009 $\pm$ 0.01	0.368 $\pm$ 0.04
+ Diversity	0.093 $\pm$ 0.06	0.647 $\pm$ 0.05	0.303 $\pm$ 0.09	0.735 $\pm$ 0.04	0.092 $\pm$ 0.12	0.613 $\pm$ 0.09	0.173 $\pm$ 0.14	0.690 $\pm$ 0.08
+ Quality	0.096 $\pm$ 0.07	0.686 $\pm$ 0.05	0.032 $\pm$ 0.04	0.702 $\pm$ 0.04	0.084 $\pm$ 0.09	0.652 $\pm$ 0.07	0.198 $\pm$ 0.13	0.679 $\pm$ 0.09
+ Combined	0.100 $\pm$ 0.08	0.673 $\pm$ 0.06	0.263 $\pm$ 0.14	0.804 $\pm$ 0.03	0.112 $\pm$ 0.12	0.672 $\pm$ 0.09	0.188 $\pm$ 0.15	0.702 $\pm$ 0.07
Zlib [5])	0.000 $\pm$ 0.00	0.338 $\pm$ 0.02	0.022 $\pm$ 0.01	0.179 $\pm$ 0.01	0.012 $\pm$ 0.02	0.369 $\pm$ 0.03	0.009 $\pm$ 0.01	0.337 $\pm$ 0.03
+ Diversity	0.129 $\pm$ 0.08	0.677 $\pm$ 0.05	0.318 $\pm$ 0.11	0.842 $\pm$ 0.03	0.099 $\pm$ 0.11	0.628 $\pm$ 0.08	0.176 $\pm$ 0.14	0.689 $\pm$ 0.07
+ Quality	0.124 $\pm$ 0.08	0.673 $\pm$ 0.06	0.208 $\pm$ 0.10	0.833 $\pm$ 0.03	0.092 $\pm$ 0.10	0.667 $\pm$ 0.08	0.210 $\pm$ 0.14	0.688 $\pm$ 0.08
+ Combined	0.128 $\pm$ 0.08	0.690 $\pm$ 0.06	0.295 $\pm$ 0.14	0.868 $\pm$ 0.02	0.121 $\pm$ 0.12	0.672 $\pm$ 0.08	0.194 $\pm$ 0.15	0.693 $\pm$ 0.08
Min-K% [27]	0.124 $\pm$ 0.08	0.680 $\pm$ 0.05	0.067 $\pm$ 0.07	0.703 $\pm$ 0.04	0.086 $\pm$ 0.11	0.650 $\pm$ 0.08	0.127 $\pm$ 0.11	0.648 $\pm$ 0.09
+ Diversity	0.120 $\pm$ 0.07	0.677 $\pm$ 0.05	0.219 $\pm$ 0.08	0.775 $\pm$ 0.03	0.064 $\pm$ 0.09	0.640 $\pm$ 0.08	0.144 $\pm$ 0.13	0.685 $\pm$ 0.08
+ Quality	0.124 $\pm$ 0.07	0.695 $\pm$ 0.05	0.095 $\pm$ 0.09	0.772 $\pm$ 0.03	0.094 $\pm$ 0.11	0.674 $\pm$ 0.09	0.178 $\pm$ 0.14	0.686 $\pm$ 0.08
+ Combined	0.113 $\pm$ 0.07	0.694 $\pm$ 0.05	0.240 $\pm$ 0.15	0.837 $\pm$ 0.03	0.092 $\pm$ 0.10	0.694 $\pm$ 0.08	0.182 $\pm$ 0.14	0.700 $\pm$ 0.07
CAMIA [6]	0.111 $\pm$ 0.09	0.683 $\pm$ 0.05	0.428 $\pm$ 0.25	0.958 $\pm$ 0.01	0.128 $\pm$ 0.12	0.664 $\pm$ 0.08	0.166 $\pm$ 0.13	0.686 $\pm$ 0.09
+ Diversity	0.118 $\pm$ 0.08	0.690 $\pm$ 0.05	0.517 $\pm$ 0.25	0.966 $\pm$ 0.01	0.104 $\pm$ 0.11	0.668 $\pm$ 0.08	0.146 $\pm$ 0.12	0.692 $\pm$ 0.08
+ Quality	0.131 $\pm$ 0.08	0.708 $\pm$ 0.05	0.501 $\pm$ 0.26	0.964 $\pm$ 0.01	0.115 $\pm$ 0.13	0.682 $\pm$ 0.08	0.192 $\pm$ 0.14	0.712 $\pm$ 0.08
+ Combined	0.109 $\pm$ 0.08	0.696 $\pm$ 0.05	0.553 $\pm$ 0.27	0.969 $\pm$ 0.01	0.109 $\pm$ 0.12	0.689 $\pm$ 0.08	0.176 $\pm$ 0.13	0.716 $\pm$ 0.08

Table 4: MIA results on IARs, where MADreMIA trajectory features are aggregated across 10 iterations (benchmark details in Appendix G). While AUC remains stable across augmentation variants, TPR@1%FPR and Accuracy improve substantially.

	VAR-d30			RAR-XXL
Attack	TPR@1%FPR	AUC	ACC	TPR@1%FPR	AUC	ACC
Baseline [19]	0.040 $\pm$ 0.02	0.750 $\pm$ 0.02	0.607 $\pm$ 0.07	0.044 $\pm$ 0.02	0.754 $\pm$ 0.01	0.562 $\pm$ 0.02
+ Diversity	0.090 $\pm$ 0.09	0.755 $\pm$ 0.03	0.691 $\pm$ 0.03	0.084 $\pm$ 0.06	0.771 $\pm$ 0.03	0.700 $\pm$ 0.03
+ Quality	0.076 $\pm$ 0.08	0.757 $\pm$ 0.03	0.703 $\pm$ 0.03	0.079 $\pm$ 0.07	0.754 $\pm$ 0.04	0.703 $\pm$ 0.03
+ Combined	0.088 $\pm$ 0.06	0.750 $\pm$ 0.04	0.696 $\pm$ 0.03	0.069 $\pm$ 0.05	0.775 $\pm$ 0.03	0.713 $\pm$ 0.03

5.8 MADreMIA amplifies baseline DI

The p-value histograms in Figure˜4 demonstrate that MADreMIA trajectory features consistently strengthen the statistical evidence for dataset-level inference across all evaluated architectures. On Pythia-6.9B, augmented variants reach the 95% confidence threshold at around 100 samples versus roughly 150 for the baseline. Furthermore, augmented variants shift the distribution of $-\log_{10}(p)$ values noticeably rightward relative to the baseline, with this pattern holding across all three signal types. The effect is more pronounced on RAR-XXL, where the Combined variant produces a substantially larger rightward shift, indicating that individual trials yield stronger and more reliable evidence for membership inference.

5.9 Sensitivity analysis of generation strength

Figure 5 shows PR curves for VAR-d30 across regeneration strengths $s\in\{2,4,6,8\}$ , where $s$ controls how many final scales are regenerated. Members consistently achieve higher precision and recall than non-members across all values of $s$ , confirming that the MIA signal is robust to the choice of regeneration strength. As $s$ increases, however, the two groups converge in PR space (see Appendix J).

5.10 Trajectory asymmetry scaling across model families

As illustrated in Figure˜6, the membership signal – quantified by $\Delta\text{FID}=\text{FID}_{\text{nonmem}}-\text{FID}_{\text{mem}}$ persists across all model scales, suggesting that the observed asymmetry is a fundamental property rather than an artifact of specific parameter regimes. While the magnitude of this separation varies across architectures, its relationship with model scale is not uniform. The separation grows stronger with model size in VAR and DiT-MoE, but remains largely unaffected by scaling in RARs. Ultimately, the underlying trend is robust: iterative trajectory chaining consistently exposes a larger membership gap compared to standard one-shot generations.

6 Conclusions

We introduced MADreMIA, a model-agnostic membership inference signal amplifier for large generative models. By chaining repeated regenerations rather than relying on a single query, MADreMIA exploits a consistent asymmetry: member samples retain coherence across iterations while non-members drift and deteriorate. This signal generalizes across image, text, and audio generators, spanning IAR, diffusion, and LLM families. Our experimental results show that fusing trajectory-derived features with baseline MIA/DI scores further improves member/non-member separability, suggesting that iterative regeneration is a broadly applicable lens for privacy auditing and copyright attribution.

Acknowledgments

We gratefully acknowledge Polish high-performance computing infrastructure PLGrid for providing computer facilities and support within computational grant no. PLG/2025/018391. This research was partially funded by National Science Centre, Poland, grant no: 2023/51/I/ST6/02854.

References

[1] S. Alemohammad, J. Casco-Rodriguez, L. Luzi, A. I. Humayun, H. Babaei, D. LeJeune, A. Siahkoohi, and R. Baraniuk (2023) Self-consuming generative models go mad. In The Twelfth International Conference on Learning Representations, Cited by: §E.3, §1, §2.
[2] C. Bai, H. Lin, C. Raffel, and W. C. Kan (2021) On training sample memorization: lessons from benchmarking generative modeling with a large-scale competition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, New York, NY, USA, pp. 2534–2542. External Links: ISBN 9781450383325, Link, Document Cited by: §E.1, §2.
[3] F. Bao, S. Nie, K. Xue, Y. Cao, C. Li, H. Su, and J. Zhu (2023) All are worth words: a vit backbone for diffusion models. In CVPR, Cited by: Appendix F, §5.1.
[4] S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, A. Skowron, L. Sutawika, and O. Van Der Wal (2023) Pythia: a suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. Cited by: Appendix F, §5.1.
[5] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. (2021) Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), pp. 2633–2650. Cited by: §E.2, §1, §2, Table 3.
[6] H. Chang, A. S. Shamsabadi, K. Katevas, H. Haddadi, and R. Shokri (2025) Context-aware membership inference attacks against pre-trained large language models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 7299–7321. Cited by: §E.2, §2, Table 3.
[7] C. Chen, D. Liu, M. Shah, and C. Xu (2025) Enhancing privacy-utility trade-offs to mitigate memorization in diffusion models. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8182–8191. External Links: Link Cited by: §E.1, §2.
[8] C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot (2021) Label-only membership inference attacks. In International conference on machine learning, pp. 1964–1974. Cited by: §1.
[9] M. Coulter (2024) Aiming for fairness: an exploration into getty images v. stability ai and its importance in the landscape of modern copyright law. DePaul J. Art Tech. & Intell. Prop. L 34, pp. 124. Cited by: Appendix K, §5.4.
[10] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: Table 8, §5.1.
[11] M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettlemoyer, Y. Tsvetkov, Y. Choi, D. Evans, and H. Hajishirzi (2024) Do membership inference attacks work on large language models?. In Conference on Language Modeling (COLM), Cited by: Table 8, Appendix G.
[12] J. Dubiński, A. Kowalczuk, F. Boenisch, and A. Dziedzic (2025) Cdi: copyrighted data identification in diffusion models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 18674–18684. Cited by: §E.2, §1, §2.
[13] Z. Fei, M. Fan, C. Yu, D. Li, and J. Huang (2024) Scaling diffusion transformers to 16 billion parameters. External Links: 2407.11633, Link Cited by: Appendix F, §5.1.
[14] D. Groeneveld, I. Beltagy, E. Walsh, A. Bhagia, R. Kinney, O. Tafjord, A. Jha, H. Ivison, I. Magnusson, Y. Wang, et al. (2024) OLMo: accelerating the science of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15789–15809. Cited by: Appendix F, §5.1.
[15] X. Gu, C. Du, T. Pang, C. Li, M. Lin, and Y. Wang (2023) On memorization in diffusion models. arXiv preprint arXiv:2310.02664. Cited by: §E.1, §2.
[16] A. Hans, Y. Wen, N. Jain, J. Kirchenbauer, H. Kazemi, P. Singhania, S. Singh, G. Somepalli, J. Geiping, A. Bhatele, et al. (2024) Be like a goldfish, don’t memorize! mitigating memorization in generative llms. Advances in Neural Information Processing Systems 37, pp. 24022–24045. Cited by: §E.1, §2.
[17] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30. Cited by: §5.2.
[18] K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi (2018) Fr $\backslash$ ’echet audio distance: a metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466. Cited by: §5.2.
[19] A. Kowalczuk, J. Dubiński, F. Boenisch, and A. Dziedzic (2025) Privacy attacks on image autoregressive models. In Forty-second International Conference on Machine Learning, External Links: Link Cited by: §E.2, §2, §4, §5.3, Table 4.
[20] J. Li, J. Dong, T. He, and J. Zhang (2024) Towards black-box membership inference attack for diffusion models. CoRR abs/2405.20771. External Links: Document, 2405.20771, Link Cited by: §E.2, §2.
[21] J. Li, W. Tu, and L. Xiao (2023) Freevc: towards high-quality text-free one-shot voice conversion. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. Cited by: Appendix F, §5.1.
[22] P. Maini, H. Jia, N. Papernot, and A. Dziedzic (2024) LLM dataset inference: did you train on my dataset?. CoRR abs/2406.06443. External Links: Document, 2406.06443, Link Cited by: §E.2, §1, §2.
[23] P. Maini and A. Suri Reassessing emnlp 2024’s best paper: does divergence-based calibration for mias hold up?. In The Fourth Blogpost Track at ICLR 2025, Cited by: §E.2, §2.
[24] P. Maini, M. Yaghini, and N. Papernot (2021) Dataset inference: ownership resolution in machine learning. arXiv preprint arXiv:2104.10706. Cited by: §1.
[25] K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson (2019) Autovc: zero-shot voice style transfer with only autoencoder loss. In International Conference on Machine Learning, pp. 5210–5219. Cited by: Appendix F, §5.1.
[26] M. Sakarvadia, A. Ajith, A. M. Khan, N. C. Hudson, C. Geniesse, K. Chard, Y. Yang, I. Foster, and M. W. Mahoney (2024) Mitigating memorization in language models. In The Thirteenth International Conference on Learning Representations, Cited by: §E.1, §2.
[27] W. Shi, A. Ajith, M. Xia, Y. Huang, D. Liu, T. Blevins, D. Chen, and L. Zettlemoyer (2023) Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789. Cited by: Table 3.
[28] R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp. 3–18. Cited by: §1.
[29] I. Shumailov, Z. Shumaylov, Y. Zhao, N. Papernot, R. Anderson, and Y. Gal (2024-07) AI models collapse when trained on recursively generated data. Nature 631, pp. 755–759. External Links: Document Cited by: §E.3, §1, §2.
[30] A. K. Singh, M. Y. Kocyigit, A. Poulton, D. Esiobu, M. Lomeli, G. Szilvasy, and D. Hupkes (2024) Evaluation data contamination in llms: how do we measure it and (when) does it matter?. arXiv preprint arXiv:2411.03923. Cited by: §1.
[31] L. Soldaini, R. Kinney, A. Bhagia, D. Schwenk, D. Atkinson, R. Authur, B. Bogin, K. Chandu, J. Dumas, Y. Elazar, et al. (2024) Dolma: an open corpus of three trillion tokens for language model pretraining research. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15725–15788. Cited by: Table 8.
[32] J. Tao and R. Shokri (2025) (Token-level) InfoRMIA: stronger membership inference and memorization assessment for LLMs. CoRR abs/2510.05582. External Links: Document, 2510.05582, Link Cited by: §E.2, §2.
[33] K. Tian, Y. Jiang, Z. Yuan, B. Peng, and L. Wang (2024) Visual autoregressive modeling: scalable image generation via next-scale prediction. External Links: 2404.02905, Link Cited by: Appendix F, §5.1.
[34] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. (2023) Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971. Cited by: Appendix F, §5.1.
[35] G. van den Burg and C. Williams (2021) On memorization in probabilistic deep generative models. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. W. Vaughan (Eds.), Vol. 34, pp. 27916–27928. External Links: Link Cited by: §E.1, §2.
[36] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140. Cited by: Table 8, §5.1.
[37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: item Structural Similarity Index Measure (SSIM) [37]:, item Structural Similarity Index Measure (SSIM) [37]:, §4.1.
[38] Y. Wen, Y. Liu, C. Chen, and L. Lyu (2024) Detecting, explaining, and mitigating memorization in diffusion models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §E.1.
[39] Y. WU, H. Qiu, S. Guo, J. Li, and T. Zhang (2024) You only query once: an efficient label-only membership inference attack. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1.
[40] J. Yamagishi, C. Veaux, and K. MacDonald (2019) CSTR vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit (version 0.92). The Rainbow Passage which the speakers read out can be found in the International Dialects of English Archive:(http://web. ku. edu/˜ idea/readings/rainbow. htm).. Cited by: Table 8.
[41] J. Ye, A. Maddi, S. K. Murakonda, V. Bindschaedler, and R. Shokri (2022) Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC conference on computer and communications security, pp. 3093–3106. Cited by: §E.2, §1, §2.
[42] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp. 268–282. Cited by: Table 3.
[43] Q. Yu, J. He, X. Deng, X. Shen, and L. Chen (2024) Randomized autoregressive visual generation. External Links: 2411.00776, Link Cited by: Appendix F, §5.1.
[44] M. Zawalski, M. Boubdir, K. Bałazy, B. Nushi, and P. Ribalta (2026) Detecting data contamination in LLMs via in-context learning. In The Fourteenth International Conference on Learning Representations, External Links: Link Cited by: Appendix G, §1.
[45] H. Zen, V. Dang, R. A. J. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu (2019) LibriTTS: a corpus derived from librispeech for text-to-speech. In Interspeech, External Links: Link Cited by: Table 8.
[46] J. Zhang, J. Sun, E. C. Yeats, Y. Ouyang, M. Kuo, J. Zhang, H. Yang, and H. H. Li (2024) Min- $k$ %++: improved baseline for detecting pre-training data from large language models. CoRR abs/2404.02936. External Links: Document, 2404.02936, Link Cited by: §E.2, §1, §2.
[47] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595. Cited by: item Learned Perceptual Image Patch Similarity (LPIPS) [47]:, item Learned Perceptual Image Patch Similarity (LPIPS) [47]:, §4.1.
[48] S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, et al. (2022) Opt: open pre-trained transformer language models. arXiv preprint arXiv:2205.01068. Cited by: Appendix F, §5.1.
[49] Z. Zhang, C. Yan, and B. A. Malin (2022) Membership inference attacks against synthetic health data. Journal of biomedical informatics 125, pp. 103977. Cited by: §1.

Appendix A Impact Statement

This work advances methods for auditing generative models by improving membership and dataset inference through chained regeneration. The primary positive impact is stronger accountability: MADreMIA can help detect memorization of sensitive, proprietary, or benchmark data, supporting privacy audits, copyright verification, and unlearning validation across model families and modalities.

While enhanced inference capabilities can assist in model auditing and transparency, they also require responsible application to avoid potential misuse. We frame MADreMIA as a tool for research evaluation, compliance monitoring, and internal red-teaming. It is important to note that our method provides statistical evidence rather than a definitive proof of data inclusion; therefore, results should be interpreted alongside additional forensic and procedural evidence within a broader data governance framework.

Appendix B Limitations

While our proposed framework is designed to be cross-modal and model-agnostic, our experimental scope is naturally constrained by several practical and theoretical factors. Most notably, we do not conduct full Membership Inference Attack (MIA) evaluations on audio generation models. Although our initial signal-degradation experiments indicate that iterative trajectory features exist in the audio domain, the literature currently lacks established single-step baselines tailored for these architectures, leaving MIA for audio models untested. Furthermore, while our framework is conceptually compatible with restricted setups, our current empirical evaluations rely on gray-box access to exact next-token logits, meaning that strictly black-box MIA remains untested in our work. Operationally, the primary limitation of our method is its scalability; the iterative regeneration loop inherently introduces a linear computational overhead by requiring multiple forward passes per sample. From a theoretical perspective, our core assumptions A1 and A3 are only partially satisfied in practice, as demonstrated by the empirical measurements in Table 2. Finally, our evaluations may be susceptible to distribution-shift confounds—where trajectory differences might stem from inherent dataset mismatches rather than pure memorization—and the exploratory findings presented in Section 5.6 are based on preliminary small- $n$ evidence that will require larger-scale validation in future work.

Appendix C LLM Usage

Large language models were used to improve the readability and clarity of portions of the manuscript, as well as to provide feedback during the writing and revision process. The authors verified all technical statements, citations, and claims and take full responsibility for the final content.

Appendix D Method Overview

Appendix E Extended Related Works

Our work builds upon three intersecting lines of prior research: the characterization of data memorization in generative models, the evolution of membership inference, and the dynamics of model collapse during recursive generation.

E.1 Memorization

Memorization — the tendency of generative models to reproduce training examples rather than generate novel samples — has been studied across multiple model families and from both measurement and mitigation perspectives. van den Burg and Williams [35] formalized the problem for probabilistic generative models such as VAEs, showing that memorization differs fundamentally from mode collapse and overfitting and is not captured by commonly-used nearest-neighbor tests. For diffusion models, Gu et al. [15] show that the denoising score matching objective has a closed-form optimum that can only replicate training samples, and introduces the EMM metric to quantify how dataset size and model configuration govern the generalization-to-memorization transition. Sakarvadia et al. [26] localize this phenomenon through bright-ending cross-attention patterns, while the sharpness-based framework of [38] justifies score-difference memorization metrics and proposes mitigation via sharpness-aware regularization of the initial noise. The benchmarking study [2] demonstrated that standard evaluation metrics fail to surface memorization even in competitive settings. Mitigation has been tackled both for LLMs, where Hans et al. [16] propose the goldfish loss that excludes randomly sampled token subsets from the training objective to prevent verbatim reproduction, and for text-to-image diffusion models, where Chen et al. [7] address the privacy–utility tension by combining prompt re-anchoring with semantic prompt search to improve both dimensions simultaneously.

E.2 Membership/Dataset Inference

A second line of work investigates whether specific examples or datasets can be identified from model behavior. Because individual Membership Inference Attacks (MIAs) can be confounded by distribution shifts [22], recent literature often favors Dataset Inference (DI), which aggregates feature evidence across many samples to statistically detect training data usage [23, 12, 19]. Concurrently, individual MIA methods must adapt to increasingly restrictive black-box deployments. Furthermore, approaches based on training multiple shadow models to learn membership distributions [41, 5] are now computationally infeasible for massive modern architectures. Consequently, modern attacks must extract signals using only limited outputs rather than internal weights or gradients [46, 6, 32].

In these restricted settings, recent black-box attacks heavily rely on output variations. For example, Li et al. [20] perform MIAs on diffusion models by repeatedly perturbing a target image via an API, averaging the results, and comparing them to the original sample. However, in an interrogation analogy, this approach merely asks multiple paraphrased versions of the exact same question. Because the target sample is perturbed independently each time, the query does not dynamically evolve in response to the model’s previous answers, leaving deeper structural memorization unexploited.

E.3 Model Collapse

The last, but very important point is the literature on recursive self-training in generative models. Alemohammad et al. [1] showed that self-consuming generative loops lead to progressive degradation in quality or diversity when insufficient fresh real data is injected at each generation, a phenomenon they term Model Autophagy Disorder. Their analysis is especially important for our setting because it frames repeated regeneration not as a neutral operation, but as a process that can magnify latent properties of the learned distribution. Closely related, Shumailov et al. [29] showed that recursively training on model-generated data causes model collapse, where tails of the original distribution disappear and learned behaviour drifts toward degenerate approximations. Taken together, these works suggest that iterative generation is structurally revealing: under repeated reuse, memorized or high-density regions may persist differently from non-member examples, while generic outputs may drift or collapse. Our method turns this insight into a privacy-auditing mechanism: rather than studying recursive generation as a training-time pathology, we exploit chained regeneration at inference time to amplify membership-relevant differences.

Appendix F Model Details

In our experiments, we consider two vision model families: image autoregressive models (IARs) and diffusion models. The IAR category includes VAR [33] and RAR [43] variants, while the diffusion category includes DiT-MoE [13] and UViT-T2I [3]. Furthermore, as others modalities, we evaluate large language models (LLMs) and voice conversion (VC) models. The LLMs include Pythia [4], OLMo [14], OPT [48], and Llama [34], while the VC models consist of AutoVC [25] and FreeVC [21]. Across all settings, we focus on representative, high-performing model variants.

Table 5: Vision model details.

	IAR Models						Diffusion Models
	VAR-d30	VAR-d24	VAR-d20	RAR-XXL	RAR-XL	RAR-L	DiT-MoE-G	DiT-MoE-XL	UViT-T2I-Deep
Model parameters	2.1B	1.0B	600M	1.5B	955M	462M	16.5B	4.1B	141M
Training epochs	350	300	250	400	400	400	—	—	—
FID	1.92	2.33	2.95	1.48	1.50	1.70	1.72	2.10	5.48

Table 6: Language model details.

	OLMo	Llama	Pythia	OPT
Model parameters	7B	13B	6.9B	6.7B
Training tokens	2.46T	1T	300B	180B

Table 7: Audio model details.

	AutoVC	FreeVC
Model parameters	28M	39M
Training data (hours)	44	40
SMOS (seen-to-seen)	3.5	4.1

Appendix G Dataset Details

For vision and audio models that have publicly known and available train/test splits we use these datasets. For most LLMs we use established MIA benchmarks (e.g. WikiMIA), but for OLMo, we use their corresponding training sets and the Global News as non-member set, as suggested in [44].

Table 8: Datasets used to construct member and non-member sets for each model family in our experiments, spanning vision, language, and speech domains.

Model	Members	Non-members
VAR	ImageNet [10]	ImageNet
RAR	ImageNet	ImageNet
DiT-MoE	ImageNet	ImageNet
UViT-T2I	COCO [36]	COCO
Pythia	Mimir [11]	Mimir
OLMo	Dolma [31]	Global News
Llama	WikiMIA	WikiMIA
OPT	WikiMIA	WikiMIA
AutoVC	VCTK [40]	LibriTTS [45]
FreeVC	VCTK	LibriTTS

Importantly, for the Pythia-6.9B we use the Mimir dataset [11] which consists of 6 subsets: arxiv, dm_mathematics, github, hackernews, pubmed_central, and wikipedia_(en). We concatenate all these subsets and randomly select samples from the pool. We use the ngram_7_0.2 data split. For the rest of the models, we employ their corresponding datasets’ train split as members and val/test split as nonmembers.

Appendix H Metrics Details

The following metrics are computed over the sequence of model outputs collected across MADreMIA iterations, capturing how the model’s generative behavior evolves under repeated generation.

H.1 Features for Language Models

Jaccard Similarity:

Measures the lexical overlap between the model’s output at a given iteration and its initial response, computed over trigrams. A high Jaccard similarity indicates that the model rigidly reproduces the same surface forms across iterations, which is characteristic of memorized content.

J(A,B)=\frac{|A\cap B|}{|A\cup B|}

Token Diversity:

Quantifies the divergence between the token probability distribution at the current iteration $P$ and the initial distribution $Q$ . Large values indicate that the model’s vocabulary preferences shift substantially during reconstruction, reflecting instability in its output distribution.

D_{KL}(P\parallel Q)=\sum_{x\in\mathcal{X}}P(x)\log\left(\frac{P(x)}{Q(x)}\right)

Token Distribution Shift:

We define it as a Jensen-Shannon Divergence, which is a symmetric and bounded variant of KLD that measures the distributional distance between $P$ and $Q$ via their mixture $M$ . Compared to KLD, JSD is well-defined even when the supports of $P$ and $Q$ do not fully overlap, making it a more numerically stable measure of distributional drift across iterations.

	$\displaystyle\mathrm{JSD}(P\parallel Q)$	$\displaystyle=\frac{1}{2}D_{KL}(P\parallel M)+\frac{1}{2}D_{KL}(Q\parallel M)$
	$\displaystyle\text{where }M$	$\displaystyle=\frac{1}{2}(P+Q)$

Predictive Entropy:

Measures the uncertainty of the model’s next-token distribution over the full vocabulary $\mathcal{V}$ . Low entropy indicates that the model assigns high probability mass to a single token — consistent with confident, memorized reproduction — whereas high entropy reflects diffuse, uncertain predictions.

H(Y\mid\mathbf{x})=-\sum_{c\in\mathcal{V}}P(y=c\mid\mathbf{x})\log P(y=c\mid\mathbf{x})

Margin:

Captures the decisiveness of the model’s token predictions by computing the difference in probability between the top-ranked and second-ranked tokens. A large margin indicates high confidence in a specific token, which may signal memorized recall, while a small margin reflects genuine uncertainty between competing continuations.

M=P(\hat{y}_{1}\mid\mathbf{x})-P(\hat{y}_{2}\mid\mathbf{x})

H.2 Features for Vision Models

Mean Squared Error (MSE):

Measures the average pixel-level reconstruction error between the generated image at a given iteration and the original input. Lower MSE indicates that the model consistently reproduces fine-grained pixel details across iterations, which is a strong signal of memorization.

\mathrm{MSE}(x,\hat{x})=\frac{1}{N}\sum_{i=1}^{N}\left(x_{i}-\hat{x}_{i}\right)^{2}

Structural Similarity Index Measure (SSIM) [37]:

Evaluates perceptual similarity between the reconstructed image $\hat{x}$ and the original $x$ by jointly comparing luminance, contrast, and structural information across local image patches. Unlike MSE, SSIM is sensitive to perceptual distortions that are meaningful to human observers, and its stability across iterations serves as a complementary signal to pixel-level metrics.

\mathrm{SSIM}(x,\hat{x})=\frac{(2\mu_{x}\mu_{\hat{x}}+c_{1})(2\sigma_{x\hat{x}}+c_{2})}{(\mu_{x}^{2}+\mu_{\hat{x}}^{2}+c_{1})(\sigma_{x}^{2}+\sigma_{\hat{x}}^{2}+c_{2})}

where $\mu_{x}$ , $\mu_{\hat{x}}$ are local means, $\sigma_{x}^{2}$ , $\sigma_{\hat{x}}^{2}$ are local variances, $\sigma_{x\hat{x}}$ is the cross-covariance, and $c_{1}$ , $c_{2}$ are stabilization constants.

Learned Perceptual Image Patch Similarity (LPIPS) [47]:

Quantifies perceptual dissimilarity between $x$ and $\hat{x}$ using deep feature representations extracted from a pretrained network $\phi$ . By operating in a learned feature space rather than pixel space, LPIPS captures high-level semantic and textural differences that are invisible to MSE or SSIM, making it particularly sensitive to cases where a model reproduces semantic content while varying low-level details.

\mathrm{LPIPS}(x,\hat{x})=\sum_{l}\frac{1}{H_{l}W_{l}}\sum_{h,w}\left\|w_{l}\odot\left(\phi_{l}(x)_{hw}-\phi_{l}(\hat{x})_{hw}\right)\right\|_{2}^{2}

where $\phi_{l}$ denotes the feature map at layer $l$ of the pretrained network and $w_{l}$ are learned channel-wise weights.

Appendix I Additional Dataset Inference Results

Figure˜8 extends our dataset inference evaluation to Llama-13B and VAR-d30. On Llama-13B, augmented variants reach the 95% confidence threshold faster than the baseline, with the Combined and Quality signals leading, though convergence is noisier at low sample counts. On VAR-d30, the benefit is more pronounced: augmented variants cross the threshold at roughly 100 samples compared to over 200 for the baseline, with all three signal types outperforming it consistently. The significance histograms corroborate these findings — the Combined variant shifts the $-\log_{10}(p)$ distribution rightward on both models, confirming that trajectory features yield stronger per-trial evidence.

Appendix J Precision and Recall for Generative Models

Figure˜9 shows Precision and Recall across iterations for VAR-d30 and DiT-MoE-XL. In both models and both metrics, members consistently score higher than non-members throughout all iterations, confirming that the membership signal is stable and model-agnostic. Notably, the gap between members and non-members widens as iterations progress, indicating that chained regeneration amplifies the underlying asymmetry rather than merely preserving it.

Appendix K Getty Images Case

As a practical case study, we consider the Getty Images v. Stability AI dispute [9] and evaluate whether chained regeneration can distinguish images that are plausibly associated with the Stable Diffusion training distribution from images that are very unlikely to have been included. We use Stable Diffusion 1.5 as the target model. For the positive pool, we extract 2,000 images from LAION-2B whose metadata contains the string gettyimages and treat them as members. For the negative pool, we collect 2,000 images from the Getty Images website whose upload date is after January 1, 2025, and treat them as non-members. Because these images post-date the original Stable Diffusion 1.5 training era (late 2022), they provide a conservative practical control group for this experiment.

For each pool, we run the same chained-regeneration procedure for 15 iterations and summarize the trajectories with SSIM and reconstruction error (MSE) (see Figure˜10). The SSIM plot measures whether regenerations remain structurally closer to the initial query for the member pool than for the non-member pool. The MSE plot provides a complementary pixel-level view across regeneration depth by measuring how quickly reconstructed samples drift away from their reference images. In our experiments, the two pools remain visibly separated under both SSIM and MSE. We do not use FID in this case, because it is very unstable on 2,000-image pools. We still interpret MSE conservatively: it is sensitive to low-level reconstruction error rather than semantic fidelity alone. For this reason, we use MSE as a stable auxiliary trajectory measure across iterations, while SSIM remains the more directly interpretable structural signal in this case study.

Appendix L Proofs for Section 3

L.1 Proof of Theorem 3.2

Proof. By definition,

\mathbb{E}[S_{T}\mid M=m]=\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[\phi_{t}\mid M=m],\qquad m\in\{0,1\}.

Hence

\mathbb{E}[S_{T}\mid 1]-\mathbb{E}[S_{T}\mid 0]=\frac{1}{T}\sum_{t=0}^{T-1}\Big(\mathbb{E}[\phi_{t}\mid 1]-\mathbb{E}[\phi_{t}\mid 0]\Big).

Under A1,

\mathbb{E}[\phi_{t}\mid 1]-\mathbb{E}[\phi_{t}\mid 0]\geq\Delta_{t}\geq 0,\quad\forall t,

\mathbb{E}[S_{T}\mid 1]-\mathbb{E}[S_{T}\mid 0]\geq\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}\geq 0.

Therefore

\Gamma_{T}:=\big|\mathbb{E}[S_{T}\mid 1]-\mathbb{E}[S_{T}\mid 0]\big|\geq\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}.

For the denominator, A3 gives, for each class $m$ ,

\mathrm{Var}(S_{T}\mid M=m)\leq C\frac{\sigma^{2}\tau_{\mathrm{eff}}}{T}.

Hence

\max_{m}\mathrm{Var}(S_{T}\mid M=m)\leq C\frac{\sigma^{2}\tau_{\mathrm{eff}}}{T}.

Combining with the lower bound on $\Gamma_{T}$ ,

\mathrm{SNR}^{2}(S_{T})=\frac{\Gamma_{T}^{2}}{\max_{m}\mathrm{Var}(S_{T}\mid M=m)}\geq\frac{\left(\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}\right)^{2}}{C\sigma^{2}\tau_{\mathrm{eff}}/T}.

This proves Theorem 3.2. ∎

L.2 Proof of Corollary 3.3

Proof. Assume $\Delta_{t}=\Delta_{0}e^{-t/\tau_{g}}$ . Then

\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}=\frac{\Delta_{0}}{T}\sum_{t=0}^{T-1}e^{-t/\tau_{g}}=\frac{\Delta_{0}}{T}\cdot\frac{1-e^{-T/\tau_{g}}}{1-e^{-1/\tau_{g}}}.

Since $1-e^{-u}\leq u$ for $u>0$ , with $u=1/\tau_{g}$ we get

1-e^{-1/\tau_{g}}\leq\frac{1}{\tau_{g}}\quad\Longrightarrow\quad\frac{1}{1-e^{-1/\tau_{g}}}\geq\tau_{g}.

Therefore

\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}\geq\Delta_{0}\frac{1-e^{-T/\tau_{g}}}{T/\tau_{g}}.

By Theorem 3.2,

\mathrm{SNR}^{2}(S_{T})\geq\frac{\left(\frac{1}{T}\sum_{t=0}^{T-1}\Delta_{t}\right)^{2}}{C\sigma^{2}\tau_{\mathrm{eff}}/T}\;\gtrsim\;\frac{\Delta_{0}^{2}\tau_{g}}{\sigma^{2}\tau_{\mathrm{eff}}}\,g(x),

where

g(x):=\frac{(1-e^{-x})^{2}}{x},\qquad x:=T/\tau_{g},

and $\gtrsim$ absorbs only $T$ -independent constants (including $1/C$ and comparability constants).

To optimize the shape in $x$ , differentiate:

g^{\prime}(x)=\frac{(1-e^{-x})\big(2xe^{-x}-(1-e^{-x})\big)}{x^{2}}.

For $x>0$ , critical points satisfy

2xe^{-x}=1-e^{-x}\quad\Longleftrightarrow\quad e^{x}=2x+1.

This has a unique positive solution $x^{\star}\approx 1.2564$ , so the surrogate shape is maximized at

T^{\star}\approx x^{\star}\tau_{g}\approx 1.2564\,\tau_{g}.

∎

L.3 Proof of Corollary 3.4 (shape-constant clarification)

Proof. From the previous corollary (under the same comparability regime),

\mathrm{SNR}^{2}(S_{T^{\star}})\gtrsim\frac{\Delta_{0}^{2}\tau_{g}}{\sigma^{2}\tau_{\mathrm{eff}}}\,g(x^{\star}).

Assume additionally

\Gamma_{1}\asymp\Delta_{0},\qquad\mathrm{Var}(S_{1}\mid M)\asymp\sigma^{2},

so $\mathrm{SNR}(S_{1})\asymp\Delta_{0}/\sigma$ . Taking square roots and ratio:

\frac{\mathrm{SNR}(S_{T^{\star}})}{\mathrm{SNR}(S_{1})}\gtrsim\sqrt{g(x^{\star})}\sqrt{\frac{\tau_{g}}{\tau_{\mathrm{eff}}}}=c_{\mathrm{shape}}\sqrt{\kappa},

where

\kappa:=\frac{\tau_{g}}{\tau_{\mathrm{eff}}},\qquad c_{\mathrm{shape}}:=\sqrt{g(x^{\star})}\approx 0.638.

Thus $c_{\mathrm{shape}}$ is the idealized shape constant; additional model-dependent prefactors remain absorbed by $\gtrsim$ . ∎

L.4 Additional comments on Bayes-cap statement at the end of Section 3

If membership is deterministic in the initial sample, $M=f(Z_{0})$ , then $H(M\mid Z_{0})=0$ , so

I(M;Z_{0})=H(M)-H(M\mid Z_{0})=H(M).

Also, conditioning on $Z_{0}$ already determines $M$ , hence

H(M\mid Z_{0},Z_{1:T})=0=H(M\mid Z_{0}),

which implies

I(M;Z_{1:T}\mid Z_{0})=H(M\mid Z_{0})-H(M\mid Z_{0},Z_{1:T})=0.

Therefore, by the chain rule for mutual information,

I(M;Z_{0:T})=I(M;Z_{0})+I(M;Z_{1:T}\mid Z_{0})=I(M;Z_{0}).

Thus trajectory iteration cannot increase Bayes-optimal information; it can improve practical fixed-form statistics through variance reduction and temporal aggregation.

Amplifying Membership Signal Through Chained Regeneration

Abstract

1 Introduction

2 Related Works

Memorization.

Membership and Dataset Inference.

Model Collapse.

3 Theory of Trajectory-Based Signal Amplification

Assumption 3.1 (Signal and Noise).

Theorem 3.2 (Trajectory Averaging).

Corollary 3.3 (Exponential Leakage).

Corollary 3.4 (Amplification Gain).

4 Method

Unified setup.

Threat model.

Base one-shot signal.

Signals and Fusion.

Mechanism.

4.1 Modality-specific instantiations

Image autoregressive models (IARs) and diffusion models.

Large language models (LLMs).

Audio generative models.

5 Experiments

5.1 Experimental Setup

5.2 Metrics

5.3 MIA and DI procedures

MIA pipeline.

DI pipeline.

5.4 Research questions

5.5 Members and Nonmembers differ in generative trajectories: qaulitative results.

The asymmetry is present across diverse models and modalities.

5.6 ϕ​(t)\phi(t) statistics may increase membership signal over one-shot ϕ​(0)\phi(0).

5.7 MADreMIA amplifies baseline MIA

5.8 MADreMIA amplifies baseline DI

5.9 Sensitivity analysis of generation strength

5.10 Trajectory asymmetry scaling across model families

6 Conclusions

Acknowledgments

References

Appendix A Impact Statement

Appendix B Limitations

Appendix C LLM Usage

Appendix D Method Overview

Appendix E Extended Related Works

E.1 Memorization

E.2 Membership/Dataset Inference

E.3 Model Collapse

Appendix F Model Details

Appendix G Dataset Details

Appendix H Metrics Details

H.1 Features for Language Models

H.2 Features for Vision Models

Appendix I Additional Dataset Inference Results

Appendix J Precision and Recall for Generative Models

Appendix K Getty Images Case

Appendix L Proofs for Section 3

L.1 Proof of Theorem 3.2

L.2 Proof of Corollary 3.3

L.3 Proof of Corollary 3.4 (shape-constant clarification)

L.4 Additional comments on Bayes-cap statement at the end of Section 3

Amplifying Membership Signal
Through Chained Regeneration

5.6 $\phi(t)$ statistics may increase membership signal over one-shot $\phi(0)$ .