Awesome-ML-Security-and-Privacy-Papers

A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security and NDSS).

1. Security Papers

1.1 Adversarial Attack & Defense

1.1.1 Image

Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. USENIX Security 2020. Transferability + Query. Black-box Attack [pdf] [code]
Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning. USENIX Security 2020. Defense of Image Scaling Attack [pdf] [code]
HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. IEEE S&P 2020. Query-based Black-box Attack [pdf] [code]
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking. USENIX Security 2021. Adversarial Patch Defense [pdf] [code]
Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. ACM CCS 2020. Build an trap in model to induce specific adversarial perturbation [pdf] [code]
A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models. ACM CCS 2020. Perturbate both input and model [pdf] [code]
Feature-Indistinguishable Attack to Circumvent Trapdoor-Enabled Defense. ACM CCS 2021. A new attack method can break TeD defense mechanism [pdf] [code]
DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks. ACM CCS 2021. Provable robustness for patch hiding in object detection [pdf] [code]
It's Not What It Looks Like: Manipulating Perceptual Hashing based Applications. ACM CCS 2021. Adversarial Attack against PHash [pdf] [code]
RamBoAttack: A Robust and Query Efficient Deep Neural Network Decision Exploit. NDSS 2022. Query-based black box attack [pdf] [code]
What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction. NDSS 2022. Generative-based AE detection [pdf] [code]
AutoDA: Automated Decision-based Iterative Adversarial Attacks. USENIX 2022. Program Synthesis for Adversarial Attack [pdf]
Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks. USENIX Security 2022. AE Detection using probabilistic fingerprints based on hash of input similarity [pdf] [code]
Physical Hijacking Attacks against Object Trackers. ACM CCS 2022. Adversarial Attacks on Object Trackers [pdf] [code]
Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models. ACM CCS 2022. Adversarial Attacks on Object Trackers [pdf]
Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning. USENIX Security 2023. Adversarial Attacks against PhotoDNA and PDQ [pdf]
The Space of Adversarial Strategies. USENIX Security 2023. Decompose the Adversarial Attack Components and combine them together [pdf]
Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks. ACM CCS 2023. Attack strategy to enhance the query-based attack against the stateful defense [pdf] [code]
BounceAttack: A Query-Efficient Decision-based Adversarial Attack by Bouncing into the Wild. IEEE S&P 2024. Query-based hard label attack [pdf]
Sabre: Cutting through Adversarial Noise with Adaptive Spectral Filtering and Input Reconstruction. IEEE S&P 2024. Filter-based adversarial perturbation defense [pdf] [code]
Sabre: Cutting through Adversarial Noise with Adaptive Spectral Filtering and Input Reconstruction. IEEE S&P 2024. Adversarial attack against face recognization system [pdf] [code]
Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability. IEEE S&P 2024. Exploring the transferability of adversarial examples [pdf] [code]
Group-based Robustness: A General Framework for Customized Robustness in the Real World. NDSS 2024. New metrics to measure adversarial examples [pdf]
DorPatch: Distributed and Occlusion-Robust Adversarial Patch to Evade Certifiable Defenses. NDSS 2024. Adversarial path against certified robustness [pdf] [code]
UniID: Spoofing Face Authentication System by Universal Identity. NDSS 2024. Face apoofing attack [pdf]
Enhance Stealthiness and Transferability of Adversarial Attacks with Class Activation Mapping Ensemble Attack. NDSS 2024. Enhancing transferability of adversarial examples [pdf] [code]
Neural Invisibility Cloak: Concealing Adversary in Images via Compromised AI-driven Image Signal Processing. USENIX Security 2025. [pdf]
Self-interpreting Adversarial Images. USENIX Security 2025. [pdf]

1.1.2 Text

TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation. USENIX Security 2020. Defense in preprossing [pdf]
Bad Characters: Imperceptible NLP Attacks. IEEE S&P 2022. Use unicode to conduct human imperceptible attack [pdf] [code]
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models. ACM CCS 2022. Attack Neural Ranking Models [pdf]
No more Reviewer #2: Subverting Automatic Paper-Reviewer Assignment using Adversarial Learning. USENIX Security 2023. Adversarial Attack on Paper Assignment [pdf]

1.1.3 Audio

WaveGuard: Understanding and Mitigating Audio Adversarial Examples. USENIX Security 2021. Defense in preprossing [pdf] [code]
Dompteur: Taming Audio Adversarial Examples. USENIX Security 2021. Defense in preprossing. Preprocessing the audio to make the noise human noticeable [pdf] [code]
EarArray: Defending against DolphinAttack via Acoustic Attenuation. NDSS 2021. Defense [pdf]
Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. IEEE S&P 2021. Attack [pdf] [code]
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems. IEEE S&P 2021. Black-box Attack [pdf]
SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems. IEEE S&P 2021. Survey [pdf]
AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. ACM CCS 2020. Attack [pdf]
Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information. ACM CCS 2021. Black-box Attack. Physical World [pdf]
Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception. ACM CCS 2022. Adversarial Audio with human-aware noise [pdf]
SpecPatch: Human-in-the-Loop Adversarial Audio Spectrogram Patch Attack on Speech Recognition. ACM CCS 2022. Adversarial Patch for audio [pdf]
Learning Normality is Enough: A Software-based Mitigation against Inaudible Voice Attacks. USENIX Security 2023. Unsupervised learning-based defense [pdf]
Understanding and Benchmarking the Commonality of Adversarial Examples. IEEE S&P 2024. Common features of adversarial audio examples [pdf]
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features. IEEE S&P 2024. Black-box adverarial audio attack [pdf] [code]
Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models. NDSS 2024. Black-box adverarial audio attack using parrot [pdf]
When Translators Refuse to Translate: A Novel Attack to Speech Translation Systems. USENIX Security 2025. [pdf]

1.1.4 Video

Universal 3-Dimensional Perturbations for Black-Box Attacks on Video Recognition Systems. IEEE S&P 2022. Adversarial attack in video recognition [pdf]
StyleFool: Fooling Video Classification Systems via Style Transfer. IEEE S&P 2023. Style Transfer to conduct adversarial attack [pdf] [code]

1.1.5 Graph

A Hard Label Black-box Adversarial Attack Against Graph Neural Networks. ACM CCS 2021. Graph Classification [pdf]

1.1.6 Software

Evading Classifiers by Morphing in the Dark. ACM CCS 2017. Morpher and search to generate adversarial PDF [pdf]
Misleading Authorship Attribution of Source Code using Adversarial Learning. USENIX Security 2019. Adversarial attack in source code, MCST [pdf] [code]
Intriguing Properties of Adversarial ML Attacks in the Problem Space. IEEE S&P 2020. Attack Malware Classification [pdf]
Structural Attack against Graph Based Android Malware Detection. IEEE S&P 2020. Perturbed function call graph [pdf]
URET: Universal Robustness Evaluation Toolkit (for Evasion). USENIX Security 2023. General Toolbox to select the perdefined perturbations [pdf] [code]
Adversarial Training for Raw-Binary Malware Classifiers. USENIX Security 2023. Adversarial Training for Windows PE malware [pdf]
PELICAN: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis. USENIX Security 2023. Reverse engineering natural backdoor in transformer-based x86 binary code analysis task [pdf]
Black-box Adversarial Example Attack towards FCG Based Android Malware Detection under Incomplete Feature Information. USENIX Security 2023. Black-box Android Adversarial Malware against the FCG-based ML classifier [pdf]
Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting. ACM CCS 2023. Semantic similar perturbations are more likely to have similar evasion effectiveness [pdf] [code]
Make a Feint to the East While Attacking in the West: Blinding LLM-Based Code Auditors with Flashboom Attacks. IEEE S&P 2025. [pdf]

1.1.7 Hardware

ATTRITION: Attacking Static Hardware Trojan Detection Techniques Using Reinforcement Learning. ACM CCS 2022. Attack Hardware Trojan Detection [pdf]
DeepShuffle: A Lightweight Defense Framework against Adversarial Fault Injection Attacks on Deep Neural Networks in Multi-Tenant Cloud-FPGA. IEEE S&P 2024. Adversarial defense against adversarial fault injection [pdf]

1.1.8 Interpret Method

Interpretable Deep Learning under Fire. USENIX Security 2020. Attack both image classification and interpret method [pdf]
“Is your explanation stable?”: A Robustness Evaluation Framework for Feature Attribution. ACM CCS 2022. Hypothesis Testing to increasing the robustness of explaination methods [pdf]
AIRS: Explanation for Deep Reinforcement Learning based Security Applications. USENIX Security 2023. DRL Interpertation Method to pinpoint the most influence step [pdf] [code]
SoK: Explainable Machine Learning in Adversarial Environments. IEEE S&P 2024. Adversarial Explaination SoK [pdf

1.1.9 Physical World

SLAP: Improving Physical Adversarial Examples with Short-Lived Adversarial Perturbations. USENIX Security 2021. Projector light causes misclassification [pdf] [code]
Understanding Real-world Threats to Deep Learning Models in Android Apps. ACM CCS 2022. Adversarial Attack in real-world models [pdf]
X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection. USENIX Security 2023. Adversarial Attack on X-ray Images [pdf] [code]
That Person Moves Like A Car: Misclassification Attack Detection for Autonomous Systems Using Spatiotemporal Consistency. USENIX Security 2023. Robust OD in Autonomous System using spatiotemporal information [pdf]
You Can't See Me: Physical Removal Attacks on LiDAR-based Autonomous Vehicles Driving Frameworks. USENIX Security 2023. Adversarial attack against Autonomous Vehicles using Laser [pdf] demo]
CAPatch: Physical Adversarial Patch against Image Captioning Systems. USENIX Security 2023. Physical Adversarial Patch against the image caption system [pdf] [code]
Exorcising "Wraith": Protecting LiDAR-based Object Detector in Automated Driving System from Appearing Attacks. USENIX Security 2023. Defend the appearing attack in autonomous system using local objectness predictor [pdf] [code]
Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception. NDSS 2024. Adversarial attacks on automous vehicles using infrared laser reflections [pdf]
Avara: A Uniform Evaluation System for Perceptibility Analysis Against Adversarial Object Evasion Attacks. CCS 2024. Adversarial Object Evasion attack evaluation system [pdf] [code]
VisionGuard: Secure and Robust Visual Perception of Autonomous Vehicles in Practice. CCS 2024. Adversarial Patch detection [pdf] [demo]
Invisible but Detected: Physical Adversarial Shadow Attack and Defense on LiDAR Object Detection. USENIX Security 2025. [pdf]
From Threat to Trust: Exploiting Attention Mechanisms for Attacks and Defenses in Cooperative Perception. USENIX Security 2025. [pdf]

1.1.10 Reinforcement Learning

Adversarial Policy Training against Deep Reinforcement Learning. USENIX Security 2021. Weird behavior to trigger opposite abnormal action. Two-agent competitor game [pdf] [code]
SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems. CCS 2024. Adversarial policy against the reinforcement learning system [pdf] [code]
CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization. USENIX Security 2025. [pdf]

1.1.11 Robust Defense

Cost-Aware Robust Tree Ensembles for Security Applications. USENIX Security 2021. Propose Cost of feature to certify the model robustness [pdf] [code]
CADE: Detecting and Explaining Concept Drift Samples for Security Applications. USENIX Security 2021. Detect Concept shift [pdf] [code]
Learning Security Classifiers with Verified Global Robustness Properties. ACM CCS 2021. Train a classifier with global robustness [pdf] [code]
On the Robustness of Domain Constraints. ACM CCS 2021. Domain constraints. Input space robustness [pdf]
Cert-RNN: Towards Certifying the Robustness of Recurrent Neural Networks. ACM CCS 2021. Certify robustness in RNN [pdf]
TSS: Transformation-Specific Smoothing for Robustness Certification. ACM CCS 2021. Certify robustness about transformation [pdf][code]
Transcend: Detecting Concept Drift in Malware Classification Models. USENIX Security 2017. Conformal evaluators [pdf] [code]
Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift. IEEE S&P 2022. New conformal evaluators [pdf][code]
Transferring Adversarial Robustness Through Robust Representation Matching. USENIX Security 2022. Robust Transfer Learning [pdf]
DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing. USENIX Security 2023. Diffusion Model Improve Certified Robustness [pdf]
Anomaly Detection in the Open World: Normality Shift Detection, Explanation, and Adaptation. NDSS 2023. Concept Drift Detection using unsupervised approch [pdf] [code]
BARS: Local Robustness Certification for Deep Learning based Traffic Analysis Systems. NDSS 2023. Certified Robustness for Traffic Analysis Systems [pdf] [code]
REaaS: Enabling Adversarially Robust Downstream Classifiers via Robust Encoder as a Service. NDSS 2023. Build a certificable EaaS model [pdf]
Continuous Learning for Android Malware Detection. USENIX Security 2023. New Continual Learning Paridigram for Malware detection [pdf] [code]
ObjectSeeker: Certifiably Robust Object Detection against Patch Hiding Attacks via Patch-agnostic Masking. IEEE S&P 2023. Certified robustness of object detection [pdf] [code]
On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks. IEEE S&P 2023. Adversarial attacks on feature space may enhance the robustness in problem space [pdf] [code]
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks. IEEE S&P 2024. Certified robustness on adversarial text [pdf] [code]
It's Simplex! Disaggregating Measures to Improve Certified Robustness. IEEE S&P 2024. Disagreement to improve the certified robustness [pdf] [code]
SoK: Efficiency Robustness of Dynamic Deep Learning Systems. USENIX Security 2025. [pdf]
AGNNCert: Defending Graph Neural Networks against Arbitrary Perturbations with Deterministic Certification. USENIX Security 2025. [pdf]
Robustifying ML-powered Network Classifiers with PANTS. USENIX Security 2025. [pdf]
CertTA: Certified Robustness Made Practical for Learning-Based Traffic Analysis. USENIX Security 2025. [pdf]
Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning. ACM CCS 2025. [pdf]

1.1.12 Network Traffic

Defeating DNN-Based Traffic Analysis Systems in Real-Time With Blind Adversarial Perturbations. USENIX Security 2021. Adversarial attack to defeat DNN-based traffic analysis [pdf] [code]
Pryde: A Modular Generalizable Workflow for Uncovering Evasion Attacks Against Stateful Firewall Deployments. IEEE S&P 2024. Evasion attack against Firewalls [pdf]
Multi-Instance Adversarial Attack on GNN-Based Malicious Domain Detection. IEEE S&P 2024. Adversarial attack on GNN-based malicious domain detection [pdf] [code]
Swallow: A Transfer-Robust Website Fingerprinting Attack via Consistent Feature Learning. ACM CCS 2025. [pdf]

1.1.13 Wireless Communication System

Robust Adversarial Attacks Against DNN-Based Wireless Communication Systems. ACM CCS 2021. Attack [pdf]

1.1.14 Tabular Data

Adversarial Robustness for Tabular Data through Cost and Utility Awareness. NDSS 2023. Adversarial Attack & Defense on tabular data [pdf]

1.2 Distributed Machine Learning

1.2.1 Federated Learning

Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. USENIX Security 2020. Poisoning Attack [pdf]
Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning. NDSS 2021. Poisoning Attack [pdf]
DeepSight: Mitigating Backdoor Attacks in Federated Learning Through Deep Model Inspection. NDSS 2022. Backdoor defense [pdf]
FLAME: Taming Backdoors in Federated Learning. USENIX Security 2022. Backdoor defense [pdf]
EIFFeL: Ensuring Integrity for Federated Learning. ACM CCS 2022. New FL Protocol to guarteen integrity [pdf]
Eluding Secure Aggregation in Federated Learning via Model Inconsistency. ACM CCS 2022. Model inconsistency to break the secure aggregation [pdf]
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information. IEEE S&P 2023. Poisoned Model Recovery Algorithm [pdf]
Every Vote Counts: Ranking-Based Training of Federated Learning to Resist Poisoning Attacks. USENIX Security 2023. Discrete the model updates and purning the model to defense the poisoning attack [pdf] [code]
Securing Federated Sensitive Topic Classification against Poisoning Attacks. NDSS 2023. Robust Aggregation against the poisoning attack [pdf]
BayBFed: Bayesian Backdoor Defense for Federated Learning. IEEE S&P 2023. Purify the model updates using bayesian [pdf]
ADI: Adversarial Dominating Inputs in Vertical Federated Learning Systems. IEEE S&P 2023. Poisoning the vertical federated learning system [pdf] [code]
3DFed: Adaptive and Extensible Framework for Covert Backdoor Attack in Federated Learning. IEEE S&P 2023. Convert normal backdoor into the federated learning scenario [pdf]
FLShield: A Validation Based Federated Learning Framework to Defend Against Poisoning Attacks. IEEE S&P 2023. Data poisoning defense [pdf]
BadVFL: Backdoor Attacks in Vertical Federated Learning. IEEE S&P 2023. Backdoor attacks against vertical federated learning [pdf]
CrowdGuard: Federated Backdoor Detection in Federated Learning. NDSS 2024. Backdoor detection in federated learning leveraging hidden layer outputs [pdf] [code]
Automatic Adversarial Adaption for Stealthy Poisoning Attacks in Federated Learning. NDSS 2024. Adaptative poisoning attacks in FL [pdf]
FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning Attacks in Federated Learning. NDSS 2024. Mitigate poisoning attack in FL using frequency analysis techniques [pdf]
Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning – A Survey and Taxonomy. CCS 2024. Mitigate poisoning attack in FL using frequency analysis techniques [pdf]
Byzantine-Robust Decentralized Federated Learning. CCS 2024. Byzantine robust federated learning [pdf]
PoiSAFL: Scalable Poisoning Attack Framework to Byzantine-resilient Semi-asynchronous Federated Learning. USENIX Security 2025. [pdf]
DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data. USENIX Security 2025. [pdf]

1.2.2 Normal Distributed Learning

Justinian's GAAvernor: Robust Distributed Learning with Gradient Aggregation Agent. USENIX Security 2020. Defense in Gradient Aggregation. Reinforcement learning [pdf]

1.3 Data Poisoning

1.3.1 Hijack Embedding

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning. IEEE S&P 2020. Hijack Word Embedding [pdf]

1.3.2 Hijack Autocomplete Code

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. USENIX Security 2021. Hijack Code Autocomplete [pdf]
TROJANPUZZLE: Covertly Poisoning Code-Suggestion Models. IEEE S&P 2024. Hijack Code Autocomplete [pdf] [code]

1.3.3 Semi-Supervised Learning

Poisoning the Unlabeled Dataset of Semi-Supervised Learning. USENIX Security 2021. Poisoning semi-supervised learning [pdf]

1.3.4 Recommender Systems

Data Poisoning Attacks to Deep Learning Based Recommender Systems. NDSS 2021. The attacker chosen items are recommended as much as possible [pdf]
Reverse Attack: Black-box Attacks on Collaborative Recommendation. ACM CCS 2021. Black-box setting. Surrogate model. Collaborative Filtering. Demoting and Promoting [pdf]

1.3.5 Classification

Subpopulation Data Poisoning Attacks. ACM CCS 2021. Poisoning to flip a group of data samples [pdf]
Get a Model! Model Hijacking Attack Against Machine Learning Models. NDSS 2022. Fusing dataset to hijacking model [pdf] [code]

1.3.6 Constractive Learning

PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning. USENIX Security 2022. Poison attack in constractive learning [pdf]
Preference Poisoning Attacks on Reward Model Learning. IEEE S&P 2025. Poison attack in reward model learning [pdf]

1.3.7 Privacy

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets. ACM CCS 2022. Poison attack to reveal sensitive information [pdf]

1.3.8 Test-Time Poisoning

Test-Time Poisoning Attacks Against Test-Time Adaptation Models. IEEE S&P 2024. Poisoning attack at test time [pdf] [code]

1.3.9 Defense

Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks. USENIX Security 2022. Identify poisioned subset by clustering and purning benign set [pdf]
Understanding Implosion in Text-to-Image Generative Models. CCS 2024. Analytic framework for the poisoning attack against T2I model [pdf]

1.4 Backdoor

1.4.1 Image

Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection. USENIX Security 2021. Class-specific Backdoor. Defense by decomposition [pdf]
Double-Cross Attacks: Subverting Active Learning Systems. USENIX Security 2021. Active Learning System. Backdoor Attack [pdf]
Detecting AI Trojans Using Meta Neural Analysis. IEEE S&P 2021. Meta Neural Classifier [pdf] [code]
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning. IEEE S&P 2022. Backdoor attack in image-text pretrained model [pdf] [code]
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features. ACM CCS 2020. Composite backdoor. Image & text tasks [pdf] [code]
AI-Lancet: Locating Error-inducing Neurons to Optimize Neural Networks. ACM CCS 2021. Locate neural location and finetuning it [pdf]
LoneNeuron: a Highly-Effective Feature-Domain Neural Trojan Using Invisible and Polymorphic Watermarks. ACM CCS 2022. Backdoor attack by modifying neuros [pdf]
ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks. NDSS 2022. Backdoor attack by attention techniques [pdf]
RAB: Provable Robustness Against Backdoor Attacks. IEEE S&P 2023. Backdoor Cetrification [pdf]
A Data-free Backdoor Injection Approach in Neural Networks. USENIX Security 2023. Data free backdoor injection [pdf] [code]
Backdoor Attacks Against Dataset Distillation. NDSS 2023. Backdoor attack against dataset istillation [pdf] [code]
BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense. NDSS 2023. Backdoor Forensics [pdf] [code]
Disguising Attacks with Explanation-Aware Backdoors. IEEE S&P 2023. Backdoor to mislead the explaination method [pdf]
Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of Backdoor Effects in Trojaned Machine Learning Models. IEEE S&P 2023. Finetuning to remove backdoor [pdf]
AI-Guardian: Defeating Adversarial Attacks using Backdoors. IEEE S&P 2023. using backdoor to detect adversarial example. Backdoor with all-to-all mapping and reverse the mapping [pdf]
REDEEM MYSELF: Purifying Backdoors in Deep Learning Models using Self Attention Distillation. IEEE S&P 2023. Purifying backdoor using model distillation [pdf]
NARCISSUS: A Practical Clean-Label Backdoor Attack with Limited Information. ACM CCS 2023. Clean label backdoor attack [pdf] [code]
ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms. USENIX Security 2023. Backdoor Defense works in Different Learning Paradigms [pdf] [code]
ODSCAN: Backdoor Scanning for Object Detection Models. IEEE S&P 2024. Backdoor defense by model dynamics [pdf] [github]
MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic. IEEE S&P 2024. Backdoor defense using maximum margin statistic in classification layer [pdf] [github]
Distribution Preserving Backdoor Attack in Self-supervised Learning. IEEE S&P 2024. Backdoor attack in contrastive learning by improving the distribution [pdf] [github]
Backdooring Bias (B^2) into Stable Diffusion Models. USENIX Security 2025. Backdoor attack in stable diffusion model [pdf]
Watch the Watchers! On the Security Risks of Robustness-Enhancing Diffusion Models. USENIX Security 2025. [pdf]
Pretender: Universal Active Defense against Diffusion Finetuning Attacks. USENIX Security 2025. [pdf]
Rowhammer-Based Trojan Injection: One Bit Flip Is Sufficient for Backdooring DNNs. USENIX Security 2025. [pdf]
From Purity to Peril: Backdooring Merged Models From "Harmless" Benign Components. USENIX Security 2025. [pdf]
Revisiting Training-Inference Trigger Intensity in Backdoor Attacks. USENIX Security 2025. [pdf]
Persistent Backdoor Attacks in Continual Learning. USENIX Security 2025. [pdf]

1.4.2 Text

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification. USENIX Security 2021. Backdoor Defense. GAN to recover trigger [pdf] [code]
Hidden Backdoors in Human-Centric Language Models. ACM CCS 2021. Novel trigger [pdf] [code]
Backdoor Pre-trained Models Can Transfer to All. ACM CCS 2021. Backdoor in pre-trained to poison the down stream task [pdf] [code]
Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation. USENIX Security 2022. Backdoor via linguistic style manipulation [pdf]
TextGuard: Provable Defense against Backdoor Attacks on Text Classification. NDSS 2024. Provable backdoor defense by spliting the sentence and ensumble learning [pdf] [code]

1.4.3 Graph

Graph Backdoor. USENIX Security 2021. Classification [pdf] [code]
Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses. CCS 2024. Distributed Backdoor attacks on federated graph learning [pdf] [code]

1.4.4 Software

Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. USENIX Security 2021. Explanation Method. Evade Classification [pdf] [code]

1.4.5 Audio

TrojanModel: A Practical Trojan Attack against Automatic Speech Recognition Systems. IEEE S&P 2023. Backdoor attack in speech recognition systems [pdf]
MagBackdoor: Beware of Your Loudspeaker as Backdoor of Magnetic Attack for Malicious Command Injection. IEEE S&P 2023. Backdoor attack in audio using magentic trigget [pdf]

1.4.6 Multimedia

Backdooring Multimodal Learning. IEEE S&P 2024. Backdoor attack in multimedia learning [pdf] [code]

1.4.7 Neuromorphic Data

Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data. NDSS 2024. Backdoor attack in neuromorphic data [pdf] [code]

1.5 ML Library Security

1.5.1 Loss

Blind Backdoors in Deep Learning Models. USENIX Security 2021. Loss Manipulation. Backdoor [pdf] [code]
IvySyn: Automated Vulnerability Discovery in Deep Learning Frameworks. USENIX Security 2023. Automatic Bug Discovery in ML libraries [pdf]

1.6 AI4Security

1.6.1 Cyberbullying

Towards Understanding and Detecting Cyberbullying in Real-world Images. NDSS 2021. Detect image cyberbully [pdf]
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content. IEEE S&P 2024. Using LLM for toxic content detection [pdf] [code]
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns. USENIX Security 2025. [pdf] [code]

1.6.2 Security Applications

FARE: Enabling Fine-grained Attack Categorization under Low-quality Labeled Data. NDSS 2021. Clustering Method to complete the dataset label [pdf] [code]
From Grim Reality to Practical Solution: Malware Classification in Real-World Noise. IEEE S&P 2023. Noise Learning method for malware detection [pdf] [code]
Decoding the Secrets of Machine Learning in Windows Malware Classification: A Deep Dive into Datasets, Features, and Model Performance. ACM CCS 2023. static features are better than dynamic feature in WindowsPE malware detection [pdf]
KAIROS: Practical Intrusion Detection and Investigation using Whole-system Provenance. IEEE S&P 2024. GNN-based intrusion detection method [pdf] [code]
FLASH: A Comprehensive Approach to Intrusion Detection via Provenance Graph Representation Learning. IEEE S&P 2024. GNN-based intrusion detection method [pdf] [code]
Understanding and Bridging the Gap Between Unsupervised Network Representation Learning and Security Analytics. IEEE S&P 2024. Unsupervised graph learning for graph-based security applications [pdf] [code]
FP-Fed: Privacy-Preserving Federated Detection of Browser Fingerprinting. NDSS 2024. Federated learning for browser fingerprinting [pdf]
GNNIC: Finding Long-Lost Sibling Functions with Abstract Similarity. NDSS 2024. GNN for static analysis [pdf]
Experimental Analyses of the Physical Surveillance Risks in Client-Side Content Scanning. NDSS 2024. Attack client scanning systems [pdf]
Attributions for ML-based ICS Anomaly Detection: From Theory to Practice. NDSS 2024. Evaluating attribution methods for industrial control systems [pdf] [code]
DRAINCLoG: Detecting Rogue Accounts with Illegally-obtained NFTs using Classifiers Learned on Graphs. NDSS 2024. Detecting rogue accounts in NFTs using GNN [pdf]
Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic. NDSS 2024. Training ML-based traffic detection using low-quality data [pdf] [code]
SafeEar: Content Privacy-Preserving Audio Deepfake Detection. ACM CCS 2024. Speech content privacy-preserving deepfake detection [pdf] [website] [code] [dataset]
USD: NSFW Content Detection for Text-to-Image Models via Scene Graph. USENIX Security 2025. NSFWE image detection [pdf]
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts. USENIX Security 2025. NSFWE image defense [pdf]
VoiceWukong: Benchmarking Deepfake Voice Detection. USENIX Security 2025. [pdf]
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis. USENIX Security 2025. [pdf]
Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning. ACM CCS 2025. [pdf]
Combating Concept Drift with Explanatory Detection and Adaptation for Android Malware Classification. ACM CCS 2025. [pdf]
MM4flow: A Pre-trained Multi-modal Model for Versatile Network Traffic Analysis. ACM CCS 2025. [pdf]
Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model. ACM CCS 2025. [pdf]

1.6.3 Advertisement Detection

WtaGraph: Web Tracking and Advertising Detection using Graph Neural Networks. IEEE S&P 2022. GNN [pdf]

1.6.4 CAPTCHA

Text Captcha Is Dead? A Large Scale Deployment and Empirical Studys. ACM CCS 2020. Adversarial CAPTCHA [pdf]
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems. NDSS 2023. Adversarial Audio CAPTCHA [pdf] [demo]
A Generic, Efficient, and Effortless Solver with Self-Supervised Learning for Breaking Text Captchas. IEEE S&P 2023. Text CAPTCHA Solver [pdf]

1.6.5 Code Analysis

PalmTree: Learning an Assembly Language Model for Instruction Embedding. ACM CCS 2021. Pre-trained model to generate code embedding [pdf] [code]
CALLEE: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning. IEEE S&P 2023. Recovering call graph from binaries using transfer and contrastive learning [pdf] [code]
Examining Zero-Shot Vulnerability Repair with Large Language Models. IEEE S&P 2023. Zero-short vulnerability repair using large language model [pdf]
Raconteur: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. NDSS 2025. LLM-powered malicious code analysis [pdf] [website]

1.6.6 Chatbot

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots. ACM CCS 2022. Measuring Chatbot Textico behavior [pdf]

1.6.7 Side Channel Attack

Towards a General Video-based Keystroke Inference Attack. USENIX Security 2023. Self Supervised Learning to recover the keybroad input [pdf]
Deep perceptual hashing algorithms with hidden dual purpose: when client-side scanning does facial recognition. IEEE S&P 2023. Manipulate deep phash algorithm to conduct specific person inference [pdf] [code]

1.6.8 Guidline

Dos and Don'ts of Machine Learning in Computer Security. USENIX Security 2022. Survey pitfalls in ML4Security [pdf]
“Security is not my field, I’m a stats guy”: A Qualitative Root Cause Analysis of Barriers to Adversarial Machine Learning Defenses in Industry. USENIX Security 2023. Survey AML Application in Industry [pdf]
Everybody’s Got ML, Tell Me What Else You Have: Practitioners’ Perception of ML-Based Security Tools and Explanations. IEEE S&P 2023. Explainable AI in practice [pdf]

1.6.9 Security Event

CERBERUS: Exploring Federated Prediction of Security Events. ACM CCS 2022. Federated Learning to predict security event [pdf]

1.6.10 Vulnerability Discovery

VulChecker: Graph-based Vulnerability Localization in Source Code. USENIX Security 2023. Detecting Bugs using GCN [pdf] [code]

1.7 AutoML Security

1.7.1 Security Analysis

On the Security Risks of AutoML. USENIX Security 2022. Adversarial evasion. Model poisoning. Backdoor. Functionality stealing. Membership Inference [pdf]

1.8 Hardware Related Security

1.8.1 Verification

DeepDyve: Dynamic Verification for Deep Neural Networks. ACM CCS 2020. [pdf]
NeuroPots: Realtime Proactive Defense against Bit-Flip Attacks in Neural Networks. USENIX Security 2023. Honey Pot to trap the bitflip attacks [pdf]
Aegis: Mitigating Targeted Bit-flip Attacks against Deep Neural Networks. USENIX Security 2023. Train multi classifer to defend the BFA [pdf] [code]

1.9 Security Related Interpreting Method

1.9.1 Anomaly Detection

DeepAID: Interpreting and Improving Deep Learning-based Anomaly Detection in Security Applications. ACM CCS 2021. Anomaly detection [pdf] [code]

1.9.2 Faithfulness

Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing. ACM CCS 2023. Trend-based faithfulness testing [pdf] [code]

1.9.3 Security Applications

FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis. ACM CCS 2023. Ensumble explaination for different stakeholder [pdf] [code]

1.10 Face Security

1.10.1 Deepfake Detection

Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction. USENIX Security 2022. deepfake detection using vocal tract reconstruction [pdf]

1.10.2 Face Impersonation

ImU: Physical Impersonating Attack for Face Recognition System with Natural Style Changes. IEEE S&P 2023. StyleGAN to impersonate persion [pdf] [code]
DepthFake: Spoofing 3D Face Authentication with a 2D Photo. IEEE S&P 2023. Adversarial image to attack 3D photos [pdf] [demo]

1.10.3 Face Verification Systems

Understanding the (In)Security of Cross-side Face Verification Systems in Mobile Apps: A System Perspective. IEEE S&P 2023. Measurement study of the security risks of cross-side face verification systems. [pdf]

1.10 AI Generation Security

1.10.1 Text Generation Detection

Deepfake Text Detection: Limitations and Opportunities. IEEE S&P 2023. Detecting the machine generated text [pdf] [code]
MGTBench: Benchmarking Machine-Generated Text Detection. CCS 2024. Benchmarking machine generated text detection [pdf] [code]
**SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models **. CCS 2025. Benchmarking machine generated text detection [pdf]

1.10.2 Deepfake

SoK: The Good, The Bad, and The Unbalanced: Measuring Structural Limitations of Deepfake Media Datasets. USENIX Security 2024. Issues in deepfake media dataset [pdf] [website]
SafeEar: Content Privacy-Preserving Audio Deepfake Detection. ACM CCS 2024. Speech content privacy-preserving deepfake detection [pdf] [website] [code] [dataset]
"Better Be Computer or I’m Dumb": A Large-Scale Evaluation of Humans as Audio Deepfake Detectors. ACM CCS 2024. Huamn in deepfake detection [pdf]

1.11 LLM Security

1.11.1 Code Analysis

Large Language Models for Code: Security Hardening and Adversarial Testing. ACM CCS 2023. Prefix tuning for secure code generation [pdf] [code]
DeGPT: Optimizing Decompiler Output with LLM. NDSS 2024. LLM-enhanced reverse engineering [pdf] [code]
Raconteur: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. NDSS 2025. LLM-powered malicious code analysis [pdf] [website]
PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs). CCS 2024. Black-box LLM secure code generation [pdf] [code]
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs. USENIX Security 2025. [pdf]

1.11.2 Vision-Language Model

Transferable Multimodal Attack on Vision-Language Pre-training Models. IEEE S&P 2024. Transferable adversarial attack on VLM [pdf]
SneakyPrompt: Jailbreaking Text-to-image Generative Models. IEEE S&P 2024. Jailbreaking text-to-image generative model using reinforcement-learning adversarial NLP methods [pdf] [code]
SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models. ACM CCS 2024. defending against unsafe content generation in text-to-image models [pdf] [code] [model]
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution. ACM CCS 2024. Bypassing the safety filter of T2I model [pdf]
Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies. ACM CCS 2024. Content moderating for T2I model [pdf] [code]
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities. USENIX Security 2025. [pdf]
Are CAPTCHAs Still Bot-hard? Generalized Visual CAPTCHA Solving with Agentic Vision Language Model. USENIX Security 2025. [pdf]
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models. USENIX Security 2025. [pdf]

1.11.3 Jailbreaking

MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots. NDSS 2024. LLM jailbreaking [pdf]
Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries. USENIX Security 2025. LLM jailbreaking [pdf]
Refusal Is Not an Option: Unlearning Safety Alignment of Large Language Models. USENIX Security 2025. LLM jailbreaking [pdf] [code]
Activation Approximations Can Incur Safety Vulnerabilities in Aligned LLMs: Comprehensive Analysis and Defense. USENIX Security 2025. LLM jailbreaking defense [pdf]
Exposing the Guardrails: Reverse-Engineering and Jailbreaking Safety Filters in DALL·E Text-to-Image Pipelines. USENIX Security 2025. LLM jailbreaking [pdf]
TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts. USENIX Security 2025. LLM jailbreaking [pdf]
Exploiting Task-Level Vulnerabilities: An Automatic Jailbreak Attack and Defense Benchmarking for LLMs. USENIX Security 2025. LLM jailbreaking [pdf]
PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs. USENIX Security 2025. LLM jailbreaking [pdf]
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. USENIX Security 2025. LLM jailbreaking [pdf]
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner. USENIX Security 2025. LLM jailbreaking defense [pdf]
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation. USENIX Security 2025. [pdf]

1.11.4 Robustness

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention. NDSS 2024. Improving the robustness of LLM by dynamic attention [pdf]

1.11.5 Generated Concent

DEMASQ: Unmasking the ChatGPT Wordsmith. NDSS 2024. Generated text detection [pdf]
Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?. CCS 2024. Human arts and the AI-generated image detection [pdf]
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing. CCS 2024. LLM generated concent detection [pdf]
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors. USENIX Security 2025. evade LLM generated concent detection [pdf] [code]
Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI. USENIX Security 2025. [pdf] [code]
"I Cannot Write This Because It Violates Our Content Policy": Understanding Content Moderation Policies and User Experiences in Generative AI Products. USENIX Security 2025. [pdf]
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data. USENIX Security 2025. [pdf]

1.11.6 Backdoor

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors. NDSS 2024. Task-agnostic backdoor detection [pdf] [code]
EmbedX: Embedding-Based Cross-Trigger Backdoor Attack Against Large Language Models. USENIX Security 2025. Backdoor attack in LLM [pdf]

1.11.7 Agent Security

When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs. USENIX Security 2025. [pdf]
Make Agent Defeat Agent: Automatic Detection of Taint-Style Vulnerabilities in LLM-based Agents. USENIX Security 2025. [pdf]
TracLLM: A Generic Framework for Attributing Long Context LLMs. USENIX Security 2025. [pdf] [code]
Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search. USENIX Security 2025. [pdf]
Cloak, Honey, Trap: Proactive Defenses Against LLM Agents. USENIX Security 2025. [pdf]
Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants. USENIX Security 2025. [pdf]
AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents. ACM CCS 2025. [pdf]

1.11.8 Prompt Injection

StruQ: Defending Against Prompt Injection with Structured Queries. USENIX Security 2025. Prompt Injection Defense [pdf]
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents. USENIX Security 2025. [pdf]
SecAlign: Defending Against Prompt Injection with Preference Optimization. ACM CCS 2025. [pdf]

1.11.9 Hallucination

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink. USENIX Security 2025. [pdf]

2. Privacy Papers

2.1 Training Data

2.1.1 Data Recovery

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning. USENIX Security 2020. Online Learning. Model updates [pdf]
Extracting Training Data from Large Language Models. USENIX Security 2021. Membership inference attack. GPT-2 [pdf]
Analyzing Information Leakage of Updates to Natural Language Models. ACM CCS 2020. data leakage in model changes [pdf]
TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing. ACM CCS 2021. Membership collision in GAN [pdf]
DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation. ACM CCS 2021. DP to train an privacy preserving GAN [pdf]
Property Inference Attacks Against GANs. NDSS 2022. Property Inference Attacks Against GAN [pdf] [code]
MIRROR: Model Inversion for Deep Learning Network with High Fidelity. NDSS 2022. Model inversion attack using GAN [pdf] [code]
Analyzing Leakage of Personally Identifiable Information in Language Models. IEEE S&P 2023. Personally identifiable information leakage in language model [pdf] [code]
Timing Channels in Adaptive Neural Networks. NDSS 2024. Infer input of adaptive NN using timing information [pdf] [code]
Crafter: Facial Feature Crafting against Inversion-based Identity Theft on Deep Models. NDSS 2024. Protect model inversion attack [pdf] [code]
Transpose Attack: Stealing Datasets with Bidirectional Training. NDSS 2024. Stealing dataset in bidirectional models [pdf] [code]
SafeEar: Content Privacy-Preserving Audio Deepfake Detection. ACM CCS 2024. Speech content privacy-preserving deepfake detection [pdf] [website] [code] [dataset]
Dye4AI: Assuring Data Boundary on Generative AI Services. ACM CCS 2024. Dye testing system in LLM [pdf]
Evaluations of Machine Learning Privacy Defenses are Misleading. ACM CCS 2024. Evaluation DP defense [pdf] [code]
Towards a Re-evaluation of Data Forging Attacks in Practice. USENIX Security 2025. [pdf]
SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark. USENIX Security 2025. [pdf]
Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models. ACM CCS 2025. [pdf]
Poisoning Attacks to Local Differential Privacy for Ranking Estimation. ACM CCS 2025. [pdf]

2.1.2 Membership Inference Attack

Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference. USENIX Security 2020. White-box Setting [pdf]
Systematic Evaluation of Privacy Risks of Machine Learning Models. USENIX Security 2020. Metric-based Membership inference Attack Method. Define Privacy Risk Score [pdf] [code]
Practical Blind Membership Inference Attack via Differential Comparisons. NDSS 2021. Use non-member data to replace shadow model [pdf] [code]
GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models. ACM CCS 2020. Membership inference attack in Generative model. Member has small reconstruction error [pdf]
Quantifying and Mitigating Privacy Risks of Contrastive Learning. ACM CCS 2021. Membership inference attack. Property inference attack. Contrastive learning in classification task [pdf] [code]
Membership Inference Attacks Against Recommender Systems. ACM CCS 2021. Recommender System [pdf] [code]
EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning. ACM CCS 2021. Contrastive learning in pre-trained model. Data augmentation has higher similarity [pdf] [code]
Auditing Membership Leakages of Multi-Exit Networks. ACM CCS 2022. Membership inference attack in multi-exit networks [pdf]
Membership Inference Attacks by Exploiting Loss Trajectory. ACM CCS 2022. Membership inference attack, knowledge distillation [pdf]
On the Privacy Risks of Cell-Based NAS Architectures. ACM CCS 2022. Membership inference attack in NAS [pdf]
Membership Inference Attacks and Defenses in Neural Network Pruning. USENIX Security 2022. Membership inference attack in Neural Network Pruning [pdf]
Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture. USENIX Security 2022. Membership inference defense by ensemble [pdf]
Enhanced Membership Inference Attacks against Machine Learning Models. USENIX Security 2022. Membership inference attack with hypothesis testing [pdf] [code]
Membership Inference Attacks and Generalization: A Causal Perspective. ACM CCS 2022. Membership inference attack with casual reasoning [pdf]
SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems. NDSS 2024. Membership inference attack in speaker recongization [pdf] [code]
Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction. NDSS 2024. The defense of membership inference attack [pdf] [code]
Membership Inference Attacks Against Vision-Language Models. USENIX Security 2025. Membership inference attack in vision language model [pdf]
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models. USENIX Security 2025. Membership inference attack in LLM [pdf]
Enhanced Label-Only Membership Inference Attacks with Fewer Queries. USENIX Security 2025. [pdf]
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks. USENIX Security 2025. [pdf]
Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble. ACM CCS 2025. [pdf]

2.1.3 Information Leakage in Distributed ML System

Label Inference Attacks Against Vertical Federated Learning. USENIX Security 2022. Label Leakage. Federated Learning [pdf] [code]
The Value of Collaboration in Convex Machine Learning with Differential Privacy. IEEE S&P 2020. DP as Defense [pdf]
Leakage of Dataset Properties in Multi-Party Machine Learning. USENIX Security 2021. Dataset Properties Leakage [pdf]
Unleashing the Tiger: Inference Attacks on Split Learning. ACM CCS 2021. Split learning. Feature-space hijacking attack [pdf] [code]
Local and Central Differential Privacy for Robustness and Privacy in Federated Learning. NDSS 2022. DP in federated learning [pdf]
Gradient Obfuscation Gives a False Sense of Security in Federated Learning. USENIX Security 2023. Data Recovery in federated learning [pdf]
PPA: Preference Profiling Attack Against Federated Learning. NDSS 2023. Preference Leakage in federated learning [pdf] [code]
On the (In)security of Peer-to-Peer Decentralized Machine Learning. IEEE S&P 2023. Information leakage in peer-to-peer decentralized machine learning system [pdf]
RoFL: Robustness of Secure Federated Learning. IEEE S&P 2023. Robust Federated Learning Framework using Secuire Aggregation [pdf] [code]
Scalable and Privacy-Preserving Federated Principal Component Analysis. IEEE S&P 2023. Privacy preserving feaderated PCA algorithm [pdf]
Protecting Label Distribution in Cross-Silo Federated Learning. IEEE S&P 2024. Priveacy-preserving SGD to protect label distribution [pdf]
LOKI: Large-scale Data Reconstruction Attack against Federated Learning through Model Manipulation. IEEE S&P 2024. Dataset reconstruction attack in fedearted learning by sending customized convoluational kernel [pdf]
Analyzing Inference Privacy Risks Through Gradients In Machine Learning. CCS 2024. information leakage through gradients [pdf]
Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings. USENIX Security 2025. [pdf]
Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning. USENIX Security 2025. [pdf]
Aion: Robust and Efficient Multi-Round Single-Mask Secure Aggregation Against Malicious Participants. USENIX Security 2025. [pdf]
SoK: On Gradient Leakage in Federated Learning. USENIX Security 2025. [pdf]
DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum. USENIX Security 2025. [pdf]
SLOTHE : Lazy Approximation of Non-Arithmetic Neural Network Functions over Encrypted Data. USENIX Security 2025. [pdf]
Sharpness-Aware Initialization: Improving Differentially Private Machine Learning from First Principles. USENIX Security 2025. [pdf]
Task-Oriented Training Data Privacy Protection for Cloud-based Model Training. USENIX Security 2025. [pdf]
From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning. USENIX Security 2025. [pdf]
SoK: Gradient Inversion Attacks in Federated Learning. USENIX Security 2025. [pdf]

2.1.4 Information Leakage in Embedding

Privacy Risks of General-Purpose Language Models. IEEE S&P 2020. Pretrained Language Model [pdf]
Information Leakage in Embedding Models. ACM CCS 2020. Exact Word Recovery. Attribute inference. Membership inference [pdf]
Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs. ACM CCS 2021. Infer privacy information in classification output [pdf] [code]

2.1.5 Graph Leakage

Stealing Links from Graph Neural Networks. USENIX Security 2021. Inference Graph Link [pdf]
Inference Attacks Against Graph Neural Networks. USENIX Security 2022. Property inference: number of nodes. Subgraph inference. Graph reconstruction [pdf] [code]
LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis. IEEE S&P 2022. Use node connection influence to infer graph edges [pdf]
Locally Private Graph Neural Networks. IEEE S&P 2022. LDP as defense for node privacy [pdf] [code]
Finding MNEMON: Reviving Memories of Node Embeddings. ACM CCS 2022. Graph recovery attack through node embedding [pdf]
Group Property Inference Attacks Against Graph Neural Networks. ACM CCS 2022. Group Property inference attack on GNN [pdf]
LPGNet: Link Private Graph Networks for Node Classification. ACM CCS 2022. DP to build private GNN [pdf]
GraphGuard: Detecting and Counteracting Training Data Misuse in Graph Neural Networks. MDSS 2024. Mitigate data misuse issues in GNN [pdf] [code]
GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models. IEEE S&P 2025. Link stealing defense [pdf] [code]

2.1.6 Unlearning

Machine Unlearning. IEEE S&P 2020. Shard and isolate the training dataset [pdf] [code]
When Machine Unlearning Jeopardizes Privacy. ACM CCS 2021. Membership inference attack in unlearning setting [pdf] [code]
Graph Unlearning. ACM CCS 2022. Graph Unlearning [pdf] [code]
On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning. ACM CCS 2022. Auditable Unlearning [pdf]
Machine Unlearning of Features and Labels. NDSS 2023. Influence Function to achieve unlearning [pdf] [code]
A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services. NDSS 2024. The vulnerabilities in machine unlearning [pdf] [code]
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach. CCS 2024. Machine unlearning as a inferencing-aware approach [pdf]
Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective. USENIX Security 2025. [pdf]
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning. USENIX Security 2025. [pdf]
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness. USENIX Security 2025. [pdf]
Split Unlearning. ACM CCS 2025. [pdf]
Rethinking Machine Unlearning in Image Generation Models. ACM CCS 2025. [pdf]
Prototype Surgery: Tailoring Neural Prototypes via Soft Labels for Efficient Machine Unlearning. ACM CCS 2025. [pdf]

2.1.7 Attribute Inference Attack

Are Attribute Inference Attacks Just Imputation?. ACM CCS 2022. Attribute Inference Attack by identified neuro with data [pdf] [code]
Feature Inference Attack on Shapley Values. ACM CCS 2022. Attribute Inference Attack using shapley values [pdf]
QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems. ACM CCS 2022. Attribute Inference detection [pdf]
Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses. USENIX Security 2025. [pdf]

2.1.7 Property Inference Attack

SNAP: Efficient Extraction of Private Properties with Poisoning. IEEE S&P 2023. Stronger Property Inference Attack by poisoning the data [pdf] [code]

2.1.8 Data Synthesis

SoK: Privacy-Preserving Data Synthesis. IEEE S&P 2024. Privacy-Preserving Data Synthesis [pdf] [website]

2.1.8 Dataset Auditing

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning. NDSS 2024. Dataset auditing in deep reinforcement learning [pdf] [code]
SoK: Dataset Copyright Auditing in Machine Learning Systems. IEEE S&P 2025. Dataset copyright [pdf]

2.2 Model

2.2.1 Model Extraction

Exploring Connections Between Active Learning and Model Extraction. USENIX Security 2020. Active Learning [pdf]
High Accuracy and High Fidelity Extraction of Neural Networks. USENIX Security 2020. Fidelity [pdf]
DRMI: A Dataset Reduction Technology based on Mutual Information for Black-box Attacks. USENIX Security 2021. Query Data Selection Method to reduce the query [pdf]
Entangled Watermarks as a Defense against Model Extraction. USENIX Security 2021. Backdoor as watermark against model extraction [pdf]
CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. NDSS 2020. Adversarial Example to strengthen model stealing [pdf]
Teacher Model Fingerprinting Attacks Against Transfer Learning. USENIX Securiy 2022. Teacher model fingerprinting [pdf]
StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning. ACM CCS 2022. Model Stealing attack in encoder [pdf]
D-DAE: Defense-Penetrating Model Extraction Attacks. IEEE S&P 2023. Meta classifier to classify the defense and generator model to reduce the noise [pdf]
SoK: Neural Network Extraction Through Physical Side Channels. USENIX Security 2024. Physical Side Channel-based model extraction [pdf]
SoK: All You Need to Know About On-Device ML Model Extraction - The Gap Between Research and Practice. USENIX Security 2024. on device model extraction [pdf]

2.2.2 Model Watermark

Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding. IEEE S&P 2021. Encode secret message into LM [pdf]
Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation. USENIX Security 2023. Inject dummy neurons into the model to break the white-box model watermark [pdf]
MEA-Defender: A Robust Watermark against Model Extraction Attack. IEEE S&P 2024. Backdoor as watermark [pdf] [code]
SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-Supervised Learning. NDSS 2024. Watermark on self-supervised learning [pdf] [code]
Watermarking Language Models for Many Adaptive Users. IEEE S&P 2025. Watermark on LLM [pdf] [code]
SoK: Watermarking for AI-Generated Content. IEEE S&P 2025. [pdf]
Provably Robust Multi-bit Watermarking for AI-generated Text. USENIX Security 2025. [pdf] [code]
AUDIO WATERMARK: Dynamic and Harmless Watermark for Black-box Voice Dataset Copyright Protection. USENIX Security 2025. [pdf]
AudioMarkNet: Audio Watermarking for Deepfake Speech Detection. USENIX Security 2025. [pdf]
Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification. USENIX Security 2025. [pdf]
LightShed: Defeating Perturbation-based Image Copyright Protections. USENIX Security 2025. [pdf]
A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks. USENIX Security 2025. [pdf]

2.2.3 Model Owenership

Proof-of-Learning: Definitions and Practice. IEEE S&P 2021. Proof the ownership of model parameters [pdf]
SoK: How Robust is Image Classification Deep Neural Network Watermarking?. IEEE S&P 2022. Survey of DNN watermarking [pdf]
Copy, Right? A Testing Framework for Copyright Protection of Deep Learning Models. IEEE S&P 2022. Calculate model similarity by generating test examples [pdf] [code]
SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders. ACM CCS 2022. Watermarking in encoder [pdf]
RAI2: Responsible Identity Audit Governing the Artificial Intelligence. NDSS 2023. Model and Data auditing in AI [pdf] [code]
ActiveDaemon: Unconscious DNN Dormancy and Waking Up via User-specific Invisible Token. NDSS 2024. Protecting DNN models by specific user tokens [pdf] [code]
THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models. USENIX Security 2025. [pdf]

2.2.4 Model Integrity

PublicCheck: Public Integrity Verification for Services of Run-time Deep Models. IEEE S&P 2023. Model verification via crafted query [pdf]

2.3 LLM Privacy

2.3.1 Prompt Privacy

Prompt Inversion Attack against Collaborative Inference of Large Language Models. IEEE S&P 2025. Prompt inversion attack [pdf]
On the Effectiveness of Prompt Stealing Attacks on In-the-Wild Prompts. IEEE S&P 2025. Prompt stealing attack [pdf]
PRSA: Prompt Stealing Attacks against Real-World Prompt Services. USENIX Security 2025. Prompt stealing attack [pdf]
Cross-Modal Prompt Inversion: Unifying Threats to Text and Image Generative AI Models. USENIX Security 2025. Prompt inversion attack [pdf]
Prompt Obfuscation for Large Language Models. USENIX Security 2025. Prompt defense [pdf]
Prompt Inference Attack on Distributed Large Language Model Inference Frameworks. ACM CCS 2025. [pdf]

2.3.2 Model Privacy

Codebreaker: Dynamic Extraction Attacks on Code Language Models. IEEE S&P 2025. Personal Information Extraction in code llm [pdf]
LLMmap: Fingerprinting for Large Language Models. USENIX Security 2025. LLM fingerprinting [pdf]
Unlocking the Power of Differentially Private Zeroth-order Optimization for Fine-tuning LLMs. USENIX Security 2025. LLM DP [pdf]
Depth Gives a False Sense of Privacy: LLM Internal States Inversion. USENIX Security 2025. [pdf]
Evaluating LLM-based Personal Information Extraction and Countermeasures. USENIX Security 2025. [pdf]
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty. USENIX Security 2025. [pdf]

2.3.3 Data Privacy

Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications. USENIX Security 2025. [pdf]
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems. USENIX Security 2025. [pdf]
Effective PII Extraction from LLMs through Augmented Few-Shot Learning. USENIX Security 2025. [pdf]
Private Investigator: Extracting Personally Identifiable Information from Large Language Models Using Optimized Prompts. USENIX Security 2025. [pdf]

2.4 User Related Privacy

2.4.1 Image

Fawkes: Protecting Privacy against Unauthorized Deep Learning Models. USENIX Security 2020. Protect Face Privacy [pdf] [code]
Automatically Detecting Bystanders in Photos to Reduce Privacy Risks. IEEE S&P 2020. Detecting bystanders [pdf]
Characterizing and Detecting Non-Consensual Photo Sharing on Social Networks. IEEE S&P 2020. Detecting Non-Consensual People in a photo [pdf]
Fairness Properties of Face Recognition and Obfuscation Systems. USENIX Security 2023. Fairness in Face related models [pdf] [code]

2.5 Private ML Protocols

2.5.1 3PC

SWIFT: Super-fast and Robust Privacy-Preserving Machine Learning. USENIX Security 2021. [pdf]
BLAZE: Blazing Fast Privacy-Preserving Machine Learning. NDSS 2020. [pdf]
Bicoptor: Two-round Secure Three-party Non-linear Computation without Preprocessing for Privacy-preserving Machine Learning. IEEE S&P 2023. [pdf]
Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization. CCS 2024. [pdf]

2.5.2 4PC

Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning. NDSS 2020. [pdf]

2.5.3 SMPC

Cerebro: A Platform for Multi-Party Cryptographic Collaborative Learning. USENIX Security 2021. [pdf] [code]
Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy. IEEE S&P 2023. [pdf]
MPCDiff: Testing and Repairing MPC-Hardened Deep Learning Models. NDSS 2023. [pdf] [code]
Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption. NDSS 2024. [pdf] [code]
Securely Training Decision Trees Efficiently. CCS 2024. [pdf]
CoGNN: Towards Secure and Efficient Collaborative Graph Learning. CCS 2024. [pdf]

2.5.4 Cryptographic NN Computation

SoK: Cryptographic Neural-Network Computation. IEEE S&P 2023. [pdf]
From Individual Computation to Allied Optimization: Remodeling Privacy-Preserving Neural Inference with Function Input Tuning. IEEE S&P 2024. [pdf]
BOLT: Privacy-Preserving, Accurate and Efficient Inference for Transformers. IEEE S&P 2024. [pdf] [code]

2.5.5 Secure Aggregation

Flamingo: Multi-Round Single-Server Secure Aggregation with Applications to Private Federated Learning. IEEE S&P 2023. [pdf] [code]
ELSA: Secure Aggregation for Federated Learning with Malicious Actors. IEEE S&P 2023. [pdf] [code]

2.6 Platform

2.6.1 Inference Attack Measurement

ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models. USENIX Security 2022. Membership inference attack. Model inversion. Attribute inference. Model stealing [pdf]

2.6.2 Survey

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning. IEEE S&P 2023. Systematizing privacy risks using game framework [pdf]

2.7 Differential Privacy

2.7.1 Tree Model

Federated Boosted Decision Trees with Differential Privacy. ACM CCS 2022. Federated Learning with Tree Model in DP [pdf]

2.7.2 DP

Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering. IEEE S&P 2023. Spectral DP [pdf]
Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering. IEEE S&P 2024. Spectral DP [pdf]
Bounded and Unbiased Composite Differential Privacy. IEEE S&P 2024. Composite DP [pdf] [code]
Cohere: Managing Differential Privacy in Large Scale Systems. IEEE S&P 2024. Unified DP in large system [pdf] [code]
You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks. NDSS 2024. DP in image recognization [pdf] [code]

2.7.3 LDP

Locally Differentially Private Frequency Estimation Based on Convolution Framework. IEEE S&P 2023. [pdf]
Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols. CCS 2024. [pdf]

2.7 LLM Privacy

2.7.1 Prompt Privacy

PLeak: Prompt Leaking Attacks against Large Language Model Applications. CCS 2024. Stealing system prompts [pdf] [code]

Contributing

This list is mainly maintained by Ping He from NESA Lab.

We are very much welcome contributors for contributing this repository!

Markdown format

**Paper Name**. Conference Year. `Keywords` [[pdf](pdf_link)] [[code](code_link)]

Licenses

To the extent possible under law, gnipping holds all copyright and related or neighboring rights to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
README.md		README.md

gnipping/Awesome-ML-SP-Papers

Folders and files

Latest commit

History

Repository files navigation