Skip to content

MyselfYangjz/binary-similarity-learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

binary-similarity-learning

二进制代码相似度分析(Binary Code Similarity Analysis)学习笔记

[paper]:论文发布页;[note]:论文笔记 ;[github]:github源码;[dataset]:数据集;[model]:算法模型

方法名前的*表示该方法使用了动态分析

基本概念解析

已整理于基本概念解析文档,学习笔记中涉及相关概念位置均已设置超链接。

综述 (review)

  • A Survey of Binary Code Similarity (WOS-Q1; 中科院-1区;2021) [paper] [note]
    • HAQ I U, CABALLERO J. A Survey of Binary Code Similarity [J]. ACM Comput Surv, 2021, 54(3): Article 51.
    • 领域内常用方法的分类与概述,适合入门
    • 仅包含2019年及以前的文献

Binary Diffing

  • Bindiff (DIMVA2004) [paper]
    • FLAKE H. Structural comparison of executable objects[C]//Proc. of the International GI Workshop on Detection of Intrusions and Malware & Vulnerability Assessment, number P-46 in Lecture Notes in Informatics.2004:161-174.
  • Graph-based comparison of executable objects (SSTIC2005) [paper] [note]
    • DULLIEN T, ROLLES R. Graph-based comparison of executable objects (english version) [J]. Sstic, 2005, 5(1): 3.
  • BinHunt (CCF-C;ICICS2008) [paper] [note]
    • GAO D, REITER M K, SONG D. BinHunt: Automatically Finding Semantic Differences in Binary Programs[C]//International Conference on Information and Communications Security. Berlin, Heidelberg:Springer Berlin Heidelberg,2008:238-255.

Binary Similarity (one-to-one)

  • *BLEX (CCF-A; USENIX2014) [paper] [note]
    • EGELE M, WOO M, CHAPMAN P, et al. Blanket execution: Dynamic similarity testing for program binaries and components[C]//23rd USENIX Security Symposium (USENIX Security 14).2014:303-317.

Binary Search (one-to-many)

  • TEDEM (CCF-B; ACSAC2014) [paper] [note]

    • PEWNY J, SCHUSTER F, BERNHARD L, et al. Leveraging semantic signatures for bug search in binary programs[C]//Proceedings of the 30th Annual Computer Security Applications Conference.2014:406-415.
  • Tracy (CCF-A;PLDI2014) [paper] [github] [note]

    • DAVID Y, YAHAV E. Tracelet-based code search in executables[C]//Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. Edinburgh, United Kingdom:Association for Computing Machinery,2014:349–360. 10.1145/2594291.2594343.
  • Multi-MH (CCF-A;S&P2015) [paper] [note]

    • PEWNY J, GARMANY B, GAWLIK R, et al. Cross-Architecture Bug Search in Binary Executables[C]//2015 IEEE Symposium on Security and Privacy.2015:709-724. 10.1109/SP.2015.49.
  • BinGo (CCF-A;FSE2016) [paper] [note]

    • CHANDRAMOHAN M, XUE Y, XU Z, et al. Bingo: Cross-architecture cross-os binary search[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2016:678-689.
  • discovRE (CCF-A;NDSS2016) [paper] [note]

    • ESCHWEILER S, YAKDAN K, GERHARDS-PADILLA E. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code[C]//NDSS.2016
  • Esh (CCF-A;PLDI2016) [paper] [github] [note]

    • DAVID Y, PARTUSH N, YAHAV E. Statistical similarity of binaries[C]//Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. Santa Barbara, CA, USA:Association for Computing Machinery,2016:266–280. 10.1145/2908080.2908126.
  • Genius (CCF-A;CCS2016) [paper] [github] [note]

    • FENG Q, ZHOU R, XU C, et al. Scalable Graph-based Bug Search for Firmware Images[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Vienna, Austria:Association for Computing Machinery,2016:480–491. 10.1145/2976749.2978370.
  • Gemini (CCF-A;CCS2017) [paper] [github] [note]

    • XU X, LIU C, FENG Q, et al. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA:Association for Computing Machinery,2017:363–376. 10.1145/3133956.3134018.
  • SAFE (CCF-C;DIMVA2019) [paper] [github] [note]

    • MASSARELLI L, LUNA G A D, PETRONI F, et al. Safe: Self-attentive function embeddings for binary similarity[C]//International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.Springer,2019:309-329.
  • InnerEye (CCF-A;NDSS2019) [paper] [model] [note]

    • ZUO F, LI X, YOUNG P, et al. Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs[C]//Network and Distributed Systems Security (NDSS) Symposium 2019.2019
  • Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection (CCF-A;AAAI2020) [paper] [note]

    • YU Z, CAO R, TANG Q, et al. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:1145-1152. 10.1609/aaai.v34i01.5466.

Plagiarism Detection

  • CoP (CCF-A;FSE2016) [paper] [note]
    • LUO L, MING J, WU D, et al. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.2014:389-400.

Clone Search

  • Kam1n0 (CCF-A; KDD2016) [paper] [github]
    • DING S H H, FUNG B C M, CHARLAND P. Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:461-470.
    • 侧重于新哈希算法和MapReduce方案的设计
  • Asm2Vec (CCF-A;S&P2019) [paper] [note]
    • DING S H H, FUNG B C M, CHARLAND P. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization[C]//2019 IEEE Symposium on Security and Privacy (SP).2019:472-489. 10.1109/SP.2019.00003.
  • *BinGo-E (WOS-Q1; 中科院-1区;2019) [paper] [note]
    • XUE Y, XU Z, CHANDRAMOHAN M, et al. 2019. Accurate and Scalable Cross-Architecture Cross-OS Binary Code Search with Emulation. IEEE Transactions on Software Engineering [J], 45: 1125-1149.

Measurement Study

  • BinKit (WOS-Q1; 中科院-1区;2022) [paper] [dataset] [github]

    • KIM D, KIM E, CHA S K, et al. 2022. Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned. IEEE Transactions on Software Engineering [J]: 1-23.
    • 分析了非语义特征(句法、结构特征)在二进制相似性分析中的作用
  • How machine learning is solving the binary function similarity problem (CCF-A; USENIX2022) [paper] [github] [note]

    • MARCELLI A, GRAZIANO M, UGARTE-PEDRERO X, et al. How machine learning is solving the binary function similarity problem[C]//31st USENIX Security Symposium (USENIX Security 22).2022:2099-2116.
    • 构建开源数据集,将现有方法在同一基准下进行测试
    • 阅读相关论文时可作为分析用参考,精读对应部分

Dataset

  • Esh Dataset [dataset]
    • 包含3015个二进制函数,覆盖8类实际漏洞

专有名词及其缩写

缩写 名词全称 中文释义
ACFG Attributed Control Flow Graph 属性控制流图
ALSH Adaptive Locality Sensitive Hashing 自适应局部敏感哈希
ASLR Address Space Layout Randomization 空间地址随机化
BB Basic Block 基本块
CDF Cumulative Distribution Function (累计)分布函数
CFG Control Flow Graph 控制流图
CG Call Graph 函数调用图
GI Graph Isomorphism 图同构
IR Intermediate Representation 中间表示
IVL Intermediate Verification Language 中间验证语言
LCS Longest Common Subsequence 最长公共子序列
LSH Locality Sensitive Hashing 局部敏感哈希
MCS Maximum Common Subgraph 最大公共子图
MLP Multilayer Perceptron 多层感知机
MRR Mean Reciprocal Rank 平均倒数排名
PDG Program Dependence Graph 程序依赖图
TED Tree Edit Distance 树编辑距离

to-do list

About

二进制相似性学习

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published