Ambobo是從頭開始實施的基於梯度的符號執行引擎的概念證明。在最新進步的基礎上,利用梯度下降來求解SMT公式[1, 2] ,助理利用梯度下降來發現在符號執行過程中滿足每個路徑約束的輸入值。有關更多信息,請檢查我們的文檔。
與其他突出的符號執行工具相比,Gymbo的實現非常簡單,更緊湊。我們希望該項目將幫助個人掌握符號執行的基本原則和通過梯度下降解決問題的基本原則。
一種實際用途的是鍛煉諸如神經網絡之類的ML模型,以檢測出意外的行為。例如,您可以通過將神經網絡轉換為當務之急的程序來生成用鍛煉的對手示例。
Ambobo的另一個獨特功能是它可以跟踪概率符號變量。我們採用[3]中提出的PBranch算法,目前支持離散的統一,Bernoulli和二項式分佈。
請注意,Amphbo目前正在開發,可能有錯誤。您的反饋和貢獻總是非常感謝。
git clone https://github.com/Koukyosyumei/Gymbo.git
./script/build.sh在示例目錄中查看一些示例:
Ambobo目前通過以下BNF語法支持類似C的程序:
program = stmt*
stmt = expr ";"
| "{" stmt* "}"
| "if" "(" expr ")" stmt ("else" stmt)?
| "return" expr ";"
expr = assign
assign = logical ("=" assign)?
logical = equality ("&&" equality | "||" equality)*
equality = relational ("==" relational | "!=" relational)*
relational = add ("<" add | "<=" add | ">" add | ">=" add)*
add = mul ("+" mul | "-" mul)*
mul = unary ("*" unary | "/" unary)*
unary = ("+" | "-")? primary
primary = num | ident | "(" expr ")"
請注意,當目前忽略
/解決路徑約束時。
Amerbo將路徑約束轉換為數值損耗函數,僅當滿足路徑約束時,該損失函數才會變為負。體育館使用以下轉換規則:
!(a) => -a + eps
(a < b) => a - b + eps
(a <= b) => a - b
(a > b) => b - a + eps
(a >= b) => b - a
(a == b) => abs(a - b)
(a != b) => -abs(a - b) + eps
(a && b) => max(a, b)
(a || b) => min(a, b)
在這裡, eps是A和B類型的最小正值。
例如,當a和b是整數( eps = 1 )時, (a < 3) && (!(a < 3) || (b == 5))變為max(a - 2, min(3 - a, abs(b - 5))) 。
可選地,Ambo可以使用DPLL(SAT求解器)來決定每個唯一術語的分配,有時會導致更好的可擴展性。例如,將DPLL應用於上面的示例會導致(a < 3)為真, (b == 5)為true。然後,足體將該任務轉換為要解決的損失函數: max(a - 2, abs(b - 5)) 。
gymbo命令接受以下命令行選項:
-d :設置符號執行的最大深度(默認值:256)。-v :設置詳細級別(默認值:1)。使用-1來最小輸出。 -1: the number of satisfiable path constraints and unsatisfiable path constraints.
0: + the list of unique unsatisfiable path constraints.
1: + estimated concrete parameters for each path constraint.
2: + trace at each operation, including the content of the virtual stack and memory.
3: + complete stack machine.
-i :設置梯度下降的迭代次數(默認值:100)。-a :設置梯度下降的步長(默認值:1.0)。-e :設置可區分錶達式的EPS(默認:1.0)。-t :設置梯度下降的最大試驗數(默認值:3)-l :設置初始參數的最小值(默認值:-10)-h :設置初始參數的最大值(默認值:10)-s :設置隨機種子(默認:42)-p :(可選)如果設置,請使用DPLL確定每個項的分配。否則,求解從路徑約束直接轉換的損耗函數。 ./gymbo " if (a < 3) if (a > 4) return 1; " -v 0
> Compiling the input program...
> Start Symbolic Execution...
> ---------------------------
> Result Summary
> # Total Path Constraints: 4
> # SAT: 3
> # UNSAT: 1
> List of UNSAT Path Constraints
> # var_1 < 3 and 4 < var_1 and 1 
libgymbo :僅標題圖書館由於Amphbo由僅標題庫組成,因此您可以輕鬆地創建自己的符號執行工具。
# include " libgymbo/compiler.h "
# include " libgymbo/gd.h "
# include " libgymbo/parser.h "
# include " libgymbo/tokenizer.h "
# include " libgymbo/type.h "
char user_input[] = " if (a < 3) return 1; "
std::vector<gymbo::Node *> code;
gymbo::Prog prg;
gymbo::GDOptimizer optimizer (num_itrs, step_size);
gymbo::SymState init;
// tokenize the input source code
gymbo::Token *token = gymbo::tokenize(user_input);
// generate AST from the tokens
gymbo::generate_ast (token, user_input, code);
// construct virtual stack machine from AST
gymbo::compile_ast (code, prg);
// execute gradient-based symbolie execution
gymbo::SExecutor executor (optimizer, maxSAT, maxUNSAT, max_num_trials,
ignore_memory, use_dpll, verbose_level);
executor.run(prg, target_pcs, init, max_depth); pip install git+https://github.com/Koukyosyumei/Gymbo
pylibgymbo :python包裝器的libgymbo import pylibgymbo as plg
inp = "a = 1; if (a == 1) return 2;"
var_counter , prg = plg . gcompile ( inp )
optimizer = plg . GDOptimizer ( num_itrs , step_size , ...)
executor = plg . SExecutor ( optimizer , maxSAT , maxUNSAT , max_num_trials ,
ignore_memory , use_dpll , verbose_level )
executor . run ( prg , target_pcs , init , max_depth )pymlgymbo :調試機器學習模型我們在pymlgymbo庫中提供了一些輔助功能,以將著名的Python庫的ML模型(如Sklearn和Pytorch)轉換為Amberbo可以處理的程序。
如[4]中提出的那樣,以下代碼針對神經網絡生成了對抗性示例。
from sklearn . neural_network import MLPClassifier
import pylibgymbo as plg
import pymlgymbo as pmg
clf = MLPClassifier ( activation = "relu" )
clf . fit ( X_train , y_train )
mlp_code = pmg . dump_sklearn_MLP ( clf , feature_names )
adv_condition = (
"("
+ " || " . join (
[ f"(y_ { c } > y_ { y_pred } )" for c in range ( len ( clf . classes_ )) if y_pred != c ]
)
+ ")"
)
optimizer = plg . GDOptimizer ( num_itrs , step_size , ...)
var_counter , prg = plg . gcompile ( mlp_code )
executor = plg . SExecutor ( optimizer , maxSAT , maxUNSAT , max_num_trials ,
ignore_memory , use_dpll , verbose_level )
executor . run ( prg , target_pcs , init , max_depth )Amphbo完全在C ++中實現,僅需要標準庫。從源代碼到堆棧計算機編譯的過程基於rui314/chibicc [5]的實現,而符號執行方法則受beala/symbolic [6]的啟發。
- [1] Chen, Peng, Jianzhong Liu, and Hao Chen. "Matryoshka: fuzzing deeply nested branches." Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019.
- [2] Minghao Liu, Kunhang Lv, Pei Huang, Rui Han, Fuqi Jia, Yu Zhang, Feifei Ma, Jian Zhang. "NRAgo: Solving SMT(NRA) Formulas with Gradient-based Optimization." IEEE/ACM International Conference on Automated Software Engineering, 2023
- [3] Susag, Zachary, et al. "Symbolic execution for randomized programs." Proceedings of the ACM on Programming Languages 6.OOPSLA2 (2022): 1583-1612.
- [4] Gopinath, Divya, et al. "Symbolic execution for importance analysis and adversarial generation in neural networks." 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2019.
- [5] https://github.com/rui314/chibicc
- [6] https://github.com/beala/symbolic