Paper: Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic
cd ./Preprocess
python Feature_Extract.py "input_dir" "sequence_data_path" "ext"
python get_origin_flow_data.py "sequence_data_path" "save_dir" "data_type"
input_dir:The directory of all raw pcap files of one kind of data, e.g.benign,malicious, ortest.sequence_data_path:The sequence data of all flows in pcap files, without zero-padding. (The values are prefix-cumulative values, and will be further processed byget_origin_flow_data.py)ext:The extension name of pcap files to process (e.g.pcap,pcapng).save_dir:The save directory of the processed sequence data of all flows.data_type:The kind of the processed data, e.g.w,bandtest.
Output: A sequence numpy file of data_type in save_dir. i.e., {save_dir}/{data_type}.npy, the dimension of each sample is 50. Need to add the 51st dimension for detection.
cd ./main
python main.py
The argument can be modified in main.py are:
data_dir:The directory of all sequence data.feat_dir:The directory of all feature data.made_dir:The directory of all results calculated byMADE.model_dir:The directory of all trained models.result_dir:The directory of the detection/prediction result of the test data.
The required input files:
{data_dir}/{benign.npy}:The benign preprocessed training data.{data_dir}/{malicious.npy}:The malicious preprocessed training data.{data_dir}/{test.npy}:The preprocessed testing data.
All data in {data_dir} should have dimensions of (n, 51), where n is the number of samples. Each sample is a 51-dimension vector, where first 50 dimensions are time-series data of traffic and the last one is the true label for the sample (used to evaluate. 0 is for benign and 1 is for malicious). If RAPIER is used to predict, the last dimension can be any value.
Output:
{result_dir}/{prediction.npy}:The prediction of all testing data.1is malicious and0is benign.