This project creates a new dataset from CSV files generated by LycoSTand.
LycoSTand is a tool which forms network flows and extracts features characterising them to understand network traffic.
Lycos is the greek word for wolf (flow in reverse order).
These characteristics can be used for network intrusion detection systems.
As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, has highlighted some of the problems, paving the way for newer datasets to correct them. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. It contains the record of a real network traffic as two files, one containing the raw data in PCAP format and the other the processed data with flow-based features in CSV format, generated by the CICFlowMeter tool for feature extraction.
In this work, a detailed analysis of this dataset is performed, and several problems discovered in the flows extracted from the network packets are reported. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms.
Based on the improvements in CIC-IDS2017, we also examine other datasets generated by CICFlowMeter, affected by the same issues, on which LycoSTand can be used to produce improved datasets for network intrusion detection.
LYCOS-IDS2017 is an improved version of the CIC-IDS2017 and therefore uses CIC-IDS2017 PCAP files that can be downloaded from https://www.unb.ca/cic/datasets/ids-2017.html
LYCOS-IDS2017 is composed of labelled network flows with flow-based features generated by LycoSTand from the packet payloads provided as PCAP format. The CSV files, LycoSTand and python source codes are available for researchers and contributions are welcome.
Should you use our dataset or source code, please cite our related publication: