Le Mans University

Use

LYCOS-IDS2017
 
  • Publicly available for researchers.
  • Language Python
  •  
  •  
  •  
  •  

Use

LycoSTand
 
  • Publicly available for researchers.
  • Language C
  •  
  •  
  •  
  •  

LYCOS-IDS2017

This project creates a new dataset from CSV files generated by LycoSTand.

LycoSTand

LycoSTand is a tool which forms network flows and extracts features characterising them to understand network traffic. Lycos is the greek word for wolf (flow in reverse order).
These characteristics can be used for network intrusion detection systems.

Installation Lycos-ids2017

    Installation
  • All python packages necessary to run this program are listed in requirements.txt and can be installed with pip or with conda.
  • Retrieve code from Github
      git clone http://maupiti.univ-lemans.fr:2443/lycos/lycos-ids2017.git
     cd lycos-ids2017
  • Retrieve CIC-IDS2017 CSV files from https://www.unb.ca/cic/datasets/ids-2017.html (download link at bottom page)
      Put CSV files in lycos-ids2017/cicids2017/csv_files/
      For convenience, they are provided in compressed format
      cd cicids2017/csv_files/
      unzip '\*.zip'
      cd ..
  • Put CSV generated by LycoSTand in lycos-ids2017/pcap_lycos/
      For convenience, they are provided in compressed format
      cd pcap_lycos
      unzip '\*.zip'
      cd ..
    Execution
  • Put labels to all flows of the CSV generated by LycoSTand.
      python labelling.py
  • Create training/cross-validation/test sets.
      python create_dataset.py
  • Launch analysis of performance
      python analysis.py

Installation LycoSTand

    Installation
  • Install LIBPCAP library
      sudo apt-get install libpcap-dev
  • Retrieve code from Github
      git clone http://maupiti.univ-lemans.fr:2443/lycos/lycostand.git
      cd lycostand
  • Retrieve PCAP files from https://www.unb.ca/cic/datasets/ids-2017.html (download link at bottom page)
      Put PCAP files in lycostand/pcap/
  • Compile code (install gcc before compiling, if not already installed):
      make
    Execution
  • Launch LycoSTand
      ./lycostand -i ./pcap/ -o ./pcap_lycos/
  • IMPORTANT NOTE:
    LycoSTand will process all PCAP files located in ./pcap/ folder. Each file will be processed one after the other (multi-threading not implemented in this version). It may take up to 16 hours on a laptop with a Core i7-8750H. In order to speed up the processing and if your machine is multi-core, it is possible to activate a compiler switch called ARG_BYPASS in options.def and to uncomment a single pcap file to process in the main function. Once the program is modified, compiled and launched, it is possible to repeat the process with other PCAP files while the first ones are running. All the programs can then execute in parallel reducing the total time necessary to generate CSV files in ./pcap_lycos directory.
    Outputs
  • The program generates 5 CSV files (one for each PCAP file) in ./pcap_lycos/
  • For convenience, we provide them in zip files.

Research

As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, has highlighted some of the problems, paving the way for newer datasets to correct them. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. It contains the record of a real network traffic as two files, one containing the raw data in PCAP format and the other the processed data with flow-based features in CSV format, generated by the CICFlowMeter tool for feature extraction.

In this work, a detailed analysis of this dataset is performed, and several problems discovered in the flows extracted from the network packets are reported. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms.

Based on the improvements in CIC-IDS2017, we also examine other datasets generated by CICFlowMeter, affected by the same issues, on which LycoSTand can be used to produce improved datasets for network intrusion detection.

License

LYCOS-IDS2017 is an improved version of the CIC-IDS2017 and therefore uses CIC-IDS2017 PCAP files that can be downloaded from https://www.unb.ca/cic/datasets/ids-2017.html
LYCOS-IDS2017 is composed of labelled network flows with flow-based features generated by LycoSTand from the packet payloads provided as PCAP format. The CSV files, LycoSTand and python source codes are available for researchers and contributions are welcome.
Should you use our dataset or source code, please cite our related publication:

  • Arnaud ROSAY, Florent CARLIER, Eloïse CHEVAL, and Pascal LEROUX (2021). From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance. In IEEE/WIC/ACM International Conference on Web Intelligence (WI-IAT ’21), December 14–17, 2021, ESSENDON, VIC, Australia. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3486622.3493973
  • Arnaud Rosay, Eloïse Cheval, Florent Carlier and Pascal Leroux (2022). Network Intrusion Detection: A Comprehensive Analysis of CIC-IDS2017. In Proceedings of the 8th International Conference on Information Systems Security and Privacy (ICISSP 2022), pages 25-36 ISBN: 978-989-758-553-1; ISSN: 2184-4356