Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

CLP

Compressed Log Processor (CLP) is a tool capable of losslessly compressing text logs and searching the compressed logs without decompression. To learn more about it, you can read our paper.

Getting Started

You can download a release from the releases page or you can build the latest by using the packager.

Project Structure

CLP is currently split across a few different components in the components directory:

  • clp-py-utils contains Python utilities common to several of the other components.
  • compression-job-handler contains code to submit compression jobs to a cluster.
  • core contains code to compress uncompressed logs, decompress compressed logs, and search compressed logs.
  • job-orchestration contains code to schedule compression jobs on the cluster.
  • package-template contains the base directory structure and files of the CLP package.

Packages

The packages held by this repository are:

  1. Docker Image clp/clp-core-dependencies-x86-ubuntu-focal
    • A docker image containing all the necessary dependencies to build CLP core in an Ubuntu Focal x86 environment
  2. Docker Image clp/clp-core-dependencies-x86-ubuntu-bionic
    • A docker image containing all the necessary dependencies to build CLP core in an Ubuntu Bionic x86 environment
  3. Docker Image clp/clp-core-dependencies-x86-centos7.4
    • A docker image containing all the necessary dependencies to build CLP core in a Centos 7.4 x86 environment
  4. Docker Image clp/clp-execution-x86-ubuntu-focal
    • A docker image containing all the necessary dependencies to run the full CLP package in an x86 environment

Next Steps

This is our open-source release which we will be constantly updating with bug fixes, features, etc. If you would like a feature or want to report a bug, please file an issue and we'll be happy to engage. We also welcome any contributions!

Comments
  • [ERROR] [clp] Unable to connect to the database with the provided credentials

    [ERROR] [clp] Unable to connect to the database with the provided credentials

    When I run ./sbin/start-clp --uncompressed-logs-dir <directory containing your uncompressed logs>, it prompts an error such as the title. The detailed execution process is as follows

    $ ./sbin/start-clp --uncompressed-logs-dir ./myTestData/input/
    2021-10-24 21:25:28,315 [INFO] [clp] Using default config file at etc/clp-config.yaml
    2021-10-24 21:25:28,320 [INFO] [clp] Provision docker network bridge
    2021-10-24 21:25:28,634 [INFO] [clp] Starting CLP scheduler
    2021-10-24 21:25:28,634 [INFO] [clp] Starting scheduler mariadb database
    2021-10-24 21:25:34,037 [INFO] [clp] Starting scheduler queue
    2021-10-24 21:25:39,313 [INFO] [clp] Initializing scheduler queue
    2021-10-24 21:25:40,831 [INFO] [clp] Initializing scheduler database tables
    2021-10-24 21:26:05,825 [ERROR] [clp] Unable to connect to the database with the provided credentials
    2021-10-24 21:26:05,826 [ERROR] [clp] 
    2021-10-24 21:26:05,826 [ERROR] [clp] Failed to provision "clp-mini-cluster"
    
  • Compression: unable to mount uncompressed logs directory in the container

    Compression: unable to mount uncompressed logs directory in the container

    Hello, I am unable to compress the log file as the path for uncompressed logs is not mounted in the container. Please see the logs below for further inspection. CLP starts successfully. uncomp_logs is the <uncompressed_logs-dir>

    $ ./clp-package-ubuntu-focal-x86_64-v0.0.1/sbin/start-clp --uncompressed-logs-dir uncomp_logs/
    2021-11-02 11:25:29,892 [INFO] [clp] Using default config file at clp-package-ubuntu-focal-x86_64-v0.0.1/etc/clp-config.yaml
    2021-11-02 11:25:29,898 [INFO] [clp] Provision docker network bridge
    2021-11-02 11:25:30,102 [INFO] [clp] Starting CLP scheduler
    2021-11-02 11:25:30,103 [INFO] [clp] Starting scheduler mariadb database
    2021-11-02 11:25:33,941 [INFO] [clp] Starting scheduler queue
    2021-11-02 11:25:39,687 [INFO] [clp] Initializing scheduler queue
    2021-11-02 11:25:41,374 [INFO] [clp] Initializing scheduler database tables
    2021-11-02 15:25:41,608 [INFO] Successfully created clp metadata tables for compression and search
    2021-11-02 15:25:41,835 [INFO] Successfully created compression_jobs and compression_tasks orchestration tables
    2021-11-02 11:25:41,851 [INFO] [clp] Starting scheduler service
    2021-11-02 11:25:41,960 [INFO] [clp] Starting CLP worker
    

    Upon compressing, it gives the following error:

    $ ./clp-package-ubuntu-focal-x86_64-v0.0.1/sbin/compress uncomp_logs/auth.log 
    2021-11-02 11:26:23,597 [INFO] [clp] Using default config file at clp-package-ubuntu-focal-x86_64-v0.0.1/etc/clp-config.yaml
    2021-11-02 15:26:23,842 [INFO] [compress] Compression job submitted to compression-job-handler.
    2021-11-02 15:26:23,842 [INFO] [compression-job-handler] compression-job-handler started.
    2021-11-02 15:26:23,863 [INFO] [job-8] Iterating and partitioning files into tasks.
    2021-11-02 15:26:23,863 [ERROR] [job-8] "/opt/clp/uncomp_logs/auth.log" does not exist.
    2021-11-02 15:26:23,871 [INFO] [job-8] Waiting for 0 task(s) to finish.
    

    There indeed is no mentioned directory in the container.

    [email protected]:/opt/clp# ls   
    LICENSE  README.md  bin  etc  lib  requirements-pre-3.7.txt  sbin  var
    

    I checked the permission of the uncompressed-log-directory and it is user:docker. Is there anything else I need to check

  • Use ArrayBackedSet to replace std::set for index in segment

    Use ArrayBackedSet to replace std::set for index in segment

    References

    N/A

    Description

    1. Introduced a new data structure ArrayBackedIntPosSet. The data structure replaced std::unorder_set for tracking which IDs had occurred in a segment. The new data structure wraps a vector<bool> which uses 1 bit for each ID. Compared to std::unorder_set, the ArrayBackedIntPosSet consumes significantly less memory and achieves similar performance.
    2. Removed the variable ID set from the encoded file object. Instead, IDs are added to the segment index as each message is encoded. For files that don't start with timestamps, we don't know whether the file will end up in the segment for files with timestamps or the segment for files without; so this change adds a temporary ID holder in the archive to handle this case.
    3. Embedded file object into archive object to enforce only 1 file can be compressed at any time of the execution.
    4. Updated make-dictionaries-readable to dump the segment index as well.

    Validation performed

    Ran compression locally on var-logs, openstack-24hrs, hadoop-24hrs. Confirmed that the output is correct and the RSS usage & performance match expectations.

    The Following is the change in RSS (Bytes) Openstack-24hrs: 1,163,071,488 -> 902,053,888 ~22% saving Spark-hibench: 1,019,035,648 -> 974,655,488 ~4% saving hadoop-24hrs: 80,744,448 -> 76,058,624 ~ 5.8% saving var-log: 436,318,208 -> 365,416,448 ~ 16% saving

  • Source Compile & Test

    Source Compile & Test

    I have a question here, I don’t know if you can answer it. I found that CLP uses docker to run the program, and it uses the MySQL database. But the core source code is still implemented in c++. I want to know how to compile and test the CLP algorithm using only C++ code, thank you! @kirkrodrigues @jackluo923

  • Fail to compile CLP

    Fail to compile CLP

    I have met another compilation problem when I try to execute "make" command. It refers missing "xmlReaderForIO" in libarchive.a, I wonder why ? I have "libxml2.a" installed in my lib directory.

  • Compress command from Package-template gets stuck (process runs indefinitely)

    Compress command from Package-template gets stuck (process runs indefinitely)

    Bug

    • When installed everything and ran the compress command via pacakge-template/src/sbin, it gives following output and hangs indefinitely.
    • The ../var directory does have file sources BUT it never produces any archived files or directory
    • Feel free to let me know if any other information is required
    [email protected]:~/code/CLP/clp/components/package-template/src/sbin$ ./compress /home/indra/TestLogs
    2022-11-30 05:15:39,689 [INFO] [/opt/clp/sbin/native/compress] Compression job submitted to compression-job-handler.
    2022-11-30 05:15:39,689 [INFO] [compression-job-handler] compression-job-handler started.
    2022-11-30 05:15:39,695 [INFO] [job-1] Iterating and partitioning files into tasks.
    2022-11-30 05:15:39,702 [INFO] [job-1] Waiting for 1 task(s) to finish.
    

    Nothing after this. Here is the list of /TestLogs directory.

    [email protected]:~/code/CLP/clp/components/package-template/src/sbin$ ls -l ~/TestLogs
    total 12
    -rw-rw-r-- 1 indra indra 4493 Nov 30 03:57 ibm_dummy_logs.log
    -rw-rw-r-- 1 indra indra   70 Nov 30 02:01 my_log_file.log
    

    CLP version

    0.0.1

    Environment

    Ubuntu 20.04 LTS:

    [email protected]:~/code/CLP/clp/components/package-template/src/sbin$ cat /etc/*release
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=20.04
    DISTRIB_CODENAME=focal
    DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
    NAME="Ubuntu"
    VERSION="20.04.5 LTS (Focal Fossa)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 20.04.5 LTS"
    VERSION_ID="20.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=focal
    UBUNTU_CODENAME=focal
    
    [email protected]:~ /code/CLP/clp/components/package-template/src/sbin$ docker -v
    Docker version 20.10.12, build 20.10.12-0ubuntu2~20.04.1
    

    Reproduction steps

    • Installed core per the directions on the repository – directly on Ubuntu environment. Did not go with docker image route.
    • Installed packages clp_py_utils, compression_job_handler, and job_orchestration in /components/package-template/src/lib/python3/site-packages/
    • Request: I want to build the "package" from scratch rather than playing with package-template and see if that works
    • From the docs, I am unsure how to do that. Any pointers on doing that would help to troubleshoot this.
  • unable to get clp running in mac

    unable to get clp running in mac

    Bug

    I am trying to get clp up and running by following the instructions provided in https://github.com/y-scope/clp/blob/main/tools/packager/README.md but I see the below error ModuleNotFoundError: No module named 'zstandard.backend_c'

    CLP version

    6d35126

    Environment

    OS: MacOS Monterey Python: 3.10.8 Docker version 20.10.17, build 100c701

    Reproduction steps

    Steps:

    1. Followed steps as per building package
    2. cd out/clp-package-ubuntu-focal-x86_64-v0.0.1
    3. Start clp - /sbin/start-clp --uncompressed-logs-dir <full path>
  • Centos7.4 build failed

    Centos7.4 build failed

    All dependencies have been successfully compiled and installed. But an error is reported when make is executed, and the error message is as follows: /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_read_support_filter_lz4.c.o): in function lz4_filter_read_legacy_stream': archive_read_support_filter_lz4.c:(.text+0x1f9): undefined reference toLZ4_decompress_safe' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_read_support_filter_lz4.c.o): in function lz4_filter_read_default_stream': archive_read_support_filter_lz4.c:(.text+0x575): undefined reference toLZ4_decompress_safe_usingDict' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_read_support_filter_lz4.c:(.text+0x79e): undefined reference to LZ4_decompress_safe' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_release': archive_cryptor.c:(.text+0x18): undefined reference to EVP_CIPHER_CTX_free' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_init': archive_cryptor.c:(.text+0x4e): undefined reference to EVP_CIPHER_CTX_new' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0x89): undefined reference toEVP_aes_192_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xb7): undefined reference to EVP_CIPHER_CTX_init' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xc9): undefined reference toEVP_aes_128_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xd9): undefined reference to EVP_aes_256_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_update': archive_cryptor.c:(.text+0x2f9): undefined reference to EVP_EncryptInit_ex' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0x318): undefined reference toEVP_EncryptUpdate' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_cleanup': archive_hmac.c:(.text+0x10): undefined reference toHMAC_CTX_cleanup' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_final': archive_hmac.c:(.text+0x48): undefined reference toHMAC_Final' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_init': archive_hmac.c:(.text+0x95): undefined reference toEVP_sha1' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_hmac.c:(.text+0xa9): undefined reference to HMAC_Init_ex' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function__hmac_sha1_update': archive_hmac.c:(.text+0x64): undefined reference to `HMAC_Update' collect2: error: ld returned 1 exit status make[2]: *** [clp] mistake 1 make[1]: *** [CMakeFiles/clp.dir/all] mistake 2 make: *** [all] mistake 2

  • Can not build on redHat 7

    Can not build on redHat 7

    I am trying to build CLP on Linux server(redHat 7), but I encountered CMake(version: 3.21.1) build failure. The debug messages refer it could not find Boost (missing iostreams, program_options filesystem system). But I have libboost-iostreams.a, libboost-program_options.a, libboost-filesystem.a and libboost-system.a(all are in version 1.59.0) under /usr/include/lib/, I wonder why the Cmake program could not find them. I am looking forward to a solution, thanks.

  • Documentation and Code Corrections

    Documentation and Code Corrections

    References

    Description

    • The proposed changes are something I had to make to get CLP up and running on my Ubuntu instance.
    • The code in the README.md for copying compression-job-handler and job-orchestration packages is misplaced.
    • I have highlighted line numbers as well as made corrections where necessary.

    Validation performed

    • I have followed the said steps and got CLP up and running on my Ubuntu box.
    • On the other hand, if someone tries to follow current directions, they'd either get confused or end up copying clp-py-utils directory again and again.
    • At the end, they will not be able to run the start or compress commands as the necessary packages won't be there.
  • Replace many of the magic strings with constants and remove unnecessary prefixes from database column names; Some code reorganization.

    Replace many of the magic strings with constants and remove unnecessary prefixes from database column names; Some code reorganization.

    Description

    • Replace many of the magic strings (e.g., job/task statuses) with constants.
    • Some of the column names in the orchestration tables were unnecessarily prefixed (e.g., job_status in compression_jobs doesn't need the job_ prefix) .
    • Moved some of the job orchestration code from clp_py_utils into job-orchestration and combined all the compression task code into one file.

    Validation performed

    • Compressed some logs and verified decompression.
    • Verified search still works.
  • mariadb connector 3.2.3 no longer available

    mariadb connector 3.2.3 no longer available

    Bug

    Running

    ./tools/scripts/lib_install/mariadb-connector-c.sh 3.2.3
    

    Returns

    dpkg-query: no packages found matching libmariadb-dev
    Checking for elevated privileges...
    curl: (22) The requested URL returned error: 404
    

    Running with 3.3.3 works for now.

    CLP version

    https://github.com/y-scope/clp/commit/54497a0dafd9fb87ec3d7c1cdfff5e21aebbe01c

    Environment

    Ubuntu 22.04.1 LTS

    Docker v4.14.1

    uname -a: Linux 485fbd483ae2 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

    Reproduction steps

    Run

    ./tools/scripts/lib_install/mariadb-connector-c.sh 3.2.3
    
  • Change ErrorCode enum to an enum class

    Change ErrorCode enum to an enum class

    References

    Description

    Changed ErrorCode enum to an enum class in ErrorCode.hpp

    Validation performed

    Validated that compression and decompression operated as normal

  • How to pass custom delimiters, dictionary and non-dictionary schemas

    How to pass custom delimiters, dictionary and non-dictionary schemas

    According to the paper, we can pass following configs for CLP.

    1. delimiters
    2. dictionary_variables
    3. non_dictionary_variables

    But, AFAIU, there is no way to pass these for clg and clp now.

    Can you help me if I miss anything? Thanks

Mini-async-log-c - Mini async log C port. Now with C++ wrappers.

Description A C11/C++11 low-latency wait-free producer (when using Thread Local Storage) asynchronous textual data logger with type-safe strings. Base

Nov 9, 2022
Cute Log is a C++ Library that competes to be a unique logging tool.

Cute Log Cute Log is a C++ Library that competes to be a unique logging tool. Version: 2 Installation Click "Code" on the main repo page (This one.).

Oct 13, 2022
log4cplus is a simple to use C++ logging API providing thread-safe, flexible, and arbitrarily granular control over log management and configuration. It is modelled after the Java log4j API.

% log4cplus README Short Description log4cplus is a simple to use C++17 logging API providing thread--safe, flexible, and arbitrarily granular control

Jan 4, 2023
View and log aoe-api requests and responses

aoe4_socketspy View and log aoe-api requests and responses Part 1: https://www.codereversing.com/blog/archives/420 Part 2: https://www.codereversing.c

Nov 1, 2022
A revised version of NanoLog which writes human readable log file, and is easier to use.
A revised version of NanoLog which writes human readable log file, and is easier to use.

NanoLogLite NanoLogLite is a revised version of NanoLog, and is easier to use without performance compromise. The major changes are: NanoLogLite write

Nov 22, 2022
Example program using eBPF to log data being based in using shell pipes

Example program using eBPF to log data being based in using shell pipes (|)

Oct 21, 2022
Log engine for c plus plus
Log engine for c plus plus

PTCLogs library PTCLogs is a library for pretty and configurable logs. Installation To install the library (headers and .so file), clone this repo and

May 20, 2022
Sagan - a multi-threads, high performance log analysis engine

Sagan - Sagan is a multi-threads, high performance log analysis engine. At it's core, Sagan similar to Suricata/Snort but with logs rather than network packets.

Dec 22, 2022
Receive and process logs from the Linux kernel.

Netconsd: The Netconsole Daemon This is a daemon for receiving and processing logs from the Linux Kernel, as emitted over a network by the kernel's ne

Oct 5, 2022
Tiny ANSI C lib for logs

logger.c An ANSI C (C86) lib for logs Easy to use and easy. Build You can build this lib or copy/paste sources files in your project. cd build make

Apr 10, 2022
ScyllaHide for IDA7.5; ScyllaHide IDA7.5; It is a really niccccccce anti-anti-debug tool
ScyllaHide for IDA7.5; ScyllaHide IDA7.5; It is a really niccccccce anti-anti-debug tool

Hint 支持原项目,谢谢原项目作者,我只是改了改代码,以支持IDA7.5 ( 原本只支持IDA6.8 )。我觉得原作者应该会介意。 有事麻烦联系我删除。sorry Thank you for the original project developer ScyllaHide Thanks、Than

Dec 15, 2022
A tool for recording RL trajectories.
A tool for recording RL trajectories.

EnvironmentLogger EnvLogger is a standard dm_env.Environment class wrapper that records interactions between a real environment and an agent. These in

Jan 7, 2023
Colorful Logging is a simple and efficient library allowing for logging and benchmarking.
Colorful Logging is a simple and efficient library allowing for  logging and benchmarking.

Colorful-Logging "Colorful Logging" is a library allowing for simple and efficient logging as well for benchmarking. What can you use it for? -Obvious

Feb 17, 2022
Portable, simple and extensible C++ logging library
Portable, simple and extensible C++ logging library

Plog - portable, simple and extensible C++ logging library Pretty powerful logging library in about 1000 lines of code Introduction Hello log! Feature

Dec 29, 2022
A DC power monitor and data logger
A DC power monitor and data logger

Hoverboard Power Monitor I wanted to gain a better understanding of the power consumption of my hoverboard during different riding situations. For tha

May 1, 2021
An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design
An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design

dodowDIY An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design The STL shells are desiged arou

Sep 4, 2022