Pattern-defeating quicksort.

pdqsort

Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on inputs with certain patterns. pdqsort is an extension and improvement of David Mussers introsort. All code is available for free under the zlib license.

Best        Average     Worst       Memory      Stable      Deterministic
n           n log n     n log n     log n       No          Yes

Usage

pdqsort is a drop-in replacement for std::sort. Just replace a call to std::sort with pdqsort to start using pattern-defeating quicksort. If your comparison function is branchless, you can call pdqsort_branchless for a potential big speedup. If you are using C++11, the type you're sorting is arithmetic and your comparison function is not given or is std::less/std::greater, pdqsort automatically delegates to pdqsort_branchless.

Benchmark

A comparison of pdqsort and GCC's std::sort and std::stable_sort with various input distributions:

Performance graph

Compiled with -std=c++11 -O2 -m64 -march=native.

Visualization

A visualization of pattern-defeating quicksort sorting a ~200 element array with some duplicates. Generated using Timo Bingmann's The Sound of Sorting program, a tool that has been invaluable during the development of pdqsort. For the purposes of this visualization the cutoff point for insertion sort was lowered to 8 elements.

Visualization

The best case

pdqsort is designed to run in linear time for a couple of best-case patterns. Linear time is achieved for inputs that are in strictly ascending or descending order, only contain equal elements, or are strictly in ascending order followed by one out-of-place element. There are two separate mechanisms at play to achieve this.

For equal elements a smart partitioning scheme is used that always puts equal elements in the partition containing elements greater than the pivot. When a new pivot is chosen it's compared to the greatest element in the partition before it. If they compare equal we can derive that there are no elements smaller than the chosen pivot. When this happens we switch strategy for this partition, and filter out all elements equal to the pivot.

To get linear time for the other patterns we check after every partition if any swaps were made. If no swaps were made and the partition was decently balanced we will optimistically attempt to use insertion sort. This insertion sort aborts if more than a constant amount of moves are required to sort.

The average case

On average case data where no patterns are detected pdqsort is effectively a quicksort that uses median-of-3 pivot selection, switching to insertion sort if the number of elements to be (recursively) sorted is small. The overhead associated with detecting the patterns for the best case is so small it lies within the error of measurement.

pdqsort gets a great speedup over the traditional way of implementing quicksort when sorting large arrays (1000+ elements). This is due to a new technique described in "BlockQuicksort: How Branch Mispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss. In short, we bypass the branch predictor by using small buffers (entirely in L1 cache) of the indices of elements that need to be swapped. We fill these buffers in a branch-free way that's quite elegant (in pseudocode):

buffer_num = 0; buffer_max_size = 64;
for (int i = 0; i < buffer_max_size; ++i) {
    // With branch:
    if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; }
    // Without:
    buffer[buffer_num] = i; buffer_num += (elements[i] < pivot);
}

This is only a speedup if the comparison function itself is branchless, however. By default pdqsort will detect this if you're using C++11 or higher, the type you're sorting is arithmetic (e.g. int), and you're using either std::less or std::greater. You can explicitly request branchless partitioning by calling pdqsort_branchless instead of pdqsort.

The worst case

Quicksort naturally performs bad on inputs that form patterns, due to it being a partition-based sort. Choosing a bad pivot will result in many comparisons that give little to no progress in the sorting process. If the pattern does not get broken up, this can happen many times in a row. Worse, real world data is filled with these patterns.

Traditionally the solution to this is to randomize the pivot selection of quicksort. While this technically still allows for a quadratic worst case, the chances of it happening are astronomically small. Later, in introsort, pivot selection is kept deterministic, instead switching to the guaranteed O(n log n) heapsort if the recursion depth becomes too big. In pdqsort we adopt a hybrid approach, (deterministically) shuffling some elements to break up patterns when we encounter a "bad" partition. If we encounter too many "bad" partitions we switch to heapsort.

Bad partitions

A bad partition occurs when the position of the pivot after partitioning is under 12.5% (1/8th) percentile or over 87,5% percentile - the partition is highly unbalanced. When this happens we will shuffle four elements at fixed locations for both partitions. This effectively breaks up many patterns. If we encounter more than log(n) bad partitions we will switch to heapsort.

The 1/8th percentile is not chosen arbitrarily. An upper bound of quicksorts worst case runtime can be approximated within a constant factor by the following recurrence:

T(n, p) = n + T(p(n-1), p) + T((1-p)(n-1), p)

Where n is the number of elements, and p is the percentile of the pivot after partitioning. T(n, 1/2) is the best case for quicksort. On modern systems heapsort is profiled to be approximately 1.8 to 2 times as slow as quicksort. Choosing p such that T(n, 1/2) / T(n, p) ~= 1.9 as n gets big will ensure that we will only switch to heapsort if it would speed up the sorting. p = 1/8 is a reasonably close value and is cheap to compute on every platform using a bitshift.

Owner
Comments
  • Integer overflows in partition_right_branchless

    Integer overflows in partition_right_branchless

    Looks like there are multiple possible integer overflows in partition_right_branchless.

    The first kind of overflow happens on the case when the sequence is already correctly partitioned. In this case first >= last and therefore there is an overflow in the loop condition: while (last - first > 2 * block_size) { This could be fixed by this tiny patch (I have not changed the indentation to show the idea):

    --- a/pdqsort.h
    +++ b/pdqsort.h
    @@ -224,7 +224,6 @@ namespace pdqsort_detail {
             if (!already_partitioned) {
                 std::iter_swap(first, last);
                 ++first;
    -        }
    
             // The following branchless partitioning is derived from "BlockQuicksort: How Branch
             // Mispredictions don’t affect Quicksort" by Stefan Edelkamp and Armin Weiss.
    @@ -325,6 +324,7 @@ namespace pdqsort_detail {
                 while (num_r--) std::iter_swap(last - offsets_r[num_r], first), ++first;
                 last = first;
             }
    +        }
    
             // Put the pivot in the right place.
             Iter pivot_pos = first - 1;
    

    However, the use of int variables there is also a bit suspicious. I have not checked all possible code paths to ensure that they cannot overflow in some cases though

  • pdqsort interface cmake

    pdqsort interface cmake

    Hi, I'd like to attempt to sue pdqsort through some package manager for C++ like CPM.

    To that end, I'd need this repo to incorporate some minimal CMakeLists.txt file that would roughly look like:

    project(pdqsort CXX)
    
    set(CMAKE_CXX_STANDARD 11)
    
    add_library(pdqsort INTERFACE)
    target_include_directories(pdqsort INTERFACE .)
    

    This would be sufficient for CPM to figure out what compiler flags to pass to other cmake based projects that want to use pdqsort.

    Is that OK from your side @orlp or would you rather not include anything like that in your repo?

  • pdqsort is slower than qsort when compare function is slow

    pdqsort is slower than qsort when compare function is slow

    I tried to use pdqsort as an implementation of Array#sort method for the programming language Ruby. Unfortunately, it was about twice slower than the current implementation (based on glibc's qsort(3)) against random input.

    It seems that pdqsort tends to call compare function more often than qsort(3). Here is a self-contained demonstration:

    $ g++ -O3 -o cmp-test cmp-test.cpp
    
    $ ./cmp-test qsort
    1536197
    
    $ ./cmp-test pdqsort
    1861162
    
    $ ./cmp-test std::sort
    1978708
    
    #include <vector>
    #include "pdqsort.h"
    
    #define N 100000
    
    int counter;
    
    int cmp(const void *a, const void *b) {
        long c = *(long*)a - *(long*)b;
        counter++;
        return c < 0 ? -1 : c > 0 ? 1 : 0;
    }
    
    int main(int argc, char *argv[]) {
        std::vector<long> vec;
    
        // reproducible random input
        long x = 1;
        for (int i = 0; i < N; i++) {
            vec.push_back(x);
            x = (48271 * x) & 0xffffffff;
        }
    
        counter = 0;
    
        std::string mode(argv[1]);
    
        if (mode == "qsort") {
            qsort(vec.data(), vec.size(), sizeof(long), cmp);
        }
        else if (mode == "pdqsort") {
            pdqsort(vec.begin(), vec.end(), [] (const auto &a, const auto &b) {
                    counter++;
                    return a < b;
            });
        }
        else if (mode == "std::sort") {
            std::sort(vec.begin(), vec.end(), [] (const auto &a, const auto &b) {
                    counter++;
                    return a < b;
            });
        }
    
        // test
        //for (int i = 0; i < N; i++) {
        //    printf("%ld\n", vec[i]);
        //}
    
        printf("%d\n", counter);
    }
    

    Because Ruby's compare function is slow (it invokes Ruby-level method for each comparison), qsort(3) implementation is faster.

    Is this a known characteristic of pdqsort? I know that there is no perfect sorting algorithm in every respect, but I'd like to report this just for the possibility that it may be an unknown room for improvement. If it is known, I'm sorry, feel free to close this issue. If there is a way to reduce the number of comparison at the expense of some speed, I'd be glad to hear about it.

  • error handling

    error handling

    Fantastic work. Not an issue more of a question. standard C++ lib is not "exception free", but I will ask anyway. Can we use this lib in "exception free" programs? Using let's say MSVC.

  • Fix MSVC 2015 warning when sorting vector<size_t>

    Fix MSVC 2015 warning when sorting vector

    MSVC 2015 would produce a warning 4244 (conversion from int64 to int) on both of these code lines, when compiling to x64, and sorting a vector<size_t>.

    As far as I can tell, these casts will not alter the behaviour of pdqsort.

  • Add pdq namespace

    Add pdq namespace

    Adds a pdq namespace so that one can use pdq::sort or use std::sort and switch back and forth easily. Added pdqsort functions back in so that this change does not break code using the old interface.

    It's a small, mostly cosmetic change, of course.

  • pdqsort requires Iter to be default constructible

    pdqsort requires Iter to be default constructible

    Currently pdqsort require Iter to be default constructible due to the following code in pdqsort_loop:

    std::pair<Iter, bool> part_result;
    if (Branchless) part_result = partition_right_branchless(begin, end, comp);
    else part_result = partition_right(begin, end, comp);
    

    Changing this to

    std::pair<Iter, bool> part_result =
      (Branchless ?
        partition_right_branchless(begin, end, comp) : partition_right(begin, end, comp));
    

    fixes the problem.

  • Resolve MSVC++ /W4 warnings in bench.cpp.

    Resolve MSVC++ /W4 warnings in bench.cpp.

    Each of the benchmark "generate pattern" functions that accepted a size_t then compared that with an int, triggering "signed/unsigned mismatch" warnings. Changed those functions to accept int instead.

    argc/argv in main() were unused, triggering unused parameter warnings, those were removed.

    A cast was added to the cycles.push_back call to suppress implicit narrowing from double to uint64_t warnings.

  • Including to Boost.Sort

    Including to Boost.Sort

    Hello.

    I have seen, that you wanted to include pdfqsort algorithm to Boost.Sort. What the status of integrating to Boost.Sort now? Can i help you in any way?

  • Move check inside of condition

    Move check inside of condition

    In partial_insertion_sort, I don't think we need to check whether limit > partial_insertion_sort_limit if we're outside the if as we only change the value of limit inside the if.

  • Benchmark suggestion - pdqsort vs Quicksort 'Magnetica'

    Benchmark suggestion - pdqsort vs Quicksort 'Magnetica'

    Hi Orson, just was asked how my Quicksort 'Magnetica' fares against pdqsort: https://github.com/Gaming32/ArrayV-v4.0/issues/73#issuecomment-979988753

    Please consider running them in some (VISUAL) benchmark. Love seeing those bars moved around. Here it is, in qsm.h: https://www.qb64.org/forum/index.php?topic=3518.msg138505#msg138505

  • Support for C++17 parallel execution?

    Support for C++17 parallel execution?

    As a drop-in replacement for std::sort, do you have plans to add support for parallel execution in pdqsort?

    As I'm sure you know, sort can now do this,

    std::sort(std::execution::par, lines.begin(), lines.end());

  • Eliminate nvcc (NVidia compiler) warning

    Eliminate nvcc (NVidia compiler) warning

    The NVidia nvcc compiler gives a "pdqsort.h(170): warning #68-D: integer conversion resulted in a change of sign" warning. It may be storing the enum as an unsigned value?

  • Improved C++11 detection on Visual Studio

    Improved C++11 detection on Visual Studio

    Visual Studio has broken __cplusplus (See: https://docs.microsoft.com/en-us/cpp/build/reference/zc-cplusplus?view=msvc-160)

    Instead one can use _MSVC_LANG (See: https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-160)

    The user can override the behaviour by defining PDQSORT_HAS_CPP11 before including pdqsort.h.

  • pdqselect?

    pdqselect?

    I'm using pqdsort in my ranges library (slightly modified to support projections, proxy iterators and constexpr evaluation). I think it's fantastic, and it's proven almost always faster (and never slower) than both libc++ and libstdc++ std::sort() in every benchmark I've tried.

    Have you given any thought to writing a companion "pdqselect" algorithm to implement std::nth_element()? As far as I know all the mainstream C++ implementations of nth_element use introselect, but it seems that (in principle at least) this algorithm could benefit from the same optimisations that pdqsort uses relative to introsort.

    (A quick Google turned up this Rust crate, but it doesn't seem to be part of the Rust standard library yet.)

Glob pattern to regex translator in C++11. Optionally, directory traversal with glob pattern in C++17. Header-only library.

Glob pattern to regex translator in C++11. Optionally, directory traversal with glob pattern in C++17. Header-only library.

Oct 27, 2021
Functional programming style pattern-matching library for C++
Functional programming style pattern-matching library for C++

Mach7: Pattern Matching for C++ by Yuriy Solodkyy, Gabriel Dos Reis, Bjarne Stroustrup Abstract Pattern matching is an abstraction mechanism that can

Jun 24, 2022
Hello from pattern-f.

TQ-pre-jailbreak A PRE-jailbreak for iOS 14.0 ~ iOS 14.3 on all devices. Generally speaking, jailbreak starts from an arbitrary kernel r/w vulnerabili

Jun 11, 2022
Simple header only pattern matching for c++14

Simple, Extensible C++ Pattern Matching Library I have recently been looking at Haskell and Rust. One of the things I wanted in C++ from those languag

Jun 15, 2022
An efficient, composable design pattern for range processing
An efficient, composable design pattern for range processing

Transrangers An efficient, composable design pattern for range processing. Intro Pull-based approach Push-based approach Transrangers Performance Tran

Apr 15, 2022
Ideas, thoughts, and notes on a typeclass/interface based polymorphism pattern for standard C

Polymorphism through Typeclasses / Interface / Traits Ideas, thoughts, and notes on an action based polymorphism pattern for good ol' C. Originally us

Jun 15, 2022
YARA pattern matching scannner GUI
YARA pattern matching scannner GUI

YARA GUI This is a GUI for the binary pattern matching scanner YARA. Features Drag and drop targets Directory scanning Compiled rule cache Favorite/re

Jul 2, 2021
match(it): A lightweight header-only pattern-matching library for C++17 with macro-free APIs.

match(it): A lightweight header-only pattern-matching library for C++17 with macro-free APIs. Features Easy to get started. Single header library. Mac

May 14, 2022
A kata to practice refactoring to the State Pattern

A kata to practice refactoring to the State Pattern

May 16, 2022
A kata to practice refactoring to the strategy pattern.

<style> commit{ color:orange; } heading{ color:firebrick; font-weight: bold; } </style> Instructions Introduction This kata is designed to help you le

May 11, 2022
Pattern Printing For beginners!

Patterns Project on Patterns Installation Download the source files and compile it. Linux g++ main.cpp -o patterns.out ./patterns.out Windows g++ mai

Oct 17, 2021
The most powerful and customizable binary pattern scanner written on modern C++

Sig The most powerful and customizable binary pattern scanner written on modern C++ ✔ Capabilities: Support for all common pattern formats: Pattern +

Jun 11, 2022
Taichi Pattern
Taichi Pattern

Taichi Pattern

Oct 8, 2021
A simple network library powered by epoll and proactor pattern.

spinet A simple network library powered by epoll and proactor pattern. Installation Required cmake version 3.10 or above c++ standard 17 or above git

Jan 27, 2022
Edf is an event-driven framework for embedded system (e.g. FreeRTOS) with state machine and subscriber-publisher pattern.

Edf means event-driven framework. Event-driven programming is a common pattern in embedded systems. However, if you develop software directly on top o

Apr 21, 2022
This repository contains the Assignment code of Data Structures and Algorithms Assignments of SPPU, Second Year IT Syllabus (2019 pattern)

DSAL This repository contains the Assignment code of Data Structures and Algorithms Assignments of SPPU, Second Year IT Syllabus (2019 pattern) Assign

Apr 6, 2022
Manticore - iOS Jailbreak based on cicuta virosa by ModernPwner and Pattern F's pre-jailbreak's amfid bypass.

Manticore Jailbreak Manticore Jailbreak is a Free and Open-Source Jailbreak utility developed by the Manticore Team. Current compatibility: iOS 14.0 -

Jun 12, 2022