R :package: and header-only C++ library for geospatial space-division based compression and encoding

spress

Lifecycle: experimental CRAN status

spress provides utilities for encoding and compressing geospatial objects, such as sf objects.

Installation

This package requires C++11 for compilation. Once you have that installed, you can install the development version from GitHub with devtools or pak:

# pak
pak::pkg_install("UFOKN/spress")

# devtools
devtools::install_github("UFOKN/spress")

Usage

spress offers both encoding and compressing utilities. For clarity, in the context of spress, encoding refers to projecting 2-dimensional coordinates to a 1-dimensional system (particularly, geographic coordinates to a Hilbert Curve index); whereas, compressing refers to the process for reducing the object and file size of the encoded object.

Encoding

At a high level, spress performs the following algorithm for encoding:

  1. Given a set of points P
  2. Retrieve P’s extent/bounding box
  3. Create a 2^n x 2^n grid with centroids aligned to the extent’s corner points
  4. Index the grid to a Hilbert Curve
  5. Assign each point in P to the closest index on the Hilbert Curve

A visualization of this algorithm is shown below:

After encoding, the data necessary to be stored is reduced down to:

  1. The set of points P’s extent;
  2. The n value used;
  3. and, the list of indices.

With this set of data, the decoding process is exactly the reverse of the encoding algorithm.

In the visual example above, an n of 4 was used (since 2^4 = 16, and we have a 16x16 grid), however, note that the maximum possible error E converges toward 0 as the extent becomes smaller, and as n grows. Thus, we can achieve a much more accurate encoding scheme by modifying these when possible.

Realistically, this algorithm is computationally expensive if the extent and n are large. However, this is mitigated via bitwise operations and taking advantage of efficient C++ structures. For the specifics of this, see libspate’s source code in inst/include/spate.hpp.

Encoding Example

# Random set of points
P <- data.frame(
  X = c(-77.8015434499876,
        -77.8707250008617,
        -77.9212227055911,
        -77.9385055000025,
        -77.8452515318622,
        -77.8715886279598),
  Y = c(34.2961615000021,
        34.1818796104794,
        34.1447587911948,
        34.2165775000314,
        34.2218266693184,
        34.2632856737062)
)

# n = 8 --> 2^8 x 2^8 grid --> 256x256 grid
n <- 8L

# Encode using vectors
spress::encode(P$X, P$Y, n = n)
#> > spress encoded coordinates: 6 indices
#> # Grid Dimensions: 256 x 256 (65536)
#> # Extent:          -77.80, -77.94, 34.30, 34.14
#>       Index  Coordinates
#> [1]:  43690  (-77.92, 34.14)
#> [2]:   6827  (-77.87, 34.18)
#> [3]:   3754  (-77.94, 34.22)
#> [4]:  16323  (-77.85, 34.22)
#> [5]:  34040  (-77.80, 34.30)
#> # … with 1 more index

# We can also use the data.frame directly and attach to it
encoded_P <- spress::encode(P, coords = c("X", "Y"), attach = FALSE)

encoded_df <- spress::encode(P, coords = c("X", "Y"), attach = TRUE)
encoded_df
#>           X        Y   h
#> 1 -77.80154 34.29616 170
#> 2 -77.87073 34.18188  37
#> 3 -77.92122 34.14476  14
#> 4 -77.93851 34.21658  63
#> 5 -77.84525 34.22183 132
#> 6 -77.87159 34.26329 111

Decoding Example

We will use encoded_P and encoded_df from the encoding example If we decode an object that used spress::encode to encode, there’s no need to pass extent or n to the decoding function. These values are stored in the attributes spress_extent and spress_n, respectively.

spress::decode(encoded_P)
#>           x        y
#> 1 -77.80154 34.29616
#> 2 -77.87459 34.18513
#> 3 -77.92024 34.14476
#> 4 -77.93851 34.21541
#> 5 -77.84720 34.22551
#> 6 -77.87459 34.26588

# Columns `x` and `y` are the new decoded coordinate columns
# The argument `index` refers to the column index of the encoded index values
spress::decode(encoded_df, index = 3)[, c("x", "y")]
#>           x        y
#> 1 -77.80154 34.29616
#> 2 -77.87459 34.18513
#> 3 -77.92024 34.14476
#> 4 -77.93851 34.21541
#> 5 -77.84720 34.22551
#> 6 -77.87459 34.26588

Compression

Compression is not implement yet as of 11/30/21.

For compression, spress performs the following algorithm:

  1. Given a set of points P
  2. Encode P based on the encoding algorithm
  3. Create an ordering index for order preservation upon decompression
  4. Sort indices by the encoding index, such that each point in P follows along corresponding Hilbert Curve
  5. Keep the first encoding index and get the difference between every successive point

This algorithm returns a set of indices that (generally) should be small integers. The returned table can then be written to any file format, preferably a columnar format (such as Apache Parquet), and additionally compressed using another compression library (such as xz or snappy).

C++ Library

Included in this package is the header-only C++ library libspate, which contains the foundational code used for encoding/decoding, and was developed specifically for this package.

This can be found at inst/include/spate.hpp from the root of this repository. Unit testing for this library is done with Catch via testthat. See testthat::use_catch for details on how this works.

Note that the C++ code for this package contains purely the algorithm for encoding and decoding. The current roadmap for this package includes making the C++ library fully vendorable.

Attribution

This package was inspired by the paper:

Liu, Dongge, Tao Wang, Xiaojuan Li, Yeqing Ni, Yanping Li, and Zhao Jin. 2020. “A Multiresolution Vector Data Compression Algorithm Based on Space Division” ISPRS International Journal of Geo-Information 9, no. 12: 721. https://doi.org/10.3390/ijgi9120721

Similar Resources

Space partitioning structures visualization

vptree-draw SVG export of several 2D space partitioning structures. Code structure geo.h minimalist 2D geometry structures (Point, Vector, Box, Sphere

Nov 17, 2022

cartographer work space with detailed comments

cartographer 超详细注释 基于2021.04.20日在github上下载的master的代码的注释版本, 会增加更规范、详细的注释, 包括: 1、增加关键地方的公式推导 2、难理解部分代码画图展示原理 3、原作者代码本身疑似/确定bug的说明 by lixiang 注释多数在 .cc 文

Dec 2, 2022

A family of header-only, very fast and memory-friendly hashmap and btree containers.

A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Nov 25, 2022

A simple single header 64 and 32 bit hash function using only add, sub, ror, and xor.

A simple single header 64 and 32 bit hash function using only add, sub, ror, and xor.

K-HASH A simple single header 64 bit hash function using only add, sub, ror, and xor. This a just general-purpose hash function for like making hash m

Jul 31, 2022

BladeBit - Fast Chia (XCH) RAM-only k32-only Plotter

BladeBit Chia Plotter A fast RAM-only, k32-only, Chia plotter. Requirements 416 GiB of RAM are required to run it, plus a few more megabytes for stack

Nov 28, 2022

Easy to use, header only, macro generated, generic and type-safe Data Structures in C

Easy to use, header only, macro generated, generic and type-safe Data Structures in C

C Macro Collections Easy to use, header only, macro generated, generic and type-safe Data Structures in C. Table of Contents Installation Contributing

Nov 27, 2022

Simple Useful Libraries: C++17/20 header-only dynamic bitset

dynamic_bitset Simple Useful Libraries: C++17/20 header-only dynamic bitset Requirements To use this dynamic bitset, you will need a C++17 (or later)

Nov 16, 2022

a header-only, constexpr alternative to gperf for C++14 users

Frozen Header-only library that provides 0 cost initialization for immutable containers, fixed-size containers, and various algorithms. Frozen provide

Nov 25, 2022

C++14 header only result monad implementation

constexpr Either S, E C++14 header only result monad implementation. Features constexpr support 0 dependencies single header Status in development T

Oct 7, 2021
Analysing and implementation of lossless data compression techniques like Huffman encoding and LZW was conducted along with JPEG lossy compression technique based on discrete cosine transform (DCT) for Image compression.

PROJECT FILE COMPRESSION ALGORITHMS - Huffman compression LZW compression DCT Aim of the project - Implement above mentioned compression algorithms an

Dec 14, 2021
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Oct 31, 2022
The open source edition of Raising the Bar: Redux's Division 1.2 release.

//===================================================================================================================================================

Jul 18, 2022
C header to execute user-space functions in ring 0

r0e - Execute User Code in Ring0 This small header allows executing functions in your application in ring 0, i.e., with kernel privileges. Use cases i

Nov 11, 2022
nanoplan is a header-only C++11 library for search-based robot planning.
nanoplan is a header-only C++11 library for search-based robot planning.

nanoplan is a header-only C++11 library for search-based robot planning. The primary design goals are correctness, ease-of-use, and efficiency (in tha

May 17, 2022
COBS framing with implicit run-length-encoding, optimized for data containing statistically a bit more 0 and FF bytes in a row, as data often carry 16, 32 or 64 bit numbers with small values.

TCOBS Table of Contents About The project TCOBS Specification TCOBS code Getting Started 4.1. Prerequisites 4.2. Installation 4.3. Usage in Go 4.3.1.

Nov 6, 2022
ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library

ring-span lite: A circular buffer view for C++98 and later Contents Example usage In a nutshell Dependencies Installation Synopsis Reported to work wi

Nov 26, 2022
Step is a C++17, header-only library of STL-like algorithms and data structures
Step is a C++17, header-only library of STL-like algorithms and data structures

Step is a C++17, header-only library of STL-like algorithms and data structures. Installation git clone --depth 1 https://github.com/storm-ptr/step.gi

Sep 1, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Dec 3, 2022