STFT based multi pitch shifting with optional formant preservation in C++ and Python

stftPitchShift

language language license build cpp build python tag pypi

This is a reimplementation of the Stephan M. Bernsee smbPitchShift.cpp, a pitch shifting algorithm using the Short-Time Fourier Transform (STFT), especially for vocal audio signals.

This repository features two analogical algorithm implementations, C++ and Python. Both contain several function blocks of the same name (but different file extension, of course).

In addition to the base algorithm implementation, it also features spectral multi pitch shifting and cepstral formant preservation extensions.

Both sources contain a ready-to-use command line tool as well as a library for custom needs. See more details in the build section.

Modules

StftPitchShift

The StftPitchShift module provides a full-featured audio processing chain to perform the pitch shifting of a single audio track, based on the built in STFT implementation.

Exclusively in the C++ environment the additional StftPitchShiftCore module can be used to embed this pitch shifting implementation in an existing real-time STFT pipeline.

Vocoder

The Vocoder module transforms the DFT spectral data according to the original algorithm, which is actually the instantaneous frequency estimation technique. See also further reading for more details.

The particular encode function replaces the input DFT values by the magnitude + j * frequency complex numbers, representing the phase error based frequency estimation in the imaginary part.

The decode function does an inverse transformation back to the original DFT complex numbers, by replacing eventually modified frequency value by the reconstructed phase value.

Pitcher

The Pitcher module performs single or multi pitch shifting of the encoded DFT frame depending on the specified fractional factors.

Resampler

The Resampler module provides the linear interpolation routine, to actually perform pitch shifting, based on the Vocoder DFT transform.

Cepster

The Cepster module estimates a spectral envelope of the DFT magnitude vector, representing the vocal tract resonances. This computation takes place in the cepstral domain by applying a low-pass filter. The cutoff value of the low-pass filter or lifter is the quefrency value to be specified in seconds or milliseconds.

Normalizer

The Normalizer module optionally performs a RMS normalization right after pitch shifting relative to the original signal to get about the same loudness level. This correction takes place in the frequency domain each DFT frame separately.

STFT

As the name of this module already implies, it performs the comprehensive STFT analysis and synthesis steps.

Pitch shifting

Single pitch

Since the Vocoder module transforms the original DFT complex values real + j * imag into magnitude + j * frequency representation, the single pitch shifting is a comparatively easy task. Both magnitude and frequency vectors are to be resampled according to the desired pitch shifting factor:

  • The factor 1 means no change.
  • The factor <1 means downsampling.
  • The factor >1 means upsampling.

Any fractional resampling factor such as 0.5 requires interpolation. In the simplest case, linear interpolation will be sufficient. Otherwise, bilinear interpolation can also be applied to smooth values between two consecutive STFT hops.

Due to frequency vector alteration, the resampled frequency values needs also be multiplied by the resampling factor.

Multi pitch

In terms of multi pitch shifting, multiple differently resampled magnitude and frequency vectors are to be combined together. For example, the magnitude vectors can easily be averaged. But what about the frequency vectors?

The basic concept of this algorithm extension is to only keep the frequency value of the strongest magnitude value. So the strongest magnitude will mask the weakest one. Thus, all remaining masked components become inaudible.

In this way, the multi pitch shifting can be performed simultaneously in the same DFT frame. There is no need to build a separate STFT pipeline for different pitch variations to superimpose the synthesized signals in the time domain.

Formant preservation

The pitch shifting also causes distortion of the original vocal formants, leading to a so called Mickey Mouse effect if scaled up. One possibility to reduce this artifact, is to exclude the formant feature from the pitch shifting procedure.

The vocal formants are represented by the spectral envelope, which is given by the smoothed DFT mangitude vector. In this implementation, the smoothing of the DFT mangitude vector takes place in the cepstral domain by low-pass liftering. The extracted envelope is then removed from the original DFT magnitude. The remaining residual or excitation signal goes through the pitch shifting algorithm. After that, the previously extracted envelope is combined with the processed residual.

Build

C++

Use CMake to manually build the C++ library, main and example programs like this:

cmake -S . -B build
cmake --build build

Or alternatively just get the packaged library from:

To include this library in your C++ audio project, study the minimal C++ example in the examples folder:

#include <StftPitchShift/StftPitchShift.h>

using namespace stftpitchshift;

StftPitchShift pitchshifter(1024, 256, 44100);

std::vector<float> x(44100);
std::vector<float> y(x.size());

pitchshifter.shiftpitch(x, y, 1);

Optionally specify following CMake options for custom builds:

  • -DBUILD_SHARED_LIBS=ON to enable a shared library build,
  • -DVCPKG=ON to enable the vcpkg compatible library only build without executables,
  • -DDEB=ON to enable the deb package build for library and main executable.

Python

The Python program stftpitchshift can be installed via pip install stftpitchshift.

Also feel free to explore the Python class StftPitchShift in your personal audio project:

from stftpitchshift import StftPitchShift

pitchshifter = StftPitchShift(1024, 256, 44100)

x = [0] * 44100
y = pitchshifter.shiftpitch(x, 1)

Usage

Both programs C++ and Python provides a similar set of command line options:

-h  --help       print this help
    --version    print version number

-i  --input      input .wav file name
-o  --output     output .wav file name

-p  --pitch      fractional pitch shifting factors separated by comma
                 (default 1.0)

-q  --quefrency  optional formant lifter quefrency in milliseconds
                 (default 0.0)

-r  --rms        enable spectral rms normalization

-w  --window     sfft window size
                 (default 1024)

-v  --overlap    stft window overlap
                 (default 32)

-c  --chrono     enable runtime measurements
                 (only available in the C++ version)

-d  --debug      plot spectrograms before and after processing
                 (only available in the Python version)

Currently only .wav files are supported. Please use e.g. Audacity or SoX to prepare your audio files for pitch shifting.

To apply multiple pitch shifts at once, separate each factor by a comma, e.g. -p 0.5,1,2. Alternatively specify pitch shifting factors as semitones denoted by the + or - prefix, e.g. -p -12,0,+12. For precise pitch corrections append the number of cents after semitones, e.g. -p -11-100,0,+11+100.

To enable the formant preservation feature specify a suitable quefrency value in milliseconds. Depending on the source signal, begin with a small value like -q 1. Generally, the quefrency value has to be smaller than the fundamental period, as reciprocal of the fundamental frequency, of the source signal.

At the moment the formant preservation doesn't seem to work well along with the multi pitch shifting and smaller pitch shifting factors. Further investigation is therefore necessary...

Further reading

Instantaneous frequency estimation

Cepstrum analysis and formant changing

Credits

License

stftPitchShift is licensed under the terms of the MIT license. For details please refer to the accompanying LICENSE file distributed with stftPitchShift.

Comments
Fuses IMU readings with a complementary filter to achieve accurate pitch and roll readings.
Fuses IMU readings with a complementary filter to achieve accurate pitch and roll readings.

SimpleFusion A library that fuses accelerometer and gyroscope readings quickly and easily with a complementary filter. Overview This library combines

May 13, 2022
Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU.
Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU.

Unicorn Engine Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU. Unicorn offers some unparalleled fe

Nov 7, 2021
CRC32 slice-by-16 implementation in JS with an optional native binding to speed it up even futher

CRC32 slice-by-16 implementation in JS with an optional native binding to speed it up even futher. When used with Webpack/Browserify etc, it bundles the JS version.

Aug 4, 2021
Fully resizing juce peak meter module with optional fader overlay.
Fully resizing juce peak meter module with optional fader overlay.

Sound Meter Juce peak meter module with optional fader overlay. by Marcel Huibers | Sound Development 2021 | Published under the MIT License Features:

Jun 8, 2022
New version of the well known ESP32 Radio. Now optional I2S output!

ESP32Radio-V2 New version of the well known ESP32 Radio. Now optional I2S output! Compile time configuration in config.h. Do not forget to upload the

Jun 12, 2022
DLL Exports Extraction BOF with optional NTFS transactions.
DLL Exports Extraction BOF with optional NTFS transactions.

DLL Exports Extraction BOF What is this? This is a Cobalt Strike BOF file, meant to use two or three arguments (path to DLL, and/or a third argument [

May 30, 2022
Zep - An embeddable editor, with optional support for using vim keystrokes.
Zep - An embeddable editor, with optional support for using vim keystrokes.

Zep - A Mini Editor Zep is a simple embeddable editor, with a rendering agnostic design and optional Vim mode. It is built as a shared modern-cmake li

Jun 18, 2022
Multi-dimensional dynamically distorted staggered multi-bandpass LV2 plugin
Multi-dimensional dynamically distorted staggered multi-bandpass LV2 plugin

B.Angr A multi-dimensional dynamicly distorted staggered multi-bandpass LV2 plugin, for extreme soundmangling. Based on Airwindows XRegion. Key featur

Apr 17, 2022
A simple C++ library with multi language interfaces (Java, NodeJS, Python...)

cpp-to-x A simple C++ library with multi language interfaces (Java, NodeJS, Python...) Why This is just a learning experiment to see how you can write

Nov 25, 2021
FastPath_MP: An FPGA-based multi-path architecture for direct access from FPGA to NVMe SSD

FastPath_MP Description This repository stores the source code of FastPath_MP, an FPGA-based multi-path architecture for direct access from FPGA to NV

Jun 15, 2022
A multi-bank MRAM based memory card for Roland instruments
A multi-bank MRAM based memory card for Roland instruments

Roland compatible multi-bank MRAM memory card (click to enlarge) This is a replacement memory card for old Roland instruments of the late 80s and earl

Jun 18, 2022
An extremely basic Python script to split word-based data into high and low byte files.

An extremely basic Python script to split word-based data into high and low byte files. This is for use in programming 16 bit computer ROMs.

Dec 26, 2021
PikaScript is an ultra-lightweight Python engine with zero dependencies and zero-configuration, that can run with 4KB of RAM (such as STM32G030C8 and STM32F103C8), and is very easy to deploy and expand.
PikaScript is an ultra-lightweight Python engine with zero dependencies and zero-configuration, that can run with 4KB of RAM (such as STM32G030C8 and STM32F103C8), and is very easy to deploy and expand.

PikaScript 中文页| Star please~ 1. Abstract PikaScript is an ultra-lightweight Python engine with zero dependencies and zero-configuration, that can run

Jun 16, 2022
C/C++ language server supporting multi-million line code base, powered by libclang. Emacs, Vim, VSCode, and others with language server protocol support. Cross references, completion, diagnostics, semantic highlighting and more

Archived cquery is no longer under development. clangd and ccls are both good replacements. cquery cquery is a highly-scalable, low-latency language s

Jun 20, 2022
Command line C++ and Python VSTi Host library with MFCC, FFT, RMS and audio extraction and .wav writing.
Command line C++ and Python VSTi Host library with MFCC, FFT, RMS and audio extraction and .wav writing.

______ _ ___ ___ | ___ \ | | | \/ | | |_/ /___ _ __ __| | ___ _ __| . . | __ _ _ __

Jun 18, 2022
Robust multi-prompt delimited control and effect handlers in C/C++
Robust multi-prompt delimited control and effect handlers in C/C++

libmprompt Note: The library is under development and not yet complete. This library should not be used in production code. Latest release: v0.2, 2021

Jun 6, 2022
A Multi-sensor Fusion Odometry via Smoothing and Mapping.
A Multi-sensor Fusion Odometry via Smoothing and Mapping.

LVIO-SAM A multi-sensor fusion odometry, LVIO-SAM, which fuses LiDAR, stereo camera and inertial measurement unit (IMU) via smoothing and mapping. The

Jun 11, 2022
A generic and robust calibration toolbox for multi-camera systems
A generic and robust calibration toolbox for multi-camera systems

MC-Calib Toolbox described in the paper "MultiCamCalib: A Generic Calibration Toolbox for Multi-Camera Systems". Installation Requirements: Ceres, Boo

Jun 13, 2022
Tsdf-plusplus - TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

Jun 21, 2022