The StftPitchShift module provides a full-featured audio processing chain to perform the pitch shifting of a single audio track, based on the built in STFT implementation.
Exclusively in the C++ environment the additional StftPitchShiftCore module can be used to embed this pitch shifting implementation in an existing real-time STFT pipeline.
The Vocoder module transforms the DFT spectral data according to the original algorithm, which is actually the instantaneous frequency estimation technique. See also further reading for more details.
encode function replaces the input DFT values by the
magnitude + j * frequency complex numbers, representing the phase error based frequency estimation in the imaginary part.
decode function does an inverse transformation back to the original DFT complex numbers, by replacing eventually modified frequency value by the reconstructed phase value.
The Pitcher module performs single or multi pitch shifting of the encoded DFT frame depending on the specified fractional factors.
The Resampler module provides the
linear interpolation routine, to actually perform pitch shifting, based on the Vocoder DFT transform.
The Cepster module estimates a spectral envelope of the DFT magnitude vector, representing the vocal tract resonances. This computation takes place in the cepstral domain by applying a low-pass filter. The cutoff value of the low-pass filter or lifter is the quefrency value to be specified in seconds or milliseconds.
The Normalizer module optionally performs a RMS normalization right after pitch shifting relative to the original signal to get about the same loudness level. This correction takes place in the frequency domain each DFT frame separately.
As the name of this module already implies, it performs the comprehensive STFT analysis and synthesis steps.
Since the Vocoder module transforms the original DFT complex values
real + j * imag into
magnitude + j * frequency representation, the single pitch shifting is a comparatively easy task. Both
frequency vectors are to be resampled according to the desired pitch shifting factor:
- The factor
1means no change.
- The factor
- The factor
Any fractional resampling factor such as
0.5 requires interpolation. In the simplest case, linear interpolation will be sufficient. Otherwise, bilinear interpolation can also be applied to smooth values between two consecutive STFT hops.
Due to frequency vector alteration, the resampled frequency values needs also be multiplied by the resampling factor.
In terms of multi pitch shifting, multiple differently resampled
frequency vectors are to be combined together. For example, the magnitude vectors can easily be averaged. But what about the frequency vectors?
The basic concept of this algorithm extension is to only keep the frequency value of the strongest magnitude value. So the strongest magnitude will mask the weakest one. Thus, all remaining masked components become inaudible.
In this way, the multi pitch shifting can be performed simultaneously in the same DFT frame. There is no need to build a separate STFT pipeline for different pitch variations to superimpose the synthesized signals in the time domain.
The pitch shifting also causes distortion of the original vocal formants, leading to a so called Mickey Mouse effect if scaled up. One possibility to reduce this artifact, is to exclude the formant feature from the pitch shifting procedure.
The vocal formants are represented by the spectral envelope, which is given by the smoothed DFT mangitude vector. In this implementation, the smoothing of the DFT mangitude vector takes place in the cepstral domain by low-pass liftering. The extracted envelope is then removed from the original DFT magnitude. The remaining residual or excitation signal goes through the pitch shifting algorithm. After that, the previously extracted envelope is combined with the processed residual.
Use CMake to manually build the C++ library, main and example programs like this:
cmake -S . -B build cmake --build build
Or alternatively just get the packaged library from:
To include this library in your C++ audio project, study the minimal C++ example in the examples folder:
#include <StftPitchShift/StftPitchShift.h> using namespace stftpitchshift; StftPitchShift pitchshifter(1024, 256, 44100); std::vector<float> x(44100); std::vector<float> y(x.size()); pitchshifter.shiftpitch(x, y, 1);
Optionally specify following CMake options for custom builds:
-DBUILD_SHARED_LIBS=ONto enable a shared library build,
-DVCPKG=ONto enable the vcpkg compatible library only build without executables,
-DDEB=ONto enable the deb package build for library and main executable.
The Python program
stftpitchshift can be installed via
pip install stftpitchshift.
Also feel free to explore the Python class
StftPitchShift in your personal audio project:
from stftpitchshift import StftPitchShift pitchshifter = StftPitchShift(1024, 256, 44100) x =  * 44100 y = pitchshifter.shiftpitch(x, 1)
Both programs C++ and Python provides a similar set of command line options:
-h --help print this help --version print version number -i --input input .wav file name -o --output output .wav file name -p --pitch fractional pitch shifting factors separated by comma (default 1.0) -q --quefrency optional formant lifter quefrency in milliseconds (default 0.0) -r --rms enable spectral rms normalization -w --window sfft window size (default 1024) -v --overlap stft window overlap (default 32) -c --chrono enable runtime measurements (only available in the C++ version) -d --debug plot spectrograms before and after processing (only available in the Python version)
To apply multiple pitch shifts at once, separate each factor by a comma, e.g.
-p 0.5,1,2. Alternatively specify pitch shifting factors as semitones denoted by the + or - prefix, e.g.
-p -12,0,+12. For precise pitch corrections append the number of cents after semitones, e.g.
To enable the formant preservation feature specify a suitable quefrency value in milliseconds. Depending on the source signal, begin with a small value like
-q 1. Generally, the quefrency value has to be smaller than the fundamental period, as reciprocal of the fundamental frequency, of the source signal.
At the moment the formant preservation doesn't seem to work well along with the multi pitch shifting and smaller pitch shifting factors. Further investigation is therefore necessary...
Instantaneous frequency estimation
- Fundamentals of Music Processing by Meinard Müller (section 8.2.1 in the second edition or online)
- Digital Audio Effects by Udo Zölzer (sections 7.3.1 and 7.3.5 in the second edition)
Cepstrum analysis and formant changing
- Digital Audio Effects by Udo Zölzer (sections 8.2.3 and 8.3.2 in the second edition)
- Discrete-Time Signal Processing by Oppenheim & Schafer (chapter 13 in the third edition)
stftPitchShift is licensed under the terms of the MIT license. For details please refer to the accompanying LICENSE file distributed with stftPitchShift.