Pitch Shifting via Sample-Rate Conversion

with 2 Comments

Pitch shifting via sample-rate conversion is an effective way to change the pitch of a digital audio sample. This method is often used in samplers, including software samplers (such as the EXS24 in Logic Pro). It works by resampling the signal in the digital domain. The audio playback speed is adjusted, changing the pitch and tempo together. The output sampling rate remains constant.

Aliasing can occur when shifting up or down in non-octave intervals due to the resampling process. Using a high-quality interpolation scheme will help alleviate this. For example, the diskin2 opcode in Csound allows for ≥8-point sinc interpolation with anti-aliasing. However, it increases the processing time.

A 10-second linear sine-wave sweep from 200Hz to 12kHz, sampled at 96kHz. A) Original (200Hz-12kHz), B) transposed up a semi-tone with no interpolation, C) linear interpolation, D) cubic interpolation, E) 8-point sinc interpolation with anti-aliasing, F) 32-point sinc interpolation with anti-aliasing, G) 64-point sinc interpolation with anti-aliasing.
Same as the previous but down a semi-tone.

When shifting up by octave intervals, aliasing is not as problematic. The only thing to watch out for is shifting the signal so it doesn’t go above the Nyquist frequency. In such cases, aliasing will occur, and an interpolation scheme that provides anti-aliasing may only be partially successful in eliminating the unwanted signal that folds back into the spectrum.

A 10-second linear sine-wave sweep from 200Hz to 12kHz, sampled at 96kHz. A) Original (200Hz-12kHz), B) transposed up an octave (400Hz-24kHz) with no interpolation, C) transposed up 3 octaves with no interpolation or anti-aliasing (note that it would be 1600Hz-96kHz, but once it hits the Nyquist frequency at 48kHz, it folds back into the frequency spectrum due to aliasing), D) the same as the previous, but with 64-point sinc interpolation with anti-aliasing (note that the unwanted signal is only partially removed).

Shifting down by octave intervals will cause aliasing. The solution is as described above.

A 2-second linear sine-wave sweep from 200Hz to 12kHz, sampled at 96kHz. A) Original (200Hz-12kHz), B) transposed down an octave with no interpolation, C) linear interpolation, D) cubic interpolation, E) 8-point sinc interpolation with anti-aliasing, F) 32-point sinc interpolation with anti-aliasing, G) 64-point sinc interpolation with anti-aliasing. Then repeat B) through G) for two octaves down and again for three octaves down.

Note: This form of pitch shifting can also essentially be accomplished by transferring data from a sound file into a function table and reading it via a table-lookup opcode in Csound. In terms of quality, loscilx has all the interpolation schemes available in diskin2, and the results appear to be exactly the same (all the previously-mentioned pitch shifts were also rendered using the loscilx opcode, in addition to the diskin2 opcode, and all the resulting spectrograms are exactly like those pictured above).

Note: You can change the interpolation quality in the EXS24 software sampler in Logic Pro. On the EXS24 interface, click the button marked “edit.” In the dialog box that pops up, got to Edit–>Preferences…–>Sample Rate Conversion: Best or Normal. According to the documentation, this deals with the sample-rate conversion algorithm, but my tests have not shown any difference between the two settings.

10-second linear sine-wave sweep from 200Hz to 12kHz, sampled at 96kHz. A) Original, B) up a semi-tone, C) down a semi-tone. Logic Pro’s sample rate conversion was in “Normal” mode.
Same as the previous. Logic Pro’s sample rate conversion was in “Best” mode.

Note: The speed effect in conjunction with the rate effect in SoX allows pitch shifting via sample-rate conversion. The level of quality is programmable. See this post.

2 Responses

  1. Garth Ferris
    | Reply

    This is very informative, and quite on-point to me as a producer. What has lead me to your article was searching for how i can adjust only the sample rate of an existing .wav file without attempting to retain the pitch – (192KHz recording of bird sounds, I seek to slow/4, or 192KHz/4=48KHz (frequency/4 & duration*4 expected). Was also thinking that it may be possible via editing the header on the sample .wav. Thanks if you see this!

    • John
      | Reply

      Glad it was useful!

Leave a Reply