Teaching a drum generator Afro-Cuban patterns

04 March 2024 - 2 mins read time

For the last year and a half, I’ve been learning about various Afro-Cuban musical traditions. With a friend, we’ve studied their musical characteristics deeply. When it came time to brainstorm for my Computer Science thesis, I was so full of ideas. What I were to generate an Afro-Cuban percussion ensemble track using deep learning? What if the generative engine could input a melody and output a drum groove in real time?

Yeah, I was far more ambitious with my goals than a kid with no deep learning experience should be. Regardless, I went hunting for a paper that I could use as a baseline for real-time drum generation, and I found a paper by Behzad Haki et al. that did just that. But to train an Afro-Cuban-influenced model, I needed more Afro-Cuban drum MIDI data than I could find on the internet. So I asked myself: “If Mathew and I can integrate some of the fundamental patterns of the tradition into our rhythmic vocabulary by reading a book, could a deep learning model do the same?”

This question led me to devise a data augmentation algorithm that can infuse Afro-Cuban sensibilities into a MIDI drum dataset. Essentially, the algorithm transforms each individual 2-bar example in the original dataset by randomly swapping out one of its voices with the the same voice from a randomly chosen augmentation seed example. For instance: suppose the example we want to transform is a 2-bar rock back-beat drum pattern. Now, assume the randomly chosen voice to replace is the hi-hat voice; we then replace the rock hi-hat pattern with the hi-hat pattern from a randomly chosen Afro-Cuban seed example. Suppose this replacement hi-hat pattern is the clave; our transformed MIDI file will sound like a rock back-beat, but with the hihat playing the clave. To craft the seed examples, I transcribed all the unique patterns in Frank Malabe and Bob Weiner’s Afro-Cuban Rhythms for Drumset.

Afro-Cuban Rhythms for Drumset

With the Afro-Cuban teaching plan in order, all that was left to do was to code the model detailed in the drum generation paper. I achieved many things by doing this. I learned to use PyTorch to train a deep learning model. I coded up many useful midi editing tools. I expanded my software development skills by working in my largest project yet.

I’ve trained some models with the augmented dataset. I’m currently working on evaluating them, but I’m able to showcase their generated MIDI. This clip is the “tapped rhythm” that’s used as an input for the generator:

And this clip is a drum groove generated by a model trained on the augmented data, prompted by the tapped rhythm:

To me, it sounds like the model has assimilated a sense of syncopation intrinsic to Afro-Cuban music. But what do you think?