Learn Creative Coding (#19) - Sound-Reactive Visuals

Last episode we gave our particles mass. Springs that overshoot and settle, friction that makes things slide, flocking boids that self-organize into something that looks alive. Our sketches feel physical now -- things have weight, momentum, energy. But everything still responds to the mouse and the keyboard. What if visuals could respond to sound instead?

Visuals that react to music. It's one of those things that immediately makes people go "whoa, how does that work?" And the answer is surprisingly accessible. The browser has everything you need built in -- no libraries, no plugins, no external dependencies. Just the Web Audio API that ships with every modern browser.

I used to think audio programming was this mysterious, math-heavy domain reserved for DSP engineers and music tech people. Then I actually tried the Web Audio API and realized... it's just arrays of numbers. You get an array of frequency values, 60 times per second, each one between 0 and 255. Low indices are bass, high indices are treble. You already know how to work with arrays. You already know how to map values to visual properties. So let's make things dance :-)

The Web Audio API: just an array of numbers

The Web Audio API processes audio through a graph of connected nodes -- think of it like a signal chain. For visualization, we really only need one node: an AnalyserNode that gives us frequency data every frame.

let audioContext;
let analyser;
let dataArray;

async function setupAudio() {
  audioContext = new AudioContext();
  analyser = audioContext.createAnalyser();
  analyser.fftSize = 256;  // determines resolution (must be power of 2)

  // frequencyBinCount = fftSize / 2
  dataArray = new Uint8Array(analyser.frequencyBinCount); // 128 bins
}

fftSize controls the resolution of the frequency analysis. FFT stands for Fast Fourier Transform -- it's the algorithm that splits raw audio into its component frequencies, the same way a prism splits white light into a rainbow. 256 gives you 128 frequency bins, which is plenty for visualization. Higher values (1024, 2048) give more frequency detail but update slower. For creative coding, 256 or 512 hits the sweet spot.

Each frame, you grab the current frequency spectrum with one function call:

function getFrequencies() {
  analyser.getByteFrequencyData(dataArray);
  // dataArray now has 128 values, each 0-255
  // index 0 = lowest frequency (sub-bass)
  // index 127 = highest frequency (treble/air)
  return dataArray;
}

That's it. An array of 128 numbers. The value at each index tells you how loud that frequency band is right now. Low indices are bass, high indices are treble. If you've been follwing along since episode 10 where we worked with pixel arrays -- same concept, different data. Instead of RGBA values describing color, we have amplitude values describing sound.

Connecting audio sources

You need to connect an audio source to the analyser before it has anything to analyze. Two main options: an audio file or the microphone.

Loading an audio file:

let audio;

async function loadSong(url) {
  await setupAudio();

  audio = new Audio(url);
  let source = audioContext.createMediaElementSource(audio);
  source.connect(analyser);
  analyser.connect(audioContext.destination); // so we hear it too

  audio.play();
}

Or using the microphone for live input:

async function useMicrophone() {
  await setupAudio();

  let stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  let source = audioContext.createMediaStreamSource(stream);
  source.connect(analyser);
  // DON'T connect to destination -- feedback loop!
}

With microphone input, the visualizer reacts to whatever audio is in the room. Point it at a speaker playing music, clap your hands, play an instrument -- it responds to everything it hears. Live audio-reactive installations at galleries and festivals use exactly this approach. The bigger challenge with microphone input is ambient noise -- in a loud room, your bass detector triggers on conversations instead of music. A direct audio connection is always cleaner.

One important practical thing: modern browsers require a user gesture (click, tap, keypress) before audio can play. Your AudioContext starts in a "suspended" state until the user interacts. Always trigger audio setup from a click handler, and show a "click to start" overlay so people know what to do. Small UX detail, big difference in how polished your piece feels.

First visualizer: frequency bars

The classic. A bar chart where each bar represents a frequency bin, and its height represents how loud that frequency is right now:

function setup() {
  createCanvas(800, 400);
  // audio setup triggered by user click
}

function draw() {
  background(15);

  if (!analyser) return;
  analyser.getByteFrequencyData(dataArray);

  let barWidth = width / dataArray.length;

  for (let i = 0; i < dataArray.length; i++) {
    let value = dataArray[i] / 255;  // normalize to 0-1
    let barHeight = value * height * 0.8;

    // color shifts from blue (bass) to red (treble)
    let hue = map(i, 0, dataArray.length, 200, 360);
    colorMode(HSB, 360, 100, 100);
    fill(hue % 360, 80, 50 + value * 50);
    noStroke();

    rect(i * barWidth, height - barHeight, barWidth - 1, barHeight);
  }
  colorMode(RGB, 255);
}

function mousePressed() {
  useMicrophone(); // or loadSong('track.mp3')
}

Bass on the left, treble on the right. Each bar's height maps to amplitude. The color gradient shifts from blue in the bass range to red in the treble, and brightness increases with volume. It already looks cool, and it's maybe 20 lines of actual logic.

But raw frequency bars are just the starting point. The creative part is deciding what to do with that data.

Splitting into frequency bands

Working with 128 individual frequency bins is overwhelming. For creative control, group them into named bands:

function getBands() {
  analyser.getByteFrequencyData(dataArray);

  let len = dataArray.length;

  // rough splits -- adjust to taste
  let bass = average(dataArray, 0, Math.floor(len * 0.1));
  let lowMid = average(dataArray, Math.floor(len * 0.1), Math.floor(len * 0.3));
  let mid = average(dataArray, Math.floor(len * 0.3), Math.floor(len * 0.5));
  let highMid = average(dataArray, Math.floor(len * 0.5), Math.floor(len * 0.7));
  let treble = average(dataArray, Math.floor(len * 0.7), len);

  return { bass, lowMid, mid, highMid, treble };
}

function average(arr, start, end) {
  let sum = 0;
  for (let i = start; i < end; i++) sum += arr[i];
  return sum / (end - start) / 255;  // normalized 0-1
}

Now instead of "bin 7 is at 184" you can say "bass is at 0.72." Named bands let you think creatively instead of numerically. "Make this circle pulse with the bass." "Shift background color with treble." "Spawn particles when mid-range spikes." Way easier to reason about.

The splits are approximate -- there's no universal standard for where bass ends and midrange begins. The 10%/30%/50%/70% ratios work well for most music but you can adjust them. Heavy electronic music might benefit from a wider bass band. Acoustic music might need more mid-range resolution. Experiment and tune by ear.

Smoothing: making it feel right

Raw audio data is jittery. Play a sustained bass note and the value still flickers between frames because of the FFT's sampling window. For smooth, polished visuals, lerp between frames:

let smoothBass = 0;
let smoothMid = 0;
let smoothTreble = 0;

function draw() {
  let bands = getBands();

  // smooth with lerp -- different speeds per band
  smoothBass = lerp(smoothBass, bands.bass, 0.15);
  smoothMid = lerp(smoothMid, bands.mid, 0.2);
  smoothTreble = lerp(smoothTreble, bands.treble, 0.25);

  // use smoothed values for visuals
}

See what we're doing here? It's the exact lerp-toward-target pattern from episode 16. Each frame, the smoothed value moves a fraction of the remaining distance toward the raw value. The lerp speed controls how responsive vs smooth each band feels.

Bass gets the most smoothing (0.15) because bass hits are big and slow -- a kick drum sustains for a moment, so the visual response should too. Treble gets less smoothing (0.25) because hi-hats and cymbals are sharp transients that need to feel snappy. Mid-range sits in between. This per-band smoothing is the differance between a visualizer that looks twitchy and one that looks polished. Same data, different feel -- just like how different easing speeds changed the character of motion in episode 16.

Beat detection

Knowing the bass level is useful. Knowing when a beat drops is more useful. Simple beat detection tracks when bass exceeds a dynamic threshold:

let beatThreshold = 0.6;
let beatDecay = 0.98;
let isBeat = false;

function detectBeat(bass) {
  // adaptive threshold -- adjusts to the music's loudness
  if (bass > beatThreshold) {
    isBeat = true;
    beatThreshold = bass * 1.1;  // raise threshold after a beat
  } else {
    isBeat = false;
    beatThreshold *= beatDecay;   // gradually lower threshold
    beatThreshold = Math.max(beatThreshold, 0.3); // floor
  }

  return isBeat;
}

The adaptive threshold is key. A fixed threshold fires constantly on loud songs and never on quiet ones. By raising the threshold after each beat detection and decaying it over time, it adapts to whatever's playing. Loud EDM track? The threshold ratchets up. Quiet jazz? It settles low. The decay rate (0.98) controls how quickly it readjusts -- lower values make it more sensitive to changes in loudness, higher values make it more stable.

Use it to trigger visual events:

function draw() {
  let bands = getBands();

  if (detectBeat(bands.bass)) {
    // flash the background
    background(40, 20, 60);

    // spawn a burst of particles
    for (let i = 0; i < 10; i++) {
      particles.push(createParticle(width/2, height/2));
    }
  } else {
    background(10, 10, 15, 30);
  }
}

Remember the particle spawning from episode 11? Same pattern. But instead of spawning on mouse click, we spawn on bass beats. The music becomes the interaction -- every kick drum creates a visual event. Combine this with the physics from last episode and those spawned particles can fly outward with velocity, decelerate with friction, and settle with springs. The beat is the impulse, the physics is the response.

Circular visualizer

Bars are classic but let's build something more interesting. A circular visualizer with a bass-pulsing core, frequency spikes radiating outward, and orbiting dots driven by different bands:

function draw() {
  background(10, 10, 15, 40);  // trail persistence

  if (!analyser) return;

  let bands = getBands();
  smoothBass = lerp(smoothBass, bands.bass, 0.15);

  translate(width / 2, height / 2);

  // pulsing center circle driven by bass
  let baseRadius = 80;
  let pulseRadius = baseRadius + smoothBass * 60;

  fill(20 + smoothBass * 40, 10, 30 + smoothBass * 20);
  noStroke();
  ellipse(0, 0, pulseRadius * 2);

  // frequency ring -- raw spectrum as radial spikes
  analyser.getByteFrequencyData(dataArray);

  noFill();
  strokeWeight(2);

  for (let i = 0; i < dataArray.length; i++) {
    let value = dataArray[i] / 255;
    let angle = (i / dataArray.length) * TWO_PI - HALF_PI;

    let innerR = pulseRadius + 10;
    let outerR = innerR + value * 100;

    let x1 = cos(angle) * innerR;
    let y1 = sin(angle) * innerR;
    let x2 = cos(angle) * outerR;
    let y2 = sin(angle) * outerR;

    stroke(100 + value * 155, 150 + value * 105, 255, 150 + value * 105);
    line(x1, y1, x2, y2);
  }

  // orbiting dots driven by mid and treble
  let orbitR = pulseRadius + 130;
  for (let i = 0; i < 8; i++) {
    let angle = (i / 8) * TWO_PI + frameCount * 0.01;
    let r = orbitR + smoothMid * 30;
    let x = cos(angle) * r;
    let y = sin(angle) * r;

    fill(255, 200 + smoothTreble * 55, 100);
    noStroke();
    ellipse(x, y, 6 + bands.highMid * 15);
  }
}

Three layers, each driven by different audio properties. The center circle pulses with bass -- big, slow, impactful. The frequency ring shows the full spectrum as radial spikes using the same cos(angle) / sin(angle) polar coordinate conversion from episode 13. The orbiting dots respond to mid and treble frequencies. Everything moves together, driven by the same audio source, but each layer listens to a different part of the spectrum.

That semi-transparent background (background(10, 10, 15, 40)) creates the motion trail effect we used in the galaxy project (episode 15). Frequency spikes leave ghostly afterimages that fade over a few frames, giving the whole thing a sense of visual momentum.

Waveform data

Besides frequency data, you can get the raw waveform -- the actual audio signal shape as it oscillates:

let waveform = new Uint8Array(analyser.fftSize);

function drawWaveform() {
  analyser.getByteTimeDomainData(waveform);

  noFill();
  stroke(100, 255, 180);
  strokeWeight(2);

  beginShape();
  for (let i = 0; i < waveform.length; i++) {
    let x = (i / waveform.length) * width;
    let y = (waveform[i] / 255) * height;
    vertex(x, y);
  }
  endShape();
}

The waveform oscillates around the center value of 128 (silence). Loud sounds push it toward 0 or 255. It looks like an oscilloscope display and it's mesmerizing to watch with actual music playing. Where frequency data tells you what frequencies are present, the waveform shows you the actual shape of the sound wave. Different instruments with the same pitch have different waveforms -- that's what gives them their unique timbre.

You could draw this waveform as a circle instead of a line (using polar coordinates from episode 13), or displace a grid of particles by the waveform values, or use it as a Bezier control point path like we explored in episode 14. The waveform is just data -- what you map it to is entirely your creative choice.

Mapping strategies: the creative part

Setting up Web Audio is a one-time thing. The real creative work is deciding which audio properties drive which visual properties. Some mappings feel natural because they match how our brains already associate sound and sight:

Audio property	Visual property
Bass level	Circle radius, background brightness, shake intensity
Bass beat	Flash, particle burst, color shift, screen shake
Mid-range	Rotation speed, line thickness, shape complexity
Treble	Small detail jitter, sparkle, grain amount, brightness
Overall volume	Global opacity, zoom level, color saturation
Waveform	Direct shape drawing, mesh displacement

The principle: big, slow audio features drive big, slow visual changes. Small, fast audio features drive small, fast visual details. Bass is heavy and impactful -- it should move large things. Treble is sharp and bright -- it should add texture and sparkle. Mid-range is where melody lives -- it should drive aesthetic properties like color and form.

The best audio visualizers layer multiple mappings. Bass drives overall scale. Mid-range drives color. Treble drives particle emission rate. Amplitude drives opacity. Each audio band influences a different visual dimension, creating a rich, multi-layered response that feels deeply connected to the music.

A note about latency

There's always a delay between the audio event and the visual response. The FFT needs a buffer of samples before it can compute frequencies -- at fftSize=256 and a sample rate of 44100Hz, that's about 6ms. Add rendering time and display latency and you're at maybe 20-40ms total. For most music this is imperceptible.

But for sharp transients like drum hits, the visual might lag slightly behind the sound. The fix: use waveform amplitude detection (which has lower latency than FFT) for triggering instant visual events like flashes and particle bursts, and use FFT frequency data for continuous properties like color and size. This hybrid approach gives you both responsiveness and richness. Professional VJ software uses exactly this strategy.

CORS: the one annoying gotcha

If you load an audio file from a different domain than your sketch, the browser blocks audio analysis due to CORS (Cross-Origin Resource Sharing) policy. The AnalyserNode gets data but it's all zeros. Solutions:

Host audio on the same server as your sketch
Use local files with a local dev server (python3 -m http.server works great)
Use the microphone instead -- no CORS issues at all
Use a CORS proxy for testing (not for production)

p5.js's loadSound() handles some CORS cases automatically, but with vanilla Canvas you'll hit this wall. The microphone approach sidesteps it entirely, which is why a lot of live installations just use a mic pointed at the speakers.

Putting it all together: a sound-reactive particle field

Allez, let's combine everything. Particles that respond to audio with physics -- bass makes them expand outward, beats spawn new bursts, treble adds jitter, and everything smooths with lerp:

let particles = [];
let smoothBass = 0, smoothMid = 0, smoothTreble = 0;

function setup() {
  createCanvas(800, 600);
  colorMode(HSB, 360, 100, 100, 100);

  for (let i = 0; i < 200; i++) {
    particles.push({
      x: width/2 + random(-50, 50),
      y: height/2 + random(-50, 50),
      vx: 0, vy: 0,
      baseHue: random(180, 280),
      size: random(2, 5)
    });
  }
}

function draw() {
  background(0, 0, 5, 15);

  if (!analyser) {
    fill(0, 0, 80);
    textAlign(CENTER);
    textSize(18);
    text('click to start audio', width/2, height/2);
    return;
  }

  let bands = getBands();
  smoothBass = lerp(smoothBass, bands.bass, 0.15);
  smoothMid = lerp(smoothMid, bands.mid, 0.2);
  smoothTreble = lerp(smoothTreble, bands.treble, 0.25);

  let beat = detectBeat(bands.bass);

  // spawn burst on beat
  if (beat) {
    for (let i = 0; i < 8; i++) {
      let angle = random(TWO_PI);
      let speed = random(3, 8);
      particles.push({
        x: width/2, y: height/2,
        vx: cos(angle) * speed,
        vy: sin(angle) * speed,
        baseHue: random(180, 280),
        size: random(3, 7)
      });
    }
  }

  for (let i = particles.length - 1; i >= 0; i--) {
    let p = particles[i];

    // bass pushes particles outward from center
    let dx = p.x - width/2;
    let dy = p.y - height/2;
    let dist = Math.sqrt(dx * dx + dy * dy);
    if (dist > 0) {
      p.vx += (dx / dist) * smoothBass * 0.3;
      p.vy += (dy / dist) * smoothBass * 0.3;
    }

    // treble adds jitter
    p.vx += (random(-1, 1)) * smoothTreble * 0.5;
    p.vy += (random(-1, 1)) * smoothTreble * 0.5;

    // gentle pull back toward center (spring)
    p.vx += (width/2 - p.x) * 0.001;
    p.vy += (height/2 - p.y) * 0.001;

    // friction
    p.vx *= 0.97;
    p.vy *= 0.97;

    p.x += p.vx;
    p.y += p.vy;

    // remove particles that drifted far off
    if (p.x < -100 || p.x > width + 100 ||
        p.y < -100 || p.y > height + 100) {
      particles.splice(i, 1);
      continue;
    }

    let hue = (p.baseHue + smoothMid * 60) % 360;
    let brightness = 60 + smoothBass * 40;
    fill(hue, 70, brightness, 80);
    noStroke();
    ellipse(p.x, p.y, p.size + smoothBass * 3);
  }

  // keep particle count reasonable
  while (particles.length > 500) particles.shift();
}

function mousePressed() {
  useMicrophone();
}

There's a lot going on here but every piece is something we've already built in a previous episode. The particles use the physics loop from episode 18 -- velocity, friction, position update. The gentle center pull is a weak spring. The beat detection spawns bursts using polar coordinates from episode 13 (cos(angle) * speed, sin(angle) * speed). The smoothed audio values use the lerp-toward-target pattern from episode 16. The semi-transparent background creates trails like episode 15's galaxy.

That's the beautiful thing about creative coding at this point in the series -- new topics don't replace what you know, they combine with it. Audio data is just another input source. The rendering, the physics, the math -- it's all the same tools we've been sharpening since episode 10.

Where this goes

Audio-reactive coding is one of those areas where the floor is low (bars bouncing to music, done in 20 lines) but the ceiling is incredibly high. VJ artists perform at festivals with custom visuals driven by live audio input. Generative music videos are rendered in real-time. Interactive installations in galleries respond to ambient sound or visitor-generated noise. The visual side can get as sophisticated as your imagination allows -- you could drive shader parameters with audio data, or feed frequency values into noise functions, or let beats trigger state machine transitions like we built in episode 17.

And once you've built something that looks good reacting to music in real-time, you'll probably want to capture it. Turn it into a video, export frames as a PNG sequence, record a GIF to share. There are techniques for that too, and we'll get there soon :-)

't Komt erop neer...

The Web Audio API's AnalyserNode gives you frequency data as an array of 0-255 values, 60 times per second
Low indices = bass, high indices = treble -- the FFT splits audio into its component frequencies
Group frequencies into named bands (bass, mid, treble) for easier creative mapping
Smooth raw values with lerp -- different smoothing speeds per band for different response feels
Beat detection uses an adaptive threshold that rises on hits and decays over time
Waveform data (getByteTimeDomainData) gives you the raw audio signal shape -- different from frequency data
Map big/slow audio features to big/slow visual changes, small/fast to small/fast details
Use amplitude for instant triggers (flashes, bursts) and FFT for continuous properties (color, size)
Always start audio from a user gesture -- browsers require it
CORS blocks cross-origin audio analysis -- host files locally or use the microphone

Phase 3 is almost wrapped up. We've covered smooth motion (lerp and easing), structured behavior (state machines), physical simulation (springs, friction, flocking), and now audio-reactive input. Our sketches can move naturally, respond to sound, and feel alive. But so far everything we've made lives only in the browser window -- what about sharing it? Capturing your running sketch as a GIF, a PNG sequence, or a proper video file is its own skill, and it's coming up next.

Sallukes! Thanks for reading.

@femdev