Voice over: WAV mono 48 kHz/24-bit and clear file names

Voice over file format is about what audio file you get (e.g., WAV or MP3), in which technical quality (sample rate/bit depth) and how versions/files are labeled so that they can be inserted into your production without guessing.

The essentials in brief

Ask for WAV for anything that will go into the mix/edit. MP3 is only suitable for quick listening.
Always match the project's sample rate/bit depth (usually 48 kHz/24-bit for video). Otherwise you’ll incur unnecessary conversions and sometimes glitches.
Decide file names and versions before recording: “final”, “alt”, “clean”, “timed”, “pickup” should mean the same thing for everyone.

File format: what you actually should request

If you are going to cut, sync or mix, you should in practice always request WAV. It is uncompressed and handles post-processing without you embedding artifacts from the start.

MP3 may be okay as a reference (to approve takes or text), but I would not build a delivery or a mix chain on MP3 if avoidable. It quickly becomes a debugging issue: “is it the room, the voice or the compression?”

Here's how I usually write in the order when I want to avoid unnecessary back-and-forth:

Master: WAV (mono), 48 kHz, 24-bit
Reference: MP3, 320 kbps (optional)

Mono is normally right for voice over. Stereo yields larger files and more opportunities for something to go wrong in the chain. Exceptions exist (e.g., if you explicitly want a stereo file with room/effect), but the default is mono.

Levels: so you avoid tearing apart your mix

Levels are where many productions encounter unnecessary problems. Either the file comes in too loud (clipping) or it's so low that someone boosts by 20 dB and ends up bringing in noise and room.

What you want is a file recorded with good headroom and not “hard-limited” to sound loud. In practice:

No clipping. Not “just a little”.
Even level, but not over-compressed.
Headroom so you can mix against music/sound without everything already hitting the ceiling.

If you must specify a number: request peaks around -6 dBFS and a reasonable loudness for speech (not maxed). The important thing is not the exact LUFS in raw VO, but that the material is clean and usable.

A recurring project problem is someone delivering “finished radio” voice level, and then it has to go into a film where you need dynamic range and space for music. Then you have to back out what’s already done. Therefore: prefer a clean delivery (and if you need a “broadcast” variant, request an extra version).

Versions: what “final” means when things get stressful

Versions are not technical. They are logistics. What usually happens in production is you have three script versions, two different pronunciations of a product name and a client who approves something in an email that no one else sees.

To avoid it you need a simple standard for what is delivered:

Clean: edited VO without music/SFX (the one you want 9 out of 10 times)
Timed: the same VO but time-aligned to picture/length if you need exact timing
Alt: alternative reading (different emphasis/tempo) as a separate file
Pickup: supplementary recordings, clearly labeled per line/timecode

When someone says “can you just send the final?” it’s often unclear if they mean “latest recording” or “the version that is approved.” I recommend reserving the word final for “approved for publication,” and using v01, v02… for working versions.

File naming and structure: dull, but it saves hours

This is one of the most concrete time-saving measures I see in real projects: clear file names and a folder structure that matches your editing/mixing.

A simple standard that works in most teams:

Project_Client_Language_Scriptv_VO_name_Type_Version.wav
Example: ACME_Film_SE_Mv3_AnnaKarlsson_Clean_v02.wav

If you work with many short assets (e.g., e-learning, IVR, ads) it’s often better to deliver per line/ID:

ACME_IVR_SE_L001_v01.wav
ACME_IVR_SE_L002_v01.wav

The point is that the technology and production should be able to see at a glance what the file is, without opening it. This reduces misusage, especially when deadlines are tight.

Process / checklist

First decide: where will the VO be used (film, web, radio, phone, e-learning)? This determines 48/44.1 kHz and how tight the timing needs to be.
Write in the order: “Delivery: WAV mono, 48 kHz/24-bit, clean + possibly timed/alt.”
Agree on versioning logic: v01/v02 for working versions, “final” only when approved.
Decide file names before recording and keep them consistent. Include the script version in the file name if possible.
Ask for both clean and an MP3 reference if you want to be able to listen quickly without heavy files.
Check on delivery: file format, sample rate, mono/stereo, that nothing clips, and that the file names match the script/shot list.

Next steps

If you want to avoid technical surprises: send your spec before recording (format, sample rate, mono/stereo, versions and file names). If you are unsure what fits your chain, send what you cut/mix in (e.g., “Premiere 48k”) so you can lock the delivery immediately. Contact path is here: contact. Want to know how I typically work in practice with deliveries and versions: about.

FAQ

Can you deliver in MP3 instead of WAV?

Yes, but tell me if MP3 is intended as the final format or just for listening. For editing/mixing: use WAV. For quick approval: MP3 works.

We have received files that cannot be imported or that sound “odd.” What is that usually due to?

The most common is incorrect sample rate (44.1 in a 48 project), wrong channel format (stereo when you expect mono), or that the file has been converted several times. Write 48 kHz/24-bit mono in the spec and stick to WAV so this almost always reduces.

Should I request 44.1 kHz or 48 kHz?

Video/film: almost always 48 kHz. Music release: often 44.1 kHz. Match what the project is set to from the start.

What is the difference between “clean” and “raw”?

Raw is essentially an untouched recording. Clean is edited (pauses, clicks, breaths as needed) but without music/SFX. For production, clean is usually what you want.

How do you handle pickups so that we don’t accidentally place the wrong line?

Pickup files are labeled per line/ID or with a clear note in the file name, and delivered separately from the main file. That way you can place them in the right spot without scanning through a long file.

Read more:

Voice Over File Delivery: WAV 48 kHz/24-bit and Naming