An audiobook is not a long commercial. It is 8 hours of concentration compressed into a single voice that must never lose its tempo. The difference from short voice over formats is fundamental: every minute of an audiobook requires the voice to maintain the same level of engagement and clarity as the first. Here is the process, the pricing models and the technical requirements — from manuscript to finished master.
Key points at a glance
- Audiobook voice over is a long-form format where consistency over time matters more than individual takes.
- The recording process is chapter-based. You do not record linearly from cover to cover but work in structured blocks.
- The pricing model is typically per finished hour (PFH), not per studio hour.
- The ratio of studio time to finished time is roughly 2:1 to 3:1 for straight narration, higher for character voices.
- Technical requirements vary between distributors. ACX (Audible) has specific specifications. Storytel and Bookbeat have others.
- Human voice still dominates the audiobook market despite AI synthesis. Listeners notice the difference within minutes.
- Author involvement in the process — pronunciation guide, tone direction, review — affects the final result more than most people expect.
- Straight narration (one voice, no dialogue) and narration with character voices are two entirely different assignments.
How audiobooks differ from short voice over formats
A 30-second commercial is a sprint. You deliver maximum energy for half a minute and then you are done. A 3-minute corporate film demands steady pacing but is over before the voice has time to fatigue. A 20-minute e-learning module tests your consistency but has natural breaks between sections.
An audiobook is a marathon. A typical novel runs 8–12 finished hours. Non-fiction can be 4–8 hours. Children's books are shorter but often demand more character work per minute.
That changes everything.
Consistency becomes the central challenge. The listener hears your voice for hours. If the pacing shifts, if the energy drops, if the tone changes between recording days — they notice. Not consciously, but as a feeling that "something is off." It pulls them out of the story.
Fatigue is a factor that does not exist in short formats. After three hours in the recording booth, the voice changes. The vocal cords dry out, the diaphragm muscles tire, concentration drops. This affects not only quality but also pacing. An experienced audiobook narrator knows when to stop — not when it starts feeling difficult, but before it starts sounding different.
The margin for error shrinks. In a commercial you can do twenty takes of one line and pick the best. In an audiobook you cannot redo every sentence. You need to deliver consistent quality in the first or second take, hour after hour. Otherwise the schedule breaks.
Preparation takes longer. You cannot "just read." You need to go through the manuscript, mark difficult words, understand the text's rhythm, identify characters and decide how each voice should sound. For a novel, that preparation can take as long as the recording itself.
The recording process step by step
1. Preparation and manuscript review
Before I enter the studio, I spend time reading the entire manuscript. Not skimming — reading. The goal is to understand the narrative structure, identify tonal shifts and mark everything that could cause problems.
In concrete terms:
- Pronunciation marking. Proper nouns, place names, technical terms, words in other languages. Everything without an obvious pronunciation is marked and verified. If the author is available, I send a list of questions.
- Character map. If the book has dialogue: which characters appear, how do they sound, are there consistency requirements (dialect, age, temperament)? This is decided before recording starts, not during.
- Structure analysis. Where are the chapter breaks? Are there time jumps, perspective shifts or tonal changes? This affects how I plan the recording sessions.
- Technical specification. Which distributor? ACX, Storytel, Bookbeat or direct publishing? That determines format, level requirements and metadata.
2. Recording: chapter by chapter
Audiobooks are recorded chapter by chapter, not linearly. The reason is practical: if I need to redo a chapter — due to script changes, pronunciation errors or quality issues — I can do it without touching the rest.
A typical recording day looks like this:
- Warm-up (15–20 minutes). Non-negotiable. The voice needs to get going. Without warm-up, the first takes sound different from the rest.
- Recording block 1 (90 minutes). Focused recording with breaks between chapters.
- Break (15–20 minutes). Water, rest, recovery. Not "check emails" — actual rest.
- Recording block 2 (90 minutes). Same structure.
- Optional block 3. Depending on the material's complexity. Dense texts with heavy character work can handle a maximum of two blocks per day. Straight narration can sometimes take three.
On a good day I produce 1.5–2.5 finished hours of material. This varies depending on the text. Non-fiction with technical terminology goes slower. A flowing narrative novel can go faster.
3. Editing and quality control
After each recording day I edit the material:
- Removal of retakes and false starts. Everything that is not the final take is cut.
- Consistency listening. I listen through the finished chapter and check pacing, pitch and energy level against previous chapters. If something does not match, I do supplementary recording.
- Technical QC. Level checks, noise floor, room tone, mouth clicks and breath sounds. Audiobooks have stricter noise requirements than most short formats — the listener wears headphones for hours and hears everything.
- Normalisation and mastering. Final processing according to the distributor's specification. ACX, for example, requires -23 LUFS integrated loudness, peak below -3 dBTP and noise floor below -60 dBFS.
4. Review and corrections
When all chapters are edited, the material goes to review — usually the author or publisher. They listen through and note any pronunciation errors, pacing issues or missed nuances.
Correction means I re-record individual sentences or paragraphs and swap them into the edit. This requires matching tone, tempo and level exactly — which is easier if the recording was done within a reasonable time frame. The longer it has been since the original recording, the harder it is to match.
Pricing models: per finished hour vs per studio hour
Per finished hour (PFH)
The most common model for audiobook voice over. You pay for the number of finished hours in the delivered audiobook.
What is included:
- Recording
- Editing
- QC and mastering
- One round of corrections after review
Typical price range: Varies significantly depending on genre, length and whether character voices are involved. Straight narration without dialogue has a different price point than a novel with ten characters. Contact me with your manuscript for a specific quote.
The advantage: You know exactly what the final cost will be. A 10-hour book costs 10 times the PFH rate.
The disadvantage: The PFH rate does not always reflect difficulty. 500 pages of non-fiction with technical terminology is more work per hour than 500 pages of light fiction, but they may result in roughly the same number of finished hours.
Per studio hour
Less common for audiobooks, but it does occur — particularly for shorter projects or when the author wants to attend and direct.
The advantage: More transparent in the moment. You see what you are paying for.
The disadvantage: Difficult to budget in advance. You do not know how many studio hours an 8-hour book requires until it is finished.
The studio time to finished time ratio
A standard rule of thumb:
| Type | Studio time per finished hour |
|---|---|
| Straight narration (non-fiction) | 2–3 hours |
| Narration (novel, simple dialogue) | 2.5–3.5 hours |
| Character-intensive (many voices) | 3–5 hours |
| Children/young adult (voice play, energy) | 3–4 hours |
This includes recording, editing and QC. Corrections after review are additional.
Technical requirements by distributor
Different platforms have different specifications. Here are the most common ones.
ACX (Audible/Amazon)
- Format: MP3 192 kbit/s CBR or higher
- Sample rate: 44.1 kHz
- Mono
- Loudness: -23 LUFS integrated (with tolerance -18 to -23)
- Peak: -3 dBTP
- Noise floor: -60 dBFS or lower
- Each file: one chapter, with opening and closing room tone (0.5–1 sec silence)
- Metadata: chapter name in file name
Storytel / Bookbeat
- Format: WAV 44.1 kHz/16-bit or MP3 192+ kbit/s
- Mono
- Loudness: -16 to -20 LUFS (varies, check current guidelines)
- Chapter-divided files
- Metadata according to their template
Direct publishing (self-distribution)
- Format: your choice, but WAV as master plus MP3 for distribution
- Recommendation: 44.1 kHz/24-bit WAV as archive, MP3 192 kbit/s for publishing
- Chapter-divided files simplify player navigation
Regardless of platform, I always deliver a WAV master plus converted files in the format the distributor requires. The master is your insurance if you switch platforms or need adjustments in the future.
Human voice vs AI synthesis: where audiobooks stand today
AI-generated voices have improved. That is undeniable. For short formats — IVR prompts, automated messages, informational text — synthesis often works well enough. But the audiobook is a format where human voice still dominates, and there are specific reasons for that.
Consistency over time. AI can sound good for 30 seconds. Over 8 hours, inconsistencies become clear: tonal shifts that are not motivated by the text, pauses that land in the wrong place, emphases that miss the point. The listener perceives it as a growing sense of artificiality.
Character work. A novel with dialogue requires the narrator to switch between characters and maintain each voice's identity throughout the entire book. AI synthesis can approximate different voices but lacks the dramaturgical understanding of why a character sounds different in chapter 12 than in chapter 3.
Emotional nuance. An audiobook lives on subtle changes in tempo, volume and colour that follow the text's emotional arc. This is not the same as applying "happy" or "sad" as a filter. It is about understanding why a sentence needs to be broken differently from the one before it.
Listener preference. Surveys from platforms including Storytel consistently show that listeners prefer human narration for narrative literature. Tolerance for synthesis is higher in non-fiction, but it is still not the majority preference.
This does not mean AI has no place in audiobook production. There are use cases: prototyping, internal review, accessibility versions where budget does not allow full recording. But for the published product — the one that carries the author's name — human voice remains the standard.
More on this topic: AI voice vs human voice: what decision-makers should consider.
What you should do — checklist for authors and publishers
Before the project
- Decide on a distributor (ACX, Storytel, Bookbeat, self-publishing). This determines the technical spec.
- Decide whether the book requires straight narration or character voices. This affects price and schedule.
- Send the manuscript in its final version. Changes after recording starts cost time and money.
- Create a pronunciation guide: proper nouns, place names, technical terms, foreign words. Write them phonetically if the pronunciation is not obvious.
When selecting a voice
- Listen to demos. Not just one — listen to several minutes of material to hear consistency.
- Request a test recording of a passage from your book (typically 2–5 minutes). This gives a better picture than a generic demo.
- Ask about long-form experience. Voice over for commercials and voice over for audiobooks require different skill sets.
- Ask about technical capability: what equipment, what room treatment, what delivery formats.
During recording
- Be available for questions. Pronunciation questions always come up, regardless of how thorough the guide is.
- Establish a review process: chapter by chapter or the whole book at once?
- Set a realistic correction window (typically one round after review).
At delivery
- Verify the files meet the distributor's specification before uploading.
- Keep the WAV master. It is your insurance if you switch platforms.
- Check chapter division, metadata and file names.
Next steps
Audiobook voice over is a specialist format. It requires a different kind of preparation, endurance and technical precision than short voice over assignments. The pricing model (PFH) reflects the complete scope of recording, editing and mastering.
If you are an author or publisher considering producing an audiobook: send the manuscript and tell me which distributor you are targeting. I will give you a specific quote based on length, genre and character requirements. Contact me.
To hear how the voice sounds in longer passages: demos. For more on pricing in general: rates and voice over pricing.
FAQ
How long does it take to record an entire audiobook?
It depends on the book's length and complexity. An 8-hour novel takes roughly 3–4 weeks from recording start to finished master, including editing, QC and one round of corrections. Pure recording time is 16–28 studio hours, distributed across sessions of 3–5 hours per day.
Can I as the author attend during recording?
Yes. This can be done on-site in Stockholm or via a remote session with Cleanfeed. The advantage is that you can give direction in real time. The disadvantage is that it can slow down the process if every paragraph is discussed. A common compromise: the author attends the first chapter to set tone and direction, and the rest is recorded self-directed with review afterwards.
What is the difference between straight narration and character voices?
Straight narration means the same voice carries the entire text without imitating characters. Dialogue is marked with subtle tonal shifts but not with distinct voices. Character voices means each speaking person in the book gets their own voice — age, dialect, temperament. The latter requires more preparation and takes longer to record.
Can I use the same voice over artist for both Swedish and English versions?
Yes, if the artist is proficient in both languages at the required level. I record in both Swedish and English (with a Nordic accent). The advantage of the same voice is consistency — particularly for non-fiction and series where the listener recognises the narrator.
How are updates and new editions handled?
If you need to update parts of the audiobook (corrected text, new edition), I record the changed sections and swap them into the master. This requires that the original files are archived. I always keep project files for at least one year after delivery. Longer archiving can be arranged.
How does audiobook pricing compare to short voice over?
Per minute, audiobooks are cheaper — the PFH rate is lower per minute than a 30-second commercial. The total cost is higher because the volume is so much greater. An 8-hour book costs more overall than a 3-minute corporate film, but the per-minute price is significantly lower. See voice over pricing and how voice over pricing works: rights and usage for more on pricing structure.
Do distributors accept audiobooks recorded in a home studio?
Yes, if the studio meets their technical requirements. ACX has specific requirements for noise floor and loudness. My studio (Isovox 2 Midnight + Austrian Audio OC18) meets all major distributors' specifications. What matters is not where the studio is located but how it sounds.
Do I need an ISBN for the audiobook?
Yes. An audiobook needs its own ISBN (separate from the print book and the e-book). This is typically the publisher's responsibility. If you are self-publishing, you will need to arrange it through your national ISBN agency.
Read more: