Notes on making an audiobook


Let’s say you write a book and decide to turn it into an audiobook. Basically, all you have to do is read it onto a recording device, edit the tape and upload the chapters to the audiobook seller’s web site. A stunningly simple process. What, I began to think about two years ago, could possibly be easier?

The book in question was the original Roo book, Notes from a Dog Rescue in Progress, now five years old. The first consideration was who to get to read it. Anyone who listens to audiobooks has experienced bad readings, or, all least, some readings that shine so brightly that you wish others were as good.

I can’t stand the sound of my own voice. When I was forced to narrate a documentary when Tom Waits backed out at the last minute, leaving our broke production on the hook for the studio rental, it was torture having to listen to my voice spool through the rest of the editing process and the mix. Every line grated on me, sounding stiff, stupid, mispronounced, off-tone. When the film was broadcast in Australia, they seemed to agree. They had me dubbed in Australian. They said it was because no one down there would understand my heavy American accent.

What with the huge quantity of audiobooks being produced these days, there’s an efficient system in place for finding a narrator. It’s right at the Amazon subsidiary where you will eventually publish the audiobook. You upload a page or two of the book, and list it as open for auditions. The next thing you know, your inbox is filled with recordings. Some of them are pros, some of them read as if they were delivering the Gettysburg Address to a high school public speaking class. None of them sounded right. I began to accept what I already knew, but had been resisting: that the only person who should read the book was me. It’s a personal story written in the first person, after all, and all those voices reading it as if the story was happening to them rang untrue. At least to me.

After reviewing 127 auditions, I knew it wouldn’t work. I gave up on it, and, because I couldn’t stand the idea of doing it myself, gave up on turning the book into an audiobook.

Then, when Roo and I were in Utah a couple of years ago, I decided to give it a go. What did I have to lose by plugging a mic into the computer and giving it a go?

Nothing, I figured. I had nothing to lose, other than a little time.

Now, I understand basic sound production. I’ve been the sound man on a couple of films and the sound was always good. I figured this would be a lot easier. There would be no adjusting to constantly changing levels and tones in new sound environments. There would be none of the crosstalk when two or more people talk over each other and ruin the take. 

I did some research into the hardware and software I’d need. I ordered an inexpensive USB microphone from Amazon. For recording software, I could have used Garage Band, which is already on the Mac, but from everything I read, as a music program, it wasn’t really suited for the type of editing you have to do on narration. The program everyone seemed to love was called Hindenburg.

I looked at their web site, and two things struck me. The price was outrageous, $375, and their web site had so many typos — hundreds of them — that it was hard to believe. I made a list of the typos, wrote the company to tell them that I had found hundreds of them and would trade the editing for the software. They turned me down. I told them, fine, if they wanted to be a bunch of cheapskates, that was their problem, but the work was already done so they could have it with my compliments. They gave me a two-year license.

Folks, I'm an experienced computer user. I can use all sorts of complicated software. I have never seen anything as complicated as Hindenburg. The learning curve goes right off the top of the chart. All it edits is sound — no pictures, no film — and yet, it's incomprehensibly difficult and unintuitive. But, once you start generating files in it, they've got you, because you can't transfer those files to any other program. You can export the finished product, but you're stuck editing it in their format. It does a good job, but the smallest tasks can put you through the most complicated procedures.

When I got everything set up, I read the first paragraph of the book into the microphone and played it back. I had steeled myself to having to hear my own voice, and so, though it made me cringe, it didn’t even make it into the Top Ten list of the things that were wrong. The problem was the sound.

When you say any word that starts with a P or B, you put your lips together and separate them as you blow some air out. It’s called a “plosive” when that wave of air hits a microphone with an annoying thud. Controlling them is a matter of positioning the mic and using a pop filter, which is just a screen placed in front of the mic that breaks up the air. The mic I had bought came with one, but it didn’t work. I made a better one out of some drug store nylons stretched over a hose clamp and attached it to the mic stand with a clothes hanger. Figuring this out took a long time.

Another irritant when recording the human voice is sibilance, the hissing produced when you say any word that has an S in it. You don’t hear this when you’re talking to someone, unless it’s one of those people who can’t help whistling through their teeth, but that hissing is magnified by the microphone. If it gets into the final recording, it’s profoundly annoying to the listener. So, it took a lengthy process of experimentation to try to position the microphone far enough away from the mouth not to pick it up too much while still producing a clear voice. 

The thing is, if you change the distance between the microphone and the mouth by just a few millimeters, the levels change, the whole sound changes. That means that you have to freeze in position to maintain the proper distance. What made this worse was tongue and lip clicking. When you talk or listen to someone else talk, you don’t hear the constant barrage of clicks and pops made by the tongue and lips. On the recordings, these sounded like a gorilla with loose dentures eating with his mouth open. It was revolting. I couldn’t control all of it. A better mic in a less noisy environment would have taken care of it, because it could be positioned far enough away not to pick them up, but with the ambient noise always present in the camper, I had to trade a closer mic position, with lower recording levels to keep out the external noises, for the clicks and pops.

None of this would have been necessary in a studio where there is no extraneous noise. People who do this at home often do it in a closet, where they can tape some foam or blankets to the walls and achieve a silent room. In the camper, nothing like that was possible.

The next problem was flubbed words or bad reading. Try reading one of these paragraphs out loud and see whether you can maintain perfect diction. Easier said than done.

Then, there were the external noises. There was always something. First, the camper’s tiny vent fan hummed in the background, and I didn’t realize it. Everything had to be re-recorded. Roo would move or snore or come over to say hi. The vinyl on the seat would squeak — translating into a piercing shriek on the recording. If I forgot myself and failed to hold perfectly still my shirt my rustle the tiniest it, enough to ruin the take. Or, outside, an airplane would fly over, or the wind would blow or another camper would crush a Bud Light can or drive by in their diesel pickups.

Working day and night, taking time off only to take Roo for her hikes, it took me five weeks to record the raw tracks. It would have take a professional reader — any one of the 127 people who had auditioned — three or four hours.

Once I had the tracks on hand, I edited out the flubbed words and assembled it into a completed audiobook. It took several more weeks, and I was out of patience. I fooled myself into thinking it might not be too rough, and sent it to my friend Jon Winokur to see what he thought. 

“Well,” he said, always the gentle — but honest — critic, “it’s a little breathy.”

I listened to it. It was breathy as hell. It sounded like I had recorded it while being ventilated in an iron lung. I’d have to find a way to edit all of that out.

This is what two minutes of finished audio looks like. Every one of those lines is an edited section, some of which took hours when a word or phrase had to be re-recorded, matched, filtered. Any sound professionals looking at this will show it to their friends at a bar and they'll all have a good laugh at the idiot who produced this, but there we are.

This is what two minutes of finished audio looks like. Every one of those lines is an edited section, some of which took hours when a word or phrase had to be re-recorded, matched, filtered. Any sound professionals looking at this will show it to their friends at a bar and they'll all have a good laugh at the idiot who produced this, but there we are.

To make all of these edits, several things had to be done. The track had to be cut where there was a breath or a click and the bad stuff deleted. But the problem with that is that those breaths or clicks are part of the words. If you cut them out, you cut the words out. So, you have to re-record tons of stuff. And then they are at different levels, and each one of those has to be individually adjusted. The levels, the tone. That Hindenburg software? The smallest edits retire as many as 20 mouse clicks and careful, frame-by-frame repositioning. A single word could take a couple of minutes to fix.

This went on and on. Eventually I thought it would never end, and I put it on the back burner. Then I turned the burner off. I had met the enemy face-to-face and the way he beat me was with the old fact of garbage in, garbage out.

So much work had gone into it, though, that from time to time I revived it. Another minute of editing would take a few hours, and no matter how many times I checked the result, there were always more problems. After a while I thought of just deleting it. The book was so old now that it had already missed it’s shot at selling as an audiobook anyway. What was I doing it for? Still, it’s hard to trash something after that much effort had gone into it. 

I kept plugging away at it. Eventually, it grew into thousands of edits. It was ridiculous. Every time I thought it might be finished, I’d hear another click or something. Or I’d find out that the technical specs required by the audiobook company were in a different standard than the one in the software and have to read scientific articles about how to translate them, and once I did, re-engineer the whole thing. 

But finally it was done. I’ve never been in a place quiet enough, and with headphones good enough to tell for sure, but it seemed to sound okay.

And, so, here it comes. The audiobook company takes a couple of weeks to review the audio quality before listing it for sale on Amazon, iTunes and Audible, and it’s entirely possible that I’ll get an email apprising me of some other mistake I’m not even aware of. If that happens, I’ll fix it if I can, but if it’s something beyond my capacity in this camper-based recording and mixing studio, I’ll probably have to let it go. Fingers crossed.

With deep thanks, I'll be sending a download link to all of you who have supported Roo’s and my travels on Patreon and beyond. I hope it's okay. And I’m glad it’s done.

If it is.