Top five audio/visual mistakes and how to avoid them


Whether you’re into creating audio podcasts, videos for vlogs on YouTube or maybe you have ambitions to work in TV or radio production – whatever your interests, you’ll likely have a clear idea of what you want your content to look or sound like. It’s important in creating this type of content that you consider how technology is linked to your artistic vision. If you don’t know and understand what’s really going on within the technology then it’s difficult to make informed judgements about whether what you’re trying to accomplish will work properly and meet the required standards. Our advice below should help you be more aware of the technical issues and how to avoid making common mistakes in order to get the results you’re looking for.

Audio visual mistakes

1. Microphones

In order to get the best quality sound, you need to get the microphone in as close to your subject as you can. Using the mic on the camera will pick up far too much of the surrounding background noise and make the wanted speech rather difficult to distinguish if the background is relatively noisy. Even in a quiet environment, the acoustics of the room will play far too great a role.

This is especially true if you’re relying on the automatic recording level control. This is because when nobody is speaking, the electronic gain will automatically be raised until the background noise (or acoustic properties of the room, e.g. reverberation) reaches foreground level and is recorded as if it were the main wanted speech. Then, when someone speaks, the level is much higher so you’ll get a distorted result, before the system plunges the recording level down again to cope with the higher-level signal. The result is a horrible pumping and distorted sound which is really unpleasant. It’s better to set everything manually having tested the levels first. This is a common theme that differentiates amateurs from professionals.

Get a good quality microphone and, unless you’re using a professional broadcast video camera with professional ‘XLR’ balanced audio inputs, a digital recorder (preferably one which will enable you to synchronise with the video signal in the camera). If you want to keep the mic out of shot, go for a ‘gun’ (or hyper cardioid) microphone and a boom pole. This enables the mic to be held just out of shot but the highly directional response means the mic will pick up the speech very well (but only in the direction towards which it is pointing). It won’t pick up too much sound from other directions which is exactly what you want. Alternatively, good quality tie-clip (‘Lavalier’) mics can be fitted to your interviewees if you don’t mind those being visible in shot.

Make sure your audio recorder is set to record wav (not MP3) files at 48kHz sampling frequency, as this matches professional standards and video editing software will expect files with those settings.

Always (always) wear headphones to listen to what is being captured, you’ll be amazed at what is sometimes being recorded that you didn’t realise was there. It’s too late when you get back to base to discover that an annoying fridge was running in the background. Keep the recording peaks well away from 0dBFS (full scale) as that will result in horrible clipping and distortion. Peaks of around -10dBFS are fine, you can fine tune the levels during mixing in post-production.

Remember to give your editor a break by starting each recording with a verbal shot/take number to match a visual reference held in front of the camera (followed by a sync clap that’s visible on screen too, if you’re using a separate audio recorder). Professional productions often use a clapper board which is useful for both shot numbering and sync, but even if you haven’t got one, just capturing the audio and video simultaneously while clapping your hands in vision will help (make sure you do it every time you stop and re-start recording or you’ll drive your editor mad).

Also, without moving any equipment or changing any settings, help your audio editor by capturing some ‘room tone’. This involves recording apparent silence in the room for around 20 seconds. Make sure nobody moves or is creating any sound during this time. This enables your audio editor to drop in segments of ‘silence’ with exactly the same audio properties as during the wanted recorded sequence if required during editing, to make any cuts sound more natural, or to act as a sample on which to base any electronic noise reduction processes required during post-production.

Notwithstanding everything listed above, there is still some merit in using the on-camera mic as the emergency back-up on one channel, with your main mic on the other channel, assuming you’ve only got one person speaking in the video. Then, if the main mic fails for whatever reason, you have at least got something to accompany the pictures, albeit of lower quality. In the edit, you’d normally just mute the back-up and duplicate the good channel so that both output channels have the correct audio.

Top five audio/visual mistakes and how to avoid them

2. Exposure

Just as with a stills camera, the amount of light and, more significantly, the range of light that can be captured with a video camera is limited. In order to restrict the range of light to be within the range of the camera’s ability, you need to look at sources of illumination.

The sun is the most obvious source of illumination (especially outdoors). This will usually present you with lots of light. You then have to adjust the aperture on the lens (combined with the shutter speed (more accurately, integration time) so that the amount of light arriving at the sensor during the integration time is within the correct range of exposure.

The combination of the display screen’s characteristics and the eye’s temporal response mean that it’s best to fix the shutter speed at around 1/50th second while we still have relatively slow frame rates of 25 frames/second (in most of the world – slightly higher in the Americas and some Pacific countries). The reasons behind this relates to the temporal response of the eye, flicker perception and digital compression efficiency.

If the shutter speed is locked at 1/50th sec, which it should be – again the difference between amateurs and professionals – this only leaves the aperture (iris) f-stop to adjust the effective exposure. However, the iris also changes the depth of field, perhaps in a way that you don’t want, either leaving too much or too little of the background in focus. As you can see, once again an understanding of the technology is required in order to achieve the desired artistic effect. You might want to select a particular aperture to control the depth of field, so now you’re stuck – can’t change the shutter speed (to avoid flicker) and can’t change the iris (to control the depth of field). So what do you do?

If the subject is too bright, you can do exactly what you’d do for your own eyes – put shades on. Except that, instead of being coloured glass (or plastic), this time they mustn’t change the colour of light entering the camera so we use what are called neutral density (ND) filters. These are filters that attach over the lens (or often inside the body of broadcast quality video cameras) which reduce the amount of light entering the lens. They can be stacked on top of each other to further decrease the light or differing opacities can be bought as single filters.

Alternatively, you might find that there is insufficient light for your needs even with the iris fully open (which, itself, creates problems as the depth of field will then be so shallow that you’ll have difficulty keeping the subject in focus, which looks pretty unprofessional). There are two ways to deal with it – the wrong way is to increase the gain in the camera (equivalent to increasing the ISO or film speed). All this does is amplify the signal coming off the sensor but electronic circuits generate internal noise so you end up amplifying that too, until it quickly becomes visible as a random snowy effect, this is known in electronics as the SNR (Signal to Noise Ratio) problem. The right way is to increase the signal at source rather than just amplifying the electronic signal that you’ve already got and that means bringing in…

Audio visual

3. Lighting

Our eyes and brain are remarkable. Depending on the actual light level, this combination enables us to see a range of light levels between 10 and 20 ‘f-stops’ (approx. 1,000:1 to 1,000,000:1 in terms of dynamic range of light). Cameras and display devices, however, are nowhere near as good, ranging from around 8 up to (expensive broadcast quality) 14 stops. In order to guarantee a usable range, allowing a bit of ‘space’ in case of errors, poor set-up at a viewer’s home etc., we’ll often restrict lighting for TV and video to, say, 5 or 6 stops (around 30:1 to 60:1). In order to capture full facial expressions, we might restrict the light levels on the face to within 1 or 2 stops (2:1 to 4:1). At the time of writing, High Dynamic Range displays (Ultra-HD-1 Phase 2) are hitting the stores, though reasonably priced cameras are some way off and there is still uncertainty over whether broadcast systems will use Hybrid Log Gamma or Perceptual Quantisation to convey the additional range. Public Service Broadcasters are unlikely to implement these enhancements until the mid 2020s.

Without extra lighting, if you adjust the camera’s iris so that the brightest parts (highlights) are okay, you might find that other parts of the reproduced image are too dark. Conversely, you might find that if you adjust so that the darker parts are okay, the lighter parts are now peaking and crushed into white. What you need to do is bring in additional illumination which could be artificial lighting or even as simple as a white sheet reflector to bounce some of the incident light back to ‘fill in’ the shadows.

By bringing in lighting, you are reducing the tonal range of the image so that, when you adjust the iris for flesh tones that are right, the darker parts of the surrounding image aren’t all black and the brighter parts aren’t all ‘burnt out’ (the ‘soot and whitewash’ effect). If you have a light meter, you could adjust your lighting to ensure that none of your human subject has an exposure range of more than one to two ‘stops’.

The three-point lighting set-up referred to in the video is a classic technique which uses a ‘key’ light (the one that provides the majority of illumination to the face) at about 45 degrees up and 45 degrees to one side of the camera, a ‘fill’ light (a softer, more diffuse light) that fills in the darker shadows (usually slightly lower and 45 degrees to the other side of the camera and a little further away) and a back light (a narrow beam that illuminates the back of the head and shoulders, out of sight up above and behind the subject, providing a nice halo effect on hair which separates the person from the background). Search the internet for ‘three point lighting’ and you’ll find lots of examples. You can also extend this to 4-point lighting by having another light illuminate the scene behind the subject.

There’s also colour balance to consider. Once again, our eyes and brain are amazing at sorting out colour but the camera is not so good. Sunlight contains light across the full range of the visible spectrum. Artificial lights do not (especially gas discharge and LEDs, unless specially treated for photographic purposes). We use ‘correlated colour temperature’ to describe the average colour of the light. Standard daylight for video is 5600 Kelvins, which is somewhat blue. Standard tungsten studio lighting is 3200 Kelvins which is somewhat yellow.

If you go from daylight outdoors to shooting an interior scene without resetting the colour balance, your shots will look orange. If you go from interior to outdoors without resetting the balance, your shots will look blue. At each new scene, make sure you hold a piece of neutral white (or light grey) card or paper in front of the subject (ensuring that it’s illuminated by the same source(s) that will light the subject), zoom in to fill the frame with the card, then press the ‘white balance’ button on the camera. You can then zoom out to compose your shot. (You might help your editor again by briefly filming the card in position after you’ve marked up the scene/take numbers and before the action starts so that it can be used for reference, in case further post-production tweaks are required.)

Don’t fall into the trap of using mixed source lighting (some tungsten artificial lights and some daylight) and saying “we can fix it in post-production”. You can’t! (Like most things that amateurs think can be ‘fixed in post’, you’ll be compromising everything if you try this and your editor will hate you.) Apart from the fact that, If you have to adjust in post, it means you’re already dealing with a compromised image, if you then try to balance for the daylight bits, the tungsten parts will look more orange. If you try to balance for the tungsten, the daylight parts will look more blue. The only way to do it is to be professional and plan for it on set (‘fix it in pre’).

Use CTB (colour temperature blue) colour correction gels over the lights to give them the same emission characteristics as the sunlight. If you’re using conventional hot tungsten bulb lighting, make sure the gel is heatproof or you’ll quickly have a fire on your hands. Then the whole scene is balanced to 5,600 Kelvins. (You could, instead, use orange CTO gels over the windows and balance everything to 3,200K but that needs a lot of gel and is very tricky.)

Audio visual

4. Banding

It’s important to set your camera to the correct scanning standard before you start your project. This will depend on where you intend to film and show your material. For most of the Americas and some Pacific Rim countries (plus part of the Middle East), the mains frequency is 60Hz. For the rest of the world, it’s 50Hz.

Therefore, you need to ensure that the frame rate you set on your camera matches these frequencies, otherwise there will be a ‘beat frequency’ set up between the camera frame rate and any mains-powered lighting. This will appear as moving or rippling bands of light and dark.

Depending on your camera, you’ll have a range of settings available. Assuming UK settings (and ignoring Ultra HD, which most people won’t have yet), the best option is to produce in 1080p/50 if possible as that allows for compatible down-conversion to all the lesser standards. Many affordable cameras can’t manage that either yet so you’re more likely to have a choice of 1080i/25, 1080p/25 or 720p/50. Confusingly, many cameras use American nomenclature which refers to refresh rate so for interlaced scanning that will be shown as 50Hz fields even though that means 25 frames/sec (since there are 2 fields per frame).

This is a complex area and one of many covered in our 2nd year module on broadcast standards, but needs to be understood and agreed with all camera operators having switched to the same settings before the start of a project as, if wrong, the scanning standards will, at best, completely change the look and feel of your project or (worse still) produce completely unusable results. (At the very least, make sure nobody is shooting in 30 (29.97), 60 (59.94) or 24 (23.98) frames/second.)

There are many subtleties and nuances to be balanced in the decisions over static resolution numbers and scanning systems (progressive versus interlaced) which we can’t cover here. Briefly, if you really must have old-fashioned ‘film look’ go for 1080p/25. If you don’t need 1080 lines then 720p/50 will give the best image reproduction, good fluid motion, least visible artefacts and optimum compression for final emission (but your camera might not be optimised for this resolution and, occasionally, might be slightly more likely to produce what’s called ‘spatio-temporal aliasing’ which shows up as a pattern moving in the opposite direction on very finely patterned objects as they move, though this could happen in 1080 lines too). If you need 1080 lines then choose either 1080p/25 (as above) or 1080i/25 which will give the video look but introduces the disadvantage of visible interlacing artefacts, which can look awful on a display device designed for progressive scanning (e.g. flat panel LCD) and will result in around 20% increased file sizes or emission bandwidth. If you’re shooting for UK broadcast then you shouldn’t need me to tell you that you must choose 1080 lines to meet the DPP standards and your camera will need to achieve the appropriate tier in the EBU R118 specs.

Top five audio/visual mistakes and how to avoid them

5. Mixing

Poor speech intelligibility is reported to be the biggest source of complaints to all broadcasters so take heed of their pain and learn from it. You know what’s being said and have heard it dozens of times while assembling your audio tracks. The viewer or listener is hearing it for the first time and may well struggle to hear the words over any background music and effects. For many decades, broadcasters used metering systems based on quasi-peak programme meters (QPPMs or just PPMs). Advertisers and others have now found ways to abuse these methods so new techniques based on average perceived loudness levels are in place.

Those mixing for broadcast emission must abide by the technical standards required. Those creating material for other purposes, even just for your own podcasts or vlogs, would also do well to take heed of the standards. In this respect, working to EBU R128 when mixing would be a good policy but the subtleties of working to achieve the correct absolute loudness unit target is beyond the scope of these notes. At the very least, try to base your levels around those required by the broadcasters and ensure that any speech levels are kept within a range of no more than about 6dB (i.e. avoid quiet mumbling then loud shouting) and also ensure that your speech level averages at least 4dB (preferably at least 6dB) higher than any background music and effects so that it isn’t swamped. Again, keep all levels well away from 0dBFS (i.e. full scale). The closer you get to it, the worse will be the audio quality once it’s been digitally compressed for emission (the classic ‘wasp in a jam jar’ sound). Your final mix level averaged across the entire programme should be at what’s called ‘-23LUFS’ under the EBU R128 standard, which is approximately -24dBFS, for normal programme material. This is a lot quieter than many people currently mix.

Remember – great content causes people to switch on but poor technical quality causes them to switch off again.


These hints and tips only start to scratch the surface of the sort of decisions that have to be made for a professional production and in order to do that properly you need to understand the technical opportunities and limitations presented by the equipment. If you’re interested in learning more, then consider studying Audio and Music Production, Audio and Music Production with Foundation Year or Video Production and Streaming.



Abbie Romano

Winning The Telegraph STEM Awards 2015


The Sanchi oil tanker burning in the sea off the coast of Shanghai

Sanchi oil tanker disaster: how spills and accidents can make ships safer


Get in touch

Have feedback or have an idea for a feature? Email us at