The quickness of the hand?

The recent minor fracas in the White House, in which Jim Acosta was accused of inappropriate behaviour towards a member of staff trying to remove a microphone from him has quickly turned towards the ‘two extremes’ of argument to which politics sadly resorts these days, with accusations of ‘fake news’, not by the President on this occasion - but against him.
The question is, was it the ‘quickness of the hand that deceived the eye’, or the deception of ‘doctored video evidence’ that is really what happened? In order to understand what’s going on here, we need to have a good knowledge of video imaging.
An image is created from a series of lines that are scanned left to right, top to bottom. Repeating the process frequently enough (within the temporal response of the eye, assisted by persistence of vision) enables us to perceive motion. However, the response of the eye gives rise to some useful phenomena which have been used by TV system design engineers for decades – the fact that the ability of the eye to resolve static detail is much higher than can be perceived in dynamic parts (i.e. those with movement in them). Thus, it has been possible to transmit moving images which are somewhat compromised without the eye and brain being able to detect the impairment. This system was called ‘interlacing’ and meant that savings could be made in the cost of the final transmission of the signals to the home, since the image quality is related to the ‘bandwidth’ of the signals (i.e. how much radio spectrum is occupied) and the high frequency signals required were difficult to achieve in the early days of TV, when the BBC started its TV service in 1936 and others around the world followed.
Although compromises still occur in the final transmission chain even today (since bandwidth costs money), as we transformed into the High Definition era we started to move away from ‘interlaced’ images and towards ‘progressive’ scanning systems, which overcame many of the compromises. The preferred format for acquisition is progressive, though for economic reasons, the signal is often ‘down-converted’ to interlaced for the end viewer. (Although it is possible to convert from an interlaced image to a progressive signal, part of the image wasn’t captured during the interlaced scan so it’s never possible to produce a perfect reconstruction.)
What do the differences show and how might they have arisen?
A frame by frame analysis of the recent event, as undertaken by Storyful shows that both versions appear to have originated from a progressively-scanned image but there are clear differences between the version that was transmitted live and the version that has been issued by the White House.
In the original, every frame is sharp and clear. There are no jagged or double feathered edges on movement. That is how we can be reasonably sure that the original is from a progressively-scanned signal format. In the version issued by the White House, for many of the frames the same is true. However, there are several frames where there is no movement at all (while the original contained movement) and then there are others where two images portraying movement appear to be overlaid.
There are already some people who are claiming that this is merely a function of the White House version having been passed through an interlacing process and they refer to ‘pulldown’. However, the images suggest otherwise. Certainly, the resolution has been reduced, making it harder to see fine detail. That could have been accidental or deliberate. Nonetheless, the ‘giveaway’ is the fact that where double edges do occur, they appear to do so on every single line. In an interlaced image, they would do so on every alternate line. In other words, the image appears to have been processed in a progressive scanning format. The question of pulldown, which some are claiming is the issue here, refers primarily to the method by which celluloid film is transferred to video. In most of the world, we currently operate at 25 frames per second. In the Americas and much of the Pacific, a frame rate of approximately 30 frames/second is used (fractionally less for reasons far too complex to discuss here).
For decades, US audiences have watched temporally-compromised images on TV that were originally shot for cinematic release on celluloid at 24 frames/sec through a process known as “2:3 Pulldown” which would produce a useable video signal in interlaced format at the required 29.97 frames/sec. This has absolutely nothing whatsoever to do with these images, which were shot on video cameras at the appropriate frame rate for TV. Even if we use the incorrect terminology of ‘pull-down’ for an interlaced 2:1 image (i.e. one in which a full ‘frame’ of video is split into the two ‘fields’ of alternate lines), this still doesn’t appear to come into play, since as already mentioned, the double edges are not on alternate lines but on every line. Additionally, splitting a progressively scanned image into two fields for interlacing would result in no movement between the two fields created by that process and, hence, there would be no double edges at all. (That is the standard telecine process used in most of the rest of the world, simply speeding up the film projection to 25 frames/sec and creating both fields from the same original celluloid frame.) Thus, references to interlacing and pull-down appear to be a complete ‘red herring’.
The only way in which these double edges could appear on a progressively-scanned image is if someone has deliberately attempted to ‘blend’ frames together. The initial delay of movement during the ‘frozen’ frames only lasts for around 7 milliseconds, which is insufficient to become obvious when played at normal speed because the movement in that frame was minimal and, thus, the result looks natural. However, the later ‘blended’ frames then appear to accelerate and accentuate the subsequent movement.
How come we don’t see the alteration?
Once again, we need to remember about the interaction of temporal and spatial resolution of the eye in order to appreciate how it is possible to achieve ‘special effects’. As referred to above, the dynamic resolution of the eye, when viewing something that’s moving, is reduced compared with its ability to resolve static images. (The overall reduction of resolution of the White House sequence has probably contributed in that regard, since the regular cry from those wanting constantly higher resolution of their TVs is often tempered somewhat too late when they find that the improved definition merely provides them with a system in which it is now easier to see all the unwanted signal artefacts and distortions that were previously invisible!) Thus, it is possible to hide all sorts of manipulation in a lower-resolution moving part of the image that only becomes visible when that movement is suspended to enable greater scrutiny.
What conclusions can we draw?
Rigorous analysis based on a comprehensive understanding and robust evidence is the only way to draw accurate conclusions - the correct academic process that is recognised around the world. Anything less is merely speculation and opinion. There is already a lot of the latter in this case.
The evidence that has been presented suggests that the clip released by the White House has certainly been altered from the original. There is an original frame that has been duplicated across several output frames. There are other output frames where multiple images are present from previously separate frames in the original. The resolution has been reduced. The impression presented when these are played at the normal speed is that the movement of the arms is more sudden than that which originally occurred.
In summary
The camera never lies…but its output can be manipulated by humans!
Video credit: Storyful News