Information theory and video upscaling
July 30, 2016
Often when people first hear about Video Enhancer and its main feature, super resolution video resizing, they say "No way, it cannot work! If information is lost in downsizing you cannot get it back." And when we're talking about total information in a video they are absolutely right. However when looking at particular frames... Here's an experiment for you. I captured a couple of phrases with my two revolutionary ASCII video cameras that capture text, unfortunately one camera has low resolution and only captures two out of three letters, and the other camera has a nice resolution but results are noisy. Here are a few frames captured by my ASCII cameras. Can you guess what were the original phrases?
From my low-res text camera:
*nf*rm*ti*n *he*ry* | Frame 1
i*fo*ma*io* t*eo*y | Frame 2
in*or*at*on*th*or* | Frame 3
From my noisy text camera:
Ahis 2enteGce cJntJiVs exaXtlQ sixYBord6. | Frame 1
ThVs06enDenc8 conHain7 eKactly 7i8 w3rds. | Frame 2
TCis sG5tenCeX1o8ta6ns CxXct5y s5x8wo5ds. | Frame 3
GhiZ 4enteVcZ contKinH exaFAl8Xsix RordH. | Frame 4
Th3s5senB9nce Z4n6ains eNactly Bix worJs. | Frame 5
7ZiV seBten4eXcont3iD4 BZaNtQyXs57 QoZds. | Frame 6
Th1M 1ent0nc6 4o8tains exXcAl7 six wZrdZ. | Frame 7
I believe it wasn't hard for your brain to reconstruct original phrases, even though no single frame contained any of the phrases fully. Each particular frame contains either a part of information (in first case) or noisy information (second case), but since the same phrase is captured in all frames of a sequence, by comparing them and analyzing together we can still extract full information. And after you recover the original phrase, do you have more information than there were in the low res or noisy "video"? No, not at all. But does your one "frame" (line of text) with recovered sentence contain more clean information than one frame of original "videos"? Yes, we can say so. This is the principle of how Super Resolution works to recover high-res frames from multitude of low-res frames, and how temporal denoising (like Film Dirt Cleaner for example) works to remove noise from video by comparing each frame with its neighbors. Of course, in real videos the objects are usually moving so we have to use motion search to do the work, and for super resolution to work we're doing quarter-pixel accurate search, this is what makes these filters somewhat slow.
Now, from these examples I think you see now the limits of this approach. Although each particular frame may contain just a part of information about original content, when combined the frames must have full information and it must not be too distorted and noisy to be recoverable. When noise or compression is too high or blurring is too strong, information is really lost, we can't do wonders, just as you woudn't recover the right word from frames "jnkfnh", "6uidkf" and "596fnd".