Digital Video Introduction

Basic terminology

An image can be thought of as a 2D matrix. If we think about colors, we can extrapolate this idea seeing this image as a 3D matrix where the additional dimensions are used to provide color data.

If we chose to represent these colors using the primary colors (red, green and blue), we define three planes: the first one for red, the second for green, and the last one for the blue color.

We'll call each point in this matrix a pixel (picture element). One pixel represents the intensity (usually a numeric value) of a given color. For example, a red pixel means 0 of green, 0 of blue and maximum of red. The pink color pixel can be formed with a combination of the three colors. Using a representative numeric range from 0 to 255, the pink pixel is defined by Red=255, Green=192 and Blue=203.

Other ways to encode a color image

Many other possible models may be used to represent the colors that make up an image. We could, for instance, use an indexed palette where we'd only need a single byte to represent each pixel instead of the 3 needed when using the RGB model. In such a model we could use a 2D matrix instead of a 3D matrix to represent our color, this would save on memory but yield fewer color options.

For instance, look at the picture down below. The first face is fully colored. The others are the red, green, and blue planes (shown as gray tones).

We can see that the red color will be the one that contributes more (the brightest parts in the second face) to the final color while the blue color contribution can be mostly only seen in Mario's eyes (last face) and part of his clothes, see how all planes contribute less (darkest parts) to the Mario's mustache.

And each color intensity requires a certain amount of bits, this quantity is known as bit depth. Let's say we spend 8 bits (accepting values from 0 to 255) per color (plane), therefore we have a color depth of 24 bits (8 bits * 3 planes R/G/B), and we can also infer that we could use 2 to the power of 24 different colors.

It's great to learn how an image is captured from the world to the bits.

Another property of an image is the resolution, which is the number of pixels in one dimension. It is often presented as width × height, for example, the 4×4 image below.

Hands-on: play around with image and color

You can play around with image and colors using jupyter (python, numpy, matplotlib and etc).

You can also learn how image filters (edge detection, sharpen, blur...) work.

Another property we can see while working with images or video is the aspect ratio which simply describes the proportional relationship between width and height of an image or pixel.

When people says this movie or picture is 16x9 they usually are referring to the Display Aspect Ratio (DAR), however we also can have different shapes of individual pixels, we call this Pixel Aspect Ratio (PAR).

DVD is DAR 4:3

Although the real resolution of a DVD is 704x480 it still keeps a 4:3 aspect ratio because it has a PAR of 10:11 (704x10/480x11)

Finally, we can define a video as a succession of n frames in time which can be seen as another dimension, n is the frame rate or frames per second (FPS).

The number of bits per second needed to show a video is its bit rate.

bit rate = width * height * bit depth * frames per second

For example, a video with 30 frames per second, 24 bits per pixel, resolution of 480x240 will need 82,944,000 bits per second or 82.944 Mbps (30x480x240x24) if we don't employ any kind of compression.

When the bit rate is nearly constant it's called constant bit rate (CBR) but it also can vary then called variable bit rate (VBR).

This graph shows a constrained VBR which doesn't spend too many bits while the frame is black.

In the early days, engineers came up with a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. This technique is known as interlaced video; it basically sends half of the screen in 1 "frame" and the other half in the next "frame".

Today screens render mostly using progressive scan technique. Progressive is a way of displaying, storing, or transmitting moving images in which all the lines of each frame are drawn in sequence.

Now we have an idea about how an image is represented digitally, how its colors are arranged, how many bits per second do we spend to show a video, if it's constant (CBR) or variable (VBR), with a given resolution using a given frame rate and many other terms such as interlaced, PAR and others.

Hands-on: Check video properties

You can check most of the explained properties with ffmpeg or mediainfo.

Next chapter

Redundancy removal

Next Chapter