H264 - what, why and how

Below we present some information about a stream that can be fetched out of SPS NAL units.

SPS NAL unit structure

seq_parameter_set_rbsp( )Descriptor
{
     seq_parameter_set_data( )
     rbsp_trailing_bits( )
}

As presented above, SPS NAL unit consists of a seq_parameter_set_data()which is a description of appropriate fields in the SPS, and the rbsp_trailing_bits() which is simply a sequence of some "0" bits at the end of the SPS NALu so that to ensure that the SPS NALu is byte-alligned.

seq_parameter_set_data( )Descriptor
{
    profile_idcu(8)
    constraint_set0_flagu(1)
    constraint_set1_flagu(1)
    constraint_set2_flagu(1)
    constraint_set3_flagu(1)
    constraint_set4_flagu(1)
    constraint_set5_flagu(1)
    
reserved_zero_2bits /* equal to 0 */u(2)
    level_idcu(8)
    seq_parameter_set_idue(v)
if( profile_idc == 100 or profile_idc == 110 or profile_idc == 122 or profile_idc == 244 or profile_idc == 44 or profile_idc == 83 or profile_idc == 86 or profile_idc == 118 or profile_idc == 128 or profile_idc == 138 or profile_idc == 139 or profile_idc == 134 or profile_idc == 135 )
{
     chroma_format_idcue(v)
     if( chroma_format_idc = = 3 )
         separate_colour_plane_flagu(1)
     bit_depth_luma_minus8ue(v)
     bit_depth_chroma_minus8ue(v)
     qpprime_y_zero_transform_bypass_flagu(1)
     seq_scaling_matrix_present_flagu(1)
     if( seq_scaling_matrix_present_flag )
     {
         for( i = 0; i < ( ( chroma_format_idc != 3 ) ? 8 : 12 ); i++ )
         {
             seq_scaling_list_present_flag[ i ]u(1)
             if( seq_scaling_list_present_flag[ i ] )
                 if( i < 6 )
                     scaling_list( ScalingList4x4[ i ], 16, UseDefaultScalingMatrix4x4Flag[ i ] )
                 else
                     scaling_list( ScalingList8x8[ i − 6 ], 64, UseDefaultScalingMatrix8x8Flag[ i − 6 ] )
         }
    }
     log2_max_frame_num_minus4ue(v)
     pic_order_cnt_typeue(v)
     if( pic_order_cnt_type == 0 )
         log2_max_pic_order_cnt_lsb_minus4ue(v)
     else if( pic_order_cnt_type == 1 )
     {
         delta_pic_order_always_zero_flagu(1)
         offset_for_non_ref_picse(v)
         offset_for_top_to_bottom_fieldse(v)
         num_ref_frames_in_pic_order_cnt_cycleue(v)
         for( i = 0; i < num_ref_frames_in_pic_order_cnt_cycle; i++ )
             offset_for_ref_frame[ i ]se(v)
    }
     max_num_ref_framesue(v)
     gaps_in_frame_num_value_allowed_flagu(1)
     pic_width_in_mbs_minus1ue(v)
     pic_height_in_map_units_minus1ue(v)
     frame_mbs_only_flagu(1)
     if( !frame_mbs_only_flag )
         mb_adaptive_frame_field_flagu(1)
     direct_8x8_inference_flagu(1)
     frame_cropping_flagu(1)
     if( frame_cropping_flag )
     {
         frame_crop_left_offsetue(v)
         frame_crop_right_offsetue(v)
         frame_crop_top_offsetue(v)
         frame_crop_bottom_offsetue(v)
     }
     vui_parameters_present_flagu(1)
     if( vui_parameters_present_flag )
         vui_parameters( )
}

Fetching resolution of a video

In order to calculate the resolution of a video (by which we mean - height and width in pixels), the following sequence of operations need to be done:

if sps.separate_colour_plane_flag == 0 then: chroma_array_type = sps.chroma_format_idc else: chroma_array_type = 0 case sps.chroma_format_idc of 1: sub_width_c = 2, sub_height_c = 2 2: sub_width_c = 2, sub_height_c = 1 3: sub_width_c = 1, sub_height_c = 1 _other: sub_width_c = nil, sub_height_c = nil if chroma_array_type == 0 then: crop_unit_x = 1 crop_unit_y = 2 - sps.frame_mbs_only_flag else crop_unit_x = sub_width_c crop_unit_y = sub_height_c * (2 - sps.frame_mbs_only_flag) if sps.frame_cropping_flag == 1 then: width_offset = (sps.frame_crop_right_offset + sps.frame_crop_left_offset) * crop_unit_x height_offset = (sps.frame_crop_top_offset + sps.frame_crop_bottom_offset) * crop_unit_y else: width_offset = 0 height_offset = 0 width_in_mbs = sps.pic_width_in_mbs_minus1 + 1 width = width_in_mbs * 16 - width_offset height_in_map_units = sps.pic_height_in_map_units_minus1 + 1 height_in_mbs = (2 - sps.frame_mbs_only_flag) * height_in_map_units height = height_in_mbs * 16 - height_offset

Fetching H.264 profile

There are a bunch of possible H.264 profiles - sets of features used by encoders/decoders to provide higher or lower levels of compression, i.e. some profiles (baseline and constrained baseline) don't use B-frames, while the other profiles do. If your decoder cannot deal with B-frames, you are obliged to use the baseline or constrained baseline stream. You can read more about H.264 profiles and differences between them here, but generally speaking you need to be aware that the knowledge of the profile might be helpful in many cases. The H.264 profile is determined by the values of fields fetched from the SPS. Below there is a table showcasing the conditions that are met when a given stream is using a particular profile.

ProfileFields values
BASELINEprofile_idc = 66
CONSTRAINED BASELINEprofile_idc = 66, constraint_set1_flag=1
MAINprofile_idc = 77
EXTENDEDprofile_idc = 88
HIGHprofile_idc = 100
PROGRESSIVE HIGHprofile_idc = 100, constraint_set4_flag=1
CONSTRAINED HIGHprofile_idc = 100, constraint_set4_flag=1, constraint_set5_flag=1
HIGH 10profile_idc = 110
HIGH 4:2:2profile_idc = 122
Next chapter
Appendix
Next Chapter