Objective Quality Assessment Metrics for Video Codecs - Sridhar Godavarthy
Contents Why are we doing this? Classification of metrics Aren’t we comparing apples and oranges? Results of experiments Conclusion 2
What is Video? Sequential combination of images Utilizes the persistence of vision High bandwidth due to large size 3
Compression To conserve Bandwidth Removes redundancy Spatial Within frame Temporal Between frames Transmit only changes from a base frame 4
Compression Continued… Lossless Low compression High quality Lossy High compression Obviously low quality 5
6/2/20166 Video Encoding Formats 1) MPEG-1 2) MPEG-2 3) MPEG-4 4) H.263 5) ASF 6) WMV and lots more… 6
6/2/20167 What is a Codec? Coder Decoder. Capable of encoding and decoding. H/W or S/W. Several codecs for each format. Separate for audio/video. 7
Quality Measurement Subjective Objective 8
Subjective Mean opinion score - bunch of people watch and rate. Not stable Slow Expensive 9
Objective Measurable quantity Rate/Distortion Common Measurements: PSNR, MSE Not accurate 10
Existing Techniques
General Considerations No magic formula How does codec X compare with codec Y in quality? Is the comparison measurable? How does codec X compare with codec Y in performance? Which codec gives best performance for A given bit rate For a given frame size For a given content Does the codec give consistent performance? 12
Evaluation Methodologies Objective Use Mathematical models to emulate the Human Visual System Based on feature extraction from a bit stream Quick Cost efficient Useful for assessing progress on a regular basis Strongly dependant on parameters chosen Very little correlation with subjective evaluation results – especially at low bit rates 13
Evaluation Methodologies Contd… Subjective Reliable Accurate Time consuming Require more effort (Hence expensive) Needs to be executed correctly 14
Objective Evaluation PSNR: Peak Signal to Noise Ratio EPFL: Moving Picture Quality Metric(Christian van den Branden Lambrecht) VQMG: General Video Quality Model (Steven Wolf & Margaret Pinson) SDM: Structural Distortion Based Model (Zhou Wang, Ligang Lu & Alan Bovik) 15
PSNR 16
General Video Quality Model 17
Structural Distortion Based Model 18
Comparison of Objective Evaluation Methods 19
Subjective Evaluation DSIS: Double Stimulus Impairment Scale DSCQS: Double Stimulus Continuous Quality Scale SSCQE: Single Stimulus Continuous Quality Evaluation SDSCE: Simultaneous Double Stimulus for Continuous Evaluation SCACJ: Stimulus Comparison Adjectival Categorical Judgement 20
Double Stimulus Impairment Scale Videos are shown consecutively in pairs First one is the reference Second one is impaired Expert must give opinion after playback using the opinion scale( 5 the best and 1 the worst) Recency effect – Most recent clip has more effect on decision 21
Double Stimulus Continuous Quality Scale Videos are played in pairs Both videos are shown simultaneously Each pair is repeated a given number of times One of the videos is the reference and the other is the distorted one. Expert is not aware of the classification. Most commonly used method. Especially when qualities are similar 22
Single Stimulus Continuous Quality Evaluation Longer program(20 – 30 mins) Reference is not presented Continuous rating. Ratings are sampled throughout Rate changes between frames can be measured Memory effect Distractions with grading 23
Simultaneous Double Stimulus for Continuous Evaluation Similar to the SSCQE, but videos shown simultaneously 24
Stimulus Comparison Adjectival Categorical Judgement Two sequences played simultaneously Expert has to give opinion after that on this scale 25
26
Recommendation Both objective and subjective testing should be used to the complement each other As objective testing method VQMG and SDM is recommended PSNR can be used to evaluate the efficiency of the codec compared to another As subjective testing method SAMVIQ is recommended Video sequences chosen for testing should be selected corresponding to the content 27
Classification of Objective Measurements Full Reference No Reference Reduced Reference 28
Full Reference Measure distortion with full reference to original image/video (Eg. Pixel to Pixel) 29
No Reference Measure distortion with no reference to original image/video Less accurate Measure a specific (set of) distortion(s) Lower Complexity 30
Reduced Reference Extract information from the original image and use that information Compare with same information from distorted image. 31
Another Classification of Measurements Error Sensitivity Structural Similarity/Distortion Statistics of Natural Images Others 32
Error Sensitivity Based on Human Visual System FR method Decompose videos into spatio-temporal sub- bands followed by an error normalization and weighting process Metrics differ by the Sub-band decomposition method and the HVS model adapted 33
Structural Similarity/Distortion Extract structural information. Based on HVS. Top Down approach Quality is independent of intensity and contrast variations. Metrics for scaling, translation and rotation Similar to Reduced Reference methods. 34
Statistics of Natural Images Only for natural images Not applicable for text images, graphics, animations, Radar, X-rays etc. Just a measurement of information loss Uses statistical models. 35
Others Spatial information losses Edge shifting Luminance variations etc. Degradation caused by water markings 36
No Questions Slide All metrics converted to Predicted DMOS Allows for easier comparison Normalized PDMOS in some cases Mapping is obtained by a non-linear equation 37
Measurements All metrics trained on standard set to obtain objective and subjective indices These indices are used to calculate the unknowns in the previous equation Viola! Mapping for each metric to PDMOS 38
Training Set SSIM, VIF, RRIQA, DMOSpPSNR Live2 database NRJPEGQS only with JPEG NRJPEG200 only with JPEG2K VQM-GM: 8 video sequences including foreman. Training only based on luminance(gray scale) 39
Analysis of Results – Why DMOSp PSNR shows differences even at high bit – rates. The objective measure DMOSp shows saturation at high bit- rates. Similar to Subjective indices. Hence the Subjective measure. 40
Analysis of Results Contd… High Bit rates results in saturation. Differences less than 5 are not perceivable. Lower DMOSp as Bit- rate increases(good!) But at low bit rates, wide variations. 41
Conclusions Metrics compared in the predicted DMOS space All Metrics were “trained” with the same dataset to attain the mapping. NRJPEG2000 gave wrong quality scores. NRJPEGQS does not accurately perceive the differences of quality VQM ranks certain formats wrongly. What does all this mean? 42
Conclusions Contd… Each metric can be used depending on Application Frame Size Bit range VIF the most accurate of the lot RRIQA the best NR method DMOSp-PSNR the fastest 43
References EBU BPN 055: Subjective viewing evaluations of some internet video codecs – Phase 1 Report by EBU Project Group B/VIM (Video In Multimedia), May F. Kozamernik, P. Sunna and E. Wyckens, “Subjective quality of internet Video codecs – Phase 2 evaluations using SAMVIQ” EBU TECHNICAL REVIEW – January Johnny Biström, “Comparing Video Codec Evaluation Methods for Handheld Digital TV”, T Digital TV Services in Handheld Devices, Fernando C. N. Pereira, Touradj Ebrahimi, “The MPEG4 book”, IMSC Press Multimedia Series M. Pinson and S. Wolf, “Comparing subjective video quality testing methodologies” pdf pdf TAPESTRIES Consortium 44
Sridhar Godavarthy Dept. of Computer Science and Engineering University of South Florida 45