Prioritizing Manual QC

Manual inspection of video - watching every frame - will always be an important part of rigorous quality control and analysis of video. But in moving image archives with enormous libraries, this level of manual inspection is not always feasible. One of our goals in building the macOS app AV Spex was to “flag” aspects of a video that need a closer look, so archivists can prioritize which parts of a video receive further investigation.

In my last post, I talked about how AV Spex checks the metadata of a video file to ensure it conforms to expected specifications (the ”specs” in AV Spex). The app also builds on the features of the open source Python tool qct-parse to automate analysis of the XML data of QCTools reports. Using the qct-parse features in AV Spex, we can identify specific frames that might have artifacts or errors. Unfortunately, sometimes that’s still too many frames! To truly prioritize manual review of video, we want to be able to investigate what is happening within the frame itself.

Beyond Frame-Level QC Values

QCTools reports contain the values of FFmpeg’s signalstats filters per frame. Here’s an example of all the data QCTools holds on a single frame from video in the National Museum of African American History and Culture (NMAAHC) collection:

<frames>
    <frame media_type="video" stream_index="0" key_frame="1" pkt_pts="0" pkt_pts_time="0.0000000" pkt_duration_time="0.0330000" pkt_pos="990" pkt_size="384986" width="720" height="486" pix_fmt="yuv422p10le" pict_type="I">
        <tag key="lavfi.signalstats.YMIN" value="6.000000"/>
        <tag key="lavfi.signalstats.YLOW" value="25.000000"/>
        <tag key="lavfi.signalstats.YAVG" value="335.074000"/>
        <tag key="lavfi.signalstats.YHIGH" value="705.000000"/>
        <tag key="lavfi.signalstats.YMAX" value="933.000000"/>
        <tag key="lavfi.signalstats.UMIN" value="36.000000"/>
        <tag key="lavfi.signalstats.ULOW" value="279.000000"/>
        <tag key="lavfi.signalstats.UAVG" value="518.363000"/>
        <tag key="lavfi.signalstats.UHIGH" value="751.000000"/>
        <tag key="lavfi.signalstats.UMAX" value="982.000000"/>
        <tag key="lavfi.signalstats.VMIN" value="147.000000"/>
        <tag key="lavfi.signalstats.VLOW" value="241.000000"/>
        <tag key="lavfi.signalstats.VAVG" value="509.733000"/>
        <tag key="lavfi.signalstats.VHIGH" value="782.000000"/>
        <tag key="lavfi.signalstats.VMAX" value="898.000000"/>
        <tag key="lavfi.signalstats.VDIF" value="0.000000"/>
        <tag key="lavfi.signalstats.UDIF" value="0.000000"/>
        <tag key="lavfi.signalstats.YDIF" value="0.000000"/>
        <tag key="lavfi.signalstats.SATMIN" value="0.000000"/>
        <tag key="lavfi.signalstats.SATLOW" value="1.000000"/>
        <tag key="lavfi.signalstats.SATAVG" value="185.665000"/>
        <tag key="lavfi.signalstats.SATHIGH" value="352.000000"/>
        <tag key="lavfi.signalstats.SATMAX" value="510.000000"/>
        <tag key="lavfi.signalstats.HUEMED" value="167.000000"/>
        <tag key="lavfi.signalstats.HUEAVG" value="172.651000"/>
        <tag key="lavfi.signalstats.TOUT" value="0.004392"/>
        <tag key="lavfi.signalstats.VREP" value="0.000000"/>
        <tag key="lavfi.signalstats.BRNG" value="0.117555"/>
        <tag key="lavfi.signalstats.VBITDEPTH" value="10.000000"/>
        <tag key="lavfi.signalstats.UBITDEPTH" value="10.000000"/>
        <tag key="lavfi.signalstats.YBITDEPTH" value="10.000000"/>
        <tag key="lavfi.psnr.mse.v" value="1319.025635"/>
        <tag key="lavfi.psnr.mse.u" value="2299.578369"/>
        <tag key="lavfi.psnr.mse.y" value="3538.474609"/>
        <tag key="lavfi.psnr.psnr.v" value="28.994980"/>
        <tag key="lavfi.psnr.psnr.u" value="26.581030"/>
        <tag key="lavfi.psnr.psnr.y" value="24.709352"/>
        <tag key="lavfi.psnr.mse_avg" value="2673.888428"/>
        <tag key="lavfi.psnr.psnr_avg" value="25.926081"/>
    </frame>
</frames>

NMAAHC found that many of the videos in their collection had frames outside of broadcast range. But when the archivists at the museum took a closer look at the files they found that the frames simply contain parts of the blanking area (video content outside of the picture area). These black borders around the outside of the picture area contain “sub-blacks” or blacks outside of broadcast range. The borders are not an issue unto themselves, but they were preventing NMAAHC’s team from effectively using the QCTools reports to automate their checks.

The blanking area of this frame that is outside of broadcast range is highlighted in yellow

The archivists want to flag frames where pixels inside the picture area are not in broadcast range, and not be bothered with frames that were perfectly fine aside from their sub-black borders. In order to automate the examination of the picture area, while ignoring the borders of the frame, we need information at the pixel-level, not at the frame level.

Using Computer Vision to See Pixels

My first goal was to automate the detection of the borders of the picture area. I turned to a Python library called OpenCV (Open Source Computer Vision Library). Using OpenCV’s “cvtColor” function on captured frames, I would convert the frames to greyscale and calculate the average pixel brightness per vertical column. When the function hits a column that is not black (average brightness > 10), that’s where the content starts. Through trial and error, I’ve had to add a lot of refinement from that initial detection, but I’m now able to detect common artifacts of analog-to-digital video borders, like head switching artifacts and visible vertical blanking intervals.

This image is used in AV Spex HTML report to demonstrate the border detection results

On Demand SignalStats

Once the picture area has been detected, I can compare analytics between the full frame and picture area to rule out frames that only have sub-blacks in the blanking area. Using FFprobe’s signalstats filters combined with a crop command, I can get the same “BRNG” value collected by QCTools, but just for the detected picture area. This will get me an “apples to apples” effective comparison between the full frame’s values and just the picture area’s corresponding values.

But, as any of us who have made a QCTools report already know, collecting signalstats filter values for an entire video is a very time consuming process. Once again, I can leverage the existing QCTools XML data to selectively choose which frames to analyze. Using very similar logic to Python code in qct-parse, AV Spex loops through the XML and notes which periods have the most values outside of broadcast range (excluding the color bars and segments of all black).

Once the start point of an analysis period has been selected, and the active picture area is identified, FFprobe can run signalstats on a targeted period, using crop to isolate the picture area only. The FFprobe command looks like this:

ffprobe \
  -f lavfi \
  -i "movie='[/path/to/input/video.mov]':seek_point=[start_time],crop=[active_picture_area],signalstats=stat=brng,trim=duration=5" \
  -show_entries frame_tags=lavfi.signalstats.BRNG \
  -of csv=p=0

Then the process of comparing the BRNG values of the signalstats output to the BRNG values in the QCTools XML is relatively straightforward. In many cases, like in the graph below, the number of frames with pixels outside of broadcast range is much lower when looking at just the active picture area.

This graph is a visualization rom the Av Spex HTML report

Highlighted BRNG

One of my favorite filters from the QCTools GUI is the “Broadcast Range Pixels” filter. The filter once again makes use of FFmpeg’s signalstats filter, this time to highlight pixels outside of broadcast range. Through my experience with using OpenCV to identify the borders of the picture area, I thought it might be possible to use computer vision on highlighted pixels to characterize their pattern in a frame.

This was not as easy as I had initially hoped. The first challenge I encountered were “false positives.” No matter what color I chose to highlight with, the color was occurring in the video already!

LL Cool J being interviewed for Ebony Jet Showcase (original)

When I was using cyan highlights of pixels outside of broadcast range, this image of LL Cool J would consistently be flagged. The edges of the white shirt on the man to J’s left, and the pant legs of the interviewer (bottom left of the frame) are not highlighted by FFmpeg signalstats. But there is a turquoise halo around these objects caused by an analog video artifact (maybe chroma shift/smear). This frame in particular took me a long time to notice because parts of the white shirt are, in fact, outside of broadcast range and would also be highlighted.

LL Cool J being interviewed for Ebony Jet Showcase (Pixels outside of broadcast range in the picture area highlighted))

This feature is the most rough around the edges and in need of the most refinement. The core challenge is distinguishing FFmpeg's highlights from colors that were already in the video. First, I compare the original frame to the highlighted version using OpenCV to detect where new magenta pixels have appeared. Then I convert both versions (original and highlighted) to the HSV color space and look for shifts in hue and saturation that correspond to the highlights. If the two methods agree, there’s a far lower chance of the “false positive” of frames that just happen to have magenta in them.

The detection works for examples like this one, where the pixels are grouped together well, but it does not do as well with more diffuse patterns.

The Road Ahead

Right now these analysis features focus on border detection and flagging frames with values not in broadcast range, but of course there are lots of common analog video artifacts and errors that would not be detected by these features. I'm hoping to expand analysis to automate detecting other types of errors as well. If you have suggestions, I'd love to hear them!

In the meantime, I'll be testing these features on digital video created from analog sources and refining the code based on what works and what doesn't. The goal remains the same: help archivists spend less time sifting through frames that are fine and more time investigating the ones that actually need attention.