Posted By: Ben Waggoner | Jan 11th @ 1:43 PM

Way back in September, we announced we were doing a VC-1 Encoder SDK. I'm happy to report that not only are we making it available, but we've decided to release the "Professional" version for free, for incorporation into compression products and tools. It's just a SDK, so it will need to be incorporated into a product to be used. Enterprising developers can find it here.

We've also combined the "Enterprise" and "Broadcast" versions of the SDK into a single library with a single license.

Professional

Professional ("ProSDK") is the baseline version of the SDK. It's a free download with a "clickthough" EULA which will allow its no-cost use in a variety of products.

Performance

The SDK includes a  variety of performance improvements, both in algorithms and in use of SIMD instructions like SSE3 and SSSE3. With equivalent settings, you'll get about a 10-15% improvement in speed over the WMP 11/Vista versions of the codec ("FDSK 11") on a recent Intel or AMD processor.

Note that different options are on by default between the two, so it's possible for an encode with FSDK 11 without any registry keys turned on to be faster than an optimized one in the ProSDK, although the quality difference will be bigger.

Quality

Quality is harder to quantify, of course, and can vary quite a bit. Using the standard PSNR (Peak Signal to Noise Ratio) metric, we average about 0.5 dB better with ProSDK compared to an optimized v11 encode using all the registry keys, but it can vary quite a bit with content. The more perceptually-focused SSIM shows improvements on the order of 3-4%. Of course, compared to the stock settings without using registry keys, the improvements will be more dramatic yet.

Note that the DQuant (Differential Quantization) features in the ProSDK are different than in the registry key settings available in the v11 codec. The new DQuant is mainly targeting HD DVD and Blu-ray bitrates, and is less tuned for lower bitrates. For web bitrates, typically you'd want to have DQuant off, or on for I-frames only (making I-frames look better can dramatically improve the quality of lower-motion video, since all other frames in the GOP are based on that I-frame).

I'm just back from vacation and CES, but I'll try to get some sample encodes up demonstrating the quality improvements up next week.

Elementary Streams

The SDK natively creates a VC-1 elementary streams, in file form a ".vc1" (equivalent to a .m2v file in MPEG-2). This makes it possible to directly encode a file that can be muxed into a HD DVD or Blu-ray disc. Thus, the ProSDK will finally enable a wide variety of applications to add support for encoding VC-1 for optical discs. The VC-1 Elementary stream can also be muxed into a MPEG-2 transport stream for IPTV use.

When the SDK is used to make a Windows Media (ASF) file, it is used in conjunction with the existing Format SDK to provide audio encoding and muxing.

Enterprise

The VC-1 Encoder Enterprise SDK ("EntSDK") now incorporates the feature of what we had announced as the "Broadcast" version of the SDK in the same library, although there are different modes for the live scenarios. Enterprise will require a paid, commercial licence from the software company licensing it, in part to offset the greater help required for integrating the more advanced features. Compared to Pro, there are two big areas of improvement in Enterprise: Live and Segment-based encoding.

Live Encoding

The single biggest advantages in the new SDKs are probably in the live area. The basic improvements from Pro are certainly useful, plus the performance improvements make it possible to encode with higher complexity modes, further improving quality. But the new Enterprise features in the live encoding mode go much further.

Lookahead Rate Control

The main reason why 2-pass encoding produces better results than 1-pass encoding is that it enables the codec to "look into the future" in order to make decisions based on how the content is going to change in the future. In the FDSK 11 codec, we introduced a "Lookahead" mode, which enabled the codec to buffer ahead up to 16 frames to make frame mode decisions, like using "BI" (intra-coded B-frames) frames for flash frames, doing better scene detection, and turning off B-frames for fades to activate VC-1's intensity compensation mode. In Enterprise, this is extended to rate control itself in the "Lookahead Rate Control" mode, letting the codec do things like start to lower the data rate a bit to save bits for a coming hard frame, or to know it's safe to spend a lot of frames on an I-frame where following frames are nearly identical.

The net effect of Lookahead Rate Control is that it gives much of the quality improvements from 2-pass encoding available in live encoding, resulting in video that maintains much more consistent quality. In particular, you'll see fewer cases where a sudden change in the video results in a sudden drop in quality.

Dynamic Complexity

One challenge in live encoding, especially at higher resolutions and frame rates, is finding settings that produce optimal quality without dropping frames with the most complex scenes. This often leads to encoding using a less aggressive complexity leaving a lot of CPU headroom most of the time in order to avoid dropping frames in the most intense sections. Additionally, different encoding modes might be more efficient in different sections of the video.

Dynamic complexity address this by dynamically adjusting a number of encoding settings in real time in order to hit a specified level of CPU utilization, switching features on and off to result in the best quality for each section

Segment Reencoding

Professionally encoded VC-1 for HD DVD and Blu-ray took advantage of "segment reencoding", where encoding settings like bitrate and DQuant can be changed per scene. This enables tweaking to get transparency to the source at lower bitrates, and to speed encoding by only using the slowest modes on the most challenging scenes.

Grid encoding

And of course, once you have the ability to encode each segment individually, there's no reason to encode everything on the same machine. The Enterprise SDK includes rich support for grid encoding, where different sections of a long encode can be split up between multiple machines. And now that we have 8- and 16-core machines readily available, we can even split up a single encode across multiple instances on the same system (typically using 4 cores per encoder instance, so a 16-core machine can encode 4 15-minute sections of an hour long movie simultaneously). Unlike other popular grid encoding products, the VC-1 Encoder Enterprise SDK enables full 2-pass VBR encoding support across the nodes, letting the encoding software shift bits between segments for optimal quality and efficiency.

SSE4 support for interlaced compression

There is one performance improvement in the Ent SDK that's not in Pro SDK: SSE4 support for interlaced encoding. The new Intel "Penryn" processors will offer about another 15% improvement from our use of SSE4 instructions when encoding to the Advanced Profile's interlaced mode.

Conclusion

And so begins the new era of VC-1. Many vendors of encoding tools have been working with prerelease versions of the SDKs for a few months now, so anticipate product announcements soon. I'll keep you posted as products ship.

Today, the first shipping SDK products are Inlet's Fathom and Spinnaker.

Tags:
Rating:
0
0