Microsoft Communities

VC-1

Posted By: Ben Waggoner | May 28th @ 11:39 PM
I've always called my annual compressionists party something like the "Ben Waggoner Compressionist's Party" but David Sayed from the Expression Encoder team suggests the much better name of the "Secret Compressionist's Ball"

Better yet, he's got a couple of video interviews from the party! You'll see me in the background in my mildly famous purple shirt in a few shots.

First, Tony Houghton from Promoscape, talking about Silverlight.

Second, Bruce Lidl from MainConcept talking about their VC-1 codec implementation they're now licensing out, including support on Mac and Linux.

The clips are embedded at smaller than they're encoded; double-click to see them in their full glory.
Posted By: Ben Waggoner | Apr 22nd @ 7:46 PM

As promised, here are the final decks for my presentations at NAB.


Encoding for the Next Generation: MPEG-2/H.264/VC-1

This session was part of the all-day Next Generation DVD track sponsored by the DVD Association.

I talked about the various options for and issues in encoding for high definition, focusing on Blu-ray but also talking about digital downloads. It's apparently become somewhat infamous, with mentions of it over at Doom9 and AVSForum. Although now that I look, they're both from the same guy...

Here's the deck: Encoding for the Next Generation.pdf


In-Depth Microsoft Silverlight

This was a three-hour media-focused overview of Silverlight and the Expression tools. It took a look at encoding and hosting video and audio assets for integration, and  incorporating those into a Silverlight Rich Interactive Application. I used a shorter version of the same presentation for the "Successfully Set up your own Streaming Media Solutions in a Worship Environment" track, so I won't bother to include both.

Here's the deck: In-Depth Microsoft Silverlight.pdf

I've been blogging versions of the tutorials. So far I've got:

Tutorial 1: Default Settings (no blog post there, as it's just a demo of the pretty-good quality that EEv2 can do without changing ANY settings, automatically adapting to the 480i 16:9 source).

Tutorial 2: Movie Trailer at 2 Mbps

Tutorial 3: Encoding screen recordings (less the navigation demo in the online version)

Now I need to figure out a good .pptx to Silverlight conversion workflow.


And if you like this stuff, don't forget about my class at Stanford in June.

Posted By: Ben Waggoner | Apr 9th @ 5:23 PM
Oops, forgot to post this video clip of a VERY exhaused me at Mix talking to Expression Encoder's James Clarke about 2.0. The clip itself is actually quarter HD (960x540), so double-click on it to take it full screen.

Could be a good case study for the captioning features in Silverlight...
Posted By: Ben Waggoner | Mar 31st @ 8:20 PM

Hard to believe that NAB is starting at the end of next week!

Here's my schedule for booth duty and conferences if you'd like to swing by and say hello.

 

Encoding for the Next Generation: MPEG-2/H.264/VC-1

Saturday, April 12th 3:30 pm - 4:45 pm

Las Vegas Convention Center N252

This session is part of the all-day Next Generation DVD track sponsored by the DVD Association.

I'll be talking about the various options for and issues in encoding for high definition, focusing on Blu-ray but also talking about digital downloads.

 

Successfully Set Up Your Own Streaming Media Solutions in a Worship Environment

Sunday, April 13th 9:00 am - 5:00 pm

Las Vegas Convention Center N115

This is going to be a fun event. We're going to be demonstrating the end-to-end experience of live broadcasting for both live events and on-demand HD, emphasizing the hands-on and best practices aspects.

 

Microsoft Booth

Monday, April 14th 1:30 pm - 6:00 pm

Wednesday, April 16th, 9:00 am - 1:30 pm

I've got two shifts at the booth, and I always happy for more company. I'll be manning the Silverlight pod. Come on down to get some questions answered or see some demos.

 

In-Depth Microsoft Silverlight

Tuesday, April 15th, 10:00 am - 1:00 pm

Las Vegas Convention Center N256

This is a three-hour media-focused overview of Silverlight and the Expression tools. We'll take a look at encoding and hosting video and audio assets for integration, and how to incorporate those into a Silverlight Rich Interactive Application.

 

Annual Compressonist's Party

Tuesday, April 15th, evening

The Wynn

Yep, it's time again for the my annual compresionist's party at NAB. And now that I work for Microsoft, the snacks are better than ever! We're not exactly sure what time and which room yet; I'll share details when we get closer. But we try to start early enough and run late enough that we're a good stop on the way to or from dinner. Drop on in and let's talk about the news of the compression world from the show.

RSVP is not required, but if you think you're coming, drop me an email so I can track a rough headcount. My email is first name period last name at Microsoft.com

Posted By: Ben Waggoner | Mar 27th @ 7:55 PM

One of the best parts of my job at Microsoft is when I can put aside the video strategy stuff and do some real-world hands-on video compression encoding for a project. My friends on the IIS team asked me to encode their new tutorials for Silverlight playback, and I thought it was a great project to illustrate the screen encoding tips I talked about a few weeks ago.

As mentioned a few weeks back, Silverlight 1.0 and 2 only support the Windows Media Video 7, 8, and 9 (aka VC-1) for video codecs. We don't support the older Windows Media Video 7 and 9 Screen codecs. This is a fine thing from my perspective; it makes the install size of Silverlight smaller, and we can get better results with our current VC-1 implementation than we can out of the screen codecs. This is because a modern OS like Vista's Aero Glass or Mac OS X 10.5 using a lot of gradients and transparencies that older screen codecs don't handle efficiently, but matches much more closely the kind of video image that VC-1 is designed for.

So, using the beta of Expression Encoder 2, which incorporates the new VC-1 Encoder SDK, let me show a real world project delivering in VC-1 for screen captures.

Goal

The job was to provide a series of source clips demonstrating common tasks in the new IIS 7. Previous screen recordings the team had done used the Windows Media Video 9 Screen and Windows Media Audio 9 Voice codecs with a total bitrate of 500 Kbps for 1024x768, 5 frames per second. There were apparent artifacts in both video and audio, although the content itself was comprehensible. I wanted to reduce the total bitrate to 400 Kbps, while tripling the frame rate to 15 fps and largely eliminating apparent video or audio issues.

Additionally, I also wanted to make files with specs to stream off Silverlight Streaming, which recommends a max peak bitrate of 1400 Kbps. So the total of my peak of video and audio needed to be no more than 1400.

Source

The source had been recorded in Techsmith's Camtasia Studio product, which captures screen activity live to an .AVI file using their lossless video codec. Camtasia does a great job of this kind of screen recording; something like the HDMI to HD SDI I used for my previous Expression Encoder 1.0 training would have been serious overkill for this low-motion lower resolution content, and forced an extra color conversion step.

The tech spec for all the files was:

  • Video: 1024x768 15 fps
  • Audio 44.1 KHz 16-bit stereo

Encoding Settings

IIS_encode_settings

Video Settings

  • Frame Rate: Source. VC-1 is extremely efficient, so we can increase the frame rate from the typical 5 to the full 15 that were originally captured
  • Key frame interval: 20. This is an unusually high setting, but critical to keeping our bitrate down. Since screen recordings often have long sequences without any dramatic changes in the video, it's pretty common for the B and P frames to be tiny, and I-frames to make up the majority of the total bandwidth. So if you wind up with too frequent I-frames, they wind up spending a ton of bits repeating the same static parts of the frame leaving the codec unable to spend those bits on other parts of the image. The normal drawback of long gaps between I-frames is slow random access. However, random access is really a matter of how many P-frames there are between I-frames (as B-frames can be skipped during decoding since no frame references them). Thus, increasing the number of B-frames between P-frames improves random access. Since we'll be using 4 B-frames as you'll see below, only 1 out of 5 frames between I-frames is a P-frame, giving us a max of 60 P-frames between I-frames (15 fps, of which 3 can be P-frames, over 20 seconds between I-frames). So, we'll have about the same random access performance as if we'd encoded at 30 fps with the standard 1 B-frame and a max 4-second keyframe interval (30 fps, of which 15 can be P-frames, over 4 seconds between I-frames)
  • Profile: VC-1 Advanced Profile, so we can use the I-frame DQuant feature below. For Silverlight 1 (which is progressive-scan only) the lack of I-frame DQuant is the only disadvantage to Main Profile compared to Advanced Profile.
  • Mode: VBR peak constrained, so we can specify both an average bitrate (to control file size) and a peak (to make sure it fits within the Silverlight Streaming 1400 Kbps maximum). VBR peak constrained is always a 2-pass encoding process, which we also want in order for the codec to be able to do optimal bitrate distribution over this file with highly variable complexity
  • Bitrate (Average): 350 Kbps, leaving us with 50 Kbps to use on audio.
  • Peak Bitrate: 1300 Kbps, leaving another 100 for audio peak.
  • Buffer Size: 5 I stuck with the default, which is fine for VBR at this bitrate. Bigger would give the codec a little more flexibility to move bits around, but could make playback of the web a little more touchy on slower speed connections.
  • Width and Height: 1024x768, matching the source.

Audio Settings

  • Codec: WMA. While Silverlight 2 adds support for Windows Media Audio 10 Professional, it isn't supported in Silverlight 1.0, which we wanted to use for this demo. We'll stick with good old WMA for maximum backwards compatibility.
  • Mode: VBR. Again, so the codec will distribute bits optimally throughout the piece, savings bits from pauses and spending them on harder bits of content
  • Bitrate: 48 Kbps. This is the lowest supported bitrate for WMA in VBR mode. I could go lower with CBR, but there's often some high-frequency artifacts in WMA CBR @ 32 Kbps and below for voice I find annoying, so I'd rather have overkill with VBR @ 48 Kbps.
  • Sample Rate: 44.1 KHz. Silverlight's internal sound engine runs at 44.1, so I recommend encoding audio to that to avoid an unneeded sample rate conversion. In this case, it also matches the source.
  • Bits per sample: 16, the only option with WMA. I'd use it anyway, as it matches the source.
  • Channels: Stereo, the only option with VBR WMA. WMA will intelligently encode the audio only once when it's identical in both channels, so it's safe to encode a mainly mono mix like this as stereo without a risk of inefficiency. The source in this case is nominally stereo, but is a mono mix.
  • Audio Peak Bitrate: 96, to add to the 1300 for video and to keep us under the 1400 Kbps max for Silverlight Streaming. That's plenty for voice content.
  • Audio Peak Buffer Size: 1.5. This default is nearly always fine.

Advanced Codec Settings

 IIS_advanced_settings

  •  Video Complexity: Normal (3). The default is just fine for simple motion like in screen recordings. Higher values are mainly useful with lots of differing motion in fine details, like with film grain or particle effects. I probably could have gotten away with lower without much drop in quality for this content.

Perceptual Options

  • Adaptive Deadzone: Off. This is good for preserving some coarse texture like film grain, but we don't have any textures we want to preserve here - it's pretty much flat areas, gradients, and fine details like font edges.

  • DQuant: I-Frames Only. DQuant is short for Differential Quantization, where the codec is able to vary the degree of compression (quantization) per macroblock (16x16 block of pixels) in the frame. The DQuant implementation in the VC-1 Encoder SDK used in Expression Encoder 2 looks for areas of smoother texture and then compresses them less. This implementation is much more aggressive than the one that shipped with Format SDK 11, and isn't appropriate for most low-bitrate encoding. But for screen captures, using it's just for I-frames (which are only 1 our of 60, as we determined above) can improve the quality of the I-frames without taking too many bits away from the other frames. And by establishing a very clean reference frame, the following frames based on the I-frame, or based on a frame based on the I-frame, start with a near-perfect copy of the screen image to start from. This reduces the common effect in older codecs where the image can be soft or blocky after a scene change, with the quality improving over the next few frames even though the original image didn't have that change.

Filters

  • In-Loop: On. The In-Loop deblocking filter softens areas where a compression artifact would otherwise be visible, and then predicts future frames on that improved version. This always helps quality at Silverlight bitrates, and I recommend it always be on as long as a low-powered device like a cellphone isn't being targeted; it does slightly increase CPU requirements for playback.

  • Overlap: On. The Overlap filter further softens potential artifacts. Since Silverlight doesn't have the postprocessing modes of Windows Media Player, the overlap filter is good to have on at typical Silverlight bitates. It's more of a brute,force filter than the In-Loop Filter, and can soften the image a bit at high bitrates.

  • Denoise: Off. Source isn't noisy.

  • Noise Edge Removal: Off. No noisy edges

Group of Pictures

  • B-Frame Number: 4. We get two things out of using this instead of the normal 1 with screen recordings. First, it helps improve compression efficiency, given the very simple motion in screen recordings. A B-frame can be based on the previous and/or next I- and P-frame, but not another B-frame. With content like film or video with some random noise in them, too many B-frames hurt quality since a B-frame can be so temporally separate from its reference frames. But a Camtasia screen-record is pixel-perfect, without any random noise. So we actually get an improvement in efficiency. Also, the greater number of B-frames lets us push up the interval between keyframes without hurting latency (as mentioned above), further improving efficiency. Going from a keyframe every 5 and 1 B-frame to a keyframe ever 20 and 4 B-frame, I was able to get better quality at 350 Kbps than I was getting at 600 Kbps before.

  • Scene Change Detection: Always have this on. It will automatically insert an I-frame at cuts, improving compression efficiency and random access.

  • Adaptive GOP: On: Always have this on. It tells the codec not to insert I-frames at regular intervals as defined by "Keyframe every" but just treat that as a maximum distance between GOPs. This helps efficiency quite a bit.

  • Closed GOP: No. Always have this off. Closed GOP makes editing easier (which we're not going to do) but hurts efficiency slightly.

Motion Estimation

  • Chroma Search: Full True Chroma. Not normally needed with screen captures, but helpful in this case as the recordings were done with ClearType on. See the previous blog post about ClearType why that's a potential problem.

  • Motion Method: SAD. The Sum of Absolute Differences is quite a bit faster than the alternate Hadamard or Adaptive modes, and perfectly good for screen recordings without any noise.

  • Search Range: Adaptive. Sometimes those dialog boxes can go pretty fast. And with 4 B-frames, each P-frame has to go back a 1/3rd of a second to the previous P or I-frame for reference. An adaptive motion search range makes sure it'll find the match if it's there.

The Results

And here's the final files, embedded in Silverlight up at IIS.net. Remember to double-click on the video windwo to go full screen and enjoy their full glory. Beyond being a compression demo, they're pretty darn useful demos of common IIS7 activities. There will be a few more files uploaded in the next few weeks, and I'll update this post to include those.

Installing Necessary IIS7 Components on Windows Vista

Install only the components you need for your Web applications by leveraging IIS7’s modular architecture.  This tutorial will cover installing the modules necessary for serving ASP and ASP.NET pages from IIS7 in Windows Vista.

Serving New Content

More flexible deployment options let you decide exactly how you want your Web content served by IIS7.  This tutorial will cover creating your first Web site, Web application and Virtual Directory through the new IIS Manager graphical-user-interface.

Editing Configuration Files

Strongly typed schema written in clear-text XML makes IIS7 configuration files simple to read and edit.  This tutorial covers reading and setting configuration in ApplicationHost.config at the server level and Web.config files at the site and application level.

Troubleshooting Unexpected Issues

Prescriptive detailed errors, automatic failure tracing and more exposed runtime information make IIS7 the simplest and quickest Web server to troubleshoot.  This tutorial will cover debugging site and application failures with the advanced diagnostic features in IIS7.

Setting Up FastCGI for PHP

Improved performance and greater reliability for PHP applications is ensured by the new FastCGI component for IIS7 and previous versions.  This tutorial will cover installing PHP 5.2.1 and the new FastCGI component to IIS7 in Windows Vista.

Delegating Configuration to web.config Files

Distributed, file-based configuration is a powerful new feature of IIS7 that enables delegated management of Web application settings at a very granular level.  This tutorial will cover the structure of IIS and ASP.NET configuration, unlocking IIS configuration for delegation, creating and setting configuration in Web.config files and using location tags.

Using ASP.NET Forms Authentication

HTTP request processing is more integrated in IIS7 allowing ASP.NET features like Forms Authentication to process requests for non-ASP.NET content like ASP, PHP or media files.  This tutorial will cover configuring authentication settings in Web.config, adding users and roles to membership, and configuring authentication for all content types in Integrated Pipeline Mode.

Configuring SSL in IIS Manager

Enabling powerful SSL security to protect your Web applications is simpler to setup with IIS Manager and easier to deploy with self-signed certificates in IIS7.  This tutorial will cover adding self signed certificates, creating certificates with a Certificate Authority and setting up HTTPS bindings.

Extending Web server Functionality in .NET

Building Web server add-ons and extensions is simpler and less time-consuming because IIS7 supports .NET extensibility through the IHTTPModule and IHTTPHandler interfaces that ASP.NET developers already know and use today.  This tutorial will cover building a .NET module starting with the Managed Module Kit, implementing the IHTTPModule interface, attaching EventHandlers to pipeline events and configuring IIS7 to use the module in the request pipeline.

Improving Performance with Native Output Caching

Dramatically reduce Web application response time by leveraging native HttpCacheModule in IIS7 that stores all application outputs in Kernel mode cache.  This tutorial will cover enabling and configuring user-mode and kernel-mode caching by creating new output caching rules in config and through the IIS Manager GUI.

Posted By: Ben Waggoner | Mar 27th @ 1:37 AM

I've been playing around a bunch with screen recordings lately (as you'll see in my next blog post), and I've noticed a pretty common problem: people leaving ClearType on while recording. ClearType is cool stuff, adding improved anti-aliasing on LCD displays. But the way it works is by taking advantage of the little sub-pixel strips of Red, Green, and Blue each LCD pixel is made of. This improves detail and readability. But, as you can see below, when you zoom in you get color fringing on that small text.

Those little color fringes make the video somewhat harder to encode, since they change with the position of the text. And they also look wrong when then video is played back on a non-LCD display, and really wrong on an oddball LCD display which has a different pattern of the colored strips.

 

ClearType Font Smoothing @ 100% Zoom

ClearType_100 

 

ClearType Font Smoothing @ 800% Zoom

ClearType_800

 

Standard Font Smoothing @ 100% Zoom:

Standard_100

 

Standard Font Smoothing @ 800% Zoom:

Standard_800

 

When doing any screen captures, be it Camtasia or via a HDMI to SDI bridge like I used for my Expression Encoder 1.0 training, make sure you have ClearType off. In Vista, you do that via the Appearance Settings control panel's Effects button.

If you wind up with source  video that has been recorded with ClearType on, turning on Chroma Search can help quality some by enabling the code to pick up on all hue/saturation changes on text as it moves around. I recommend using the "Full True Chroma" mode if using a tool based on the VC-1 Encoder SDK, like Expression Encoder 2.0.

Posted By: Ben Waggoner | Mar 6th @ 11:10 AM
And here it is. You can watch in Silverlight, or download in WMV or for iPod or Zune.
Posted By: Ben Waggoner | Jan 15th @ 11:02 AM
A few months ago I posted links to my Expression Encoder training. However, I didn't encode the original clips, and being a little obsessive I wanted to see wha I could pull off myself with VC-1 for screen shots.

You can see the results of my work here.

Some tips for encoding screen shots:
  1. I-Frame DQuant. While I normally don't recommend DQuant for low bitrates with the VC-1 Encoder SDK, it makes sense to apply it to I-frames for screen recordings, since we don't need very many of them, and with so little motion, that really helps improve the quality of all frames based on the reference frame.
  2. Long GOPs (distance between keyframes) can really help efficiency, since keyframes can take up the majority of bits in the files.
  3. Use B-frames. >1 can pay off a lot in improved efficiency. And more B-frames improve random access when using long GOPs, since B-frames don't need to be decoded when jumping to a particular frame (just the previous I-frame and all P-frames between that and the current frame, plus the following P-frame for a B-Frame). I think I used 3 for these clips.
  4. Chroma Search! A full precision chroma search can pay off for colorful content like screen shots.
  5. In-Loop, but no Overlap filter. The Ovelap filter softens the image some, which is better than getting blocky artifacts at low bitrates with natural images, but looks weird with crisp static content like screen shots. The In-Loop filter is okay since it only kicks on when needed.
  6. 2-pass VBR. Since the complexity of screen recordings varies so much, doing an analysis pass as well as letting the codec distribute bits on its own allows for big momentary spikes for quality, while letting the average bitrate stay low.
  7. Quality 90. Using Quality 90, the codec can dip down to QP (Quantization Parameter) 1, letting our reference frames be really crisp, reducing the bits later frames need. A good thing. In the VC-1 Encoder SDK, we can manually set the min and max QP as different parameters, but that feature was added after I did these clips.
  8. Don't be afraid of low bitrates. Using the above, you can need a lot fewer bits/pixel for screen recordings than typical motion content.
Posted By: Ben Waggoner | Jan 14th @ 4:09 PM
Alex Zambelli has a VC-1 SDK FAQ up, saving me the trouble of doing one :).

If you have any Q's that haven't been FA'ed up there, let him know.

Posted By: Ben Waggoner | Jan 10th @ 7:49 PM

We've just released a hotfix for a recently discovered issue impacting encoding performance when doing muliple bitrates ("Intelligent Streaming"). Essentially, all the encodes happen on a single core, instead of spreading out over up to 4 cores.

http://support.microsoft.com/kb/945170

Recommended for anyone encoding to multiple bitrates. It can improve performance around 4x on a 4-core system.

Tags: ,
Posted By: Ben Waggoner | Dec 11th, 2007 @ 1:44 AM

The "Compression with Silverlight" Live Meeting I did with Michael Scherotter has now been posted.

Check it out here at the Synergist blog.

Posted By: Ben Waggoner | Nov 28th, 2007 @ 1:53 PM
I'm doing a Silverlight Media Q&A with Michael S. Scherotter next Tuesday (Dec 4th) at 1pm PST.

Michael is going to be collecting questions, and we'll spend an hour anwering them via a Live Meeting session. Forward your questions on to Michael here.
Posted By: Ben Waggoner | Oct 17th, 2007 @ 6:09 PM
David Trescot of Rhozet and I were guests on last week's episode of Derrick Freeman's "The DV Show." David talked about Carbon (a great compression product, soon to support our VC-1 Encoder SDK), and I answered questions about Windows Media, Silverlight, and the VC-1 Encoder SDK.

The podcast is archived here.
Posted By: Ben Waggoner | Oct 14th, 2007 @ 12:05 AM
Alex Zambelli has updated his invaluable WMCmd.vbs yet again. The two main new features are improved support for running multiple versions at once (great with my new 8-core Barcelona workstation), an explicit QP mode for 1-pass VBR, and refactored presets for different compression levels.

The full details are in the Readme, but here's my summary and elaboration, respectively.

For running multiple versions, the script sets the registry keys, starts the encode, and then reverts them immediately after the encode starts. This should reduce the chance of the wrong keys being set during an encode (not that I've had any problems with that in the last six months or so).

QP is a measure of how compressed the image is, with lower numbers being less compressed. For the explicit QP mode, the script now lets you specify the QP you want, instead of providing a 0-100 range and knowing the magical translation table. Why is this useful? Well, for most readers, you probably don't have an intuitive sense of what quantization parameter you want to use, but it's there for those that do. And better yet, it give you a chance to understand how QP works.

Reading the above, it's clear I need to do a blog post on QP and how to use it. That will reveal the mysteries of the subtle "Quality" control for WMV 1-pass CBR modes.

Lastly, we have preset refactoring, where Alex has cleaned up what combination of settings get applied for different targets for encode time. You can think of these as an extension of the old "Complexity" slider, applying yet more options and getting better results overall. We'll be sharing these recommendations to vendors using our VC-1 Encoder SDK. These new modes are worth some detailed discussion:

fast: Up to 1.5x faster than default with comparable quality.
-v_complexity 2
-v_bframedist 1
-v_lookahead 16
-v_loopfilter 1
-v_overlap 1


Even for the fastest mode, we don't mess with Complexity 1 (the live default in Windows Media Encoder, but very rarely needed even for live encoding on a modern system). And we can use features that help quality a lot without much CPU hit like B-Frames and Lookahead. For any Main or Advanced Profile encode, B-Frames are almost always a big plus. And Lookahead should be used for all 1-pass encodes (there's no downside to having it set for 2-pass encodes; it's ignored).

good: Up to 1.5x slower than default.
-v_complexity 3
-v_bframedist 1
-v_lookahead 16
-v_loopfilter 1


A little higher complexity, and Overlap is off. Overlap causes the image to get softer, so ideally it won't be needed. But for aggressive bitrates, it might be needed with any preset.

better: Up to 2.5x slower than default.
-v_complexity 3
-v_bframedist 1
-v_lookahead 16
-v_loopfilter 1
-v_mslevel 1
-v_msrange 0

We add Integer Chroma Search which can help a lot with animation and motion graphics, and adaptive motion search range, which helps with higher resolutions and higher motion.

best: Up to 4.5x slower than default.
-v_complexity 5
-v_bframedist 1
-v_lookahead 16
-v_loopfilter 1
-v_msrange 0

Complexity jumps from 3 to 5. MSLevel isn't specified because Complexity 5 is a little unique - it has hardcoded amounts of both chroma search and Hadamard motion match that can't otherwise be specified. The nice thing about Complexity 5 is that it can provide some of the quality gains of using registry keys for machines where those can't be set. However, it doesn't set B-frames or Lookahead, so "better" would generally look better and encode faster than a default "Complexity 5"

insane: The slowest and highest quality preset.
-v_complexity 4
-v_bframedist 1
-v_lookahead 16
-v_loopfilter 1
-v_mslevel 2
-v_msrange 0
-v_mmatch 0

And lastly, Insane. Note this goes back down to Complexity 4, which allows us to specify a Full Chroma Search and adaptive SAD/Hadamard Motion Match. This is both better and slower than Complexity 5.

And the above is what I used for most of my encodes, personally.

I'll sometimes use what I think of as "Hyper Insane" which is turning -v_numthreads down to 1, which gives a very slight further improvement. Also, 4x single-thread encodes are faster than a single 4-thread encode on the same hardware. Which is why I wind up using multiple instances so much - For a huge batch of files to encode, I'll be done slightly better running 4 simultaneous single-threaded encodes.

 

Posted By: Ben Waggoner | Sep 8th, 2007 @ 8:56 PM
A couple of months ago the Expression Encoder team invited me to record their official training videos, produced by Total Training. They're now online for free.

It was an interesting production process. I've done a number of training titles over the years, including for Class On Demand, and this is the first time I've been able to do it in HD with digital capture. The workflow was simple - HDMI out of my Toshiba G35 laptop into a Blackmagic Design Intensity HDMI capture card. Full 1280x720 60 fps capture, straight into a NLE.

Unfortunately, I didn't get a chance to encode it myself - the cobbler's children don't have shoes and all that. It was encoded with an older version of our codec without all the usual tweaks. But you can certainly get a feel of this very exciting product. I'll try to get the source and redo it with our new VC-1 Encoder SDK - it'd look a lot better.

Here's the individual files (linked to from the Expression training videos page):

Introduction to Expression Encoder
Enhancing Media
Metadata, Markers, & Silverlight
Live Production
Advanced Expression Encoder

I haven't been on camera enough to get used to watching and in particular hearing myself - to my ear I sound like an overcaffinated Jim Henson.

UPDATE: I reencoded the clips myself, details here.
Posted By: Ben Waggoner | Sep 8th, 2007 @ 2:36 PM
The birth of the VC-1 Encoder SDKs will reduce the need for these over the next few months, but Alex has updated his WMV PowerToy and also revised our documentation about the registry key options.

Here's the new PowerToy. It mainly removes a few options that we determined weren't in the Format SDK 11 implementation, particularly adaptive chroma search and default adaptive deadzone.

And here's the new, hopefully final registry key documentation, reflecting the above and other useful tidbits we've learned.
Posted By: Ben Waggoner | Sep 7th, 2007 @ 10:19 AM
I've had a number of requests for how to find me and Microsoft at IBC, so here's the details.

First, Microsoft is in its own space - the Topaz Lounge (2nd floor above the IPTV area). Beyond the codec team, we also have big presences from Silverlight, Mediaroom, and the Interactive Media Manager. We have a bar here with complementary drinks and coffee, and I hope something good will happen with those beer taps later.

The big news for my team is of course is the launch of our new VC-1 Encoder SDK. This is an improved VC-1 implementation for all markets (Windows Media, IPTV, HD disc, mobile, etcetera) that'll be incorporated into third party products. Rhozet, Inlet, and Envivio are demoing support for it in their products here at the show. More details are available at
www.microsoft.com/resources/mediaandentertainment/ibc2007/vc-1encodersdk.mspx

I'll be presenting our new VC-1 Encoder at the booth on the below schedule. Feel free to drop by to say hello, and chat about anything digital media related.

Friday: 3:00 pm - 6:00 pm
Saturday: 1:30 pm - 6:00 pm
Sunday: 1:30 pm - 6:00 pm
Monday: 9:30 am - 1:30 pm
Tuesday: 9:30 am - 12:30 pm
We're also going to be doing a daily hour-long presentation of the VC-1 Encoder program. That schedule is:

Friday: 4:30 pm
Saturday: 10:00 am (I'll be giving this one personally)
Sunday: 4:30 pm
Monday: 10:00 am
Tuesday: 11:00 am

We've got a ton of great news - I wish I was going to have more time to blog before next week, but for those of you here in person, I'm glad to discuss it 1:1, and show you the impressive results. We've been blowing people away with our 720p @ 5 Mbps for IPTV demos.
Posted By: Ben Waggoner | Sep 6th, 2007 @ 7:38 PM

It's been a long road, but today was the big day for Silverlight to hit the world. Scott Guthrie's Blog has got the best detailed summary of the ton of stuff that got announced today. There are four big things relevant to digital media I want to highlight before I hop on the plane for IBC. I'll be focusing on Silverlight and its scenarios on the blog for the rest of this month.

All drop by the Microsoft booth at IBC if you'd like to say hello and see some demos of both our new codec technologies and Silverlight.

Silverlight 1.0 is out!

The next generation web plugin for digital media is now a released 1.0, so non-early adopters can start installing it and content creators and developers can start authoring to it. Silverlight uses Windows Media for its digital media, so a huge library of compatible content is immediately available. And the current CDN ecosystem for hosting and streaming all works, using Windows Media Services as shipping. Silverlight contains its own implementation of the codecs inside itself, and so doesn't have any dependency on WMP or other OS features, so you'll get an identical experience wherever Silverlight is installed. The set of codecs supported in 1.0 are:

  • Windows Media Video 7
  • Windows Media Video 8
  • Windows Media Video 9
  • Windows Media Video 10 (progressive only)
  • Windows Media Audio "Standard" (tested with WMA 9.2 back to WMA 7 files, but the bitstream has been locked-down since WMA 2)
  • MP3 (in a .mp3 file)

Full details on media features are here.

Silverlight includes robust auto-update features, so support for additional codecs is certainly possible in the future. We'd appreciate any feedback on missing codecs and media features that are keeping anyone from adopting Silverlight today.

Expression Encoder 1.0 is out!

Expression Encoder (formerly Expression Media Encoder) is a new compression tool targeting Silverlight, (although its WMV files will play in all the standard players). Beyond being a great WMV encoder, its real killer feature is easy generation of a Silverlight player experience around the media asset, including subtitles/captioning, thumbnail-based visual navigation, etcetera. It's a deep product, and I'm going to be doing a much more in-depth analysis of it here soon.

One important thing to note about Expression Encoder is that it can be used to build the Silverlight player experience around an already encoded WMV file, so it can be used in conjunction with other compression tools.

Expression Encoder is based on the Windows Media Format SDK 11, so all the registry key tweaks I've been discussing on this blog work perfectly with it.

Silverlight for Linux is announced!

Good details on Miguel de Icaza's web log. In essence, we're partnering with Novell on their Moonlight implementation of Silverlight 1.0 and 1.1 (only 1.1 had been announced before). They'll be providing everything but the media codecs, which will be built by my team and provided via Microsoft.

So, Silverlight today has great reach for Windows and Mac (Intel and PowerPC), with Linux on the way.

Lots of content support!

For any new platform, the chicken-and-egg question is always paramount when discussing adoption - why should users install before there's content, and why should content publishers target it before users have installed it? That we're already compatible with a huge volume of WMV content is a big help here, of course. But nothing beats high-profile content companies targeting the format to get the installs out there (which is also a great validation of the unique value Silverlight provides).

Customers already deploying content include: MLB.com (Major League Baseball), Home Shopping Network, World Wrestling Entertainment, and the "Entertainment Tonight" show.

Silverlight is also now deployed on several Microsoft sites, including the Halo 3 preview site, including the HD version - I worked on the encoding workflow for these assets), Tafiti.com, MSN Extra, and MSN Podium '08. Silverlight will also be used in many other new and updated Microsoft properties.

Posted By: Ben Waggoner | Aug 3rd, 2007 @ 3:20 AM

Ancient History

Compression, although an obsession with me since I was 19, didn't appear to be a career option until many years after that. My years at Hampshire College were spent essentially majoring in neuropsychology, minoring in computer science, and spending my evening and weekends helping out my film student buddies. It all seemed hopelessly random to my parents and advisors, but turned out to be the perfect background for what I do now (after all, what's compression but extremely applied neuropsychology?).

After college and a couple of science internships under my belt I decided I didn't want to spend my life writing grant proposals or doing lab work so I started a video production company with my friends, including my recent interviewer Halstead York. The plan was to use emerging technology to be able to produce and post independent films from our own scripts. We thought we had a financing deal lined up back in 1994, and purchase a NLE: (a PowerMac 8100/80 with a Radius VideoVision card, and 4GB SledgeHammer RAID) was purchased for doing video editing. The idea was we could rent it out before and after post in order to cover some of the costs. Then there were two big problems:

  1. The infamous defective BART chip in those early PowerMacs meant it couldn't keep sync for more than a few minutes.
  2. Our financing fell through.

So, there we were, with a script, no money, a bunch of debt, and a NLE that couldn't edit video. However, we found a nice market using shorter clips with looser sync requirements: CD-ROM video! And so we were launched in the heady early days of multimedia. Journeyman Digital was a full service production company for digital media, and we did all the screenwriting, production, and post that we dreamed of, but not for our own projects. But we kept writing screenplays on the side. We got as far as a few meetings with Sony Pictures on one, but like nearly all screenplays, nothing really happened in the end. And while I liked doing the work, when it came down the the fundamental gut check of moving to LA and rolling the dice, I didn't NEED to do it. Instead I got married and soon enough had three little kids, and rather ran out of time for side projects.

Halstead is only recently married and currently kidless, and had time. So he and many members of the old gang dusted off one of our old screenplays, Temporary Insanity and darn if it they didn't actually shoot the whole thing in HD! Halstead just finished up the trailer. Quite an experience seeing jokes I wrote a decade ago there on the screen. And it's amazing to see how it's finally possible to make movies on a hobbyists budget, even with high-end techniques. Check out this post on color correction in the home office.

I didn't have time to work on the production itself (I was busy having that third child get born and joining Microsoft), but I certainly wasn't going to let anyone else compress the trailers (now available for download)!

The project

And so, after all that ramble, we're back to talking about hands-on compression.

Halstead had a pretty typical 2x2 matrix for encoding: two formats at two data rates each:

Formats

  • MPEG-4 compatible with QuickTime/AppleTV/iPod
  • Windows Media compatible with Windows Media Player/Flip4Mac/Xbox/Zune/Silverlight

Data rates

  • 3 Mbps for a 720p30 HD version compatible with Xbox360/AppleTV
  • 300 Kbps for a low data rate download, which would also be portable media player compatible (iPod for .mp4, Zune for .wmv)

Workflow

The source was provided as a 730p30 .AVI file using the CineForm Aspect HD codec. It was video-only - audio was provided in a separate .wav file.

HD WMV encoding was easy - I was able to use the source as is. And the current WMCmd.vbs supports specifying a separate .wav file as source for the audio track.

HD .MOV was harder. I wanted to use QuickTime's H.264 encoder to output, since it uses a complexity-constrained mode that is well tuned for computer playback via QuickTime, on both Intel and PPC (and there's a lot of G4 PowerBooks out there among Indie film fans). While it won't offer the same compression efficiency as a highly-tuned H.264 encoder from another encoder, it'll also playback well on more machines.

However, QuickTime, even QuickTime for Windows, can't read AVI files using the standard DirectShow API! Now that we've added support for the QuickTime API in Expression Media Encoder, it's only fair for Apple to support DirectShow now :). So, I used Rhozet Carbon to encode my .avi and .wav source files into a single Photo-JPEG compressed .MOV file that QuickTime could then read (believe it or not, there's no lossless Y'CbCr 4:2:0 encoder in QuickTime for Windows). I wound up doing that compression on my G5, so I could do it in parallel with the WMV encoding on my Windows box.

For the mobile versions, I used VirtualDub to make me a nice 320x180 version of the .AVI and Carbon again to make a 320x180 JPEG .mov.

As an alternative (and what I would have done if this was going to be a high-volume process and not just a one-off) would be to using Carbon to encode all four outputs from the single source. Also, using the "multipass" mode with Carbon and other tools other than QuickTime Player Pro itself results in very, very slow rendering time, since it reruns preprocessing for the entire clip for each pass, although only a small part of the file might be adjusted per pass. So in a high-volume workflow, probably only the 1-pass mode would have been used.

Windows Media Settings

WMV HD @ 3 Mbps:

cscript "C:\Program Files\Windows Media Components\Encoder\WMCmd.vbs" -input "G:\Temp Insanity\Trailer 1 timed v5 720.avi" -output "Trailer 1 720p 3M 192.wmv" -a_input "G:\Temp Insanity\Trailer 1.wav" -a_codec WMASTD -a_mode 4 -a_setting 128_48_2 -v_codec WVC1 -v_mode 4 -v_keydist 5 -v_bitrate 2870000 -v_peakbitrate 6000000 -v_peakbuffer 4000 -v_performance 80 -v_bframedist 1 -v_dquantoption 2 -v_loopfilter 1 -v_mmatch 0 -v_mslevel 4 -v_msrange 0 -v_percopt 2

Pretty standard stuff, with the same basic settings as my previous encodes. A few items of note:

  1. Not excessive vertical motion and HD, so I didn't bother constraining the number of threads.
  2. Since the source was just stereo, I used WMA instead of WMA Pro, in order to preserve Silverlight 1.0 compatibility.
  3. Note the use of the -a_input flag to specify a different audio source.

WMV mobile @ 300 Kbps:

cscript "C:\Program Files\Windows Media Components\Encoder\WMCmd.vbs" -input "Trailer 1 timed v5 320x180.avi" -output "Trailer 1 280 Zune.wmv" -v_codec WMV9 -v_mode 4 -v_keydist 10 -v_bitrate 235000 -v_peakbitrate 600000 -v_peakbuffer 4000 -v_performance 80 -v_bframedist 1 -v_loopfilter 1 -v_overlap 1 -v_mmatch 0 -v_mslevel 2 -v_msrange 0 -v_percopt 2 -v_numthreads 1 -a_codec WMASTD -a_mode 4 -a_setting 48_44_2 -a_peakbitrate 160000

Pretty much identical to the Zune encoding settings I posted last week, except with lower data rates to hit the 300 Kbps total.

  1. The audio was pretty simple, so 48 Kbps was enough when using VBR mode (again VBR audio is a very underused and very useful feature for downloadable files).
  2. the data rate was so low, I went to the max and used -mslevel 2 (full floating point chroma search) and -v_numthreads 1 (single-thread encode). Even with those, this encoded much quicker than the HD version, since the frame size was so much smaller.
  3. Main Profile is required by Zune, and thus I can't use DQuant.

QuickTime Settings

QuickTime's advanced settings aren't available via command-line, so I'll include screen shots of my MPEG-4 settings.

I matched the WMV settings as closely as appropriate.

MPEG-4 Main Profile @ 3 Mbps

image

image

image

  1. The "Current" mode passes through the source frame size and frame rate (Note it would have said 1280x720 (Current) above - I had a different source loaded when I took the screen shot).
  2. "Optimize for Download" is the equivalent of our 2-pass VBR modes. However it lacks the ability to specify a peak buffer rate or duration.
  3. QuickTime specifies keyframe rate in terms of total frames between keyframes, not total seconds.
  4. The "Better" mode for audio encoding quality is optimal for 16-bit sources. The "Best" mode only improves >16-bit sources
  5. The Multi-pass mode improves quality, but can make encoding time very unpredictable. The WMV versions encoded quite a bit faster on a similar era machine (Dual 3.4 GHz "NetBurst" Xeon versus dual 2.0 GHz G5). My main compression box, a quad AMD, was busy doing some other work.
  6. QuickTime lacks a true 2-pass VBR audio mode. For MPEG-4 exports, I only get 1-pass CBR. With a QuickTime export, I could have gotten a 1-pass VBR encode, but only in a MP3 style "range" encode, where the final file size could vary substantially. For soundtracks in downloadable files, this makes WMA a more efficient codec.
  7. Main Profile is compatible with AppleTV, and uses B-frames. The "Extended" profile is theoretically for streaming, but it's been grayed out in QuickTime since H.264 support launched in QuickTime 7.0, and I've never seen a H.264 Extended Profile stream in the wild.

The mobile encode was the same, except with the lower video and audio data rate, and its use of the Baseline profie, required for iPod compatibility.

Differences

So, how did the two encodes come out?

For the most part, they both looked and sounded good (or at least accurate - the audio mix will be improved in a later version). The biggest difference was in flatter areas, especially with shadows. That's where the VC-1 Differential Quantization and Perceptual Optimization come in, plus the ability to use different block sizes(4x4, 4x8, 8x4, and 8x8), to better compress the edges and interiors of flat areas. The Baseline and Main Profiles of H.264 are limited to 4x4 blocks only, and H.264 doesn't have an equivalent mechanism to DQuant to compress flat areas of the image less.

Again, another H.264 encoder could have done a better job here, although at the cost of higher decode requirements, by using features like CABAC and multiple reference frames. High Profile, and hence 8x8 blocks, are not compatible with QuickTime's H.264 decoder, nor those in the AppleTV or iPod. The iPod-required Simple Profile doesn't support B-frames or CABAC.

Here's some samples from the available clips that show different levels of banding. Sorry the luma levels don't quite match - it's surprisingly difficult to get exact level screen grabs out of the QuickTime and DirectShow pipelines. If anything, these minimize the banding seeing in the clips when looking at them in QuickTime on a Mac (2.2 to 1.8 gamma correction issue?).

H.264:

Brown wall h264

VC-1:

Brown wall vc1

H.264:

Coffee room  h264

VC-1:

Coffee room  vc1

Posted By: Ben Waggoner | Jul 19th, 2007 @ 4:18 PM

 In the pleasures of software development, the joy of removing useless features is a close second to adding useful ones. To that Alex Zambelli has released a minor update to WMV9 PowerToy, simplifying a few modes. First, some of the Perceptual Optimization modes that we discovered actually do the same thing have been turned into a single "Adaptive Deadzone" mode. And he's spelled out the ADZ Min Width modes explicitly. And lastly, in Main Profile mode, the DQuant options are not available, as they never did anything there anyway. Below shows you the new Main Profile constrained mode with my typical defaults:

WMV9 PowerToy 1.2.1

Since DQuant can help quality quite a bit with some content, especially gradients like shadows and skies, this is yet another good reason to be using WMV9 Advanced Profile over Main Profile. For content where only DQuant but not ADZ  may have been used in AP, it can often make sense to use ADZ in Main Profile order to reducing banding in flatter areas of the image.

Posted By: Ben Waggoner | Jul 18th, 2007 @ 1:39 PM
Part 2 of my Halstead York interview is now up, wherein we discuss VC-1, Silverlight, frame rate unsampling, and many other things. Part 3 talking about DRM (save the most contentious for last!) will be up soon.

Posted By: Ben Waggoner | Jul 16th, 2007 @ 9:11 PM

So, we've made a couple of HD clips now - how about on the other end of the size spectrum? Let's talk about compressing for the Zune. As, usual, we'll use the Elephant's Dream source.

Zune.net has some reasonably detalied compression recommendations for targeting the Zune, and the settings are a little different than what we've seen before. But they're really targeted for consumers. Let's target the important constraints and work back for appropriate settings:

  • Format: Windows Media Video (.wmv)
  • Video codec: Windows Media Video 9 (Simple, Main, and Advanced Profiles), Windows Media Video 9 Screen, Windows Media Video 9 Image Version 2, Windows Media Video 9 VCM.
  • Video resolution: up to 320x240 (QVGA) or 320x180 (16:9 QVGA)
  • Maximum video bit rate: 500 Kbps (recommended for the best balance of optimal battery life and video quality) up to 1.5 Mbps
  • Video peak bit rate: up to 1.5 Mbps
  • Complexity or profile: Main profile, VBR
  • Audio codec: Windows Media Audio (.wma)
  • Maximum audio bit rate: WMA Standard, CBR, 128 Kbps (recommended), up to 192 Kbps, Stereo, 44.1 kHz
  • Maximum total bit rate: 1.692 Mbps, 1.5 Mbps for peak video plus 192 Kbps for audio
  •  

    So, for Elephant's Dream, what does the above tell us? Our technical constraints to be able to sync (going beyond these would force a reencode) are:

    • WMV with WMV9 Main Profile and WMA "Standard"
    • Video at 320x180 24p
    • Audio as 44.1 stereo
    • Video with a peak rate of 1500 Kbps and audio peak of 192 Kbps

     

    So, let's look at a couple of scenarios - Maximum quality, and maximized compression efficiency

    • Maximimum Efficiency
      • Video average 500 Kbps peak 1500 Kbps
      • Audio average 96 Kbps peak 192 Kbps
    • Maximized Quality
      • Video average 1000 Kbps peak 1500 Kbps
      • Audio 192 Kbps CBR

    We could also do CBR video as well, I suppose, but that wouldn't help quality much, and would waste quite a few bits (and joules for playback).

    In our previous examples, we used Windows Media Encoder session files and then WMCmd.vbs scripts. This time around, let's take the third portable example for settings files, and use .PRX files. A .PRX file defines the settings for an encode in a fashion supported by most compression tools that support the Format SDK. The "Windows Media Profile Editor" is installed along with Windows Media Encoder.

     

    .PRX for the Maximum Efficiency Scenario:

    500-1500 General 500-1500 Settings

     

    And for Maximum Quality:

    1000-1500 General 1000-1500 Settings

     

    Registry Keys:

    And for registry keys (set via WMV9 PowerToy, of course)? Pretty similar to last time:

    Reg Keys

    We're Main Profile now, so DQuant doesn't apply. So unlike the 2 Mbps sample, I'm going to use Adaptive Deadzone to get better quality in the flat areas (before we discovered that DQuant without Perceptual was better). Since encoding at 320x180 is so fast, we can turn Chroma Search up to max. My rule of thumb is at least one thread per 64 pixels high, so we at most would want to use 2-threads. But it'll be a little more efficient to do 1.

     

    Looking at the final results, both look and sound pretty good, with an edge to the higher bitrate, of course. More challenging content would benefit more from the higher rates, of course.

    The biggest challenge in the encode is simply going from 1920x1080 down to 320x180 - that's a 36:1 reduction in pixels. The action holds up nicely, but the credits are pretty illegible due to scailng. if I was doing this content for distribution, I would have cropped the left/right more aggressively for the credits, eliminating the black bars/left right in order to keep the center of the image larger. Cropping 240 L/R and 135 T/B would do the trick, albeit eliminating some of the groovy animated graphics in the corners. But for that, I leave the proof as an exercise for the reader...

     
    EDIT EDIT: Sorry about the bad link again - here it is fixed: Direct link to the .zip file

    Posted By: Ben Waggoner | Jul 16th, 2007 @ 3:48 PM
    My old colleague Halstead York just interviewed me for his blog. Part 1, focusing on HD DVD is up now.

    Part 1 of Ben Waggoner/Halstead York interview

    There will be two more parts coming over the next week or two, focusing on Silverlight and DRM.
    Posted By: Ben Waggoner | Jun 18th, 2007 @ 4:55 PM
    UPDATE: Didn't work. Try this link

    Sorry this took so long, but here it is:

    The link from this page instead

    Click to play, right-click to download.

    Comments appreciated.
    Posted By: Ben Waggoner | Jun 11th, 2007 @ 2:29 AM
    Rumor has it that ABC.com is going to be offering a 2 Mbps HD download service. I don't know about that rumor, but it sparked some conversations about what kind of quality can be delivered in 2 Mbps. Back again with Elephant's Dream, I took a whack at it. I hope y'all don't get sick of this source - there's not much that's widely available as source to the general public, and I like to use examples that readers and replicate and play around with on their own. I'll probably keep working with this clip until people tell me they're sick of it.

    So, to mix things up a bit, we'll use Alex Zambelli's WMCmd.vbs script for encoding this time. The nice thing about the script is that I can script all the encoding parameters instead of having to juggle WMV9 PowerToy and a compression tool. The script I wound up using was:

    cscript "C:\Program Files\Windows Media Components\Encoder\WMCmd.vbs" -input "G:\Elephant's Dream\ED Lag 1280x720.avi" -output "G:\Elephant's Dream\ED 720p 2M.wmv" -a_codec WMAPRO -a_mode 4 -a_peakbitrate 384000 -a_setting 128_44_6_16 -v_codec WVC1 -v_mode 4 -v_keydist 6 -v_bitrate 1863000 -v_peakbitrate 8000000 -v_peakbuffer 4000 -v_performance 80 -v_bframedist 2 -v_dquantoption 2 -v_dquantstrength 4 -v_loopfilter 1 -v_overlap 1 -v_mmatch 0 -v_mslevel 2 -v_msrange 0 -v_numthreads 1

    So, what does the above mean?

    -a_codec WMAPRO -a_mode 4 -a_peakbitrate 384000 -a_setting 128_44_6_16: This specifies the WMA Pro codec is used, in peak limited 2-pass VBR mode, 5.1 44.1 KHz 16-bit, with 128 Kbps average and a peak bitrate of 384 Kbps. I think WMA Pro is one of the overlooked parts of Windows Media - it might not has as many knobs to twirl, but WMA Pro is an extremely good codec, letting us deliver 5.1 audio at bitrates that would be low for stereo MP3. One of the unique features of WMA is that it supports 2-pass encoding modes for CBR and VBR, while most other codecs are 1-pass only. That lets us use a full 2-pass VBR for download, which is much more efficient than using CBR for audio, by shifting bits away from silent or simple passages to the hardest passages. 384K CBR typically sounds very good, so I used that as the peak.

    -v_codec WVC1 -v_mode 4 -v_keydist 6 -v_bitrate 1863000 -v_peakbitrate 8000000 -v_peakbuffer 4000: WMV9 Advanced Profile in peak limited 2-pass VBR mode, with an average bitrate of 1863 Kbps and a peak of 8000 Kbps, over a buffer of 4 seconds, and a keyframe at least every 6 seconds. I picked the datarate so that + the 128 Kbps for audio + 9 Kbps of file overhead would be exactly 2000K. And yes, it's a little confusing to have peakbuffer duration be in milliseconds and keyframe rate to be in seconds. I chose a keydist of 6 seconds to save a few bits over the 4 seconcds I used for 1080p - lower data rate and frame size means you need fewer keyframes to provide the same random access latency. The peakbitrate and peakbuffer set the maximum bitrate - this means any 4 seconds of the file can be at most 8 Mbps average. I wanted to give the codec enough headroom to spend a lot of bits on the hardest scenes (especially running through the cables). I don't like to use -v_mode 3 (2-