Wednesday, May 2, 2012

The Problem with HTML5 Audio/Video

HTML5 introduced the brand new <audio> and <video> tags, in recognition that the internet needs a way of delivering multimedia that doesn't depend on a resource-intensive browser plugin (i.e. Flash).  This is a good thing.  Except for one part: There are only three different video codecs supported on video tags, three different audio codecs supported on audio tags, and not all browsers support all the codecs, meaning if you want to use it you have to have your file in all the formats, which wastes space.

I look at this, then take a step back and say "but hey, even though Firefox 9 doesn't apparently have an h264 decoder built in, I have a perfectly good GPU that can do it.  Why can't they just use decoders available on the user's system as opposed to relying on software decoders loaded by the browser?"

Obviously they should still have the software decoders available in the browser.  In fact, every browser should have all of them.  There's no excuse.  However, the web browser should then take a look at your system to see what codecs you have installed and automatically prefer those over any that are built into the browser.  That way, people like me with lower-end computers that can decode h264 just fine thanks to having a decent GPU can actually use what they have available.

These days, there's no excuse for not having a decent codec pack installed anyway.  All the people who whine whenever groups switch codecs need to realize that there are all-encompassing, simple to install and configure codec packs out there that will make any codec switch a moot point.  The Combined Community Codec Pack is one such pack, and comes with my personal recommendation.

I might sound like a hypocrite for saying what I said in that previous paragraph and still complaining about retards who use 10-bit h264, but I'm not.  10-bit is simply a color profile of the h264 codec.  It's still the same compression and decompression algorithm, just with support for more bits per color channel.  The issue is that nothing else in any system that 10-bit will be used on actually supports 10 bits per color channel, so the video has to get dithered down to 8 bits per color channel once it's decoded, adding to the overall time it takes to decode any given frame, and contributing to frame dropping on systems that can't do software decoding smoothly.

To be fair, if it can be done in hardware, then why the fuck would you ever want to do it in software?  Hardware is going to be so much more efficient at it.  Of course, I have proof to back this up.  Playing an 8-bit h264 video on my system, using the GPU to decode, takes around 30% CPU.  Which, since the GPU is what's actually doing the decoding, I can infer that the CPU is simply being tasked with reading the file from disk and sending the relevant parts to their relevant decoders.  When I play a 10-bit h264 video on my system, it uses anywhere between 60 and 100% CPU.  This is because the GPU can't do it, so it has to be done in software using the CPU.  For relatively low-action scenes, this is fine, but when things really get going, the decoder has to start dropping frames (because they're taking too long to decode) to even try and maintain audio/video synchronization, and even then it fails sometimes.

As for decoding audio, no matter what I play in foobar2000 (wav, midi, mp3, ogg, ape, flac), I barely ever see foobar2000 use any CPU at all.  I honestly don't know if my (integrated Realtek AC'97) sound card does any form of hardware decoding, but regardless, audio is going to take a lot less effort to decode than video.  This is evidenced by how when a video desyncs, it's always the video that gets behind the audio and never the other way around.

No comments:

Post a Comment

I moderate comments because when Blogger originally implemented a spam filter it wouldn't work without comment moderation enabled. So if your comment doesn't show up right away, that would be why.