Of course, this was done by the original format being stolen and then copied - and the holders of the format are now trying to monetise it by issuing separate 'pro' versions.
And lots of wierd incompatibilies exist - exacerbated by there being many different older versions.
But in general, as MP3 is to audio, this is to video.
Hollywood is wetting itself.
(MoonShadow): Based on MPEG4 video, IIRC. Basically, one frame every 15-30 or so is encoded in a format similar to JPEG and used as reference; the remaining frames just list coordinates of 8x8, 8x16 and 16x16 blocks in the reference frame and the direction and distance in which they move. The format allows one to overlap the blocks in the target frame, to specify new 8x8 or 16x16 blocks if there isn't one similar enough in the reference frame, and to use two frames as reference rather than just one; but not all encoders implement all of this. The quality of the video largely depends on how good the encoder is at finding the best-matching blocks in the reference frame(s) for each block in the intermediate frames. This takes prohibitively long to do by brute force, so all encoders make some sort of guess. The guessing algorithms are still something of a black art.
Sound is typically encoded as MP3.
Yes, based upon the MPEG standards. All that you have described there is a part of the MPEG standards - though you've missed a bit which DivX also uses. As well as keyframes and 'changes from that frame' frame, it also allows 'prediction' frames - where pieces of an image are specified as "this part of this frame which *you haven't yet seen* with this offset".
Um - that's what I meant when I said you can use two reference frames; but I wasn't very clear 'cos I was trying to be brief. Sorry ^^; I'll refactor that later. - MoonShadow
This, naturally, makes it much nastier to decode (since you have to read at least one keyframe ahead) but gives it a major advantage in size.
Makes it interesting in embedded systems where you have limited memory; especially if you're playing a stream so have to remember all the stuff in between the reference frames as well 'cos you can't ask for it to be fetched again.
Also, once you have the movement data - that is then JPEG compressed again - so that you have still images under FourierTransform and then the time domain given another FourierTransform. This gives (franlky) an insane compression ratio.
Moop. That's new. When I was working with MPEG all we did was quantize the motion vectors and Huffman encode the result ^^; I'd be interested to see how that works..
Well, from the compression results - very very well. From the amount of processing needed to create the streams - exceptionally badly. But from the point of view of the player - not much difference.
Will just add here - obviously just FFT-ing the time domain doesn't give you any compression - you FFT it and then cmopress. The point is that after FFT you generally get a much easier number to compress, as regions of the picture tend to move at the same speed - or to move at smoothly varying speeds. That's the theory, anyway and about the limit of my knowledge. I was working more on the data packing structures, which means I didn't much care what the data was, just that all of the data was present and correct after it'd been through the hell that is viterbi encoding and error correction.
Amusingly, most of the local wikizens will be most familiar with DivX as a compression format for Anime. Amusing, since the qualiy/compression tradeoff is far worse than far simpler methods can provide. Compare JPEG and GIF for photographs and glyphs. Straight lines and blocks of static colour are not well suited to JPEG, and hence to MPEG and hence to DivX. But, like MP3 - it's good enough.
That's fine so long as you start with a digital copy you ripped from a DVD. You're not generally going to get perfect solid blocks of colour if you're recording from a TV signal.
Well, someone has to create the DVD in the first place - and the same 'flat colour' applies. I guess you can just use the 'any encoder you like' principle, though.
Encoders being a black art, I heartily agree with. It's built into the MPEG standard by design: "Here is how you interpret the output, and view it - but how you make that, we have no idea. We guess lots of people will come up with propretry methods, but we're just gonna give a sample crappy BruteForce? algorithm."
Same for MP3, for that matter, which basically leaves you free to build your own mathematical model of human hearing in the encoder.
Yeah, but mp3 has had longer to settle down (and the huffman codec freely available for a while) - leading to 'anything is posisble but this is actual' Whereas divx seems to still wander all over the place.