Mired in politics
Here we are running operating systems such as Linux, OS X and Windows that typically require gigabytes of disc space on installation and almost as much RAM on boot up, yet apparently nobody expects the operating system to supply facilities for image decoding, or video decoding, or audio decoding. People seem to be expecting it to be all built into some humungous great monolithic browser "lump" because "we don't want plugins".
Making use of your host operating system's API to render images, render video, render audio, render fonts, handle colour, do printing etc. etc. is exactly what a browser *should* be doing, all the time. Just because the browser recognises the <video> tag and decides it doesn't need to launch an external plugin - and by the way, it could just as easily do that with HTML 4's <object>, for which HTML 5's <video> is really just a shorthand subset - does NOT mean that the browser has to have an entire media playback engine complete with CODECs all *built into itself*. That would be a preposterous solution.
Since the host OS does the decode, then whatever video the OS can play, the browser ought to be able to play. HTML doesn't specify image formats and shouldn't specify video formats; it doesn't even specify a scripting language! That's why HTML requires "type" (current) or "language" (deprecated) attributes on the <script> tag. As with the likes of Javascript, JPEG and PNG, it's likely that a de facto video standard will emerge based purely on the popularity of a particular container format and audio+video CODEC choice.
So why is a mandatory video format being pressed for? Because big companies are involved and the whole things has become mired in corporate politics. This has nothing to do with the engineering abilities of those companies IMHO.
Incidentally, just because a browser recognises <video> does not mean it avoids plugins just as handling <object> does not mean that a plugin must be launched. The browser may choose to simply show a "play" icon placeholder and when the user activates it, fire up some external helper application or embed a known video handling plugin instead. The helper application option may well be a sensible thing to do for web browsers where screen real estate is limited (e.g. mobile phones), making it easier for the user to control the video and, in particular, show it full screen.