The past 3 days I’ve committed a couple major changes to TagLib#. Namely, support for MPEG-1/2 files and support for MPEG-4 video. The latter was pretty easy, but the former was an amazing pain, largely due to the amazing lack of online documentation for this 14 year old format. As such, here’s my brief explanation of:
HOW MPEG-1/2 FILES LOOK AND HOW TO UNDERSTAND THEM
MPEG contains two key parts, an audio stream and a video stream. If you were to extract the audio stream from an MPEG file, you’d have an MP3 (which could actually really be an MP2 or and MP1). The MP3 stream is very well documented online. For my personal favorite, click here. MPEG video is a very different format, but the important thing is that we have two streams, audio and video, and we’re combining them into one.
To do this, MPEG does pretty much the exact same thing as OGG, it takes the two streams, splits them into packets, and shuffles them together like playing cards. When you know what each of these cards looks like, the format stops being scary and becomes managable. So, what do the cards look like?
Each packet starts off with a simple packet identifier: 0x000001??. The ?? is replaced with a byte explaining the purpose of the packet. These are:
- BA: This packet contains information on the MPEG version, a timestamp, and most likely some other useful information. This packet will always appear before BB, C0, or E0. BA should also be the first packet of the file.
- BB: This contains some more useful information (I’m guessing.)
- BE: A padding packet.
- C0: An audio packet.
- E0: A video packet.
- B9: End of stream notification.
BA is a very important packet because it provides information on time. It appears before each audio or video packet to let you know when it’s happening. If you have the first and last BA packet, you can determine the duration of the movie. It sure beats the guessing in MP3 and doesn’t depend on weird codec dependent calculations like in OGG.
Now that we know the MPEG version and we know the duration, it is time to piece together the audio information. The first audio and video packets SHOULD contain complete headers for their respective formats and should tell us all we need to know. Audio is no problem for us, as we can just scan to the first C0 and sic our FindFrameHeader functions from the MP3 parser on it, and get every bit of information we need to know.
Video is a completely different format, but now that we understand MPEG files we should be able to understand this, as it uses exactly the same format: 0x000001??. The first packet should be B3, which we parse with VideoHeader.cs.
That just about covers everything except packet size. For all but BA, which is version specific, the first two bytes following the packet identifier describe the packet’s content length (total packet size – 6). This means you can just skip through the file pretty quickly (about 0x800 bytes at a time, depending).
All the work is in subversion. Be sure to note that the file sticks an ID3v2.4 and ID3v1 packet at the it’s end.