TagLib#: Go Speed Racer…
Posted by Brian on March 12, 2007
Note: The changes discussed in this post have yet to be rolled in to Subversion. Things have changed that require me to recode the some of the rendering parts.
When I way a young lad, nary a few fortnights ago, Aaron Bockover posted a blog entry comparing TagLib#’s performance to Entagged#’s performance, and this comparison was by no means flattering. I, being entrapped in a stupor of my own greatness, and too busy making things work to consider such trivialities as performance, didn’t pay too much heed to this at the time. That all changed one fateful day, when I committed a mortal sin, revision 71776. Working diligently to remove the vestigial organs in the body that is the library, I went too far, pulling out a stop cap which lead to a flood of file operations, making life unbearable for those souls do not fancy Id3v2. While fixing this wound, I noticed a rather startling thing, that those who abstained from the wicked art of Id3v2 lives quicker and cheerier lives. With the scent of victory in my nostrils, I was off, like a bloodhound after a fox, determined that by sheer will alone, I would cure my library of the venom that ailed it. Tearing through the code, piece by piece, I slaughtered my foe, liberating all MP3’s from the bondage they were in. Again I struck, and again, until no difference could be seen between the sons of MP3 and the sons of Ogg. There was much rejoicing throughout the land, but it was not enough. For MPEG-4, on its mighty mountain, looking down from afar was still cursed with the poison. But today, this very day, there should be rejoicing in the kingdom once more. For MPEG-4 has been freed from captivity and may walk once more among his brothers…
I apologize for that. I started typing and then got carried away, but there is much to celebrate today, as I have made another great step forward, and reduced the reading time on MPEG-4 (AKA. iTunes files) but almost 70%, bringing the average read time to just more than a millisecond. But there is only so much words can say, perhaps this picture could best explain what has happened:

This happened when I tried, one more time, to tackle the problem of why exactly MPEG-4 was taking so long to read. I had figured the culperate was the many, many tiny file reads I was doing. Reading many, many tiny blocks was my best guess, but I had no really good way of stopping this (due to the whole “box inside a box” style), so I moved the entire thing to reading one big block, but it resulted in marginal gains. Then I recoded it to just read the “udta” box and the files loading near at Ogg speeds. So I changed the reading system again, this time doing tiny file read but ignoring a few gigantic boxes I didn’t need to read. This cut the read times in half. It was sweet, but still not good enough. So, I decided to go back to the orginal comparison of TagLib# vs. Entagged#, and peeked at how Entagged# does things. Essentially, it does a very quick recursive lookup of box header and for very standard container boxes ({header}{child1}{child2}…{childN}), it recurses through them. This exposes the four main boxes that TagLib# is interested in: ‘meta’, ’stco/co64′, ‘mvhd’, and ’stsd’. Using switch cases and our classic Box factory, we can extract the rest of the useful data. Presto! Everything we need, nothing we don’t. Why didn’t I think of that?
Now I need to rewrite some of the writing code, but once that is done, I’ll be able to roll it all into subversion. Look forward to yet another release in the next couple days!
PS. The results of the new TagLib# vs. Entagged# battle:
File Reader Avg. Total
-----------------------------------------------------
sample.flac TagLib 0.000322 3.2200
Entagged 0.000269 2.6900
sample_v1.mp3 TagLib 0.000213 2.1300
Entagged 0.000250 2.5000
sample.m4a TagLib 0.000896 8.9601
Entagged 0.000619 6.1901
sample_v2.mp3 TagLib 0.000717 7.1701
Entagged 0.000382 3.8201
sample.wma TagLib 0.000673 6.7301
Entagged 0.000398 3.9801
sample.mpc TagLib 0.000531 5.3101
Entagged 0.000324 3.2400
sample.ogg TagLib 0.000512 5.1201
Entagged 0.003756 37.5606
sample_both.mp3 TagLib 0.000727 7.2701
Entagged 0.000469 4.6901
Considering that one of the main differences between TagLib# and Entagged# is that TagLib# reads everything and Entagged# reads just what is necessary, really, really quickly, I don’t think we’ll ever be as good as TagLib# on everything, but I think we’re doing a pretty decent job.









