Programming · TagLib#

TagLib#: Go Speed Racer…

Note: The changes discussed in this post have yet to be rolled in to Subversion. Things have changed that require me to recode the some of the rendering parts.

When I way a young lad, nary a few fortnights ago, Aaron Bockover posted a blog entry comparing TagLib#’s performance to Entagged#’s performance, and this comparison was by no means flattering. I, being entrapped in a stupor of my own greatness, and too busy making things work to consider such trivialities as performance, didn’t pay too much heed to this at the time. That all changed one fateful day, when I committed a mortal sin, revision 71776. Working diligently to remove the vestigial organs in the body that is the library, I went too far, pulling out a stop cap which lead to a flood of file operations, making life unbearable for those souls do not fancy Id3v2. While fixing this wound, I noticed a rather startling thing, that those who abstained from the wicked art of Id3v2 lives quicker and cheerier lives. With the scent of victory in my nostrils, I was off, like a bloodhound after a fox, determined that by sheer will alone, I would cure my library of the venom that ailed it. Tearing through the code, piece by piece, I slaughtered my foe, liberating all MP3’s from the bondage they were in. Again I struck, and again, until no difference could be seen between the sons of MP3 and the sons of Ogg. There was much rejoicing throughout the land, but it was not enough. For MPEG-4, on its mighty mountain, looking down from afar was still cursed with the poison. But today, this very day, there should be rejoicing in the kingdom once more. For MPEG-4 has been freed from captivity and may walk once more among his brothers…

I apologize for that. I started typing and then got carried away, but there is much to celebrate today, as I have made another great step forward, and reduced the reading time on MPEG-4 (AKA. iTunes files) but almost 70%, bringing the average read time to just more than a millisecond. But there is only so much words can say, perhaps this picture could best explain what has happened:

Warm file read times for my music directory with my current hacking version.

This happened when I tried, one more time, to tackle the problem of why exactly MPEG-4 was taking so long to read. I had figured the culperate was the many, many tiny file reads I was doing. Reading many, many tiny blocks was my best guess, but I had no really good way of stopping this (due to the whole “box inside a box” style), so I moved the entire thing to reading one big block, but it resulted in marginal gains. Then I recoded it to just read the “udta” box and the files loading near at Ogg speeds. So I changed the reading system again, this time doing tiny file read but ignoring a few gigantic boxes I didn’t need to read. This cut the read times in half. It was sweet, but still not good enough. So, I decided to go back to the orginal comparison of TagLib# vs. Entagged#, and peeked at how Entagged# does things. Essentially, it does a very quick recursive lookup of box header and for very standard container boxes ({header}{child1}{child2}…{childN}), it recurses through them. This exposes the four main boxes that TagLib# is interested in: ‘meta’, ‘stco/co64’, ‘mvhd’, and ‘stsd’. Using switch cases and our classic Box factory, we can extract the rest of the useful data. Presto! Everything we need, nothing we don’t. Why didn’t I think of that?

Now I need to rewrite some of the writing code, but once that is done, I’ll be able to roll it all into subversion. Look forward to yet another release in the next couple days!

PS. The results of the new TagLib# vs. Entagged# battle:

              File         Reader      Avg.     Total
-----------------------------------------------------
       sample.flac         TagLib  0.000322    3.2200
                         Entagged  0.000269    2.6900

     sample_v1.mp3         TagLib  0.000213    2.1300
                         Entagged  0.000250    2.5000

        sample.m4a         TagLib  0.000896    8.9601
                         Entagged  0.000619    6.1901

     sample_v2.mp3         TagLib  0.000717    7.1701
                         Entagged  0.000382    3.8201

        sample.wma         TagLib  0.000673    6.7301
                         Entagged  0.000398    3.9801

        sample.mpc         TagLib  0.000531    5.3101
                         Entagged  0.000324    3.2400

        sample.ogg         TagLib  0.000512    5.1201
                         Entagged  0.003756   37.5606

   sample_both.mp3         TagLib  0.000727    7.2701
                         Entagged  0.000469    4.6901

Considering that one of the main differences between TagLib# and Entagged# is that TagLib# reads everything and Entagged# reads just what is necessary, really, really quickly, I don’t think we’ll ever be as good as TagLib# on everything, but I think we’re doing a pretty decent job.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s