The case of a missing tree

Recently when I downloaded some files from an user I came across a situation which seemed a bit confusing at first sight. The user (I knew him before) is from another country and while he has a fast connection with nice upload speeds, the connection to him is usually not fully trouble free: my downloads from him are usually disonnect in every 10-20 minutes with timeout. This is a small problem I know, but this will be a key of this story as you’ll see later…

So I queued up some folders with larger files (there were no other sources available) and left DC++ to download in the background. Sometime later when I came back, I realized that only a couple of smaller files are downloaded so far, despite that the speed of the file transfer was still as nice as usual. When I checked the current transfer in the Connections Tab I found a very suprising fact : the file has been downloaded in one chunk and the chunk size was equal to the file size! Cheking the Finished Downloads window made the problem more mysterious : it said that 150% (!) of the current file has already been transferred…

Since segmented download method introduced DC++ does not create one large chunkfor a download except when segmented downloads are disabled. The size of the chunks are automatically adjusted depending on how fast are the transfers and how many percent left from the current download. Faster transfers result bigger chunk sizes but a chunk never reaches the overall file size unless the file is very small. In this case (as always) I had segmented downloads enabled.

As I thought that this can be a bug, I tried to disconnect the download manually to see if the chunk sizes go normal after reconnect. There came another suprise then : the download wasn’t resumed at all, it started from the beginning and still with that huge chunk size! At this point I understood why this download didn’t finished at all. As I mentioned before there are plenty of disconnects happened during the download from this user so the actual file (which was a pretty large one) wasn’t able to finish. But… why ?!?

The user had a fairly new DC client so incompatibility was ruled out. I checked the download from other users – they worked as they should: normal chunk sizes and successful resume on reconnect. Then I thought I try to get more files from this problematic user and… would you believe or not: some of the files are worked well while others still didn’t! However, this last strangeness started to ring the bell at last…

I asked the user to rebuild his share and… voilĂ  things started to work normally right away. The problem was with his hashdata file, it became partially corrupted. Hash trees of the shared (and queued) files are stored in the hashdata so the other client failed to provide the correct tree information.

Now we found the problem but you may ask: why the hash tree needed to resume an unfinished download? Or: why’s it needed to get smaller parts of a download from more sources at the same time?

Before segmented downloading there were two methods in DC++ for resume a download. Both became more or less obsolete when chunks arrived because since then, the downloaded part of an unfinished file isn’t a contigous data. There’s no certain point to resume the download from as before. Now the unfinished temporary file is fully allocated and it contains non-contigous segments of finished and unfinished data. The size of each segment is equal to or an exact multiply of the TTH leaf size (or block size) and when segments are just finished, their integrity checked at once using the hash(es) of the block(s). The offset and length of the already finished segments are stored in the download queue.

Now its clear that to be able to check the integrity of the finished segments DC++ needs the full Tiger tree of the download (faithful readers of this blog are already familiar with Tiger hashes and hash trees by a very explanatory earlier post). Since compatibility dropped with pre-TTH era DC clients, DC++ checks if the peer supports hashes and gets the full hash tree just before an actual download starts. (The only exception is when a file is smaller than the minimum leaf size – these files are downloaded in one go and checked by their TTH). One could think that if DC++ is unable to get the full tree then it won’t start to download the file. But actually this isn’t the case…

Instead, DC++ will start the download with the hope that it’ll find more sources, so it will be able to grab the full tree later from another source. Until then it uses the full file size as blocksize and TTH for checking the integrity – ofcourse its possible only when the whole download finished. This was a good strategy up until DC++ was able to resume a download without having the hash tree. As the good old rollback function is removed in 0.699 resume without the full tree became impossible.

Possibility of download without the tree is a nice feature for small or medium sized files. They are usually downloaded in short time, they can be checked by the TTH in the end, and thats it. However, it is a problem for huge files, especially if there’s no additional source to get the tree from. Even if they download all day long if it cannot finish in one go then its just a waste of time and bandwith. It will start all over again and again… And even if some other sources with free slots come around later (and the hash tree is successfully grabbed from them) these new sources won’t be used while the full size segment is running…

About emtee
I started to use DC using DC++ in 2003 when its version number was around 0.260. Since then I've been amazed by the DC network: a professional but still easy-to-use way of P2P file sharing. I was invited to the DC++ team in 2006 where - in the beginning - I had been doing user support and some testing only. A few years later I started to add small contributions to the DC++ code as well so for many years I'd been doing mostly bug fixes, testing, feature proposals and improvements. At the same time I worked on improving the documentation for both DC++ and ADCH++ as well. These days I'm trying to maintain the whole code and the infrastructure behind to keep these software secure and usable for a prolonged time. My ultimate goal is to help making the DC network as more user friendly as possible.

2 Responses to The case of a missing tree

  1. defenderofdc says:

    So what you are saying is that:
    – not only DC++ has destroyed DC by being incompatible with DC (NMDC),
    – and not only is mandatory TTH eating huge amount of CPU and I/O recourses, which directly add to power consumption and our environmental issues,
    – you are now stating that it’s quite propable that downloading of big files does not work with DC++!

    What are your excuses for ruining the DC? I’d like to read Jacek’s more than others.

  2. poy says:

    defenderofdc, you are missing the point of this article. besides, this problem is quite rare, and only a (solvable) bug after all.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: