Mixed-hash DC hubs

They work fine if clients and hubs support both TTH and its successor adequately long.

While transitioning to a TTH successor, currently interoperable clients and hubs all supporting only TTH will diverge. In examining the consequences of such diversity, one can partition concerns into client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication. In each case, one can look at scenarios with complete, partial, and no supported hash function overlap. Complete overlap defines the all-TTH status quo and, clearly, works without complication for all forms of DC communication, so this post focuses on the remaining situations. In general,

Almost as straightforwardly, ADC but not NMDC client-hub communication irrelevant to other clients requires partial but not complete hash function overlap but only between each individual client/hub pair, and don’t create specific mixed-hash hub problems; otherwise, an ADC hub indicates STA error code 47. For ADC, This category consists of GPA, PAS, PID/CID negotiation (with length caveats as relate to other clients interpreting the resulting CID), and the establishment of a session hash function; NMDC does not depend on hashing at all for analogous functionality. Thus, for NMDC, no problems occur here. ADC’s greater usage of hashing requires correspondingly more care.

Specifically, GPA and PAS require that SUP had established some shared hash function between the client logging in and the hub, but otherwise have no bearing on mixed-hash-function DC hubs. Deriving the CID from the PID involves the session hash algorithm, which as with GPA/PAS merely requires partial hash function support overlap between each separate client and a hub. Length concerns do exist here, but become relevant only with hub-mediated communication between two clients.

Indeed, clients communicating via a hub comprise the bulk of DC client-hub communication. Of these, INF, SCH, and RES directly involve hashed content or CIDs. SCH ($Search) allows one to search by TTH and would also allow one to search by TTH’s successor. Such searches can only return results from clients which support the hash in question, so as before, partial overlap between clients works adequately. However, to avoid incentivizing clients which support both TTH and its successor to broadcast both searches and double auto-search bandwidth, a combined search method containing both hashes might prove useful. Similarly, RES specifies that clients must provide the session hash of their file, but also “are encouraged to supply additional fields if available”, which might include non-session hash functions they happen to support, such that as with the first client-hub communication category, partial hash function support overlap between any pair of clients suffices, but no overlap does not.

A more subtle and ADC-specific issue issue arises via RES’s U-type message header and INF’s ID field whereby ADC software commonly checks for exactly 39-byte CIDs. While clients need not support whatever specific hash algorithm produced a CID, the ADC specification requires that they support variable-length CIDs. Example of other hash function output lengths which, minimally, should be supported include:

Bits Bytes Bytes (base32) Supporting Hashes
192 24 39 Tiger
224 28 45 Skein, Keccak, other SHA-3 finalists, SHA-2
256 32 52 Skein, Keccak, other SHA-3 finalists, SHA-2
384 48 77 Skein, Keccak, other SHA-3 finalists, SHA-2
512 64 103 Skein, Keccak, other SHA-3 finalists, SHA-2

Finally, direct client-client communications introduces CSUP ($Supports), GET/GFI/SND ($Get/$Send) via the TTH/ share root or its successor, and filelists, all of which work if and only if partial hash function support overlap exists. CSUP otherwise fails with error code 54 and some subset of hash roots and hash trees regarding some filelist must be mutually understood, so as with the other cases, partial but not complete hash function support overlap between any given pair of clients is required.

Encouragingly, since together client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication cover all DC communication, partial hash function support overlap between any given pair of DC clients or servers suffices to ensure that all clients might fully functionally interact with each other. This results in a smooth, usable transition period for both NMDC and ADC so long as clients and hubs only drop TTH support once its successor becomes sufficiently ubiquitous. Further, relative to ADC, poy has observed that “all the hash function changes on NMDC is the file list (already a new, amendable format) and searches (an extension) so a protocol freeze shouldn’t matter there”, which creates an even easier transition than ADC in NMDC.

In service of such an outcome, I suggest two parallel sets of recommendations, one whenever convenient and the other closer to a decision on a TTH replacement. More short-term:

  • Ensure ADC software obeys “Clients must be prepared to handle CIDs of varying lengths.”
  • Create an ADC mechanism by which clients supporting both TTH and its successor can search via both without doubling (broadcast) search traffic. Otherwise, malincentives propagate.
  • Ensure BLOM scales to multiple hash functions.
  • Update phrasing in ADC specification to clarify that all known hashes for a file should be included in RES, not just session hash.

As the  choice of TTH’s successor approaches:

  • Disallow new hash function from being 192 bits to avoid ambiguity with Tiger or TTH hashes. I suggest 224 or 256-bit output; SHA-2 and all SHA-3 finalists (including Keccak and Skein) offer both sizes.
  • Pick either a single filelist with all supported hashes or multiple filelists, each of which only supports one hash. I favor the former; it especially helps during a transition period for even a client downloading via TTH’s successor to be able to autosearch and otherwise interact with clients which don’t yet support the new hash function, without needing to download an entire new filelist.
  • Barring a more dramatic break in Tiger than thus far seen, clients should retain TIGR support until the majority of ADC hubs and NMDC or ADC clients offer support for the successor hash function’s extension.

By doing so, clients both supporting only TTH and both TTH and new hash function should be capable of interacting without problems, transparently to end-users, while over time creating a critical mass of new hash function-supporting clients such that eventually client and hub software might outright drop Tiger and TTH support.

Old DC++ forums restored

DC++ used to have a forum where people would receive help, give suggestions on improvements and discuss protocol features. This forum migrated from SourceForge to the domain dcpp.net (now defunct, don’t use it). The entire site was then attacked and the forum was put offline. This was in 2007, and no forum has yet replaced the old DC++ forum as a whole.

The DCBase.org project is put in place to harmonize different content for Direct Connect. As such, the project host the DCBase forum (previously ADCPortal) where today’s discussions for (primarily but not exclusively) ADC development lies. However, it is also important to look in the past and what has been done and the discussions that were held then. As such, the old DC++ forum is now restored. This forum is now set up similar to the old forum, and the database is migrated as such. The entire forum is locked down (until someone want DC++ to regain that as a forum) so you can’t post anything.

I will probably create posts in the future where the old forum is referenced (in particular NMDC and ADC development and protocol discussions).

If anyone else have a forum, wiki or site that is now defunct, let me know. It is important that the content that we once produced isn’t completely lost.

DC++ 0.811

A new service release of DC++ is out and marked stable. Along with some small fixes the new version improves on a couple of more important areas.

We received reports that the obligatory share format converison at the first start of DC++ 0.810 may take more time than one would expect, especially in cases when  large number of files shared or had been shared before. During the first startup conversion DC++ 0.810 may look unresponsive; worse is that when the anxious user clicks to the taskbar preview or the DC++ startup logo during the conversion period, it results an immediate “Not responding” message shown by Windows. People might get confused with this so the fix comes with DC++ 0.811 and you will never get those messages again. Also there’s a new graphical progress indicator under the DC++ splash logo so from now users are somewhat better informed about what’s happening at the startup.

Note that the first (and only the first) startup can still take unusually long when you uprgade from version 0.802 or older. Some users with extra large shares reported even 15-20 minutes long startups and that really sounds long. But again, you have to keep in mind that it’s still a lot better option than a complete rehash of your entire share.

The other bigger change in version 0.811 is the switch of the used compiler package. We changed from MinGW to MinGW-w64 for now as some tests done with DC++ built using MinGW-w64 resulted better long time stability under extra heavy load.

Version 0.811 is a stable release and immediate upgrade is recommended.

A Decade of TTH: Its Selection and Uncertain Future

NMDC and ADC rely on the Tiger Tree Hash to identify files. DC requires a cryptographic hash function to avoid the previous morass of pervasive similar, but not identical, files. A bare cryptographic hash primitive such as SHA-1 did not suffice because not only did the files need identification as a whole but in separate parts, allowing reliable resuming and multi-source downloading, and per-segment integrity verification (RevConnect unsuccessfully attempted to reliably use multi-source downloading precisely because it could not rely on cryptographic hashes).

Looking for inspiration from other P2P software, I found that BitTorrent used (and uses) piecewise SHA-1 with per-torrent segment sizes. Since the DC share model asks that same hash function work across entire shares, this does not work. eDonkey2000 and eMule, with per-user shares similar to those of DC, resolved this with fixed, 9MB piecewise MD4, but this segment size scaled poorly, ensured that fixing corruption demanded at least 9MB of retransmission, and used the weak and soon-broken MD4. Gnutella, though, had found an elegant, scalable solution in TTH.

This Tiger Tree hash, which I thus copied from Gnutella, scales to both large and small files while depending on what was at the time a secure-looking Tiger hash function. It smoothly, adaptively sizes a hash tree while retaining interoperability between all such sizes of files files on a hub. By 2003, I had released BCDC++ which used TTH. However, the initial version of hash trees implemented by Gnutella and DC used the same hash primitive for leaf and internal tree nodes. This left it open to collisions, fixed by using different leaf and internal hash primitives. Both Gnutella and DC quickly adopted this fix and DC has followed this second version of THEX to specify TTH for the last decade.

Though it has served DC well, TTH might soon need a replacement. The Tiger hash primitive underlying it by now lists as broken due to a combination of a practical 1-bit pseudocollision attack on all rounds, a similarly feasible full collision on all but 5 of its 24 rounds, and full, albeit theoretical, 24-round pre-images (“Advanced Meet-in-the-Middle Preimage Attacks”, 2010, Guo et al). If one can collide or find preimages of Tiger, one can also trivially collide or find preimages of TTH. We are therefore investigating alternative cryptographic hash primitives to which we might transition as Tiger looks increasingly insecure and collision-prone, focusing on SHA-2 and SHA-3.

DC++ 0.810

We’re happy to annouce that a new testing release of DC++ is available from the official download page. Version 0.810 contains numerous fixes and small improvements over the last release but this isn’t going to be a large post with lots of explanations in this case; the changelog items mostly speak for themselves.

I’d like to highlight three improvements that might need more attention:

  • The Plugin API has been evolved further but it’s still not reached the stage where the plugin binaries are possible to distibute with the client. They’re still in a testing phase but binaries available from now to those who interested on testing. As long as the the final plugin distribution format is beeing implemented the nightly plugin binary builds are available in a simple .dll format at the Plugins archive of the DCBase Builds server.
    At the time of writing two fully functional and a test plugin are available. The LUA 5.2.1 client side Script Plugin is able to run all BCDC++ client side scripts. The Dev Plugin offers functions like monitoring the protocol talk, follow search requests (former Search Spy function) and more. Make sure you consult the readme file before downloading any revision of any plugins.
    For those who interested in creating plugins the source code,  SDK and Doxygen documentation is still available in the DCPP Plugins SDK repositories (C and C++).
  • Fixing of two the two following long standing problems required to change the hash data file format: the case insensitive share format that caused problems on non-Windows clients based on the dcpp library and a bug with merging shares  containing same filenames/folders resulted an inconsistent behaviour.
    Because of these fixes if you upgrade from an eralier release the first start of the client might require a bit more time than usual. An automatic share format conversion will make you wait a bit more, depending on how large your share is.
    The bad news is that the automatic conversion is possible only on Windows Vista and later. Due to lack of extended file handling capabilities of the OS, the entire share needs to be rehashed for XP users.
  • The only remarkable and very useful GUI improvement this time is the ability to copy data from every list view to the clipboard. It is possible through the lists’ context menus, using a new ‘Copy’ menu item. You are able to copy the content of one column or the data of all columns in one go.
    There’s also a special new menu item  (called “Copy user information”) for lists containing users; it’s made to easily gather all the available info (not just what’s displayed) about one user or multiple selected users copied to the clipboard.

Things looking good so far; this means that DC++ 0.810 should make stable within days. Upgrade is recommended as always.

ADC 1.0.2 released

A new version of the base ADC protocol is now released, version 1.0.2.

The document may look slightly different, especially with the addition of commands in the table of contents. The document itself (its content) is not that much modified (except for state management, see below).

An important part of the document is a new addition, a terminology section where difficult words or phrases are specified. This list is obviously meant to be much more than mere four items but it’s at least a start.

The STA previously didn’t specify who had the responsibility for action when a STA is sent with the severity Fatal (2). This has always been the originator of the message, and this is now explicit.

The state management is re-worded and restructured. All information about state has now been moved to its own section, allowing an implementator a quick and comprehensive overview on the requirements for the state management. Previously, the state management was sprinkled all across the document, making it difficult for a person to properly implement a state machine in their software. This has meant that state management information is now removed from each command (only thing remaining is an explicit note about in which state each command is used). Certain information is also clarified, such as what to call the parties in a client to client connection (“client party” and “server party”) and state transitions.

Version 1.0.1 of ADC was also ambiguous in state management when it came to one important part: who shall send the first INF in a client to client connection. This is important because it has the ramification that it makes multi-share difficult. The current specification is now not ambiguous, and makes the following stance: the first party to send the INF is the connecting party (“client party”). No known implementation suffer from this explicit note, as all manage this scenario just fine. Basically, this change means that multiple shares (per hub) may not be too far off.

The new version also brings in a new time where we can safely and appropriately update the base document. There was an announcement period when the document was going to be released which meant that developers have had time to adjust their software and give feedback in a timely manner.

DCDev archives published

I previously requested the DCDev archives, a repository of posts from DC developers. I was able to acquire the repository and it is now posted on DCBase.

There’s a lot of stuff in the posts, especially the initial parts with ADC. Enjoy.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Follow

Get every new post delivered to your Inbox.