ADC? No, cancelled…

Back in the end of 2000’s when the 1.0 version of the ADC protocol was ready and the implementation had started to taking shape the protocol maintainers thought it’s a good idea to add information about the distinct new protocol of DC to Wikipedia. Besides linking to the technicals a brief description of what and why is ADC was added to the new Advanced Direct Connect Protocol page.

The page, professional and made according to the best Wikipedia standards, had been improved over time and stayed there for many years – until the end of last year when someone requested a complete removal, an uncontroversial deletion. This action was requested to be reverted (thanks to klondike) which means that per the Wikipedia rules the page itself cannot be requested to be removed again. However, a few weeks later, another admin made a lawnmover style deletion of the best part of the content of the page citing that the source of information, this very blog, where the most of the content are from the ADC protocol’s designers, maintainers and implementators, is unreliable.

Of course Wikipedia has its own rules and they have been controversial all the time. This time they’re clearly followed the rules unwisely and I guess it’s not worth to engage into an add/remove style fight with them anymore. Who knows, maybe cancel culture has reached Wikipedia as well or it’s just another attempt at making Wikipedia worse for technical purposes.

In any case we decided to preserve the removed document, originally added by Fredrik Ullner, here:


Advanced Direct Connect (ADC) is a peer-to-peer file sharing and chat protocol, using the same network topology, concepts and terminology as the Direct Connect (DC) protocol.

“ADC” unofficially an acronym for “Advanced Direct Connect”.[1]

Contents

  • 1 History
  • 2 Design and features
  • 3 Protocol
  • 4 See also
  • 5 References
  • 6 External links

History

ADC was created to allow an extensible protocol and to address some shortcomings of the Direct Connect protocol. It was initiated by Jacek Sieka, under the influence of Jan Vidar Krey’s DCTNG draft.[2] The first revision of ADC came in 2004 and the first official version in 2007-12-01.

Design and features

ADC is structured around clients that connect to a central hub, where the clients (users) can chat and download files from other clients (users). The hub provides routing between clients for chat, searches and requests for connections. The actual file transfers are between clients.

The protocol itself is split in two parts: a base protocol that every client and hub respectively must follow and extensions that are optional. The protocols allow signalling of protocol features (such as bloom filters), and messages can be constructed to only be routed to those who support that particular feature.

Each hub has their own rules and are commonly governed by hub operators.[3] Hubs may define different capabilities for hub operators. The hubs themselves do not regulate discussion and files, but the hub operators. The hub regulate minimum share and maximum amount of simultaneous hubs; things that are sent by the client, rather than the user.

Lists of hubs [4] exist where a hub’s name, description, address and rules are specified. With the hub list, users can choose hubs that are similar according to the user’s liking of discussion topics and files.

The peer-to-peer part of the protocol is based on a concept of “slots” [5] (similar to number of open positions for a job). These slots denote the number of people that are allowed to download from a user at any time. The slots are controlled by the user of respective client.

ADC require that all text must be sent in UTF-8, which means that users with different system encoding (say, Russian and Chinese) are able to chat with respective native characters.

The protocol natively supports IPv6.

There are two modes a user can be in: “active” or “passive”. Clients in active mode can download from anyone else on the network. Passive mode users can only download from active users. Passive clients will be sent search results through the hub, while active clients will receive the results directly. An active searcher will receive (at most) 10 results per user and a passive searcher will receive (at most) 5 results per user. NAT traversal exist as a protocol extension,[6] which allow passive users to connect to other passive users.

The base protocol does not require encryption, but extensions exist to provide encryption with TLS.[7]

Files in client connections are identified by their hash, most commonly the Tiger Tree Hash. The hash algorithm is negotiated with the hub and used throughout the client-hub session, as well as subsequent client-client connections.

Protocol

The ADC protocol is a text-based protocol, where commands and their information are sent in clear text, except during password negotiation. The client-server (as well as client-client, where one acts as a “server”) aspect of the protocol stipulates that the client speak first when a connection has been made. For example, when a client connects to a hub’s socket, the client is the first to talk to the hub.

The protocol requires that all text must be sent as UTF-8 encoded Unicode, normalized in form C.

There are no port defaults, for hubs or clients.

Hub addresses are in the following form: adc://example.com:411, where 411 is the port.

During hub-client protocol information exchange, the client offers a set of hashes it supports. The hub will select one of these hashes, and that hash will be used throughout the hub-client session. If the hub deems that the client doesn’t support an (arbitrary) appropriate hash set, an error is raised.

The global identification scheme is based on the hash set producing two end-hashes, where one of them depends on the output of the other. During hub-client information exchange, the client will send these end-hashes, encoded with base32, which the hub will confirm to match. One of these base32 encoded hashes will be further sent to other clients in the network. The global identification scheme is this last string. The client may change its end-hashes on a hub-to-hub basis.

Each user, during a hub session, is assigned a hash that only lasts that particular session. This hash will be used for all client references in that hub. There is no dependency on nicks.

Each client information notification is incrementally sent.

An incoming request for a client-client connection is linked to an actual connection, with the use of a token.

Searches use a token, as well, to identify each result of a search.

There is no out-of-the-box ability for a client to kick or redirect another client from a hub. The hub, however, can kick and redirect arbitrarily. The hub can also require that all other clients in the hub must terminate their transfers with the kicked/redirected client. If a client is redirected to another hub, the redirecting client must use a referrer, similar to the HTTP referrer. The kicked/redirected client is not required to receive a notification message.

The peer-to-peer part of the protocol is based on a concept of “slots” (similar to number of open positions for a job). These slots denote the number of people that are allowed to download from a user at any time. These slots are controlled by the client. Automatic slot allocation is supported by the protocol.

The token in the client-client connection decides who should be allowed to download first.

Downloads are transported using TCP. Searches can be transported using TCP or UDP.

An active client has a listening port for TCP and another for UDP, though the ports don’t depend on each other.

Protocol delimiters are ‘\n’ and ‘ ‘ (space). The character ‘\’ is used as an escape sequence. Allowed escape sequences are “\n” (new line), “\s” (space) and “\\” (backslash).

The protocol allows for extensions such as compression with bzip2 or encryption with TLS.[8] While the protocol does not mandate that these extensions be implemented, hubs may require them.

See also

References

  1. Fredrik Ullner (March 2007). “ADC: The run down”. DC++: Just These Guys, Ya Know? blog. Retrieved 2010-12-13.

2. Jan Vidar Krey (August 2006). “ADC: Protocol simplicity”.

3. Jan Vidar Krey. Archived from the original on 2013-01-30. Retrieved 2006-09-23.

4. Fredrik Ullner (March 2006). “Power + Person = Operator”. DC++: Just These Guys, Ya Know? blog. Retrieved 2010-12-13.

5. Fredrik Ullner (January 2007). “The parts of a hub list”. DC++: Just These Guys, Ya Know? blog. Retrieved 2010-12-13.

6. Fredrik Ullner (March 2006). “Slots, slots, slots…”. DC++: Just These Guys, Ya Know? blog. Retrieved 2010-12-13.

7. Fredrik Ullner (December 2010). “ADC Extensions – NATT – NAT traversal”. ADC Project. Retrieved 2010-12-13.

8. Fredrik Ullner (December 2010). “ADC Extensions – ADCS – Symmetrical Encryption in ADC”. ADC Project. Retrieved 2010-12-13.

9. En_Dator (March 2009). “TLS and Encryption”. ADCPortal. Archived from the original on 2011-07-07. Retrieved 2009-03-01.

External links


Click here to see how the original Wikipedia page looked like before the content removal.

We’ve once had a very good overview of ADC at Wikipedia, a brief explanation of what it is all about so interested people can go further, contact, etc…. Now we have almost nothing. To compensate that I’ll also try to preserve this document another way by adding it to the ADC project site later.

May people with the powers to destruct valid and current information sleep better after each time they’re acting so.

DC++ is 20 years old today

In the beginning there was NMDC, as its name says (Neo-Modus) a new way of file sharing. It was a quite good, if not revolutionary idea of its time but a bit clumsy and low-quality implementation of a business model that wanted to get revenue through displaying ads in its client software. NMDC could be used for sharing of files using a community hub capable of controlling direct file transfers between its online users and also relaying searches and instant messages. This system of direct file sharing built around online communities has quickly become a success at the end of the 90’s, despite its clumsiness and annoying limitations.

The early years

In the fall of 2001 one DC user, a secondary school teenager, thought he could easily make a much better, ad-free client for this network and that it would would be a fun project for him to improve his skills in C++ programming. So DC++, an open source replacement of the original Neo-Modus client has born, exactly 20 years ago this day. And the rest is history…

DC++ had rapidly become a success. Many users switched to it and enjoyed the new intuitive interface, the slick and fast look-and-feel, the new thoughtful functions like the ability of connecting multiple hubs in parallel. Neo-Modus had put out a new versions of its client as an answer, trying to amend the limitations of the original one but the effort was completely futile – by that time DC++ had already become the go-to client for the DC network.

As it happens with most open source development, with time, contributors appeared and helped to add their ideas and fix bugs in DC++. Many of them just came and went but some remained, giving more and more input and help for the original author to make DC++ better and better. Somehow, the changelog of DC++ preserved some of what that early development was like, it is a fun to read from the distance of so many years, especially for those who hadn’t been around DC that time.

But not all of those outside ideas and directions were accepted to DC++. Many people wanted to go to different ways and this can be easily done in open source; soon, there was no shortage of various forks of DC++, some existing just for the sake of a few additional functions while others went much further, to different directions adding complete set of new features and optimizations. But, with the exception of the few examples, most of them were still built around the code provided in DC++ as a base. Many forks were short-lived, having been abandoned within months or years but a few ones are still remained being developed or at least maintained these days.

These were the years when DC as a file sharing network flourished; public hubs with overall usercount in the hundred thousand magnitude and also a lot of smaller private communities.

On the pinnacle of file sharing

Once DC++ achieved the initial target of being a fast, full-featured, easy-to-use NMDC replacement, it was time to improve the initial system created by Neo-Modus. The protocol (1), (2), connections, file transfers were insecure, especially the latter; file identification and corruption problems were an everyday thing in DC. For example, files were identified by their names and sizes only so searches for other sources for the same file many times came up with another file of the same size, resulting a corrupted download.

This needed to be fixed and the fix came in the form of Tiger Tree Hashes that allowed the files to be properly identified, searched and verified after download so no corrupted or arbitrary content would arrive anymore to your computer. It’s still the same today; it comes with the need of hashing files before sharing, but it provides the ultimate safety and integrity. Some users and forks hated hashing and stayed behind – eventually, DC++ has become incompatible with these old clients and their stubborn users.

Interesting part of the story is that before the old ways of transfers without hash check is finally removed in 2006, the team has released DC++ v0.674, a version that’s become quite popular among large group of DC users – so much that even today it is still the most widely used old version of DC++ among those stubborn people mentioned above. Yes, this version was moderately stable at the time, an end result of an era in the development of DC++, still compatible with the old hashless ways. And since big changes were coming in the forthcoming releases, this one remained known as “the best” and “working” DC client for many. Nevertheless, DC++ 0.674 has soon become less and less secure and by today plenthora of vulnerabilities has been discovered in it. Also, being developed on a different era with the tools of the time, it isn’t that stable running on modern Windows versions, either. Our favorite support requests are when people demand to fix these instability issues on a 10+ year old version of the program when even most of the tools that used to build DC++ back then aren’t working anymore on operating systems of today. Of course the fix is available long time ago, only a version upgrade away.

Still leading the way to be secure

In the meantime, DC’s decline started to happen as in the middle of the 2000’s torrents became popular. The development of the Internet as a whole and the way torrents work fitted better for many file sharing users. In torrents, related group of files were bundled and client software were easier to set up and use, community members not needed to be online with a client software anymore to communicate with each other as messages were persistent on the web. IRC could be set up and used for those who missed instant messaging so this was a suitable replacement of earlier file sharing methods for many.

Yet the author of DC++ had his next big thing to realize. A complete change of the old commmunication protocol of DC, inherited from Neo-Modus, to a brand new one that is professionally designed, defined and documented; a standard protocol that is secure, aims to fix the design issues of the old one and is extensible with features, most notably with support of secure encrypted connections. The new protocol was named Advanced Direct Connect (ADC) and the first draft has been released in 2006. In parallel, with the help of many contributors, elements of the new protocol had been started to built into DC++ and also into its forks.

Thanks to ADC, by the end of the first decade of the new millenium Direct Connect was ready for the change to become a fully standardized file sharing system with safe and secure encrypted communications. Yet ADC has never taken off, really. Partly because it has came too late and the focus of file sharing has already moved elsewhere, partly because the reluctance of members of the DC network: key hub software developers, hub owners and hub list server maintainers. Many new ADC hubsoftware started to appear, written from scratch, some were just hobby projects while others showed promise and were high quality software. Since the DC network was reluctant to adapt to ADC, most of the new hub software were abandoned soon, and by now only a few that are still maintained. ADC has become popular only within small private DC communities due to its security and advanced integrity.

From development to maintenance

By 2008, DC++ had completely switched to free, open source build tools and libraries, not to rely on closed products of big tech companies. Meanwhile, inputs from the original author of DC++ started to phase out and eventually completely stopped. Under the control of a new leading developer DC++ had started to catch up with other DC clients in user-friendliness: new graphical UI elements, modern look-and-feel, easier setup and complete documentation of UI elements and functions, plenty of new functions like automated connectivity setup, secure encrypted private messages between users and so on.

And then, after a few years, the constant development that had characterized DC++ in its first 12 years of existence, just ended abruptly. In the following years DC++ had been slowly switched into maintenance mode, with mostly essential security and small bug fixes added to each release. Some other DC clients are still improving – changing and adding features to DC in their own ways but, at least to this point, remaining mostly compatible with DC++.

And this is where we are at today, 20 years after the start.

These above just semi-randomly picked important parts of the whole story. There were ups and downs, problems and solutions, you can find many more piece of the puzzle (mostly the technical aspects) throughout this blog. But the things mentioned here today are enough to show that key people created and worked on DC++ had been the most influential ones on the development of the DC network, at least in the best part of the last two decades. And while by now others shaping DC, almost everything is still based on the work of the people who have been in and around DC++ in these years.

And all the contributors to DC++, both ones who realized plenty of big ideas and ones with just small additions, they’ve done it mostly for having fun and to learn new things, improve themselves. They were many – you can find all the names preserved in the About box of DC++.

DC++ is still somewhat popular these days, around 10k people still interested on it in a course of a month. The program is still maintained, albeit in a slower speed and no ambitious feature updates in the plans. People remained with the project want to provide the safety, stability and compatibility and want to make sure that DC++ at least remains viable for some use cases. Hopefully, this will help users to keep having fun using DC++ for many more years.

Happy birthday DC++ and keep on sharing!

DC++ 0.868+1 will require TLS 1.2 or TLS 1.3

In accordance with the published plan, the next DC++ release will increase the minimum supported TLS version from 1.0 to 1.2. This follows Firefox, Chrome, and Fedora doing so as well. As DC++ 0.868 supports TLS 1.3, DC++ will, for ADCS, use only TLS 1.2 or TLS 1.3. Additionally, client-client connections for ADC hubs will default to requiring TLS, also 1.2 or 1.3.

Widely used, currently maintained DC clients interoperably (Russian original) support TLS 1.3 in this manner as part of ADCS, as Delion’s post documents, including DC++ since version 0.868, ApexDC++ since version 1.6.5, AirDC++ since version 3.53, EiskaltDC++ since version 2.2.10, FlylinkDC++ since build 21972, and ncdc.

This DC++ release will, due to practical and efficient chosen-prefix SHA-1 collisions, similarly disallow SHA-1-based TLS ciphersuites. Remaining ciphersuites provide forward secrecy.

Finally, enforcing Diffie-Hellman keys of at least 2048 bits avoids the previous 1024-bit DH keys vulnerable to well-funded actors, and likely already broken by nation-states to which ADCH++ had defaulted.

Dropping less secure TLS versions 1.0 and 1.1, along with SHA-1-based ciphersuites and weak DH keys, protects DC++’s and the DC network’s security against current and emerging cryptographic attacks.

Disabling TLS 1.0 and 1.1 in DC++ by 2020

Following the IETF’s deprecation of TLS 1.0 and TLS 1.1Chrome, Edge, Firefox, and Safari have announced that they’ll disable both TLS 1.0 and 1.1 during the first half of 2020. GitHubStripeCloudFlarePayPal, and KeyCDN have all already done so on the server side. The deprecated TLS 1.0 dates from 1999 and TLS 1.1 from 2006.

Meanwhile, TLS 1.2 has now existed since 2008 and been supported by OpenSSL 1.0.1 since 2012. DC++, along therefore with modified versions, has supported TLS 1.2 since version 0.850 in 2015. ncdc likewise has supported TLS 1.2 for many years. ADCH++, uhub, and Luadch all support TLS 1.2 or 1.3.

Hardening DC++ Cryptography: TLS, HTTPS, and KEYP and BEAST, CRIME, BREACH, and Lucky 13: Assessing TLS in ADCS document vulnerabilities that TLS 1.0 and 1.1 allow or exacerbate, including but not limited to BEAST, Lucky 13, and potential downgrade attacks discovered in the future in TLS 1.0 or TLS 1.1 to which TLS 1.2 is not subject.

As such, DC++ has deprecated TLS 1.0 and 1.1 and will disable both by default in 2020 along with the browsers, while supporting TLS 1.2, 1.3, and newer versions, with an option to re-enable TLS 1.0 and 1.1 should that remain necessary.

BEAST, CRIME, BREACH, and Lucky 13: Assessing TLS in ADCS

1. Summary

Several TLS attacks since 2011 impel a reassessment of the security of ADC’s usage of TLS to form ADCS. While the specific attacks tend not to be trivially replicated in a DC client as opposed to a web browser, remaining conservative with respect to security remains useful, the issues they exploit could cause problems regardless, and ADCS’s best response thus becomes to deprecate SSL 3.0 and TLS 1.0. Ideally, one should use TLS 1.2 with AES-GCM. Failing that, ensuring that TLS 1.1 runs and chooses AES-based ciphersuite works adequately.

2. HTTP-over-TLS Attacks

BEAST renders practical Rogaway’s 2002 attack on the security of CBC ciphersuites in SSL/TLS by using an SSL/TLS server’s CBC padding MAC acceptance/rejection as a timing oracle. Asking whether each possible byte in each position results in successful MAC, it decodes an entire message. One can avert BEAST either by avoiding CBC in lieu of RC4 or updating to TLS 1.1 or 1.2, which mitigate the timing oracle and generate new random IVs to undermine BEAST’s sequential attack.

CRIME and BREACH build on a 2002 compression and information leakage of plaintext-based attack. CRIME “requires on average 6 requests to decrypt 1 cookie byte” and, like BEAST, recognizes DEFLATE’s smaller output when it has found a pre-existing copy of the correct plaintext in its dictionary. Unlike BEAST, CRIME and BREACH depend not on TLS version or CBC versus RC4 ciphersuites but merely compression. Disabling HTTP and TLS compression therefore avoids CRIME and BREACH.

One backwards-compatible solution thus far involves avoiding compression due to CRIME/BREACH and avoiding BEAST with RC4-based TLS ciphersuites. However, a new attack against RC4 in TLS by AlFardan, Bernstein, et al exploits double-byte ciphertext biases to reconstruct messages using approximately 229 ciphertexts; as few as 225 achieve a 60+% recovery rate. RC4-based ciphersuites decreasingly inspire confidence as a backwards-compatible yet secure approach to TLS, enough that the IETF circulates an RFC draft prohibiting RC4 ciphersuites.

Thus far treating DC as sufficiently HTTP-like to borrow their threat model, options narrow to TLS 1.1 or TLS 1.2 with an AES-derived ciphersuite. One needs still beware: Lucky 13 weakens even TLS 1.1 and TLS 1.2 AES-CBC ciphers, leaving between it and the RC4 attack no unscathed TLS 1.1 configuration. Instead, AlFardan and Paterson recommend to “switch to using AEAD ciphersuites, such as AES-GCM” and/or “modify TLS’s CBC-mode decryption procedure so as to remove the timing side channel”. They observe that each major TLS library has addressed the latter point, so that AES-CBC might remain somewhat secure; certainly superior to RC4.

3. ADC-over-TLS-specific Concerns

ADCS clients’ and hubs’ vulnerability profiles and relevant threat models regarding each of BEAST, CRIME, BREACH, Lucky 13, and the RC4 break differ from that of a web browser using HTTP. BEAST and AlFardan, Bernstein, et al’s RC4 attack both point to adopting TLS 1.1, a ubiquitously supportable requirement worth satisfying regardless. OpenSSL, NSS, GnuTLS, PolarSSL, CyaSSL, MatrixSSL, BouncyCastle, and Oracle’s standard Java crypto library have all already “addressedLucky 13.

ADCS doesn’t use TLS compression, so that aspect of CRIME/BREACH does not apply. The ZLIB extension does operate analogously to HTTP compression. Indeed, the BREACH authors remark that:

there is nothing particularly special about HTTP and TLS in this side-channel. Any time an attacker has the ability to inject their own payload into plaintext that is compressed, the potential for a CRIME-like attack is there. There are many widely used protocols that use the composition of encryption with compression; it is likely that other instances of this vulnerability exist.

ADCS provides an attacker this capability via logging onto a hub and sending CTMs and B, D, and E-type messages. Weaponizing it, however, operates better when these injected payloads can discover cookie-like repeated secrets, which ADC lacks. GPA and PAS operate via a challenge-reponse system. CTM cookies find use at most once. Private IDs would presumably have left a client-hub connection’s compression dictionary by the time an attack might otherwise succeed and don’t appear in client-client connections. While a detailed analysis of the extent of practical feasibility remains wanting, I’m skeptical CRIME and BREACH much threaten ADCS.

4. Mitigation and Prevention in ADCS

Regardless, some of these attacks could be avoided entirely with specification updates incurring no ongoing cost and hindering implenetation on no common platforms. Three distinct categories emerge: BEAST and Lucky 13 attacks CBC in TLS; the RC4 break, well, attacks RC4; and CRIME and BREACH attack compression. Since one shouldn’t use RC4 regardless, that leaves AES-CBC attacks and compression attacks.

Disabling compression might incur substantial bandwidth cost for little thus-far demonstrated security benefit, so although ZLIB implementors should remain aware of CRIME and BREACH, continued usage seems unproblematic.

Separately, BEAST and Lucky 13 point to requiring TLS 1.1 and, following draft IETF recomendations for secure use of TLS and DTLS, preferring TLS 1.2 with the TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or other AES-GCM ciphersuite if supported by both endpoints. cryptlib, CyaSSL, GnuTLS, MatrixSSL, NSS, OpenSSL, PolarSSL, SChannel, and JSSE support both TLS 1.1 and TLS 1.2 and all but Java’s supports AES-GCM.

Suggested responses:

  • Consider how to communicate to ZLIB implementors the hazards and threat model, however minor, presented by CRIME and BREACH.
  • Formally deprecate SSL 3.0 and TLS 1.0 in the ADCS extension specification.
  • Discover which TLS versions and features clients (DC++ and variations, ncdc, Jucy, etc) and hubs (ADCH++, uHub, etc) support. If they use standard libraries, they probably all (except Jucy) already support TLS 1.2 with AES-GCM depending on how they configure their TLS libraries. Depending on results, one might already safely simply disable SSL 3.0 and TLS 1.0 in each such client and hub and prioritize TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or a similar ciphersuite so that it finds use when mutually available. If this proves possible, the the ADCS extension specification should be updated to reflect this.

Mixed-hash DC hubs

They work fine if clients and hubs support both TTH and its successor adequately long.

While transitioning to a TTH successor, currently interoperable clients and hubs all supporting only TTH will diverge. In examining the consequences of such diversity, one can partition concerns into client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication. In each case, one can look at scenarios with complete, partial, and no supported hash function overlap. Complete overlap defines the all-TTH status quo and, clearly, works without complication for all forms of DC communication, so this post focuses on the remaining situations. In general,

Almost as straightforwardly, ADC but not NMDC client-hub communication irrelevant to other clients requires partial but not complete hash function overlap but only between each individual client/hub pair, and don’t create specific mixed-hash hub problems; otherwise, an ADC hub indicates STA error code 47. For ADC, This category consists of GPA, PAS, PID/CID negotiation (with length caveats as relate to other clients interpreting the resulting CID), and the establishment of a session hash function; NMDC does not depend on hashing at all for analogous functionality. Thus, for NMDC, no problems occur here. ADC’s greater usage of hashing requires correspondingly more care.

Specifically, GPA and PAS require that SUP had established some shared hash function between the client logging in and the hub, but otherwise have no bearing on mixed-hash-function DC hubs. Deriving the CID from the PID involves the session hash algorithm, which as with GPA/PAS merely requires partial hash function support overlap between each separate client and a hub. Length concerns do exist here, but become relevant only with hub-mediated communication between two clients.

Indeed, clients communicating via a hub comprise the bulk of DC client-hub communication. Of these, INF, SCH, and RES directly involve hashed content or CIDs. SCH ($Search) allows one to search by TTH and would also allow one to search by TTH’s successor. Such searches can only return results from clients which support the hash in question, so as before, partial overlap between clients works adequately. However, to avoid incentivizing clients which support both TTH and its successor to broadcast both searches and double auto-search bandwidth, a combined search method containing both hashes might prove useful. Similarly, RES specifies that clients must provide the session hash of their file, but also “are encouraged to supply additional fields if available”, which might include non-session hash functions they happen to support, such that as with the first client-hub communication category, partial hash function support overlap between any pair of clients suffices, but no overlap does not.

A more subtle and ADC-specific issue issue arises via RES’s U-type message header and INF’s ID field whereby ADC software commonly checks for exactly 39-byte CIDs. While clients need not support whatever specific hash algorithm produced a CID, the ADC specification requires that they support variable-length CIDs. Example of other hash function output lengths which, minimally, should be supported include:

Bits Bytes Bytes (base32) Supporting Hashes
192 24 39 Tiger
224 28 45 Skein, Keccak, other SHA-3 finalists, SHA-2
256 32 52 Skein, Keccak, other SHA-3 finalists, SHA-2
384 48 77 Skein, Keccak, other SHA-3 finalists, SHA-2
512 64 103 Skein, Keccak, other SHA-3 finalists, SHA-2

Finally, direct client-client communications introduces CSUP ($Supports), GET/GFI/SND ($Get/$Send) via the TTH/ share root or its successor, and filelists, all of which work if and only if partial hash function support overlap exists. CSUP otherwise fails with error code 54 and some subset of hash roots and hash trees regarding some filelist must be mutually understood, so as with the other cases, partial but not complete hash function support overlap between any given pair of clients is required.

Encouragingly, since together client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication cover all DC communication, partial hash function support overlap between any given pair of DC clients or servers suffices to ensure that all clients might fully functionally interact with each other. This results in a smooth, usable transition period for both NMDC and ADC so long as clients and hubs only drop TTH support once its successor becomes sufficiently ubiquitous. Further, relative to ADC, poy has observed that “all the hash function changes on NMDC is the file list (already a new, amendable format) and searches (an extension) so a protocol freeze shouldn’t matter there”, which creates an even easier transition than ADC in NMDC.

In service of such an outcome, I suggest two parallel sets of recommendations, one whenever convenient and the other closer to a decision on a TTH replacement. More short-term:

  • Ensure ADC software obeys “Clients must be prepared to handle CIDs of varying lengths.”
  • Create an ADC mechanism by which clients supporting both TTH and its successor can search via both without doubling (broadcast) search traffic. Otherwise, malincentives propagate.
  • Ensure BLOM scales to multiple hash functions.
  • Update phrasing in ADC specification to clarify that all known hashes for a file should be included in RES, not just session hash.

As the  choice of TTH’s successor approaches:

  • Disallow new hash function from being 192 bits to avoid ambiguity with Tiger or TTH hashes. I suggest 224 or 256-bit output; SHA-2 and all SHA-3 finalists (including Keccak and Skein) offer both sizes.
  • Pick either a single filelist with all supported hashes or multiple filelists, each of which only supports one hash. I favor the former; it especially helps during a transition period for even a client downloading via TTH’s successor to be able to autosearch and otherwise interact with clients which don’t yet support the new hash function, without needing to download an entire new filelist.
  • Barring a more dramatic break in Tiger than thus far seen, clients should retain TIGR support until the majority of ADC hubs and NMDC or ADC clients offer support for the successor hash function’s extension.

By doing so, clients both supporting only TTH and both TTH and new hash function should be capable of interacting without problems, transparently to end-users, while over time creating a critical mass of new hash function-supporting clients such that eventually client and hub software might outright drop Tiger and TTH support.

A Decade of TTH: Its Selection and Uncertain Future

NMDC and ADC rely on the Tiger Tree Hash to identify files. DC requires a cryptographic hash function to avoid the previous morass of pervasive similar, but not identical, files. A bare cryptographic hash primitive such as SHA-1 did not suffice because not only did the files need identification as a whole but in separate parts, allowing reliable resuming and multi-source downloading, and per-segment integrity verification (RevConnect unsuccessfully attempted to reliably use multi-source downloading precisely because it could not rely on cryptographic hashes).

Looking for inspiration from other P2P software, I found that BitTorrent used (and uses) piecewise SHA-1 with per-torrent segment sizes. Since the DC share model asks that same hash function work across entire shares, this does not work. eDonkey2000 and eMule, with per-user shares similar to those of DC, resolved this with fixed, 9MB piecewise MD4, but this segment size scaled poorly, ensured that fixing corruption demanded at least 9MB of retransmission, and used the weak and soon-broken MD4. Gnutella, though, had found an elegant, scalable solution in TTH.

This Tiger Tree hash, which I thus copied from Gnutella, scales to both large and small files while depending on what was at the time a secure-looking Tiger hash function. It smoothly, adaptively sizes a hash tree while retaining interoperability between all such sizes of files files on a hub. By 2003, I had released BCDC++ which used TTH. However, the initial version of hash trees implemented by Gnutella and DC used the same hash primitive for leaf and internal tree nodes. This left it open to collisions, fixed by using different leaf and internal hash primitives. Both Gnutella and DC quickly adopted this fix and DC has followed this second version of THEX to specify TTH for the last decade.

Though it has served DC well, TTH might soon need a replacement. The Tiger hash primitive underlying it by now lists as broken due to a combination of a practical 1-bit pseudocollision attack on all rounds, a similarly feasible full collision on all but 5 of its 24 rounds, and full, albeit theoretical, 24-round pre-images (“Advanced Meet-in-the-Middle Preimage Attacks”, 2010, Guo et al). If one can collide or find preimages of Tiger, one can also trivially collide or find preimages of TTH. We are therefore investigating alternative cryptographic hash primitives to which we might transition as Tiger looks increasingly insecure and collision-prone, focusing on SHA-2 and SHA-3.

ADC 1.0.2 released

A new version of the base ADC protocol is now released, version 1.0.2.

The document may look slightly different, especially with the addition of commands in the table of contents. The document itself (its content) is not that much modified (except for state management, see below).

An important part of the document is a new addition, a terminology section where difficult words or phrases are specified. This list is obviously meant to be much more than mere four items but it’s at least a start.

The STA previously didn’t specify who had the responsibility for action when a STA is sent with the severity Fatal (2). This has always been the originator of the message, and this is now explicit.

The state management is re-worded and restructured. All information about state has now been moved to its own section, allowing an implementator a quick and comprehensive overview on the requirements for the state management. Previously, the state management was sprinkled all across the document, making it difficult for a person to properly implement a state machine in their software. This has meant that state management information is now removed from each command (only thing remaining is an explicit note about in which state each command is used). Certain information is also clarified, such as what to call the parties in a client to client connection (“client party” and “server party”) and state transitions.

Version 1.0.1 of ADC was also ambiguous in state management when it came to one important part: who shall send the first INF in a client to client connection. This is important because it has the ramification that it makes multi-share difficult. The current specification is now not ambiguous, and makes the following stance: the first party to send the INF is the connecting party (“client party”). No known implementation suffer from this explicit note, as all manage this scenario just fine. Basically, this change means that multiple shares (per hub) may not be too far off.

The new version also brings in a new time where we can safely and appropriately update the base document. There was an announcement period when the document was going to be released which meant that developers have had time to adjust their software and give feedback in a timely manner.

Splitting IDENTIFY to support multiple share profiles in ADC

ADC insufficiently precisely orders the IDENTIFY and NORMAL states such that ADC clients can properly support multiple share profiles. Several client software-independent observations imply this protocol deficiency:

  • ADC clients define download queue sources by CID, such that if sharing client presents multiple shares it must be through different CIDs, barring backwards-incompatible and queue-crippling requirements to only connect to a source via the hub through which it was queued.
  • A multiply-sharing ADC client in the server role must know the CTM token associated with a client-client connection to determine unambiguously which shares to present and therefore which CID to present to the non-server client.
  • ADC’s SUP specification, as illustrated by the example client-client connection, states that when “the server receives this message in a client-client connection in the PROTOCOL state, it should reply in kind, send an INF about itself, and move to the IDENTIFY state”; this implies the server client sending its CINF before the non-server client sends the CTM token in the TO field with its CINF.
  • Either the server or non-server client may be the downloader and vice versa. As such, by the time both the server and non-server clients in a client-client connection sends their CINF commands, they must know, since either may be a multiply-sharing client about to upload files, which CTM token with which to associate the connection.
  • The non-server client can unambiguously track which client-client connections it should associate with each CTM token by locally associating that token with each outbound client-client connection it creates, an association a server-client listening for inbound connections by cannot reliably create until the non-server client sends it a CINF with a token field.

Together, these ADC properties show that a server client which uploads using multiple share profiles must know which CID to send, but must do so before it has enough information to determine via the CTM token the correct share profile and thus the correct CID. Such a putatively multiply-sharing ADC client cannot, therefore, remain consistent with all of the listed constraints.

Most constraints prove impractical or undesirable to change, but by clarifying the SUP specification and IDENTIFY states, one can fix this ADC oversight while remaining compatible with DC++ and ncdc, with jucy apparently requiring adjustment. In particular, I propose to:

  1. Modify SUP and INF to require rather that the non-server client, rather than the server client, send the first INF; and
  2. in order to do so, split the IDENTIFY state into SERVER-IDENTIFY and CLIENT-IDENTIFY, whereby
  3. the next state after SUP in a client-client connection is CLIENT-IDENTIFY, which transitions to SERVER-IDENTIFY, which finally transitions as now to NORMAL

This effectively splits the IDENTIFY state into CLIENT-IDENTIFY and SERVER-IDENTIFY to ensure that they send their CINF commands in an order consistent with the requirement that both clients know the CTM token when they send their CINF command, finally allowing ADC to reliably support multiple share profiles.

Such a change appears compatible with both DC++ and ncdc, because both simply respond to CSUP with CINF immediately, regardless of what its partner in a client-client connection does. The only change required in DC++ and ncdc is for the server client to wait for the non-server client to send its CINF before sending a reply CINF rather than replying immediately to the non-server client’s CSUP.

jucy would need adjustment because it currently, by only triggering a non-server client’s CINF, containing the CTM token, in response to the server client’s pre-token CINF. A server client which waits for a jucy-based non-server client to send the first CINF will wait indefinitely.

Thus, by simply requiring that the non-server client in a client-client connection sends its CINF first, in a manner already compatible with DC++-based clients and ncdc and almost compatible with jucy, ADC-based can finally provide reliable multiple share profiles.

ADC Recommendations

A while back (a really long time ago, it appears), I started the document ADC Recommendations. The intent is to create a document that can be reviewed for best-practices, common implementations and other useful information that need not be in the official specification(s).

Also, my intent was to have the document be more frequently updated (once done), so that it can quickly reference the latest software, so as to not having to update versions for the specifications, for simply guidance.

If you want to add more or revise the existing content, leave a comment below or go to the ADCPortal forum post.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”