Mixed-hash DC hubs
May 17, 2013 Leave a comment
They work fine if clients and hubs support both TTH and its successor adequately long.
While transitioning to a TTH successor, currently interoperable clients and hubs all supporting only TTH will diverge. In examining the consequences of such diversity, one can partition concerns into client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication. In each case, one can look at scenarios with complete, partial, and no supported hash function overlap. Complete overlap defines the all-TTH status quo and, clearly, works without complication for all forms of DC communication, so this post focuses on the remaining situations. In general,
Almost as straightforwardly, ADC but not NMDC client-hub communication irrelevant to other clients requires partial but not complete hash function overlap but only between each individual client/hub pair, and don’t create specific mixed-hash hub problems; otherwise, an ADC hub indicates STA error code 47. For ADC, This category consists of GPA, PAS, PID/CID negotiation (with length caveats as relate to other clients interpreting the resulting CID), and the establishment of a session hash function; NMDC does not depend on hashing at all for analogous functionality. Thus, for NMDC, no problems occur here. ADC’s greater usage of hashing requires correspondingly more care.
Specifically, GPA and PAS require that SUP had established some shared hash function between the client logging in and the hub, but otherwise have no bearing on mixed-hash-function DC hubs. Deriving the CID from the PID involves the session hash algorithm, which as with GPA/PAS merely requires partial hash function support overlap between each separate client and a hub. Length concerns do exist here, but become relevant only with hub-mediated communication between two clients.
Indeed, clients communicating via a hub comprise the bulk of DC client-hub communication. Of these, INF, SCH, and RES directly involve hashed content or CIDs. SCH ($Search) allows one to search by TTH and would also allow one to search by TTH’s successor. Such searches can only return results from clients which support the hash in question, so as before, partial overlap between clients works adequately. However, to avoid incentivizing clients which support both TTH and its successor to broadcast both searches and double auto-search bandwidth, a combined search method containing both hashes might prove useful. Similarly, RES specifies that clients must provide the session hash of their file, but also “are encouraged to supply additional fields if available”, which might include non-session hash functions they happen to support, such that as with the first client-hub communication category, partial hash function support overlap between any pair of clients suffices, but no overlap does not.
A more subtle and ADC-specific issue issue arises via RES’s U-type message header and INF’s ID field whereby ADC software commonly checks for exactly 39-byte CIDs. While clients need not support whatever specific hash algorithm produced a CID, the ADC specification requires that they support variable-length CIDs. Example of other hash function output lengths which, minimally, should be supported include:
|Bits||Bytes||Bytes (base32)||Supporting Hashes|
|224||28||45||Skein, Keccak, other SHA-3 finalists, SHA-2|
|256||32||52||Skein, Keccak, other SHA-3 finalists, SHA-2|
|384||48||77||Skein, Keccak, other SHA-3 finalists, SHA-2|
|512||64||103||Skein, Keccak, other SHA-3 finalists, SHA-2|
Finally, direct client-client communications introduces CSUP ($Supports), GET/GFI/SND ($Get/$Send) via the TTH/ share root or its successor, and filelists, all of which work if and only if partial hash function support overlap exists. CSUP otherwise fails with error code 54 and some subset of hash roots and hash trees regarding some filelist must be mutually understood, so as with the other cases, partial but not complete hash function support overlap between any given pair of clients is required.
Encouragingly, since together client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication cover all DC communication, partial hash function support overlap between any given pair of DC clients or servers suffices to ensure that all clients might fully functionally interact with each other. This results in a smooth, usable transition period for both NMDC and ADC so long as clients and hubs only drop TTH support once its successor becomes sufficiently ubiquitous. Further, relative to ADC, poy has observed that “all the hash function changes on NMDC is the file list (already a new, amendable format) and searches (an extension) so a protocol freeze shouldn’t matter there”, which creates an even easier transition than ADC in NMDC.
In service of such an outcome, I suggest two parallel sets of recommendations, one whenever convenient and the other closer to a decision on a TTH replacement. More short-term:
- Ensure ADC software obeys “Clients must be prepared to handle CIDs of varying lengths.”
- Create an ADC mechanism by which clients supporting both TTH and its successor can search via both without doubling (broadcast) search traffic. Otherwise, malincentives propagate.
- Ensure BLOM scales to multiple hash functions.
- Update phrasing in ADC specification to clarify that all known hashes for a file should be included in RES, not just session hash.
As the choice of TTH’s successor approaches:
- Disallow new hash function from being 192 bits to avoid ambiguity with Tiger or TTH hashes. I suggest 224 or 256-bit output; SHA-2 and all SHA-3 finalists (including Keccak and Skein) offer both sizes.
- Pick either a single filelist with all supported hashes or multiple filelists, each of which only supports one hash. I favor the former; it especially helps during a transition period for even a client downloading via TTH’s successor to be able to autosearch and otherwise interact with clients which don’t yet support the new hash function, without needing to download an entire new filelist.
- Barring a more dramatic break in Tiger than thus far seen, clients should retain TIGR support until the majority of ADC hubs and NMDC or ADC clients offer support for the successor hash function’s extension.
By doing so, clients both supporting only TTH and both TTH and new hash function should be capable of interacting without problems, transparently to end-users, while over time creating a critical mass of new hash function-supporting clients such that eventually client and hub software might outright drop Tiger and TTH support.