SSE3 in DC++

The next DC++ release will require SSE3. Steam’s hardware survey currently lists SSE3 as having 99.96% penetration. All AMD and Intel x86 CPUs since the Athlon 64 X2 in 2005 and Intel Core in January 2006 have supported SSE3. Even earlier, though, all Pentium 4 steppings since Prescott which support the NX bit required by Windows 8 and 10 also support SSE3, which extends the effective Intel support back to 2004. I can’t find an Intel CPU which supports NX (required for Win8/10) but not SSE3. Finally, this effectively affects only 32-bit builds, since 64-bit builds exclusively use SSE for floating-point arithmetic.

This effects two basic transformations, one minor and one major, depending on how well the existing code compiles. The minor improvement derives from functions such as bool SettingsDialog::handleClosing() using one instruction rather than two, from

bool SettingsDialog::handleClosing() {
	dwt::Point pt = getWindowSize();
	SettingsManager::getInstance()->set(SettingsManager::SETTINGS_WIDTH,
    cvttss2si eax,DWORD PTR [esp+0x18] ;; eax is just a temporary
    mov    DWORD PTR [edx+0x87c],eax   ;; which is promptly stored to mem

to

bool SettingsDialog::handleClosing() {
	dwt::Point pt = getWindowSize();
	SettingsManager::getInstance()->set(SettingsManager::SETTINGS_WIDTH,
    fisttp DWORD PTR [edx+0x87c]      ;; no byway through eax (also, less register pressure)

However, sometimes cvttss2si and related SSE/SSE2 instructions don’t fit as well, so g++ had been relying on fistp. These instances previously produced terrible code generation; without SSE3, only using through SSE2, part of void SearchFrame::runSearch() compiles to:

	auto llsize = static_cast(lsize);
    fnstcw WORD PTR [ebp-0x50e]     ;; save FP control word to mem
    movzx  eax,WORD PTR [ebp-0x50e] ;; zero-extend-move it to eax
    mov    ah,0xc                   ;; build new control word
    mov    WORD PTR [ebp-0x510],ax  ;; place control word in mem for fldcw
    fld    QWORD PTR [ebp-0x520]    ;; load lsize from mem (same as below)
    fldcw  WORD PTR [ebp-0x510]     ;; load new control word
    fistp  QWORD PTR [ebp-0x548]    ;; with correct control word, round lsize
    fldcw  WORD PTR [ebp-0x50e]     ;; restore previous control word

All 6 red-highlighted lines just scaffold around the actual fistp doing the floating point-to-int rounding, which can cost 80 cycles or more for this single innocuous-looking line of code. By contrast, using fisttp from SSE3, that same fragment collapses to:

	auto llsize = static_cast(lsize);
    fld    QWORD PTR [ebp-0x520]    ;; same as above; load lsize
    fisttp QWORD PTR [ebp-0x548]    ;; convert it. simple.

This pattern recurs many times through DC++, including void AdcHub::handle(AdcCommand::GET which has a portion halving in size and dramatically increasing in speed from

		// Ideal size for m is n * k / ln(2), but we allow some slack
		// When h >= 32, m can't go above 2^h anyway since it's stored in a size_t.
		if(m > (5 * Util::roundUp((int64_t)(n * k / log(2.)), (int64_t)64)) || (h < 32 && m > static_cast(1U << h))) {
    mov    DWORD PTR [esp+0x1c],edi
    xor    ecx,ecx
    imul   eax,DWORD PTR [esp+0x18]
    movd   xmm0,eax
    movq   QWORD PTR [esp+0x58],xmm0
    fild   QWORD PTR [esp+0x58]
    fdiv   QWORD PTR ds:0xca8
    fnstcw WORD PTR [esp+0x22]     ;; same control word dance as before
    movzx  eax,WORD PTR [esp+0x22]
    mov    ah,0xc                  ;; same control word
    mov    WORD PTR [esp+0x20],ax  ;; but fldcw loads from mem not reg
    fldcw  WORD PTR [esp+0x20]     ;; load C and C++-compatible rounding mode
    fistp  QWORD PTR [esp+0x58]    ;; the actual conversion
    fldcw  WORD PTR [esp+0x22]     ;; restore previous
    mov    eax,DWORD PTR [esp+0x58]
    mov    edx,DWORD PTR [esp+0x5c]

to, using the fisttp SSE3 instruction,

		// Ideal size for m is n * k / ln(2), but we allow some slack
		// When h >= 32, m can't go above 2^h anyway since it's stored in a size_t.
		if(m > (5 * Util::roundUp((int64_t)(n * k / log(2.)), (int64_t)64)) || (h < 32 && m > static_cast(1U << h))) {
    mov    DWORD PTR [esp+0x20],edi
    xor    ecx,ecx
    imul   eax,DWORD PTR [esp+0x1c]
    movd   xmm0,eax
    movq   QWORD PTR [esp+0x58],xmm0
    fild   QWORD PTR [esp+0x58]
    fdiv   QWORD PTR ds:0xca8
    fisttp QWORD PTR [esp+0x58]    ;; replaces all seven red lines
    mov    eax,DWORD PTR [esp+0x58]
    mov    edx,DWORD PTR [esp+0x5c]

This specific control word save/convert float/control word restore pattern recurs 19 other times across the current codebase in the dcpp, dwt, and win32 directories, including DownloadManager::getRunningAverage(); HashBloom::get_m(size_t n, size_t k); QueueItem::getDownloadedBytes(); Transfer::getParams(); UploadManager::getRunningAverage(); Grid::calcSizes(…); HashProgressDlg::updateStats(); TransferView::on(HttpManagerListener::Updated, …); and TransferView::onTransferTick(…).

Know your FPU: Fixing Floating Fast provides microbenchmarks showing just how slow this fistp-based technique can be due to the fnstcw/fldcw 80+-cycle FPU pipeline flush and therefore how much faster code which replaces it can become:

Fixed tests...
Testing ANSI fixed() ... Time = 2974.57 ms
Testing fistp fixed()... Time = 3100.84 ms
Testing Sree fixed() ... Time =  606.80 ms

SSE3 provides not simply some hidden code generation aesthetic quality improvement, but a speed increase across much of DC++.

BEAST, CRIME, BREACH, and Lucky 13: Assessing TLS in ADCS

1. Summary

Several TLS attacks since 2011 impel a reassessment of the security of ADC’s usage of TLS to form ADCS. While the specific attacks tend not to be trivially replicated in a DC client as opposed to a web browser, remaining conservative with respect to security remains useful, the issues they exploit could cause problems regardless, and ADCS’s best response thus becomes to deprecate SSL 3.0 and TLS 1.0. Ideally, one should use TLS 1.2 with AES-GCM. Failing that, ensuring that TLS 1.1 runs and chooses AES-based ciphersuite works adequately.

2. HTTP-over-TLS Attacks

BEAST renders practical Rogaway’s 2002 attack on the security of CBC ciphersuites in SSL/TLS by using an SSL/TLS server’s CBC padding MAC acceptance/rejection as a timing oracle. Asking whether each possible byte in each position results in successful MAC, it decodes an entire message. One can avert BEAST either by avoiding CBC in lieu of RC4 or updating to TLS 1.1 or 1.2, which mitigate the timing oracle and generate new random IVs to undermine BEAST’s sequential attack.

CRIME and BREACH build on a 2002 compression and information leakage of plaintext-based attack. CRIME “requires on average 6 requests to decrypt 1 cookie byte” and, like BEAST, recognizes DEFLATE’s smaller output when it has found a pre-existing copy of the correct plaintext in its dictionary. Unlike BEAST, CRIME and BREACH depend not on TLS version or CBC versus RC4 ciphersuites but merely compression. Disabling HTTP and TLS compression therefore avoids CRIME and BREACH.

One backwards-compatible solution thus far involves avoiding compression due to CRIME/BREACH and avoiding BEAST with RC4-based TLS ciphersuites. However, a new attack against RC4 in TLS by AlFardan, Bernstein, et al exploits double-byte ciphertext biases to reconstruct messages using approximately 229 ciphertexts; as few as 225 achieve a 60+% recovery rate. RC4-based ciphersuites decreasingly inspire confidence as a backwards-compatible yet secure approach to TLS, enough that the IETF circulates an RFC draft prohibiting RC4 ciphersuites.

Thus far treating DC as sufficiently HTTP-like to borrow their threat model, options narrow to TLS 1.1 or TLS 1.2 with an AES-derived ciphersuite. One needs still beware: Lucky 13 weakens even TLS 1.1 and TLS 1.2 AES-CBC ciphers, leaving between it and the RC4 attack no unscathed TLS 1.1 configuration. Instead, AlFardan and Paterson recommend to “switch to using AEAD ciphersuites, such as AES-GCM” and/or “modify TLS’s CBC-mode decryption procedure so as to remove the timing side channel”. They observe that each major TLS library has addressed the latter point, so that AES-CBC might remain somewhat secure; certainly superior to RC4.

3. ADC-over-TLS-specific Concerns

ADCS clients’ and hubs’ vulnerability profiles and relevant threat models regarding each of BEAST, CRIME, BREACH, Lucky 13, and the RC4 break differ from that of a web browser using HTTP. BEAST and AlFardan, Bernstein, et al’s RC4 attack both point to adopting TLS 1.1, a ubiquitously supportable requirement worth satisfying regardless. OpenSSL, NSS, GnuTLS, PolarSSL, CyaSSL, MatrixSSL, BouncyCastle, and Oracle’s standard Java crypto library have all already “addressedLucky 13.

ADCS doesn’t use TLS compression, so that aspect of CRIME/BREACH does not apply. The ZLIB extension does operate analogously to HTTP compression. Indeed, the BREACH authors remark that:

there is nothing particularly special about HTTP and TLS in this side-channel. Any time an attacker has the ability to inject their own payload into plaintext that is compressed, the potential for a CRIME-like attack is there. There are many widely used protocols that use the composition of encryption with compression; it is likely that other instances of this vulnerability exist.

ADCS provides an attacker this capability via logging onto a hub and sending CTMs and B, D, and E-type messages. Weaponizing it, however, operates better when these injected payloads can discover cookie-like repeated secrets, which ADC lacks. GPA and PAS operate via a challenge-reponse system. CTM cookies find use at most once. Private IDs would presumably have left a client-hub connection’s compression dictionary by the time an attack might otherwise succeed and don’t appear in client-client connections. While a detailed analysis of the extent of practical feasibility remains wanting, I’m skeptical CRIME and BREACH much threaten ADCS.

4. Mitigation and Prevention in ADCS

Regardless, some of these attacks could be avoided entirely with specification updates incurring no ongoing cost and hindering implenetation on no common platforms. Three distinct categories emerge: BEAST and Lucky 13 attacks CBC in TLS; the RC4 break, well, attacks RC4; and CRIME and BREACH attack compression. Since one shouldn’t use RC4 regardless, that leaves AES-CBC attacks and compression attacks.

Disabling compression might incur substantial bandwidth cost for little thus-far demonstrated security benefit, so although ZLIB implementors should remain aware of CRIME and BREACH, continued usage seems unproblematic.

Separately, BEAST and Lucky 13 point to requiring TLS 1.1 and, following draft IETF recomendations for secure use of TLS and DTLS, preferring TLS 1.2 with the TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or other AES-GCM ciphersuite if supported by both endpoints. cryptlib, CyaSSL, GnuTLS, MatrixSSL, NSS, OpenSSL, PolarSSL, SChannel, and JSSE support both TLS 1.1 and TLS 1.2 and all but Java’s supports AES-GCM.

Suggested responses:

  • Consider how to communicate to ZLIB implementors the hazards and threat model, however minor, presented by CRIME and BREACH.
  • Formally deprecate SSL 3.0 and TLS 1.0 in the ADCS extension specification.
  • Discover which TLS versions and features clients (DC++ and variations, ncdc, Jucy, etc) and hubs (ADCH++, uHub, etc) support. If they use standard libraries, they probably all (except Jucy) already support TLS 1.2 with AES-GCM depending on how they configure their TLS libraries. Depending on results, one might already safely simply disable SSL 3.0 and TLS 1.0 in each such client and hub and prioritize TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or a similar ciphersuite so that it finds use when mutually available. If this proves possible, the the ADCS extension specification should be updated to reflect this.

Mixed-hash DC hubs

They work fine if clients and hubs support both TTH and its successor adequately long.

While transitioning to a TTH successor, currently interoperable clients and hubs all supporting only TTH will diverge. In examining the consequences of such diversity, one can partition concerns into client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication. In each case, one can look at scenarios with complete, partial, and no supported hash function overlap. Complete overlap defines the all-TTH status quo and, clearly, works without complication for all forms of DC communication, so this post focuses on the remaining situations. In general,

Almost as straightforwardly, ADC but not NMDC client-hub communication irrelevant to other clients requires partial but not complete hash function overlap but only between each individual client/hub pair, and don’t create specific mixed-hash hub problems; otherwise, an ADC hub indicates STA error code 47. For ADC, This category consists of GPA, PAS, PID/CID negotiation (with length caveats as relate to other clients interpreting the resulting CID), and the establishment of a session hash function; NMDC does not depend on hashing at all for analogous functionality. Thus, for NMDC, no problems occur here. ADC’s greater usage of hashing requires correspondingly more care.

Specifically, GPA and PAS require that SUP had established some shared hash function between the client logging in and the hub, but otherwise have no bearing on mixed-hash-function DC hubs. Deriving the CID from the PID involves the session hash algorithm, which as with GPA/PAS merely requires partial hash function support overlap between each separate client and a hub. Length concerns do exist here, but become relevant only with hub-mediated communication between two clients.

Indeed, clients communicating via a hub comprise the bulk of DC client-hub communication. Of these, INF, SCH, and RES directly involve hashed content or CIDs. SCH ($Search) allows one to search by TTH and would also allow one to search by TTH’s successor. Such searches can only return results from clients which support the hash in question, so as before, partial overlap between clients works adequately. However, to avoid incentivizing clients which support both TTH and its successor to broadcast both searches and double auto-search bandwidth, a combined search method containing both hashes might prove useful. Similarly, RES specifies that clients must provide the session hash of their file, but also “are encouraged to supply additional fields if available”, which might include non-session hash functions they happen to support, such that as with the first client-hub communication category, partial hash function support overlap between any pair of clients suffices, but no overlap does not.

A more subtle and ADC-specific issue issue arises via RES’s U-type message header and INF’s ID field whereby ADC software commonly checks for exactly 39-byte CIDs. While clients need not support whatever specific hash algorithm produced a CID, the ADC specification requires that they support variable-length CIDs. Example of other hash function output lengths which, minimally, should be supported include:

Bits Bytes Bytes (base32) Supporting Hashes
192 24 39 Tiger
224 28 45 Skein, Keccak, other SHA-3 finalists, SHA-2
256 32 52 Skein, Keccak, other SHA-3 finalists, SHA-2
384 48 77 Skein, Keccak, other SHA-3 finalists, SHA-2
512 64 103 Skein, Keccak, other SHA-3 finalists, SHA-2

Finally, direct client-client communications introduces CSUP ($Supports), GET/GFI/SND ($Get/$Send) via the TTH/ share root or its successor, and filelists, all of which work if and only if partial hash function support overlap exists. CSUP otherwise fails with error code 54 and some subset of hash roots and hash trees regarding some filelist must be mutually understood, so as with the other cases, partial but not complete hash function support overlap between any given pair of clients is required.

Encouragingly, since together client-hub communication irrelevant to other clients; hub-mediated communication between two clients; and direct client-client communication cover all DC communication, partial hash function support overlap between any given pair of DC clients or servers suffices to ensure that all clients might fully functionally interact with each other. This results in a smooth, usable transition period for both NMDC and ADC so long as clients and hubs only drop TTH support once its successor becomes sufficiently ubiquitous. Further, relative to ADC, poy has observed that “all the hash function changes on NMDC is the file list (already a new, amendable format) and searches (an extension) so a protocol freeze shouldn’t matter there”, which creates an even easier transition than ADC in NMDC.

In service of such an outcome, I suggest two parallel sets of recommendations, one whenever convenient and the other closer to a decision on a TTH replacement. More short-term:

  • Ensure ADC software obeys “Clients must be prepared to handle CIDs of varying lengths.”
  • Create an ADC mechanism by which clients supporting both TTH and its successor can search via both without doubling (broadcast) search traffic. Otherwise, malincentives propagate.
  • Ensure BLOM scales to multiple hash functions.
  • Update phrasing in ADC specification to clarify that all known hashes for a file should be included in RES, not just session hash.

As the  choice of TTH’s successor approaches:

  • Disallow new hash function from being 192 bits to avoid ambiguity with Tiger or TTH hashes. I suggest 224 or 256-bit output; SHA-2 and all SHA-3 finalists (including Keccak and Skein) offer both sizes.
  • Pick either a single filelist with all supported hashes or multiple filelists, each of which only supports one hash. I favor the former; it especially helps during a transition period for even a client downloading via TTH’s successor to be able to autosearch and otherwise interact with clients which don’t yet support the new hash function, without needing to download an entire new filelist.
  • Barring a more dramatic break in Tiger than thus far seen, clients should retain TIGR support until the majority of ADC hubs and NMDC or ADC clients offer support for the successor hash function’s extension.

By doing so, clients both supporting only TTH and both TTH and new hash function should be capable of interacting without problems, transparently to end-users, while over time creating a critical mass of new hash function-supporting clients such that eventually client and hub software might outright drop Tiger and TTH support.

A Decade of TTH: Its Selection and Uncertain Future

NMDC and ADC rely on the Tiger Tree Hash to identify files. DC requires a cryptographic hash function to avoid the previous morass of pervasive similar, but not identical, files. A bare cryptographic hash primitive such as SHA-1 did not suffice because not only did the files need identification as a whole but in separate parts, allowing reliable resuming and multi-source downloading, and per-segment integrity verification (RevConnect unsuccessfully attempted to reliably use multi-source downloading precisely because it could not rely on cryptographic hashes).

Looking for inspiration from other P2P software, I found that BitTorrent used (and uses) piecewise SHA-1 with per-torrent segment sizes. Since the DC share model asks that same hash function work across entire shares, this does not work. eDonkey2000 and eMule, with per-user shares similar to those of DC, resolved this with fixed, 9MB piecewise MD4, but this segment size scaled poorly, ensured that fixing corruption demanded at least 9MB of retransmission, and used the weak and soon-broken MD4. Gnutella, though, had found an elegant, scalable solution in TTH.

This Tiger Tree hash, which I thus copied from Gnutella, scales to both large and small files while depending on what was at the time a secure-looking Tiger hash function. It smoothly, adaptively sizes a hash tree while retaining interoperability between all such sizes of files files on a hub. By 2003, I had released BCDC++ which used TTH. However, the initial version of hash trees implemented by Gnutella and DC used the same hash primitive for leaf and internal tree nodes. This left it open to collisions, fixed by using different leaf and internal hash primitives. Both Gnutella and DC quickly adopted this fix and DC has followed this second version of THEX to specify TTH for the last decade.

Though it has served DC well, TTH might soon need a replacement. The Tiger hash primitive underlying it by now lists as broken due to a combination of a practical 1-bit pseudocollision attack on all rounds, a similarly feasible full collision on all but 5 of its 24 rounds, and full, albeit theoretical, 24-round pre-images (“Advanced Meet-in-the-Middle Preimage Attacks”, 2010, Guo et al). If one can collide or find preimages of Tiger, one can also trivially collide or find preimages of TTH. We are therefore investigating alternative cryptographic hash primitives to which we might transition as Tiger looks increasingly insecure and collision-prone, focusing on SHA-2 and SHA-3.

Splitting IDENTIFY to support multiple share profiles in ADC

ADC insufficiently precisely orders the IDENTIFY and NORMAL states such that ADC clients can properly support multiple share profiles. Several client software-independent observations imply this protocol deficiency:

  • ADC clients define download queue sources by CID, such that if sharing client presents multiple shares it must be through different CIDs, barring backwards-incompatible and queue-crippling requirements to only connect to a source via the hub through which it was queued.
  • A multiply-sharing ADC client in the server role must know the CTM token associated with a client-client connection to determine unambiguously which shares to present and therefore which CID to present to the non-server client.
  • ADC’s SUP specification, as illustrated by the example client-client connection, states that when “the server receives this message in a client-client connection in the PROTOCOL state, it should reply in kind, send an INF about itself, and move to the IDENTIFY state”; this implies the server client sending its CINF before the non-server client sends the CTM token in the TO field with its CINF.
  • Either the server or non-server client may be the downloader and vice versa. As such, by the time both the server and non-server clients in a client-client connection sends their CINF commands, they must know, since either may be a multiply-sharing client about to upload files, which CTM token with which to associate the connection.
  • The non-server client can unambiguously track which client-client connections it should associate with each CTM token by locally associating that token with each outbound client-client connection it creates, an association a server-client listening for inbound connections by cannot reliably create until the non-server client sends it a CINF with a token field.

Together, these ADC properties show that a server client which uploads using multiple share profiles must know which CID to send, but must do so before it has enough information to determine via the CTM token the correct share profile and thus the correct CID. Such a putatively multiply-sharing ADC client cannot, therefore, remain consistent with all of the listed constraints.

Most constraints prove impractical or undesirable to change, but by clarifying the SUP specification and IDENTIFY states, one can fix this ADC oversight while remaining compatible with DC++ and ncdc, with jucy apparently requiring adjustment. In particular, I propose to:

  1. Modify SUP and INF to require rather that the non-server client, rather than the server client, send the first INF; and
  2. in order to do so, split the IDENTIFY state into SERVER-IDENTIFY and CLIENT-IDENTIFY, whereby
  3. the next state after SUP in a client-client connection is CLIENT-IDENTIFY, which transitions to SERVER-IDENTIFY, which finally transitions as now to NORMAL

This effectively splits the IDENTIFY state into CLIENT-IDENTIFY and SERVER-IDENTIFY to ensure that they send their CINF commands in an order consistent with the requirement that both clients know the CTM token when they send their CINF command, finally allowing ADC to reliably support multiple share profiles.

Such a change appears compatible with both DC++ and ncdc, because both simply respond to CSUP with CINF immediately, regardless of what its partner in a client-client connection does. The only change required in DC++ and ncdc is for the server client to wait for the non-server client to send its CINF before sending a reply CINF rather than replying immediately to the non-server client’s CSUP.

jucy would need adjustment because it currently, by only triggering a non-server client’s CINF, containing the CTM token, in response to the server client’s pre-token CINF. A server client which waits for a jucy-based non-server client to send the first CINF will wait indefinitely.

Thus, by simply requiring that the non-server client in a client-client connection sends its CINF first, in a manner already compatible with DC++-based clients and ncdc and almost compatible with jucy, ADC-based can finally provide reliable multiple share profiles.

The GPL is not a EULA

GPL software therefore gain nothing by prompting the user to agree to or disagree with the GPL and DC++ will stop doing so. This holds both for the GPLv2 and GPLv3:

Some software packaging systems have a place which requires you to click through or otherwise indicate assent to the terms of the GPL. This is neither required nor forbidden. With or without a click through, the GPL’s rules remain the same.

Merely agreeing to the GPL doesn’t place any obligations on you. You are not required to agree to anything to merely use software which is licensed under the GPL. You only have obligations if you modify or distribute the software. If it really bothers you to click through the GPL, nothing stops you from hacking the GPLed software to bypass this.

The upcoming version of DC++ therefore does not ask the user to assent or otherwise to the GPL during installation.

Interestingly, several other top-ranking GPL-using SourceForge projects, about half of the tested sample, equally uselessly also require Windows users to agree to the GPL before allowing installation:

Distributing non-GPL DC++ mods infringes copyright

DC++ is licensed under the GPL, which requires that “the source code of the modified version is available to the users”. Courts support the GPL’s binding power in Germany and the US, while companies in additional countries such as the Netherlands (second Dutch case) and Taiwan have found it worthwhile to settle rather than fight. Several DC++ modifications, most prominently GreylinkDC++ and its derivatives, violate the GPL by not releasing source code, and thus lose eligibility for redistribution under the GPL. Absent such explicit GPL-provided allowance, websites and other media distributing them commit copyright infringement.

The DC++ project supports, instead, usage of one of the non-license-violating clients listed at ADCPortal.

bzip2 remains optimal for filelists

Several open-source file compressors are available. Currently, DC++ uses bzip2 with the default 900KiB block size, corresponding to bzip2 -9. Potential free alternatives include other bzip2 block sizes, zlib, xz, XMill, and FreeArc. I benchmarked the compression times and compression efficiencies of each alternative across a test set of 441 filelists; this ADCPortal thread has methodological details and test scripts.

Across the 9 zlib settings, 9 bzip2 settings, 10 XMill settings, 16 xz settings, and 21 FreeArc settings tested, many were both slower than and resulted in larger files than another than those of another compressor/setting pair. I started by filtering these out. Thus, all compressor/setting pairs charted have some advantage over all other such compressor/setting pairs.

Time vs size efficiency scatterplot (all filelists)

Size ratios (all filelists)Time ratios (all filelists)

Because the ratios are relative to the status quo (bzip2 -9 at (1,1) in the scatter plot), all compressor/setting pairs to the left of the points around (1,1) in the scatter plot create larger filelists more quickly than bzip2 -9. The few points to the right, mostly far to the right, are slower but result in slightly smaller files, up to 4% smaller. After filtering out the more space-efficient but 4 times slower (and and dependent on hundreds of MiB of RAM to compress) FreeArc settings on the high-end of the x axis, corresponding to FreeArc settings -5 and above, one is left with a more compact scatter plot:

Time vs size efficiency scatterplot (all filelists, sans high-FreeArc)This allows one to distinguish between all the other points more easily, but demonstrates that across the set of all tested filelists, the status quo at (1, 1) beats all others in compression efficiency. To the left lie mostly zlib and XMill variations which preprocess the XML filelist input then feed it to zlib, as well as a couple of low FreeArc settings. This distribution of compressor time and space efficiency holds true with minor adjustments when dividing filelists into quintiles by raw list size as well:

Time vs size efficiency scatterplot (quintiles)Size ratios (quintiles)Time ratios (quintiles)Those three points clustered around (4.4, 0.97) are, as before, high FreeArc settings. Though this was obscured by the previous charts, they only become advantageous at all with the largest quintile of filelists, those above 3.4MiB. For at least 80% of the 441 tested filelists, they provide negligible advantage of any sort. Because those large lists for which the 4x speed penalty would bite most are also the only filelists they would provide much size advantage to (e.g. 5% of a 5MiB-bzip2’ed list would save 256KiB, but even 5% of a 1MiB-bzip2’d list saves only 51KiB), the more CPU-intensive FreeArc settings are uselessly wasteful for DC filelists. Removing them creates a more readable scatter plot:

Time vs size efficiency scatterplot (quintiles, sans high FreeArc settings) The bzip2 -9 status quo resides, as stated, at (1,1) by construction of ratios. At best, of the speed-wise feasible alternative compressors/settings, one might extract perhaps a 2-3% compressed list size reduction, at the cost of 20% slower compression; having to maintain two compressors indefinitely and thus in practice having to compress twice; and adopting nonstandard or abandoned (XMill), albeit still open-source, compressors.

bzip2 -9, the existing status quo, has proven a remarkably durable compromise between speed and compressed size and even given an opportunity to select a compression algorithm without regard for backwards compatibility, I would choose it again for a DC-like protocol. Combined with the high costs of transitioning to anything else, bzip2 -9 thus remains an the optimal filelist compression algorithm for DC.

Interview series: poy

This is one part of a blog series where influential people in Direct Connect are interviewed about the past, present and future state of Direct Connect.


poy is a developer known for his influence in the development of DC++ and ADCH++. He raised code base standards to a new high, made the UI library transition (in DC++) end up well, have been one of the main developers of DC++ and ADCH++, and more.

What follows are the questions I asked and the answers he gave.

  • What made and when did you start with Direct Connect?

About 8 years ago, back when all I knew of P2P was eMule, KaZaA, Limewire, emerging torrents and other faceless global file sharing solutions, a friend of mine told me about this great gem he had just found, where getting into servers depends upon how well and how much you share yourself.

  • When did you start with the development of DC++?

Looking into the changelog, my first patches landed in version 0.695, released 5 years ago. Before that, I had been spending most of my C++ coding time modding my own version of DC++.

  • What made you interested in the development of DC and DC++?

DC has a unique set of communities that care about sharing in a sense that I fully appreciate. It is therefore always a pleasure to look into its inner workings.

Being reverse-engineered, the initial DC protocol has always left me with a slightly bitter feeling, thus getting me to look for a better, more future-proof alternative. Fortunately, by the time I got interested in that, ADC was already on its way up. It was however far from perfect, especially with regards to the lack of hub programs available. This has led to a constant state of motion, ideas, achievements which quickly appealed to me.

For DC++ specifically, I enjoy its use of many modern programming constructs, its large user base and the fact that there are still so many things to do with it.

  • You are one of the main developers for DC++; What are your goals and ideas?

One of my primary goals is and has always been to be able to replace the mod I used to develop a while ago, which included various features I have yet to see in any other DC client so far. This includes a more evolved chatting facility, configuration for each hub or each hub group, various UI fanciness…

I have many ideas but mostly smaller, short-term ones, such as (just dropping some from my todo list) ways to filter lists with Regular expressions, using the binary and/or city databases of GeoIP, adding right-click accelerators similar to those in Internet Explorer

One untold goal of the UI library that DC++ is using (formerly SmartWin++, now DWT for “DC++ Widget Toolkit”) is to support more than just Windows and to be released separately. Knowing of other applications than DC++ that are able to use DWT would be quite an accomplishment.

Another ongoing goal is to keep DC++ on top of the latest changes in C++ standards. The C++ committee has been quite productive these last few years.

  • You are also involved with the development of ADCH++; What are your goals and ideas?

My ultimate goal for ADCH++ would be a hub that anyone can easily configure, yet expert users can still enjoy. I believe the latter part to be close to reality right now; the former, however, is not quite so according to the feedback we are getting. A GUI would be most welcome. One was in development at one point but it seems to have halted; I hope it gets picked up again.

Aside from that, I consider ADCH++ to be quite complete; its author has made a wonderful job with it. A few features may not yet be available but they are all doable with plugins / scripts, so that doesn’t really fall on ADCH++ itself.

  • Do you have any goals and ideas with the further expansion of NMDC and/or ADC?

I hope ADC can provide a fully secure environment; it is already possible to encrypt ADC traffic, but that doesn’t mean it is completely secure.

I hope ADC can reach a state comparable to other Internet protocols such as HTTP, FTP, etc.

I have been wondering about an ADC extension idea that I would call HFAV, where clients would send the hub addresses they know about to hubs, which in turn would poll all the addresses they have gathered, regularly ping them and dispatch the results to any client that requests them.

  • What other DC projects are you involved in, and what are your goals and ideas with those?

I have written DCBouncer mostly for my own use: it stays connected on a server I own and simulates a presence on some hubs, gathering chat messages (including private ones); then when i log back in, it forwards them to me. Usage scenarios are multiple: logging, hiding one’s actual IP, perhaps in the future even share directly from the server…

  • What do you think will attract more users of DC in general? Would that be desirable?

Internet ads praising the DC idea would, in my opinion, be a great way to attract users. Another is to troll P2P forums.

The main reason I like DC is its elitist philosophy: the better and the faster you share, the more likely you are to get invited to better hubs with users with a better sharing quality. If I wanted some kind of global server where anyone could enter, I would prefer the programs previously mentioned. That is the reason I am not a fan of hubs that strive for a maximum amount of users and disregard the community aspect; and especially why i dislike the DHT feature that some clients have implemented as DHT is taking that idea of faceless hubs to an extreme.

The stuff one can find on global P2P networks / faceless hubs is very different than what is available in elitist DC communities. I believe luring users of the former category into DC would not necessarily be beneficial; it would be more interesting to strive for those who are likely to contribute to the very unique DC sharing spirit.

  • Would you change anything with DC, DC++, ADC etc if you could?

All the ideas I have had so far have been quite realistic; I guess I tend to unconsciously discard those I couldn’t be able to implement myself.

  • How much time do you spend on DC development?

This depends a lot on real life, other Internet groups I frequent and the motivation I have to accomplish a particular task. At times I have spent 6 hours straight; other times, just 15 minutes to do an otherwise awesome change. I try to at least check what’s up once a day or once a week on busy weeks.

  • What would cause you to spend more time with development?

An awesome idea that would have many code ramifications; or a crazily hard-to-find bug.

  • What was the most difficult thing you have done with DC?

I would be tempted to mention the recent crash logger, which led me to fully read the libdwarf, DWARF 2 and PE/COFF documentations. This was a unique achievement so I had no example on how to go about it at first. But in retrospect, this wasn’t quite that difficult.

The hardest thing I can remember is having to track down bug 590651. It was a leak of a graphic object that resulted in all debug reports being useless. We ended up having to ask some testers to run several versions of the program at the same time to try to figure out a scheme to the crash, until one postulated that they were related to the amounts of hub disconnects received by the program. That wasn’t much to go on from but I eventually figured it may have been related to tab icon changes on disconnects. Although the fix was trivial, pinpointing the bug was quite the hardship.

  • What was the most important thing you have done with DC?

I am quite proud of the DC++ window manager, which restores the session to the way it was before the program was closed. This goes with the menu of recent windows in the toolbar.

  • What was the most fun thing you have done with DC?

Hard to say, it’s always fun and a unique sensation of accomplishment that only a developer can know of. :) But I guess ADCH++-PtokaX is pretty fun in its own ironic way.


Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Copyright notice for Tiger implementations

I and Jacek began a while back discussing the Tiger implementation in DC++. The implementation is a C++ port of the original C code available on the official Tiger website; http://www.cs.technion.ac.il/~biham/Reports/Tiger/

What was noticable about the C code was that there was no license attached; this means that the implementation fall under default copyright laws (in this case, US laws). As such, any type of modification or derivative (which the C++ implementation might be considered as) is most likely not directly allowed.

I sent an e-mail to the authors to rectify the situation and make sure there is sound lawful ground for DC++, other derivatives and users of the C++ code.

The following is what I sent to Eli Biham, one of the authors;

Under what license is your C implementation of the Tiger (http://www.cs.technion.ac.il/~biham/Reports/Tiger/) algorithm? The source code doesn’t state any license explicitly, nor does the Tiger main page. As it stands now, it is not possible to use your implementation, as is, in an application without getting explicit permission from you.

Biham responded;

Dear Fredrik,

I hereby allow you to use it, provided it will compute Tiger, and state
the names of the authors in it.

Clearly, the usual disclaimers hold, e.g., that it’s use will be legal,
and that it will not be exported to countries banned by law, that the
authors will not be responsible for the code, your software, nor
anything else.

Regards,

Eli

In an effort to create a cleaner phrasing that would suit source code, I rephrased and added the following to the DC++ source (in TigerHash.cpp/h, in DC++ 0.780);

/*
* The Tiger algorithm was written by Eli Biham and Ross Anderson and
* is available on the official Tiger algorithm page .
* The below Tiger implementation is a C++ version of their original C code.
* Permission was granted by Eli Biham to use with the following conditions;
* a) This note must be retained.
* b) The algorithm must correctly compute Tiger.
* c) The algorithm’s use must be legal.
* d) The algorithm may not be exported to countries banned by law.
* e) The authors of the C code are not responsible of this use of the code,
* the software or anything else.
*/

If you are using the C implementation or a derivate (including DC++’s implementation), you must include a similar notice.

Feel free to use my phrasing or write your own (adherring to Biham’s restrictions).

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”