SSE3 in DC++

The next DC++ release will require SSE3. Steam’s hardware survey currently lists SSE3 as having 99.96% penetration. All AMD and Intel x86 CPUs since the Athlon 64 X2 in 2005 and Intel Core in January 2006 have supported SSE3. Even earlier, though, all Pentium 4 steppings since Prescott which support the NX bit required by Windows 8 and 10 also support SSE3, which extends the effective Intel support back to 2004. I can’t find an Intel CPU which supports NX (required for Win8/10) but not SSE3. Finally, this effectively affects only 32-bit builds, since 64-bit builds exclusively use SSE for floating-point arithmetic.

This effects two basic transformations, one minor and one major, depending on how well the existing code compiles. The minor improvement derives from functions such as bool SettingsDialog::handleClosing() using one instruction rather than two, from

bool SettingsDialog::handleClosing() {
	dwt::Point pt = getWindowSize();
	SettingsManager::getInstance()->set(SettingsManager::SETTINGS_WIDTH,
    cvttss2si eax,DWORD PTR [esp+0x18] ;; eax is just a temporary
    mov    DWORD PTR [edx+0x87c],eax   ;; which is promptly stored to mem

to

bool SettingsDialog::handleClosing() {
	dwt::Point pt = getWindowSize();
	SettingsManager::getInstance()->set(SettingsManager::SETTINGS_WIDTH,
    fisttp DWORD PTR [edx+0x87c]      ;; no byway through eax (also, less register pressure)

However, sometimes cvttss2si and related SSE/SSE2 instructions don’t fit as well, so g++ had been relying on fistp. These instances previously produced terrible code generation; without SSE3, only using through SSE2, part of void SearchFrame::runSearch() compiles to:

	auto llsize = static_cast(lsize);
    fnstcw WORD PTR [ebp-0x50e]     ;; save FP control word to mem
    movzx  eax,WORD PTR [ebp-0x50e] ;; zero-extend-move it to eax
    mov    ah,0xc                   ;; build new control word
    mov    WORD PTR [ebp-0x510],ax  ;; place control word in mem for fldcw
    fld    QWORD PTR [ebp-0x520]    ;; load lsize from mem (same as below)
    fldcw  WORD PTR [ebp-0x510]     ;; load new control word
    fistp  QWORD PTR [ebp-0x548]    ;; with correct control word, round lsize
    fldcw  WORD PTR [ebp-0x50e]     ;; restore previous control word

All 6 red-highlighted lines just scaffold around the actual fistp doing the floating point-to-int rounding, which can cost 80 cycles or more for this single innocuous-looking line of code. By contrast, using fisttp from SSE3, that same fragment collapses to:

	auto llsize = static_cast(lsize);
    fld    QWORD PTR [ebp-0x520]    ;; same as above; load lsize
    fisttp QWORD PTR [ebp-0x548]    ;; convert it. simple.

This pattern recurs many times through DC++, including void AdcHub::handle(AdcCommand::GET which has a portion halving in size and dramatically increasing in speed from

		// Ideal size for m is n * k / ln(2), but we allow some slack
		// When h >= 32, m can't go above 2^h anyway since it's stored in a size_t.
		if(m > (5 * Util::roundUp((int64_t)(n * k / log(2.)), (int64_t)64)) || (h < 32 && m > static_cast(1U << h))) {
    mov    DWORD PTR [esp+0x1c],edi
    xor    ecx,ecx
    imul   eax,DWORD PTR [esp+0x18]
    movd   xmm0,eax
    movq   QWORD PTR [esp+0x58],xmm0
    fild   QWORD PTR [esp+0x58]
    fdiv   QWORD PTR ds:0xca8
    fnstcw WORD PTR [esp+0x22]     ;; same control word dance as before
    movzx  eax,WORD PTR [esp+0x22]
    mov    ah,0xc                  ;; same control word
    mov    WORD PTR [esp+0x20],ax  ;; but fldcw loads from mem not reg
    fldcw  WORD PTR [esp+0x20]     ;; load C and C++-compatible rounding mode
    fistp  QWORD PTR [esp+0x58]    ;; the actual conversion
    fldcw  WORD PTR [esp+0x22]     ;; restore previous
    mov    eax,DWORD PTR [esp+0x58]
    mov    edx,DWORD PTR [esp+0x5c]

to, using the fisttp SSE3 instruction,

		// Ideal size for m is n * k / ln(2), but we allow some slack
		// When h >= 32, m can't go above 2^h anyway since it's stored in a size_t.
		if(m > (5 * Util::roundUp((int64_t)(n * k / log(2.)), (int64_t)64)) || (h < 32 && m > static_cast(1U << h))) {
    mov    DWORD PTR [esp+0x20],edi
    xor    ecx,ecx
    imul   eax,DWORD PTR [esp+0x1c]
    movd   xmm0,eax
    movq   QWORD PTR [esp+0x58],xmm0
    fild   QWORD PTR [esp+0x58]
    fdiv   QWORD PTR ds:0xca8
    fisttp QWORD PTR [esp+0x58]    ;; replaces all seven red lines
    mov    eax,DWORD PTR [esp+0x58]
    mov    edx,DWORD PTR [esp+0x5c]

This specific control word save/convert float/control word restore pattern recurs 19 other times across the current codebase in the dcpp, dwt, and win32 directories, including DownloadManager::getRunningAverage(); HashBloom::get_m(size_t n, size_t k); QueueItem::getDownloadedBytes(); Transfer::getParams(); UploadManager::getRunningAverage(); Grid::calcSizes(…); HashProgressDlg::updateStats(); TransferView::on(HttpManagerListener::Updated, …); and TransferView::onTransferTick(…).

Know your FPU: Fixing Floating Fast provides microbenchmarks showing just how slow this fistp-based technique can be due to the fnstcw/fldcw 80+-cycle FPU pipeline flush and therefore how much faster code which replaces it can become:

Fixed tests...
Testing ANSI fixed() ... Time = 2974.57 ms
Testing fistp fixed()... Time = 3100.84 ms
Testing Sree fixed() ... Time =  606.80 ms

SSE3 provides not simply some hidden code generation aesthetic quality improvement, but a speed increase across much of DC++.

Why DCNF uses HTTPS via Let’s Encrypt

All DCNF web services either use HTTPS or are being transitioned to HTTPS.

The US government’s HTTPS-only standard and Google’s “Why HTTPS Matters” describe how HTTPS enables increased website privacy, security, and integrity in general. ISPs, home routers, and antivirus software have all been caught modifying HTTP traffic, for example, which HTTPS hinders. HTTPS also increases Google’s search ranking and, via HTTP/2, decreases website loading time.

Somewhat more forcefully, Chrome 56 will warn users of non-HTTPS login forms, as does Firefox 50 beta and according to schedule, will Firefox 51. This will become important, for example, for the currently-under-maintenance DCBase forums.

Beyond the obvious advantages of not costing money, Let’s Encrypt provides important reduced friction versus alternatives in automatically and therefore scalably managing certificates for multiple subdomains, as well as ameliorating certificate revocation and security-at-rest importance and thereby HTTPS management overhead by such automation allowing more shorter-lived certificates and more rapid renewal. Additionally, as crypto algorithms gain and lose favor, such quick renewals catalyze agility. These HTTPS, in general, and Let’s Encrypt, specifically, advantages have led to adopting HTTPS using Let’s Encrypt.

DC++ Will Require SSE2

The next version of DC++ will require SSE2 CPU support.

This represents no change for the 64-bit builds since x86-64 includes SSE2. The last widely used CPUs affected, lacking SSE2 support, are Athlon XPs the last of which were released in 2004. As such, not just DC++ but Firefox 49, Chrome on both Windows and Linux since 2014, IE 11 since 2013, and Windows 8 since 2012 all require SSE2. Empirically, Firefox developers found that just 0.4% of their users as of this May lacked SSE2 and Chrome developers measured 0.33% of their Windows stable population lacking SSE2 in 2014, suggesting that to the extent not requiring SSE2 imposes non-negligible development or runtime cost, one might find increasingly thin support for avoiding it.

A straightforward advantage SSE2 provides derives from non-SIMD 32-bit x86 supporting only arguably between 6 and 8 general-purpose 32-bit registers. SSE2 in 32-bit environments adds 8 additional registers, substantially increasing x86’s architecturally named registers.

Furthermore, these additional registers in 32-bit x86 are 128-bit, allowing 64-bit and 128-bit memory moves in single instructions, rather than multiple 32-bit mov instructions, which also enables each reg/mem move to more efficiently align on larger boundaries. Similarly, access to 64-bit arithmetic and comparisons on x86 allow native handling of all those 64-bit arithmetic, logic, and comparison operations which show up both in the Tiger hash code (designed for 64-bit CPUs and it shows) and the 64-bit file position handling pervasive in DC++.

Finally, there’s substantial use of 2-wide SIMD, especially when common patterns such as

foo += bar;
baz += foobar;

via SSE2 packed integer addition (e.g., paddq) or

foo -= bar;
baz -= foobar;

appear, using packed integer subtraction (e.g., psubq).

Putting all this together in one of the more dramatic improvements in generated code quality as a result of this change, one can watch as enabling SSE2 automatically transforms part of TigerHash::update(…) from:

193:dcpp/TigerHash.cpp **** 	}
movl	168(%esp), %edi	 # %sfp, x7
movl	172(%esp), %ebp	 # %sfp, x7
movl	440(%esp), %ebx	 # %sfp, x1
movl	444(%esp), %esi	 # %sfp, x1
movl	%edi, %eax	 # x7, tmp2058
movl	412(%esp), %edx	 # %sfp, x0
xorl	$-1515870811, %eax	 #, tmp2058
movl	%eax, 488(%esp)	 # tmp2058, %sfp
movl	%ebp, %eax	 # x7, tmp2059
movl	%ebx, %ecx	 # x1, tmp2062
xorl	$-1515870811, %eax	 #, tmp2059
movl	%esi, %ebx	 # x1, tmp2063
movl	156(%esp), %esi	 # %sfp, x2
movl	%eax, 492(%esp)	 # tmp2059, %sfp
movl	408(%esp), %eax	 # %sfp, x0
subl	488(%esp), %eax	 # %sfp, x0
sbbl	492(%esp), %edx	 # %sfp, x0
xorl	%eax, %ecx	 # x0, tmp2062
movl	%ecx, 384(%esp)	 # tmp2062, %sfp
xorl	%edx, %ebx	 # x0, tmp2063
movl	384(%esp), %edi	 # %sfp, x1
movl	%ebx, 388(%esp)	 # tmp2063, %sfp
movl	152(%esp), %ebx	 # %sfp, x2
movl	388(%esp), %ebp	 # %sfp, x1
movl	%edi, %ecx	 # x1, tmp2066
notl	%ecx	 # tmp2066
addl	%edi, %ebx	 # x1, x2
movl	%ecx, 496(%esp)	 # tmp2066, %sfp
movl	%ebp, %ecx	 # x1, tmp2067
adcl	%ebp, %esi	 # x1, x2
notl	%ecx	 # tmp2067
movl	%ebx, (%esp)	 # x2, %sfp
movl	%ecx, 500(%esp)	 # tmp2067, %sfp
movl	496(%esp), %ecx	 # %sfp, tmp1093
movl	%esi, 4(%esp)	 # x2, %sfp
movl	500(%esp), %ebx	 # %sfp,
movl	(%esp), %esi	 # %sfp, x2
movl	4(%esp), %edi	 # %sfp,
shldl	$19, %ecx, %ebx	 #, tmp1093,
movl	%esi, %ebp	 # x2, tmp2069
movl	460(%esp), %esi	 # %sfp, x3
sall	$19, %ecx	 #, tmp1093
xorl	%edi, %ebx	 #, tmp2070
xorl	%ecx, %ebp	 # tmp1093, tmp2069
movl	%ebp, 504(%esp)	 # tmp2069, %sfp
movl	%ebx, 508(%esp)	 # tmp2070, %sfp
movl	456(%esp), %ebx	 # %sfp, x3
subl	504(%esp), %ebx	 # %sfp, x3
sbbl	508(%esp), %esi	 # %sfp, x3
movl	%ebx, %edi	 # x3, x3

to something of comparative beauty:

193:dcpp/TigerHash.cpp **** 	}
movl	80(%esp), %eax	 # %sfp, tmp1091
movl	84(%esp), %edx	 # %sfp,
xorl	$-1515870811, %eax	 #, tmp1091
xorl	$-1515870811, %edx	 #,
movd	%eax, %xmm0	 # tmp1091, tmp1885
movd	%edx, %xmm1	 #, tmp1886
punpckldq	%xmm1, %xmm0	 # tmp1886, tmp1885
psubq	%xmm0, %xmm7	 # tmp1885, x0
movdqa	96(%esp), %xmm1	 # %sfp, tmp2253
pxor	%xmm7, %xmm1	 # x0, tmp2253
movdqa	%xmm1, %xmm0	 # x1, tmp1843
psrlq	$32, %xmm0	 #, tmp1843
movd	%xmm1, %edx	 # tmp21, tmp2105
notl	%edx	 # tmp2105
movd	%xmm0, %eax	 #, tmp2106
notl	%eax	 # tmp2106
paddq	%xmm1, %xmm6	 # x1, x2
movl	%edx, 192(%esp)	 # tmp2105, %sfp
movdqa	%xmm1, %xmm3	 # tmp2253, x1
movl	%eax, 196(%esp)	 # tmp2106, %sfp
movl	192(%esp), %eax	 # %sfp, tmp1093
movl	196(%esp), %edx	 # %sfp,
shldl	$19, %eax, %edx	 #, tmp1093,
sall	$19, %eax	 #, tmp1093
movd	%edx, %xmm1	 #, tmp1888
movd	%eax, %xmm0	 # tmp1093, tmp1887
punpckldq	%xmm1, %xmm0	 # tmp1888, tmp1887
pxor	%xmm6, %xmm0	 # x2, tmp1094
psubq	%xmm0, %xmm5	 # tmp1094, tmp2630

The register overflow spill/fills in the non-SSE version from %eax to 492(%esp) back to %edx three instructions later to enable %eax to be reused; from %ecx to 500(%esp) back to %ebx in another three instructions to enable 496(%esp) to be left-shifted a few instructions later; and between %edi, %ecx, and that same 496(%esp) because evidently, there’s not enough space to sort both %ecx and notl %ecx simultaneously with a half-dozen GPRs.

Virtually no spills/fills remain because there are now ample registers; the movdqa from 96(%esp) to %xmm1 replaces multiple 32-bit movl instructions; the ugly addl/adcl and subl/sbbl pairs emulating 64-bit addition and subtraction using 32-bit arithmetic disappear in lieu of natively 64-bit arithmetic; and each pair of 32-bit xorl instructions becomes a single pxor.

While TigerHash.cpp especially shows off SSE2’s advantage over i686-generation 32-bit x86, each of these improvements appears sprinked in thousands of places around DC++, in function prologues, every time certain Boost template functions shows up, every time _builtin_memcpy is called, and in dozens of other mundane yet common situations.

Setting up multiple-subdomain HTTPS with nginx, acme-tiny, and Lets Encrypt

This guide briefly describes aspects of setting up nginx and acme-tiny to automatically register and renew multiple subdomains.

acme-tiny (Debian, Ubuntu, Arch, OpenBSD, FreeBSD, and Python Package Index) provides a more verifiable and more easily customizable than the default Let’s Encrypt client. This proves especially useful in less mainstream contexts where either the main client works magically or fails magically, but tends to offer little between those two outcomes.

The first step is to create a multidomain CSR which informs Let’s Encrypt of which domains it should provide certificates for. When adding or removing subdomains, this needs to be altered:
# OpenSSL configuration to generate a new key with signing requst for a x509v3
# multidomain certificate
#
# openssl req -config bla.cnf -new | tee csr.pem
# or
# openssl req -config bla.cnf -new -out csr.pem
[ req ]
default_bits = 4096
default_md = sha512
default_keyfile = key.pem
prompt = no
encrypt_key = no

# base request
distinguished_name = req_distinguished_name

# extensions
req_extensions = v3_req

# distinguished_name
[ req_distinguished_name ]
countryName = "SE"
stateOrProvinceName = "Sollentuna"
organizationName = "Direct Connect Network Foundation"
commonName = "dcbase.org"

# req_extensions
[ v3_req ]
# https://www.openssl.org/docs/apps/x509v3_config.html
subjectAltName = DNS:dcbase.org,DNS:www.dcbase.org

Then, when one is satisfies with one’s changes:
openssl req -new -key domain.key -config ~/dcbase_openssl.cnf > domain.csr
in the appropriate directory to regenerate a CSR based on this configuration. One does not have to change this CSR unless the set of subdomains or other information contained within also changes. Simply renewing certificates does not require regenerating domain.csr.

Having created a CSR, one then needs to ensure Let’s Encrypt knows where to find it. The ACME protocol Let’s Encrypt uses specifies that this should be /.well-known/acme-challenge/ and per acme-tiny’s documentation:
# https://github.com/diafygi/acme-tiny#step-3-make-your-website-host-challenge-files
location /.well-known/acme-challenge/ {
alias $appropriate_challenge_location;

allow all;
log_not_found off;
access_log off;

try_files $uri =404;
}

Where this needs to be accessible via ordinary HTTP, port 80, to work most conveniently, even if the entire rest of the site is HTTPS-only. Furthermore, this needs to hold even for otherwise dynamically generated sites — e.g., http://build.dcbase.org/.well-known/acme-challenge/, http://builds.dcbase.org/.well-known/acme-challenge/, http://archive.dcbase.org/.well-known/acme-challenge/, and http://forum.dcbase.org/.well-known/acme-challenge/ would all need to point to that same challenge location, even if disparate PHP CMSes generate each or they ordinarily redirect to other sites (such as Google Drive).

If this works, then one sees:
Parsing account key...
Parsing CSR...
Registering account...
Already registered!
Verifying dcbase.org...
dcbase.org verified!
Verifying http://www.dcbase.org...
http://www.dcbase.org verified!
Signing certificate...
Certificate signed!

When running acme-tiny.

Once this works reliably, the whole process should be run automatically as a cron job often enough to stay ahead of Let’s Encrypt’s 90-day cycle. However, one cannot renew too often:

The main limit is Certificates per Registered Domain (20 per week). A registered domain is, generally speaking, the part of the domain you purchased from your domain name registrar. For instance, in the name http://www.example.com, the registered domain is example.com. In new.blog.example.co.uk, the registered domain is example.co.uk. We use the Public Suffix List to calculate the registered domain.

If you have a lot of subdomains, you may want to combine them into a single certificate, up to a limit of 100 Names per Certificate. Combined with the above limit, that means you can issue certificates containing up to 2,000 unique subdomains per week. A certificate with multiple names is often called a SAN certificate, or sometimes a UCC certificate.

Once Let’s Encrypt certificate renewal’s configured, Strong Ciphers for Apache, nginx and Lighttpd and BetterCrypto provide reasonable recommendations, while BetterCrypto’s Crypto Hardening guide discusses more deeply rationales behind these choices.

Finally, SSL Server Test and Analyse your HTTP response headers offer sanity checks for multiple successfully secured subdomains served by nginx over HTTPS using Let’s Encrypt certificates.

Hardening DC++ Cryptography: TLS, HTTPS, and KEYP

BEAST, CRIME, BREACH, and Lucky 13 together left DC++ with no secure TLS support. Since then, the triple handshake attack, Heartbleed, POODLE for both SSL 3 and TLS, FREAK, and Logjam have multiplied hazards.

Fortunately, in the intervening year and a half, in response:

  • poy introduces direct, encrypted private messages in DC++ 0.830.
  • DC++ 0.840 sees substantial, wide-ranging improvements in KEYP and HTTPS support from Crise, anticipating Google sunsetting SHA1 by several months and detecting man-in-the-middle attempts across both KEYP and HTTPS.
  • OpenSSL 1.0.1g, included in DC++ 0.842, fixes Heartbleed.
  • DC++ 0.850 avoids CRIME and BREACH by disabling TLS compression; avoids RC4 vulnerabilities by removing support for RC4; prevents BEAST by supporting TLS 1.1 and 1.2; mitigates Lucky 13 through preferring AES-GCM ciphersuites; removes support for increasingly factorable 512-bit and 1024-bit DH and RSA ephemeral TLS keys; and with all but one ciphersuite, AES128-SHA, deprecated and included for DC++ pre-0.850 compatibility, uses either DHE or ECDHE ciphersuites to provide perfect forward secrecy, mitigating any future Heartbleed-like vulnerabilities.
  • DC++ 0.851 uses a new OpenSSL 1.0.2 API to constrain allowed elliptic curves to those for which OpenSSL provides constant-time assembly code to avoid timing side-channel attacks.

These KEYP, TLS, and HTTPS improvements have not only fixed known weaknesses, but prevent DC++ 0.850 and 0.851 from ever having been vulnerable to either FREAK or Logjam. As with perfect forward secrecy, these changes increase DC++’s ongoing security against yet-unknown cryptographic developments.

The upcoming version switches URLs in documentation, in menu items, and of the GeoIP downloads from HTTP to HTTPS. While these changes do not and cannot prevent attacks perfectly, it should now provide users with improved and still-improving cryptographic security for the benefit of all DC++ users.

DC Development hub revived

Following a two-month-long hiatus, adcs://hub.dcbase.org:16591 hosts the DC development hub again.

BEAST, CRIME, BREACH, and Lucky 13: Assessing TLS in ADCS

1. Summary

Several TLS attacks since 2011 impel a reassessment of the security of ADC’s usage of TLS to form ADCS. While the specific attacks tend not to be trivially replicated in a DC client as opposed to a web browser, remaining conservative with respect to security remains useful, the issues they exploit could cause problems regardless, and ADCS’s best response thus becomes to deprecate SSL 3.0 and TLS 1.0. Ideally, one should use TLS 1.2 with AES-GCM. Failing that, ensuring that TLS 1.1 runs and chooses AES-based ciphersuite works adequately.

2. HTTP-over-TLS Attacks

BEAST renders practical Rogaway’s 2002 attack on the security of CBC ciphersuites in SSL/TLS by using an SSL/TLS server’s CBC padding MAC acceptance/rejection as a timing oracle. Asking whether each possible byte in each position results in successful MAC, it decodes an entire message. One can avert BEAST either by avoiding CBC in lieu of RC4 or updating to TLS 1.1 or 1.2, which mitigate the timing oracle and generate new random IVs to undermine BEAST’s sequential attack.

CRIME and BREACH build on a 2002 compression and information leakage of plaintext-based attack. CRIME “requires on average 6 requests to decrypt 1 cookie byte” and, like BEAST, recognizes DEFLATE’s smaller output when it has found a pre-existing copy of the correct plaintext in its dictionary. Unlike BEAST, CRIME and BREACH depend not on TLS version or CBC versus RC4 ciphersuites but merely compression. Disabling HTTP and TLS compression therefore avoids CRIME and BREACH.

One backwards-compatible solution thus far involves avoiding compression due to CRIME/BREACH and avoiding BEAST with RC4-based TLS ciphersuites. However, a new attack against RC4 in TLS by AlFardan, Bernstein, et al exploits double-byte ciphertext biases to reconstruct messages using approximately 229 ciphertexts; as few as 225 achieve a 60+% recovery rate. RC4-based ciphersuites decreasingly inspire confidence as a backwards-compatible yet secure approach to TLS, enough that the IETF circulates an RFC draft prohibiting RC4 ciphersuites.

Thus far treating DC as sufficiently HTTP-like to borrow their threat model, options narrow to TLS 1.1 or TLS 1.2 with an AES-derived ciphersuite. One needs still beware: Lucky 13 weakens even TLS 1.1 and TLS 1.2 AES-CBC ciphers, leaving between it and the RC4 attack no unscathed TLS 1.1 configuration. Instead, AlFardan and Paterson recommend to “switch to using AEAD ciphersuites, such as AES-GCM” and/or “modify TLS’s CBC-mode decryption procedure so as to remove the timing side channel”. They observe that each major TLS library has addressed the latter point, so that AES-CBC might remain somewhat secure; certainly superior to RC4.

3. ADC-over-TLS-specific Concerns

ADCS clients’ and hubs’ vulnerability profiles and relevant threat models regarding each of BEAST, CRIME, BREACH, Lucky 13, and the RC4 break differ from that of a web browser using HTTP. BEAST and AlFardan, Bernstein, et al’s RC4 attack both point to adopting TLS 1.1, a ubiquitously supportable requirement worth satisfying regardless. OpenSSL, NSS, GnuTLS, PolarSSL, CyaSSL, MatrixSSL, BouncyCastle, and Oracle’s standard Java crypto library have all already “addressedLucky 13.

ADCS doesn’t use TLS compression, so that aspect of CRIME/BREACH does not apply. The ZLIB extension does operate analogously to HTTP compression. Indeed, the BREACH authors remark that:

there is nothing particularly special about HTTP and TLS in this side-channel. Any time an attacker has the ability to inject their own payload into plaintext that is compressed, the potential for a CRIME-like attack is there. There are many widely used protocols that use the composition of encryption with compression; it is likely that other instances of this vulnerability exist.

ADCS provides an attacker this capability via logging onto a hub and sending CTMs and B, D, and E-type messages. Weaponizing it, however, operates better when these injected payloads can discover cookie-like repeated secrets, which ADC lacks. GPA and PAS operate via a challenge-reponse system. CTM cookies find use at most once. Private IDs would presumably have left a client-hub connection’s compression dictionary by the time an attack might otherwise succeed and don’t appear in client-client connections. While a detailed analysis of the extent of practical feasibility remains wanting, I’m skeptical CRIME and BREACH much threaten ADCS.

4. Mitigation and Prevention in ADCS

Regardless, some of these attacks could be avoided entirely with specification updates incurring no ongoing cost and hindering implenetation on no common platforms. Three distinct categories emerge: BEAST and Lucky 13 attacks CBC in TLS; the RC4 break, well, attacks RC4; and CRIME and BREACH attack compression. Since one shouldn’t use RC4 regardless, that leaves AES-CBC attacks and compression attacks.

Disabling compression might incur substantial bandwidth cost for little thus-far demonstrated security benefit, so although ZLIB implementors should remain aware of CRIME and BREACH, continued usage seems unproblematic.

Separately, BEAST and Lucky 13 point to requiring TLS 1.1 and, following draft IETF recomendations for secure use of TLS and DTLS, preferring TLS 1.2 with the TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or other AES-GCM ciphersuite if supported by both endpoints. cryptlib, CyaSSL, GnuTLS, MatrixSSL, NSS, OpenSSL, PolarSSL, SChannel, and JSSE support both TLS 1.1 and TLS 1.2 and all but Java’s supports AES-GCM.

Suggested responses:

  • Consider how to communicate to ZLIB implementors the hazards and threat model, however minor, presented by CRIME and BREACH.
  • Formally deprecate SSL 3.0 and TLS 1.0 in the ADCS extension specification.
  • Discover which TLS versions and features clients (DC++ and variations, ncdc, Jucy, etc) and hubs (ADCH++, uHub, etc) support. If they use standard libraries, they probably all (except Jucy) already support TLS 1.2 with AES-GCM depending on how they configure their TLS libraries. Depending on results, one might already safely simply disable SSL 3.0 and TLS 1.0 in each such client and hub and prioritize TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 or a similar ciphersuite so that it finds use when mutually available. If this proves possible, the the ADCS extension specification should be updated to reflect this.