DC++ Will Require SSE2

The next version of DC++ will require SSE2 CPU support.

This represents no change for the 64-bit builds since x86-64 includes SSE2. The last widely used CPUs affected, lacking SSE2 support, are Athlon XPs the last of which were released in 2004. As such, not just DC++ but Firefox 49, Chrome on both Windows and Linux since 2014, IE 11 since 2013, and Windows 8 since 2012 all require SSE2. Empirically, Firefox developers found that just 0.4% of their users as of this May lacked SSE2 and Chrome developers measured 0.33% of their Windows stable population lacking SSE2 in 2014, suggesting that to the extent not requiring SSE2 imposes non-negligible development or runtime cost, one might find increasingly thin support for avoiding it.

A straightforward advantage SSE2 provides derives from non-SIMD 32-bit x86 supporting only arguably between 6 and 8 general-purpose 32-bit registers. SSE2 in 32-bit environments adds 8 additional registers, substantially increasing x86’s architecturally named registers.

Furthermore, these additional registers in 32-bit x86 are 128-bit, allowing 64-bit and 128-bit memory moves in single instructions, rather than multiple 32-bit mov instructions, which also enables each reg/mem move to more efficiently align on larger boundaries. Similarly, access to 64-bit arithmetic and comparisons on x86 allow native handling of all those 64-bit arithmetic, logic, and comparison operations which show up both in the Tiger hash code (designed for 64-bit CPUs and it shows) and the 64-bit file position handling pervasive in DC++.

Finally, there’s substantial use of 2-wide SIMD, especially when common patterns such as

foo += bar;
baz += foobar;

via SSE2 packed integer addition (e.g., paddq) or

foo -= bar;
baz -= foobar;

appear, using packed integer subtraction (e.g., psubq).

Putting all this together in one of the more dramatic improvements in generated code quality as a result of this change, one can watch as enabling SSE2 automatically transforms part of TigerHash::update(…) from:

193:dcpp/TigerHash.cpp **** 	}
movl	168(%esp), %edi	 # %sfp, x7
movl	172(%esp), %ebp	 # %sfp, x7
movl	440(%esp), %ebx	 # %sfp, x1
movl	444(%esp), %esi	 # %sfp, x1
movl	%edi, %eax	 # x7, tmp2058
movl	412(%esp), %edx	 # %sfp, x0
xorl	$-1515870811, %eax	 #, tmp2058
movl	%eax, 488(%esp)	 # tmp2058, %sfp
movl	%ebp, %eax	 # x7, tmp2059
movl	%ebx, %ecx	 # x1, tmp2062
xorl	$-1515870811, %eax	 #, tmp2059
movl	%esi, %ebx	 # x1, tmp2063
movl	156(%esp), %esi	 # %sfp, x2
movl	%eax, 492(%esp)	 # tmp2059, %sfp
movl	408(%esp), %eax	 # %sfp, x0
subl	488(%esp), %eax	 # %sfp, x0
sbbl	492(%esp), %edx	 # %sfp, x0
xorl	%eax, %ecx	 # x0, tmp2062
movl	%ecx, 384(%esp)	 # tmp2062, %sfp
xorl	%edx, %ebx	 # x0, tmp2063
movl	384(%esp), %edi	 # %sfp, x1
movl	%ebx, 388(%esp)	 # tmp2063, %sfp
movl	152(%esp), %ebx	 # %sfp, x2
movl	388(%esp), %ebp	 # %sfp, x1
movl	%edi, %ecx	 # x1, tmp2066
notl	%ecx	 # tmp2066
addl	%edi, %ebx	 # x1, x2
movl	%ecx, 496(%esp)	 # tmp2066, %sfp
movl	%ebp, %ecx	 # x1, tmp2067
adcl	%ebp, %esi	 # x1, x2
notl	%ecx	 # tmp2067
movl	%ebx, (%esp)	 # x2, %sfp
movl	%ecx, 500(%esp)	 # tmp2067, %sfp
movl	496(%esp), %ecx	 # %sfp, tmp1093
movl	%esi, 4(%esp)	 # x2, %sfp
movl	500(%esp), %ebx	 # %sfp,
movl	(%esp), %esi	 # %sfp, x2
movl	4(%esp), %edi	 # %sfp,
shldl	$19, %ecx, %ebx	 #, tmp1093,
movl	%esi, %ebp	 # x2, tmp2069
movl	460(%esp), %esi	 # %sfp, x3
sall	$19, %ecx	 #, tmp1093
xorl	%edi, %ebx	 #, tmp2070
xorl	%ecx, %ebp	 # tmp1093, tmp2069
movl	%ebp, 504(%esp)	 # tmp2069, %sfp
movl	%ebx, 508(%esp)	 # tmp2070, %sfp
movl	456(%esp), %ebx	 # %sfp, x3
subl	504(%esp), %ebx	 # %sfp, x3
sbbl	508(%esp), %esi	 # %sfp, x3
movl	%ebx, %edi	 # x3, x3

to something of comparative beauty:

193:dcpp/TigerHash.cpp **** 	}
movl	80(%esp), %eax	 # %sfp, tmp1091
movl	84(%esp), %edx	 # %sfp,
xorl	$-1515870811, %eax	 #, tmp1091
xorl	$-1515870811, %edx	 #,
movd	%eax, %xmm0	 # tmp1091, tmp1885
movd	%edx, %xmm1	 #, tmp1886
punpckldq	%xmm1, %xmm0	 # tmp1886, tmp1885
psubq	%xmm0, %xmm7	 # tmp1885, x0
movdqa	96(%esp), %xmm1	 # %sfp, tmp2253
pxor	%xmm7, %xmm1	 # x0, tmp2253
movdqa	%xmm1, %xmm0	 # x1, tmp1843
psrlq	$32, %xmm0	 #, tmp1843
movd	%xmm1, %edx	 # tmp21, tmp2105
notl	%edx	 # tmp2105
movd	%xmm0, %eax	 #, tmp2106
notl	%eax	 # tmp2106
paddq	%xmm1, %xmm6	 # x1, x2
movl	%edx, 192(%esp)	 # tmp2105, %sfp
movdqa	%xmm1, %xmm3	 # tmp2253, x1
movl	%eax, 196(%esp)	 # tmp2106, %sfp
movl	192(%esp), %eax	 # %sfp, tmp1093
movl	196(%esp), %edx	 # %sfp,
shldl	$19, %eax, %edx	 #, tmp1093,
sall	$19, %eax	 #, tmp1093
movd	%edx, %xmm1	 #, tmp1888
movd	%eax, %xmm0	 # tmp1093, tmp1887
punpckldq	%xmm1, %xmm0	 # tmp1888, tmp1887
pxor	%xmm6, %xmm0	 # x2, tmp1094
psubq	%xmm0, %xmm5	 # tmp1094, tmp2630

The register overflow spill/fills in the non-SSE version from %eax to 492(%esp) back to %edx three instructions later to enable %eax to be reused; from %ecx to 500(%esp) back to %ebx in another three instructions to enable 496(%esp) to be left-shifted a few instructions later; and between %edi, %ecx, and that same 496(%esp) because evidently, there’s not enough space to sort both %ecx and notl %ecx simultaneously with a half-dozen GPRs.

Virtually no spills/fills remain because there are now ample registers; the movdqa from 96(%esp) to %xmm1 replaces multiple 32-bit movl instructions; the ugly addl/adcl and subl/sbbl pairs emulating 64-bit addition and subtraction using 32-bit arithmetic disappear in lieu of natively 64-bit arithmetic; and each pair of 32-bit xorl instructions becomes a single pxor.

While TigerHash.cpp especially shows off SSE2’s advantage over i686-generation 32-bit x86, each of these improvements appears sprinked in thousands of places around DC++, in function prologues, every time certain Boost template functions shows up, every time _builtin_memcpy is called, and in dozens of other mundane yet common situations.

Setting up multiple-subdomain HTTPS with nginx, acme-tiny, and Lets Encrypt

This guide briefly describes aspects of setting up nginx and acme-tiny to automatically register and renew multiple subdomains.

acme-tiny (Debian, Ubuntu, Arch, OpenBSD, FreeBSD, and Python Package Index) provides a more verifiable and more easily customizable than the default Let’s Encrypt client. This proves especially useful in less mainstream contexts where either the main client works magically or fails magically, but tends to offer little between those two outcomes.

The first step is to create a multidomain CSR which informs Let’s Encrypt of which domains it should provide certificates for. When adding or removing subdomains, this needs to be altered:
# OpenSSL configuration to generate a new key with signing requst for a x509v3
# multidomain certificate
# openssl req -config bla.cnf -new | tee csr.pem
# or
# openssl req -config bla.cnf -new -out csr.pem
[ req ]
default_bits = 4096
default_md = sha512
default_keyfile = key.pem
prompt = no
encrypt_key = no

# base request
distinguished_name = req_distinguished_name

# extensions
req_extensions = v3_req

# distinguished_name
[ req_distinguished_name ]
countryName = "SE"
stateOrProvinceName = "Sollentuna"
organizationName = "Direct Connect Network Foundation"
commonName = "dcbase.org"

# req_extensions
[ v3_req ]
# https://www.openssl.org/docs/apps/x509v3_config.html
subjectAltName = DNS:dcbase.org,DNS:www.dcbase.org

Then, when one is satisfies with one’s changes:
openssl req -new -key domain.key -config ~/dcbase_openssl.cnf > domain.csr
in the appropriate directory to regenerate a CSR based on this configuration. One does not have to change this CSR unless the set of subdomains or other information contained within also changes. Simply renewing certificates does not require regenerating domain.csr.

Having created a CSR, one then needs to ensure Let’s Encrypt knows where to find it. The ACME protocol Let’s Encrypt uses specifies that this should be /.well-known/acme-challenge/ and per acme-tiny’s documentation:
# https://github.com/diafygi/acme-tiny#step-3-make-your-website-host-challenge-files
location /.well-known/acme-challenge/ {
alias $appropriate_challenge_location;

allow all;
log_not_found off;
access_log off;

try_files $uri =404;

Where this needs to be accessible via ordinary HTTP, port 80, to work most conveniently, even if the entire rest of the site is HTTPS-only. Furthermore, this needs to hold even for otherwise dynamically generated sites — e.g., http://build.dcbase.org/.well-known/acme-challenge/, http://builds.dcbase.org/.well-known/acme-challenge/, http://archive.dcbase.org/.well-known/acme-challenge/, and http://forum.dcbase.org/.well-known/acme-challenge/ would all need to point to that same challenge location, even if disparate PHP CMSes generate each or they ordinarily redirect to other sites (such as Google Drive).

If this works, then one sees:
Parsing account key...
Parsing CSR...
Registering account...
Already registered!
Verifying dcbase.org...
dcbase.org verified!
Verifying http://www.dcbase.org...
http://www.dcbase.org verified!
Signing certificate...
Certificate signed!

When running acme-tiny.

Once this works reliably, the whole process should be run automatically as a cron job often enough to stay ahead of Let’s Encrypt’s 90-day cycle. However, one cannot renew too often:

The main limit is Certificates per Registered Domain (20 per week). A registered domain is, generally speaking, the part of the domain you purchased from your domain name registrar. For instance, in the name http://www.example.com, the registered domain is example.com. In new.blog.example.co.uk, the registered domain is example.co.uk. We use the Public Suffix List to calculate the registered domain.

If you have a lot of subdomains, you may want to combine them into a single certificate, up to a limit of 100 Names per Certificate. Combined with the above limit, that means you can issue certificates containing up to 2,000 unique subdomains per week. A certificate with multiple names is often called a SAN certificate, or sometimes a UCC certificate.

Once Let’s Encrypt certificate renewal’s configured, Strong Ciphers for Apache, nginx and Lighttpd and BetterCrypto provide reasonable recommendations, while BetterCrypto’s Crypto Hardening guide discusses more deeply rationales behind these choices.

Finally, SSL Server Test and Analyse your HTTP response headers offer sanity checks for multiple successfully secured subdomains served by nginx over HTTPS using Let’s Encrypt certificates.

XML parsing of file lists

Many DC clients (and other software) have their own XML parser for parsing XML files and content. This means the parsers can be heavily specialized for performance (in the case of large file lists for instance) compared to just using a “standard parser” (i.e. one that has been used in multiple projects). However, building one’s own parser also means that the parser may be incorrect to a far greater extent, thereby increasing the risk that a malicious party (e.g. the one sending the file list) may try to remotely crash the receiver by sending incorrect files. Beyond the obvious concern for network security, clients may incorrectly allow files to be read or read incorrect data within those files.

I have compiled a list of potential errors that a file list may have, and generated file lists for each of those occurences. These file lists were then opened in DC++ (0.851) and verified to see what happened. This test should likely be done with all clients that don’t derive their own XML parsing with DC++’s (i.e., all DC++-mods will likely follow the below pattern).

I created multiple file lists (downloadable here), based on this file, generated with this C# snippet. Here are the results (Microsoft Office Excel file).

A summary of the results;

  • DC++ will parse invalid data (e.g. omission of data) and sometimes replace the faulty data with “something sensible”, although this is almost in all cases wrong.
  • In most cases where it is an invalid XML document, DC++ will ignore those sections or ignore the file altogether (this is good).
  • DC++ will not crash on invalid data.

Most of the issues found can be solved by performing a XML-sanitation check before reading the document, by validating against the XSD. DC++’s XML parser does not have any XSD validation, so it couldn’t be done at this point anyway, but should such a validation be implemented, it will cause a (small or big depends on the source file list) performance hit.

While I didn’t test it, parsing of the XML list for version.xml and any hublists will likely have the same issue(s) as mentioned above. At least we won’t crash DC++.

If someone has other software that they can test this with, please feel free to do so and let me know so I can update the Excel sheet. It’s also possible that the resulting files are named incorrectly (e.g. by not requiring a CID in the file name), so just run the snippet code.

(Note: The files in this post may have a file name such as “foo-zip.pdf”, and it is because the file is actually a zip file but this blog software couldn’t handle that, so just change the file-extension to the appropriate one.)

Organization meeting scheduled for 2016-01-10

An upcoming meeting for the Direct Connect Network Foundation is scheduled for 2016-01-10, at 19.00 CET. In the meeting, we will go over items from the past year and what we shall do in the coming year. The agenda will be according to the By-laws (https://www.dcbase.org/bylaws/) § 10.

You can see the previous meeting(s) here: https://www.dcbase.org/meetings/ so you can get a feel for the structure etc of the meeting. Feel free to suggest ways to improve the meeting process.

If you have additional items you wish the meeting to address (that people should think about beforehand), please post in this forum thread.

DCNF Resources

A new resources page at the dcbase.org site has now been created, where you can see all articles etc that relate to DC. Also, the page contains court cases where DC is involved in some way.

Addressing DC++’s service provider, SourceForge

There has been a lot of discussion regarding changes to SourceForge’s hosting practices [1][2][3]. There are two things that SourceForge have done; created an opt-in “revenue program” and begun taking over old or non-updating (or even non-existant) projects.

The opt-in program is DevShare and allow developers (project administrators) to receive revenue based on modified installers. FileZilla is one of the major projects that have done so. The modified installers embed additional programs, thereby acting as ad services. The developers can choose which type of ads/programs are suggested, although they cannot say exactly which may or may not show up. The developers do nothing extra to accomodate this feature. The difference, as noted by Ghacks.net is that SourceForge will change the appearance of the download page to highlight the ad-specific one whilst still having a link to the other one (albiet not as easy to see).

The DC++ administrators were sent an e-mail from SourceForge regarding the DevShare program whether DC++ should or should not also opt-in for the DevShare program. The DC++ administrators declined this offer as the additional revenue was not needed for any basic operation and it felt it might violate the integrety of the installers. This was just as the DevShare program had been announced. No further action for this has been taken and no additional requests from SourceForge have been made.

The second part of SourceForge’s changes are that of modifications to old projects or completely taking over the projects (or even creating them in the first place). This can be seen with e.g. GIMP. As long as DC++ does not become stale or otherwise non-active this will never affect DC++.

All of this have caused us (the developers of DC++) to review our stance with SourceForge. Some facts before I continue:

  • SourceForge have hosted DC++ (and other DC related software) since its inception (i.e. for several years) without any problems in this area.
  • SourceForge provides stable code repositories and website resources. Although the speed of SourceForge network may be questionable, it is able to withstand hard DDoS:ing.
  • DC++ hosts the source code repository, file downloads and website resources on SourceForge.
  • There are other DC related projects that are also hosted on SourceForge.
  • DC++ is considered a “valued projects” in that it has appeared on SourceForge’s project of the month as well as the DevShare offer. DC++ is also among the high-download projects at SourceForge.
  • DC++ will not be directly affected by DevShare as we have not accepted such an offer. (I must stress it is an opt-in offer.)
  • DC++ will not be directly affected by the abondoned projects changes as DC++ continue to be updated and will not qualify for such a change.
  • At least one browser plugin, uBlock, have started to block SourceForge as a whole, thereby potentially restricting users from accessing DC(++) resources.

So, in light of all of this, we have begun to look into other project repositories:

  • Launchpad – Already hosts other features for DC++, such as the bug tracker, but does not provide a sufficient code repository (Bazaar is near-dead), somewhat cumbersome download capabilities and no true website support.
  • Github – No real website support. This is more suited for just the code repository than a full-on project repository. We are more likely to host the source code on Github and proxy that through another service.
  • Bitbucket – Restricts number of contributors, no website support. poy suggests strongly that we do not move to Bitbucket.
  • Google Code – Recently closed registration of new projects. (Lacked anyway certain features.)

There are other project repositories available, although no one of us have experience with most of them.

It is important for us to move forward with this, so here is our plan forward:

  • Move (or at least parts of) source code repositories, websites and download facilities to our own hosting facilities. E.g., Rhodecode is being set up to address this for source code.
  • DC++ will continue to use SourceForge as a minimum as its backup service provider. It is important to note that we have had a relatively pleasant experience with SourceForge – as project administrators.
  • We will continue to monitor any further development in SourceForge management and changes.

We welcome suggestions, both from SourceForge and others, in how we can move forward.

Hardening DC++ Cryptography: TLS, HTTPS, and KEYP

BEAST, CRIME, BREACH, and Lucky 13 together left DC++ with no secure TLS support. Since then, the triple handshake attack, Heartbleed, POODLE for both SSL 3 and TLS, FREAK, and Logjam have multiplied hazards.

Fortunately, in the intervening year and a half, in response:

  • poy introduces direct, encrypted private messages in DC++ 0.830.
  • DC++ 0.840 sees substantial, wide-ranging improvements in KEYP and HTTPS support from Crise, anticipating Google sunsetting SHA1 by several months and detecting man-in-the-middle attempts across both KEYP and HTTPS.
  • OpenSSL 1.0.1g, included in DC++ 0.842, fixes Heartbleed.
  • DC++ 0.850 avoids CRIME and BREACH by disabling TLS compression; avoids RC4 vulnerabilities by removing support for RC4; prevents BEAST by supporting TLS 1.1 and 1.2; mitigates Lucky 13 through preferring AES-GCM ciphersuites; removes support for increasingly factorable 512-bit and 1024-bit DH and RSA ephemeral TLS keys; and with all but one ciphersuite, AES128-SHA, deprecated and included for DC++ pre-0.850 compatibility, uses either DHE or ECDHE ciphersuites to provide perfect forward secrecy, mitigating any future Heartbleed-like vulnerabilities.
  • DC++ 0.851 uses a new OpenSSL 1.0.2 API to constrain allowed elliptic curves to those for which OpenSSL provides constant-time assembly code to avoid timing side-channel attacks.

These KEYP, TLS, and HTTPS improvements have not only fixed known weaknesses, but prevent DC++ 0.850 and 0.851 from ever having been vulnerable to either FREAK or Logjam. As with perfect forward secrecy, these changes increase DC++’s ongoing security against yet-unknown cryptographic developments.

The upcoming version switches URLs in documentation, in menu items, and of the GeoIP downloads from HTTP to HTTPS. While these changes do not and cannot prevent attacks perfectly, it should now provide users with improved and still-improving cryptographic security for the benefit of all DC++ users.