XML parsing of file lists

Many DC clients (and other software) have their own XML parser for parsing XML files and content. This means the parsers can be heavily specialized for performance (in the case of large file lists for instance) compared to just using a “standard parser” (i.e. one that has been used in multiple projects). However, building one’s own parser also means that the parser may be incorrect to a far greater extent, thereby increasing the risk that a malicious party (e.g. the one sending the file list) may try to remotely crash the receiver by sending incorrect files. Beyond the obvious concern for network security, clients may incorrectly allow files to be read or read incorrect data within those files.

I have compiled a list of potential errors that a file list may have, and generated file lists for each of those occurences. These file lists were then opened in DC++ (0.851) and verified to see what happened. This test should likely be done with all clients that don’t derive their own XML parsing with DC++’s (i.e., all DC++-mods will likely follow the below pattern).

I created multiple file lists (downloadable here), based on this file, generated with this C# snippet. Here are the results (Microsoft Office Excel file).

A summary of the results;

  • DC++ will parse invalid data (e.g. omission of data) and sometimes replace the faulty data with “something sensible”, although this is almost in all cases wrong.
  • In most cases where it is an invalid XML document, DC++ will ignore those sections or ignore the file altogether (this is good).
  • DC++ will not crash on invalid data.

Most of the issues found can be solved by performing a XML-sanitation check before reading the document, by validating against the XSD. DC++’s XML parser does not have any XSD validation, so it couldn’t be done at this point anyway, but should such a validation be implemented, it will cause a (small or big depends on the source file list) performance hit.

While I didn’t test it, parsing of the XML list for version.xml and any hublists will likely have the same issue(s) as mentioned above. At least we won’t crash DC++.

If someone has other software that they can test this with, please feel free to do so and let me know so I can update the Excel sheet. It’s also possible that the resulting files are named incorrectly (e.g. by not requiring a CID in the file name), so just run the snippet code.

(Note: The files in this post may have a file name such as “foo-zip.pdf”, and it is because the file is actually a zip file but this blog software couldn’t handle that, so just change the file-extension to the appropriate one.)

Organization meeting scheduled for 2016-01-10

An upcoming meeting for the Direct Connect Network Foundation is scheduled for 2016-01-10, at 19.00 CET. In the meeting, we will go over items from the past year and what we shall do in the coming year. The agenda will be according to the By-laws (https://www.dcbase.org/bylaws/) § 10.

You can see the previous meeting(s) here: https://www.dcbase.org/meetings/ so you can get a feel for the structure etc of the meeting. Feel free to suggest ways to improve the meeting process.

If you have additional items you wish the meeting to address (that people should think about beforehand), please post in this forum thread.

DCNF Resources

A new resources page at the dcbase.org site has now been created, where you can see all articles etc that relate to DC. Also, the page contains court cases where DC is involved in some way.

Addressing DC++’s service provider, SourceForge

There has been a lot of discussion regarding changes to SourceForge’s hosting practices [1][2][3]. There are two things that SourceForge have done; created an opt-in “revenue program” and begun taking over old or non-updating (or even non-existant) projects.

The opt-in program is DevShare and allow developers (project administrators) to receive revenue based on modified installers. FileZilla is one of the major projects that have done so. The modified installers embed additional programs, thereby acting as ad services. The developers can choose which type of ads/programs are suggested, although they cannot say exactly which may or may not show up. The developers do nothing extra to accomodate this feature. The difference, as noted by Ghacks.net is that SourceForge will change the appearance of the download page to highlight the ad-specific one whilst still having a link to the other one (albiet not as easy to see).

The DC++ administrators were sent an e-mail from SourceForge regarding the DevShare program whether DC++ should or should not also opt-in for the DevShare program. The DC++ administrators declined this offer as the additional revenue was not needed for any basic operation and it felt it might violate the integrety of the installers. This was just as the DevShare program had been announced. No further action for this has been taken and no additional requests from SourceForge have been made.

The second part of SourceForge’s changes are that of modifications to old projects or completely taking over the projects (or even creating them in the first place). This can be seen with e.g. GIMP. As long as DC++ does not become stale or otherwise non-active this will never affect DC++.

All of this have caused us (the developers of DC++) to review our stance with SourceForge. Some facts before I continue:

  • SourceForge have hosted DC++ (and other DC related software) since its inception (i.e. for several years) without any problems in this area.
  • SourceForge provides stable code repositories and website resources. Although the speed of SourceForge network may be questionable, it is able to withstand hard DDoS:ing.
  • DC++ hosts the source code repository, file downloads and website resources on SourceForge.
  • There are other DC related projects that are also hosted on SourceForge.
  • DC++ is considered a “valued projects” in that it has appeared on SourceForge’s project of the month as well as the DevShare offer. DC++ is also among the high-download projects at SourceForge.
  • DC++ will not be directly affected by DevShare as we have not accepted such an offer. (I must stress it is an opt-in offer.)
  • DC++ will not be directly affected by the abondoned projects changes as DC++ continue to be updated and will not qualify for such a change.
  • At least one browser plugin, uBlock, have started to block SourceForge as a whole, thereby potentially restricting users from accessing DC(++) resources.

So, in light of all of this, we have begun to look into other project repositories:

  • Launchpad – Already hosts other features for DC++, such as the bug tracker, but does not provide a sufficient code repository (Bazaar is near-dead), somewhat cumbersome download capabilities and no true website support.
  • Github – No real website support. This is more suited for just the code repository than a full-on project repository. We are more likely to host the source code on Github and proxy that through another service.
  • Bitbucket – Restricts number of contributors, no website support. poy suggests strongly that we do not move to Bitbucket.
  • Google Code – Recently closed registration of new projects. (Lacked anyway certain features.)

There are other project repositories available, although no one of us have experience with most of them.

It is important for us to move forward with this, so here is our plan forward:

  • Move (or at least parts of) source code repositories, websites and download facilities to our own hosting facilities. E.g., Rhodecode is being set up to address this for source code.
  • DC++ will continue to use SourceForge as a minimum as its backup service provider. It is important to note that we have had a relatively pleasant experience with SourceForge – as project administrators.
  • We will continue to monitor any further development in SourceForge management and changes.

We welcome suggestions, both from SourceForge and others, in how we can move forward.

Donations for DCNF (April 2015)

A big thank you to the following people who donated to the PayPal account for DCNF (and DC++). Your money will be spent on server and domain upkeep. We will be looking for a way for donators to receive something back.

Valentin B.
R P H.
Åke S.
Patrick H.
Alan D.

The organization now has 293,04 Euros raised from member fees and the donators above.

Direct Connect Network Foundation

In January 2015, a non-profit organization was set up, called Direct Connect Network Foundation (DCNF). The organization aims to provide information and resources for developers and users of Direct Connect. The website dcbase.org was chosen to be the main site for the organization.

DCNF is an actual registered organization in Sweden, with government number 802492-9716. See also the by-laws, and the annual meeting notes.

To become a member, simply donate to the PayPal account and make a note in the forum.

I or the others on the board will periodically make a note here about anyone who donates to the organization.

Team organization structure proposal

Direct Connect is very loosly organized and has always been. There are a few people that control resources, be it websites, software or hubs. However, the idea of a strict hiearchy is by many appalling and that is why there is no designated ‘boss’ person. Indeed, simply appointing someone is directly not possible because of the structure of developers and projects. There is no one that decides what to do or what must be done. Everyone is a volunteer and pitch in when they can and want to. It is why I think that we should not focus on a person that ‘delegates’ tasks to people, rather we should focus on what we want to accomplish and what we want to do in the community.

That is why I propose a new type of organization breakdown, focusing on work area and interest area, ‘teams’ or ‘workgroups’. The intent with the teams is to provide a clear message to others what we’re working on and how they can help for that group. For example, a person that is interested in security can provide information in the ‘security team’ whilst not feeling a requirement to participate with the interoperability of software. The point of the team is not to say “these people have to produce content for this team”, but rather “these people know some stuff that pertain to this subject” and serves as an encouragement for them to provide content (documentation or software etc). The purpose with all teams must be to better the DC community in some way. My hope is that participation in a team is meant to spur people into working on that team’s issues and future (i.e. a commitment to oneself). Of course, anyone can submit information and help a team. A clear intent on what the community or team needs can allow new people to help with the community.

My proposal is also to have a team leader that tries to encourage other participants in the team to provide data. Do note that I do not mean “hey, do that” but rather “hey, I think you have knowledge about a subject, could you check out if it’s something you’d be interested in and write anything on the subject”.

By the way, teams aren’t meant to be static: they can change and new ones can be introduced and old ones can be removed.

In addition to the team leader, I propose that teams have a cyclic report on what been done in the past X. For example, if there has been lots of stuff going on with security (be it documentation or even discussion) in the past week, it should be lifted in the blog and forum, so we can have summaries of decisions or interesting avenues.

Teams I have thought of:

  • Security – Overarching adressing security
  • Cryptography – A subteam for security that focuses on math and cryptography itself
  • Software – Overarching addressing software in terms of clients, hubs and others items
  • Interoperability – A subteam to software that addresses interoperability issues that arise between software
  • Protocol – Addresses the needs of the protocols both in terms of software support but also in terms of documentation and standardization
  • Infrastructure – Addresses needs in the infrastructure of projects

I have only created an intial description of security and cryptography (below) but I will add more later on. Also, I’ve created forums for each of those items at dcbase.org

Team: Security

Purpose: Investigate and publish items relating to security.

The team will gather and inform the community on any security related issue that arises directly within DC or if external data that may be of interest for the DC community.

The following are items addressed:

  • Auditing of attack vectors:
  • Protocol messages
    • Content passed over implementation boundaries
      • File list
      • Hublist
      • Other messages that others must react to (e.g., chat messages relating to magnet URI parsing)
    • External sources of information (URIs)
    • Certificate management
  • Auditing of protocols:
    • Message structure and content
    • Providing information for protocol parsers (wireshark, profilers, sniffers, packet-shapers)
  • Auditing of implementations:
    • Spam/Flood protection mechanisms (and other DoS-related content)
    • XML parsers (file list, hublist, zip-bombs)
    • Protocol message parsers
    • Hammering of hub after kick (reconnect timer etc)
    • External library implementations (SSL, BZIP, XML etc)
    • Source references (RF field etc)
  • Other points for audit:
    • Hublist retrieval (distributed etc)
    • version.xml retrieval (distributed etc)
    • Audit of DC architecture
    • Nature of data broadcasts
    • Hub controls content
    • Hub is fully trusted
    • Distribution of hub/dns etc
  • Post and review of CVEs
  • Security companies contact
  • Audit and review external security related reports that relate to DC

Team: Cryptography

Purpose: Investigate and publish items relating to cryptography.

The team will gather and inform the community on any cryptographic issue that arises directly within DC or if external data that may be of interest for the DC community.

The team will provide a description of the cryptographic solutions currently employed. The team will also provide information about cryptographically related content such as hashing, as the latter may benefit from analysis of the former.

Functionality that will be included as protocol extensions will be discussed with the appropriate protocol team.

The team will inform the community as clearly as possible, providing necessary information for cryptologists as well as for the lay person. A report of the current discussions and results will be published on a regular basis (monthly if there have been any new content).

The crypto-team is related to the security team. However, the former’s job is to focus any cryptography while the latter is focused on general security related content. As such, the security team will only address cryptography based on the crypto-team’s reports.

Software that influence or validates the cryptography functionality will also be provided.

The following are items addressed:

  • Certificate management for client – client and client – hub connections
  • Hash management
    • Tiger Hash
    • Potential successors to Tiger (Tree) Hash: SHA-2, SHA-3
    • Protocol support with ADC and NMDC
    • Implementation support
    • Sharing hash databases

Old DC++ forums restored

DC++ used to have a forum where people would receive help, give suggestions on improvements and discuss protocol features. This forum migrated from SourceForge to the domain dcpp.net (now defunct, don’t use it). The entire site was then attacked and the forum was put offline. This was in 2007, and no forum has yet replaced the old DC++ forum as a whole.

The DCBase.org project is put in place to harmonize different content for Direct Connect. As such, the project host the DCBase forum (previously ADCPortal) where today’s discussions for (primarily but not exclusively) ADC development lies. However, it is also important to look in the past and what has been done and the discussions that were held then. As such, the old DC++ forum is now restored. This forum is now set up similar to the old forum, and the database is migrated as such. The entire forum is locked down (until someone want DC++ to regain that as a forum) so you can’t post anything.

I will probably create posts in the future where the old forum is referenced (in particular NMDC and ADC development and protocol discussions).

If anyone else have a forum, wiki or site that is now defunct, let me know. It is important that the content that we once produced isn’t completely lost.

ADC 1.0.2 released

A new version of the base ADC protocol is now released, version 1.0.2.

The document may look slightly different, especially with the addition of commands in the table of contents. The document itself (its content) is not that much modified (except for state management, see below).

An important part of the document is a new addition, a terminology section where difficult words or phrases are specified. This list is obviously meant to be much more than mere four items but it’s at least a start.

The STA previously didn’t specify who had the responsibility for action when a STA is sent with the severity Fatal (2). This has always been the originator of the message, and this is now explicit.

The state management is re-worded and restructured. All information about state has now been moved to its own section, allowing an implementator a quick and comprehensive overview on the requirements for the state management. Previously, the state management was sprinkled all across the document, making it difficult for a person to properly implement a state machine in their software. This has meant that state management information is now removed from each command (only thing remaining is an explicit note about in which state each command is used). Certain information is also clarified, such as what to call the parties in a client to client connection (“client party” and “server party”) and state transitions.

Version 1.0.1 of ADC was also ambiguous in state management when it came to one important part: who shall send the first INF in a client to client connection. This is important because it has the ramification that it makes multi-share difficult. The current specification is now not ambiguous, and makes the following stance: the first party to send the INF is the connecting party (“client party”). No known implementation suffer from this explicit note, as all manage this scenario just fine. Basically, this change means that multiple shares (per hub) may not be too far off.

The new version also brings in a new time where we can safely and appropriately update the base document. There was an announcement period when the document was going to be released which meant that developers have had time to adjust their software and give feedback in a timely manner.

DCDev archives published

I previously requested the DCDev archives, a repository of posts from DC developers. I was able to acquire the repository and it is now posted on DCBase.

There’s a lot of stuff in the posts, especially the initial parts with ADC. Enjoy.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”