Request for video about attacks spawned from DC

In a previous post, the security on DC was discussed as well as an attack on hublist.org, the then largest hublist for DC. This attack, as well as a generic view on attack on the web, was covered on a Finnish program (translated to Enligh as “MOT: Invisible plane hijacking”). This was broadcasted a few years ago (see the link) but wasn’t uploaded to their site.

My intention was to get this video and upload it to the Youtube page so everyone can view it. I have aquired a copy of it from a video recording, but I cannot simply (re)broadcast it because of obvious legal reasons.

This is a request for anyone who are able to come in contact with YLE.fi (the broadcasting company) and are able to persuade them to either upload the video or allow us to upload it.

I have previously made a request to YLE but was denied due to the fact that the video contains a trailer from the movie Die Hard. YLE claim they do not have the necessary resources to edit this film. (I was denied the ability to do the edits myself.)

YLE sales are in charge of the video and I can provide details of the person(s) I’ve been in contact with, if you think you have a better chance.

The road ahead: Post summary

This series I called “the road ahead” is simply my look into how we can improve and continue to develop the DC community and system. The posts should be readable as they are but can of course be read in conjunction with each other. There’s no priority in the suggestions I make. Instead, everyone should look at the posts and think about what small part they can do to help accomplish some of the suggestions (or if you have some of your own).

The posts:

  • Security and Integrity – About security and integrity in DC and how it affects us
  • Protocols – What the protocols should strive for and the tools the protocol community should provide
  • Competition – The different challenges that we face in Direct Connect in the battle of users
  • Software – What improvements we can make to software across the board
  • Widening the base – How the different information outlets we have can increase DC knowledge in the world
  • Infrastructure – How the infrastructure in DC can improve

The road ahead: Infrastructure

Direct Connect rests upon three parts: clients, hubs and hub lists. If there are no clients available then the community becomes stagnant and the appeal for new users diminishes.  Hubs must be available, else there’s no way for clients to connect to each other. The hubs provide the very community we have. The hublists provide clients with a sense of direction where there are other hubs with users.

These three parts are, in my mind, equally important and it is imperative that we have them in our infrastructure.

Direct Connect has had problems when hublists go down or becomes outdated etc, so the infrastructure we have should manage these types of problems.

The Direct Connect community is concentrated around the ability to provide a straight forward file sharing service all the while having a talk and discussion forum. These two parts are why DC is so great: you can have a discussion while sharing content that you and your peers like.

The infrastructure of DC should allow users to interact while they are not necessarily using a normal client. A website can serve as a client where you simply have a tab for chat and one for downloading etc.

The infrastructure should give users the ability to browse any type of software or discussion topic around DC, and this is the intent with DCBase.org. The idea is to create a central source for DC content. If we can gather information about each client in one place, users don’t need to go to different sites that have different appearances. users can go to a single site and if they choose so, they can continue on to the main page of the mentioned software or item of interest. The central source needn’t own each client or have the source code of each client, it simply needs to be able to refer to them.

Basing content around one place will also help making information unambiguous and not redundant. This infrastructure idea should allow developers control over their system while the face of DC can be unanimous.

An interesting aspect may be to create, say, a real non-profit organization. The organization could be recognized by a government, allowing the validity of DC to increase and potentially draw some attention from new users. The organization can serve as an ‘umbrella’ for donations, for example by distributing its donations to developers and their own infrastructure. It could also open up the possibility of receiving government funds to stabilize the DC infrastructure.

The infrastructure can provide websites, build and download repositories. It can allow specialized hubs dedicated to support and development.

The road ahead

The future is to try and merge many sources of information that needn’t be separated. For instance, the NMDC and ADC project can simply be a general “DC protocol project”, with minor branching. Whenever we can join resources, it will mean that there’s less time managing the multitude of sources and more time doing what we want — the further development of DC. If we can merge different functionality of software or if we can provide a clear interface for those interested in DC content, all the better.

The road ahead: Widening the base

We have some statistics on DC usage and while the interest has diminished somewhat over the years, there’s still a lot of people who use DC on a daily basis. Many believe that we should encourage users to spread word about DC in various places. The idea is then that other people who aren’t (regular) DC users become more interested.

A couple of these initiatives have already taken place with Twitter, Youtube and Reddit. No one has created a Facebook account for DC content and I frankly don’t think we need one at this time.

The intent with the Twitter account is that we should be able to get out information that isn’t worthy of a large blog post or article. The fact that one is restricted by 140 characters only makes it interesting.

The Youtube channel is there to provide new users of the software a simple yet effective way of going through a piece of software. The idea is that we can upload videos of other software instruction manual, and the videos doesn’t even have to be in English. If people want to provide translations and videos in another language, just let me know.

The subreddit DirectConnect serves as a point of interest for those who are already on Reddit for other reasons. Reddit provides a way of publishing information that needn’t be restricted to the confines of the forum or likewise. The subreddit can prove to be a great source to gather new users.

The road ahead

The idea is that more and more users should be made aware of Direct Connect and what it can offer. Users should be able to go to their own source of media and be able to pick up DC information if they choose to. Any time someone provides DC information in a new way or in a new place, then the base of DC widens.

The road ahead: Software

There’s a wide range of different applications that are for Direct Connect. There’s a bunch of clients, each with their own special niche or cool feature. The same applies for hubs.

Many applications promote the use of open-source, allowing multiple developers to add their thoughts and ideas to the product. Don’t like where the application is heading? Fork it and create your own. As is says in the DC++ documentation (paraphrasing); “eventually your modification may have a higher user count”.

The software that is produced are most often not code reviewed or tested very much. There’s only so many people that can write code or documentation, review code or test the application. The fact that people are doing much Direct Connect work for free on their spare time means that there’s only so much time in the ability to continue developing the software.

Most companies are structured around specific people writing code, specific people testing, specific people writing documentation, etc. This isn’t directly possible as people can’t be forced to do something. However, I believe that this means that people do what they like. By doing so, they don’t mind putting X hours into development of a feature if they feel that it was fun or interesting to do.

Goals

While it’s a novel thing to build the applications, it’s silly when people don’t envision the future. You don’t have to have a “business” sense or likewise to explain what you want with your product.

Each application should have a definite goal and specified stages where certain features are implemented. The goals can simply be “let us fix bugs X, Y and Z by the next version.” The goal doesn’t have to be “we must have X amount of users by the end of the year.” The goals should be sufficiently easy for someone else to do: that is, if you as a developer don’t have time to complete a feature, then someone else might.

Each application should strive for a certain type of freedom for its users. Lately, this freedom has come in the form of plugins for clients. While many clients have offered this ability in the past, adding this  ability in DC++ will probably increase the diversity of plugins. The current DC++ plugin interface is C, so the goal should be to provide an implementation for C# (through C++/CLI and/or Mono) or Java users. Python and other languages could probably be incorporated also through middle ware plugins. A clear goal would be to have wizards for Visual Studio or Eclipse or Netbeans, allowing developers little time in having to set up their environment. Base classes could also be added that help common operations for plugin developers.

If the software is provided with the source code, a clear goal should be to have clear and concise instructions for building your own version of the software. Project files and scripts for automatically downloading and building the software can greatly decrease the turn around time for development. In fact, you can increase your own productivity if you don’t have to do fifty things each time you need to compile or pull down the source code.

Diversity

While it is great that we have a lot of applications for the Windows platform, there’s still missing applications for Linux and Macintosh based systems. Applications like WINE on Linux help, but only so far. The need to have applications and developers for each platform is important.

An important part in today’s society is to provide applications for the mobile platform; phones and tablets. An interesting aspect would be to create native Blackberry, Windows Phone, iOS and Android interfaces. This would allow users to chat and share files through their mobile device. The network traffic cost could of course be an impact in the amount of users, but anyone with a flat rate plan would have no problem.

Easy as mobile devices are, another avenue may be adding support for Facebook, Spotify or other media directly in clients. Additionally, for example plugins in Facebook for DC could open up a new world of users. Just imagine that you can download and share photos from your hub straight onto Facebook, even while not being at home.

The road ahead

As more and more features get supported in each application, it is important to continually take a break  and make sure that each feature is properly implemented. Any new user is a potential source of questions and requests. The important part is to not bury the head in the sand, but to provide ample support for users whilst trying to continue developing the product. Any time a product has a capability that will allow the user to extend it (either through someone else’s plugin or themselves directly) it will mean that the user is much more interesting in the continuation of using the product.

An application signing can be a great way to provide a receipt that the software is genuine and not tampered with. While this may cost, it will increase users confidence in the authenticity of the software.

Videos, articles and other media that can be used for helping users (either starting users and long time users) will always be considered useful.

Going through the list of feature request on a regular basis may provide a good insight in what users want.

The road ahead: Competition

Nearly always when you have a component, resource of item you own, there’s someone else that want to compete with you and beat you with the better product.

In Direct Connect, we can clearly see this competition when it comes to client developers trying to get more users than other client developers. Hub developers vs other hub developers. Hub owners vs other hub owners. Protocol maintainers or proponents vs other protocol maintainers and proponents.

Competition comes from a desire to perform better than their counter-part. The end state is that users have more options (in software or elsewhere) to choose from, thereby allowing both niche and generalities in software and content.

Competition in the present

For example, the initial NMDC client could only connect to one hub at a time. As such, if you wanted to be in multiple hubs, you had to have multiple instances of the application open. When other clients started popping up, they supplied the ability to be connected to multiple hubs at the same time. This provided a clear competition between the two sets of developers as one party could say “hey, users, choose us since we can do this better and you’ll avoid hassle”. The end result was that every client got updated to have this functionality, which meant that users benefited greatly from this exchange. Now it’s almost unthinkable to have a client that can only connect to one hub at a time.

Direct competition can even come from those who help your product: say, if you say “no” to a particular feature, then they can create that feature themselves and distribute the new changes. This is how most client modifications of DC++ arose: the developers of DC++ felt that a feature was too experimental or didn’t follow according to the “mainstream” user, which meant that the feature wouldn’t get included. If a developer wanted this feature themselves, they could simply add that to their own version (and kind as they were, distribute it to others). For example, the use of hashing was an experimental feature that first saw the light of day in the modification BCDC++ but was eventually merged back into DC++.

Competition from the past

An often over-looked problem for software developers is that you do not only compete against the current set of applications or fellow developers. You compete with yesterday’s products. That includes your own product. DC++ has notoriously this problem: when hashing was introduced and became mandatory, a lot of users simply didn’t upgrade as they felt the feature(s) were not enough compared to the downside of hashing. For example, at LANs where you have no (less) bandwidth problems, the need for hashing may (seem to) become meaningless as you can simply transfer the files so quickly. The problem for the current client then is that users don’t want to upgrade to the flashy new version, as the old versions are perceived as better. This means that while you can out-smart your current competitors, you must not be able to manage your past self.

The competition from the past is also easy to spot when it comes to upgrading: upgrading is (considered) difficult and cumbersome. It takes time until people have moved from one (possibly insecure) version to another (hopefully better) version. Any form of automatic upgrade management can provide a good venue, like how most browsers do today (they upgrade without notification and without any form of user interaction).

An interesting management of handling past implementations are to provide an “in-between’ version, that incorporates the good from the past and the present. For example, the current DC++ client require that all files are hashed before they are shared. This is a perceived problem for LAN users (as explained above), but perhaps there’s ways around it. What if you hashed, say, the path of each file and called that ‘the file’s temporary hash’. These files and the temporary hash would then be included in the file list of that user. When a file is properly hashed, the temporary hash goes away. If another client connects and tries to get a file that only has a temporary hash, then that file moves up in the priority queue for files waiting to be hashed. When the file is hashed, the client can send back “hey, this is the new real hash”. That is, let all files be available for browsing and responding to (text) searches whilst not allowing people to actually download those files before the file’s been hashed. (I understand that this would be an undertaking in the software.) The out come would mean that you no longer compete against yourself.

Competition between users

In Direct Connect users also compete with other users when it comes to slots and hub bandwidth.

A slot is one potential upload channel. That is, if a user has three slots open, only three other users can download from them. That is, each user is competing with other users on the availability of slots within the system. Those users who have a fast connection are also able to quickly get their content, meaning that slots get more frequently available. The addition of including ticket systems and the ability for uploaders to decide who they grant additional slots to also provide a new dimension in the hunt for the slot. There are not many other file-sharing systems that behave this way, and I believe this is one of the reasons Direct Connect is prevalent as such: it promotes fast connections, small files and the ability (for the uploader) to manage their resources (bandwidth), all the while allowing the possibility of slow connections and large files.

The hub’s bandwidth is the primary reason that DC scales relatively poorly (compared to e.g. eDonkey); DC rests upon hubs broadcasting of information to most or all clients. That means that the bandwidth of a hub is crucial. For example, if the hub have 1 Mbit/s upload capability, then it is bound by a certain amount of users it can manage. Some hubs manage this resource by restricting how often you can perform certain actions. For example, the ability to search is often restricted to once or twice a minute. Sometimes only active users are allowed to search, etc. This means that as a simple user, you are competing against other users: if you can search, then that means another user might not, etc. There’s relatively little you can do as a user to fix this, beyond perhaps avoiding passive mode and encouraging the hub owner to get a better connections.

Competition from other systems

While the developers of the current system can discuss and argue about internal stuff, there’s an outside world as well. There’s a variety of different protocols and systems just waiting to (further) push down DC (sometimes, they can do so albeit unintentionally).

In the past months, we have seen more and more BitTorrent websites use magnet links. These have previously been an almost exclusively DC-resource. As such, DC client have owned the magnet link resource. As more and more BitTorrent sites require magnet links, so do the BitTorrent clients. That means that the DC clients now must compete against the BitTorrent clients for the ownership of magnet links. This is a battle I believe we cannot simply win, but I think there’s way we can still come out on top. DC and BitTorrent uses different information in their magnet links, and it’s easy to spot the differences. The DC clients (that previously at least) that own the link resource, should prompt its users about unknown magnet information. If the user can then specify that this new magnet information is actually for the BitTorrent client, then the DC client can simply redirect those types of links to the BitTorrent client. That means that DC owns the resource, while those who want to use BitTorrent isn’t left in the dust. Likewise, I believe the same option should go in the BitTorrent clients; if they discover a DC magnet link, then they should try and send that to the installed DC client.

While some systems do not have an intent on diminishing the DC user count, it may be in the system’s nature to do so. If the user isn’t using DC, they’re using something else.

The road ahead

There are no clear cut ways in steering clear of competition. The only way to stay ahead, is to invent features and come up with ideas before your adversaries. When it comes to other systems, the key is to provide ways of attracting users while still giving that system the small part of control.

The road ahead: Protocols

Direct Connect was started with the protocol “Neo-Modus Direct Connect” (NMDC). This was named after the only client and hub available at the time. Over time, this protocol grew as more client and hub developers followed. The protocol was initially sniffed out by various people as the original system was closed-source. Over the years, the client and hub developers grew and discussions commenced that the protocol had become unmaintainable. The protocol was considered bad in various aspects and the request for a new protocol was underway.

The initial discussion was whether the “new” protocol should be binary or text based: a binary is less resource intensive as you put much more care into what is being sent, while a text protocol (like NMDC) is easier to read and implement.

The discussion eventually came down to a “my client has the most users, so here’s my protocol and I’ll implement it in my client” call from Jacek Sieka, the DC++ author. The new protocol was called ADC, sometimes referred to as “Advanced Direct Connect”, while this has never been its official name. ADC took the same fundamental structure of Direct Connect as NMDC did, but with the intent of increased usability and extensability. A few of ADC’s aspects can even be traced back to the protocol suggestion (made around the same time) called “Direct Connct The Next Generation” (DCTNG), as is noted from ADC’s website.

ADC eventually grew and there’s now “competition” between NMDC and ADC (although I believe NMDC is still ‘winning’ in user counts).

There are now various resources that people can use to implement and support their own component for the NMDC and ADC protocols, although not enough in my mind.

The tools

There are, as of writing, very few tools for supporting protocol developers.

The tool Wireshark can provide a tremendous support in filtering what information is actually being sent on the network. Effectively, if you didn’t have the specification(s), you could create your own implementation by simply looking at the network traffic. Wireshark uses plugins that are protocol-specific sniffing implementations. However, no plugin has been fully implemented for either NMDC or ADC. The ADC Wireshark plugin attempts to do just this, but it isn’t complete (at the moment you have to compile things on your own etc). Having a plugin (for either protocol) could provide an excellent opportunity for developers to learn the protocol’s raw nature. There’s probably other similar types of applications like Wireshark, but it is probably the most widely known and used tool for providing network information.

There are few ways of actually testing your NMDC and ADC implementation. As a client developer, you need to connect to a “normal” hub and see if that hub accept your data and see if other clients can see your data. As a hub developer, you need to take “normal” clients and connect it to the hub and see if you properly route traffic. This means that while we can point to the specifications as the reference, most often developers need some form of software to actually verify their application. The proper tool for doing this would be a full reference implementation of a client or hub. This implementation doesn’t have to be fast or be able to handle lots of data. It should provide the basic protocol implementation together with a list of things it should send out to test the application (like sending out faulty data to verify that the system handles it). Ideally in my mind, a public hub should be set up to act as a reference implementation — that way you must manage connections that didn’t originate from your own computer or LAN.

While a reference implementation is the way to go, the next step is to have something called “stress tester”. This is an application that pushes the software to its limits. For a hub, the stress tester could simulate hundreds or thousands of users, seeing whether the hub can cope with the information. For a client, the stress tester could simulate lots of search results, managing lots of searches and simulating lots of connection attempts. The stress tester could also include faulty data, but the key of the application is to test whether the underlying service is able to handle a huge amount of data.

While we can provide tools for those who decide to implement the protocols themselves, we should also strive after providing reference implementation code. That is, people shouldn’t have to re-write boring protocol implementations all the time: they should just be able to take an already created one and use that. The ADC code snippets is such an attempt and in the future my idea was to have further code such as hash or magnet implementations, in addition to the protocol specific implementation. The idea is to create a basic foundation for Direct Connect software, similar to how FlowLib is managed. Of course, having general code extends also to NMDC.

Discussion

The idea is to promote any type of venue where people can interact and discuss further protocol enhancements and issues. The DCBase forum (previously ADCPortal) intends to provide such a functionality. There’s also the FlexHub forum, which also have NMDC and ADC sections.

I am not sure about the use of a wiki, as I think much of the content can be written elsewhere in a better way, but I see no problem in having wikis that explain various implementations in more detail or the use of “workgroups”.

Regardless of the venue, it is best if we can create a service that gathers protocol issues and content. This blog has served as such an information pool in the past for various parts, but I’d not be sad if we could have a better place that would be easier to manage.

The previous DCDev archives provide a good picture of the early discussions about NMDC and ADC, and I’m sure there’s a couple of gems that should be discussed further or at least lifted. Not only is that old resource important, the resources we have today are also important: the developer hub can be an important source and the future might should be post any protocol discussion to a forum or alike so other can read the discussions.

Any type of document that describe the DC protocols (or any other incarnation of DC) should be made public.

Documentation

The ADC protocol is (well?) documented and the main document and partner (“extensions”) should hold its merits on their own. However, I believe it would be good if others provided suggestions how we can improve the specification to be easier to read and be less ambiguous. The intent of the “recommendations” document should also provide information that may not warrant inclusion in the main documents but can serve as good-to-know information as people are reviewing the protocol. There have also been suggestions for state diagrams over the protocol as it should provide better insight into the flow of information.

The NMDC protocol was never officially documented. The documentation that exist was scraped together by various people based on the behaviour of the protocol implementations. There today a few resources left, but I would like everyone to acknowledge the NMDC project as its intent is to provide the same level of information as the ADC project. While the specification for NMDC should provide information how implementations should behave, I believe it should also specify how implementations have done in the past (if they have deviated and how).

Down the line, perhaps an official RFC document is the target for both NMDC and ADC (one for each, obviously).

The road ahead

The NMDC and ADC protocol should increase in its documentation and the support that can be provided for their implementators. The tools should provide better support for developing something new as well as simply not having to do the implementation at all.

The road ahead: Security and Integrity

The community we are part of has had its fair share of security threats. The security threats have originated from software bugs, protocol issues, malicious users and even from the developers of the network.

Security and integrity are very broad terms and my use for them is indeed broad, as I believe they address multiple points and need not necessary simply be about remotely crashing another user. A system’s security and integrity are tightly coupled and may sometimes overlap.

There is a variety in the number of issues we face.

Issue 1: Software issues
Writing software is hard. Really hard. It’s even more difficult when you also include the possibility for others to impact your system (client/hub etc): chat messages, file sharing exchange etc. Direct Connect hinges upon the ability to exchange information with others, so we cannot simply shut down that ability.

A software issue or bug arise differently depending on what type of issue we’re talking about.

The most typical bug is that someone simply miswrote code, “oops, it was supposed to be a 1 instead of a 0 here”.

The more difficult bugs to catch — and consequently fix — are design issues, which can be caused by a fundamental use of a component or the application’s infrastructure, “oops, we were using an algorithm or library that has fundamental issues”.

A security issue may stem from an actual feature — for instance the ability to double click magnet links. That is, the bug itself is that the software is not resilient enough for a potential attack. That is, there’s nothing wrong with the code itself, it simply isn’t built to withstand a malicious user. (Note: This is not a reference to the magnet links, they were simply an example.)

A software bug may not only cause malicious users or (other) software to exploit the system, they may also cause the integrity of content the crumble. For instance, pre-hashing, the ability to match different files to each other were done via reported name and file size. This was ultimately flawed as there was no way of identifying that the two files were identical, beyond the name and size, both of which can be easily faked.

A software issue may be addressed by simply blocking functionality (e.g., redirects to certain addresses, stopping parsing after X character etc). While this is the simplest course of action, removing functionality is often not what users want.

Issue 2: Protocol issue or deficiencies
Systems and protocols that allow users to perform certain actions carry with them a set of potential security issues. The problem with writing a protocol is that other people need to follow it: the developers of a piece of software may not be the same as the developers for the protocols. For Direct Connect, there’s a very close relationship between the two groups (it’s actually closer to one group at the time of writing), so this issue may not be that severe. However, there will always be a discrepancy between the maintainers of the protocol and software. Imagine the scenario where the developers for a software suddenly disappear (or are otherwise not continuing updates). The developers for the protocol cannot do anything to actually address issues. In the reverse situation, the software developers can simply decide for themselves (effectively creating their own ‘protocol group’) that things need to be updated and do so.

Any protocol issue is hard to fix, as you must depend on multiple implementations to manage the issue correctly. The protocol should also, as best as it can, provide backwards compatibility between its various versions and extensions. Any security issue that comes in between can greatly affect the situation.

A protocol issue may also simply be that there’s not enough information as to what has happened. For example, the previous DDoS attacks were possible to (continue to) do as there weren’t an ability for the protocol to inform other clients and hubs (and web servers etc) what was happening.

The original NMDC had no hashes and as such no integrity verification for files. This was a fundamental issue with the protocol and extensions were provided later on to manage the (then) new file hashing. This wasn’t so much a bug in the protocol, it was simply that it was a feature NMDC’s founder hadn’t thought of.

When software is told to interact in a certain way according to the protocol, then those actions are by effect the protocol’s doing. For example, the (potential) use of regular expressions for searches are not a problem for the protocol itself: the specification for regular expressions in ADC is quite sparse and very simple. However, the problem with regular expressions is that they’re expensive to do and any client that will implement that functionality effectively will open themselves up to a world of hurt if people are malicious enough. While the functionality lies in the software’s management of the feature, it is the protocol that mandates the use of it. (Note: In ADC, regular expressions are considered an extension. Any extension is up to the developers to implement if they so choose. That is, there is no requirement that clients implement regular expressions. However, those that do implement the functionality of the regular expressions, are bound by the protocol when they announce so.)

Issue 3: Infrastructure
The infrastructure of the system must withstand security threats and issues.

If a hosting service would go down for a particular software, then that software cannot make updates responding to upcoming issue. Official development simply stops at that point on that service (and the developers need to find another route).

If a hosting service decide to remove old versions (say, because it had a 2 year pruning of software or for legal matters) then someone need to keep backups of the information.

A large part in the DC infrastructure is the ability to connect to the available hublists. This issue was apparent a few years ago when the major hublists were offline while various software didn’t update. People simply couldn’t connect to hubs, and for beginners this is even more annoying. There are now various mitigation approaches to handle these scenarios, such as local caching, proxy/cloud caching and even protocol suggestions to handle these scenarios and distribution avenues.

Infrastructure isn’t simply being able to download software and connect to a hublist, it is also the ability to report bugs, request features and get support for your existing software and resources.

A very difficult problem with infrastructure is that it is often very costly (money) (for the developers) to set up. Not only that, it must be properly done, which is also costly (time) and hard. Moreover, most people aren’t experts at setting up resources of this kind, and there is lots of information available online for avenues of attacks against forums and websites.

Infrastructure issues can be aided by moving some services out in a distributed manner (whilst a set of people maintain the resources) and moving some services out to the users in a distributed manner (for example, allowing clients to automatically exchange hub lists). Obviously, the services must be there from the start, otherwise there’s little one can do.

Issue 4: People

Software, infrastructure and our ideas only last so far. If a person has the means and intent, they can cause various problems for the rest of the community. Most of the time, we envision a person trying to cause havoc using a bug in the system (or equivalent) but that is not the only concern we have when it comes to people and their interactions.

While a person with the know-how and the tools can cause a tremendous problem, the people that can cause the most problem are those who control key resources within the system. For example, a hub operator may cause problems in a hub by kicking and banning people. But the hub owner can do much more than that, since they control the very resource that people are using.

That means the developers and owners of each resource must guard themselves against others who they share that control with. This is primarily a problem when the two (or more) people who share a resource disagree with an issue, and one party decide that they want to shut down that resource. The last instance of this was last year and was with ADCPortal and other similar problems have occurred in the past.

The problem with this is that we all need to put trust in others. If we don’t, we can’t share anything and the community crumbles. A problem with resource ownership and control is a general problem of responsibility: if I own a resource (or have enough control over it), I am expected to continue developing it and nurturing that resource. If I do nothing as a response to security issues (and any other issue) then that resource eventually needs to be switched out.

The solution is to share resources in such a way that allow people to contribute as much as possible. The community should encourage those who are open about content, and try and move away from a “one person control everything” system. This is extra difficult and puts the pressure on all of us.

The road ahead

Security cannot be obtained by not addressing the problems we face. The community gain very little by obfuscating the ‘when’ and ‘how’ when it comes to security: it only slows down any malicious party so much by not being open about the security issues we face.

Disclosure of security issues is an interesting aspect and the developers owe it to the community to be as direct as possible. It does not help if we wait one day, one week or one year to inform people, anyone vigilant enough will discover problems regardless when and how we announce them. Any announcement (or note in a changelog or other way of information) shouldn’t cause people to treat the messenger badly. Instead, the key is to have an open dialog between developers, hub owners, users and anyone else involved in the community. The higher the severity of the security issue, the more reason to treat any potential forthcoming issue directly and swiftly. I believe it would also be good if someone reviewed past security issues and put them together in a similar article or document, essentially allowing current and future developers to see problems that have been encountered and hopefully how they were solved (this has been done to a certain extent). Discussing security issues with security experts from various companies may also be a way forward.

The community must be active in security and integrity issues. A common phrase for development is to be “liberal what you accept and conservative what you send out”. This applies to both software and protocol development.

Software should have clear boundaries where input from another user or client can cause an impact.

Protocols should be reactive in what hashing methods, algorithms and general security it uses. The new SHA-3 standard is interesting in this aspect, and it would be good if we would switch to something that provide a higher security or integrity for us. Direct Connect has gone from a clear-text system to a secure-connection system (via TLS and keyprints). The system could further be extended with the use of Tor or other anonymous services, to provide that anonymity that other systems have.

The security of our system shouldn’t depend on “security by obscurity”; before DC++ added an IP column to its user list, people (incorrectly) believed that their IP was “secret”. The security of our system shouldn’t depend on obfuscating security issues, since they’ll only hit us even harder in the future. There are other cases where the normal user doesn’t know enough security aspects. For example when people disclosed how you could as a hub owner sniff out all data from their hub and their users’ interactions. While I strongly believe it’s difficult to educate your users (on any topic, really), you shouldn’t lie to them. Provide instead ample evidence and reassurance that the information is treated with care and that you as developers and fellow user consider security an important point.

Security is tricky because it may sometimes seem like there’s a security issue when there’s in fact not. This makes it important for us to investigate issues and not rush for a solution. It is also important that people don’t panic and go around yelling “security problem!”, as if there’s no tomorrow (I’ve been the source of such a scare, I’ll admit). Equally important is that those who knows more about security should be the decider of protocol and software aspects, as the topic shouldn’t be subject to whimsical changes “because it makes no sense, right?” (I’ll once again, unfortunately, admit of being the cause of such an issue, regarding ADC — but hopefully will be rectified soon-ish).

The road ahead is to investigate security issues in a timely but proper manner, be pro-active and be up front with problems. Time should be spent to investigate a component’s weaknesses and that component should then be discarded if the hurdles are too difficult to overcome.

Request for DCDev archives

There used to be a mailing list for the developers of Direct Connect, where people discussed protocol and feature implementations. The mailing list resided at http://3jane.ashpool.org/pipermail/dcdev/ but is no longer accessible. I have been able to traverse the Internet Archive Wayback Machine’s storage, although I haven’t been able to get everything.

I have been able to acquire the pages marked 42-43, 92-99 and 101-113.

Does anyone have the other pages or old mail logs? I think gathering these files can prove to be a very useful, as they describe why things are the way they are. For instance, what spawned ADC and the discussions immediately after.

 

Addendum: I have acquired logs and will shortly be publishing them.

Old interviews with Jon Hess, the creator of Direct Connect

The creator of Direct Connect, Jon Hess, made at least two interviews during the years when he was active. These are shown here below, together with their original link. I’m rehashing the posts here in the (unlikely) event that both sites will disappear…


Sharing the Data
by Annalee Newitz

NINETEEN-YEAR-OLD Jon Hess, inventor of the sensational, underground file-sharing program Direct Connect (www.neo-modus.com), is an old-school geek in a cyberpunk world. Unlike many of his peers in UC-Berkeley’s computer science program, Hess doesn’t wear his geekhood like a badge of pride. For him, working with computers isn’t about hacking. It isn’t about being a guru or wearing Matrix-style sunglasses. It’s just something he does for fun–and to make a little pocket change.

Hess talks about writing computer programs in the same way old-time mainframe tweakers talk about their punch-card days back in the late 1960s and ’70s. Those guys weren’t in it for the fame or the IPOs. They were just glad to be allowed to code for a living. When Hess first got into coding as a high school student in the tiny Northern California town of Redding, he had never heard of the Slashdot community or the see-and-be-seen geek event DefCon. “I wasn’t a geek really,” he confessed to me over the phone. “Programming was something I liked to do and I didn’t know anyone else like me.”

So how did an isolated programmer like Hess wind up developing Direct Connect (DC), which is fast becoming a word-of-mouth hit among data-sharing dorks everywhere? “The people I talked to most were folks in my high school calculus class who used the program,” said Hess, who dreamed up DC when he was 17, after getting frustrated with the file-sharing capabilities of Internet Relay Chat (IRC). “I wasn’t hanging out on IRC to chat, but to get files,” Hess recalled. He needed a file-sharing program similar to Napster, but which would work more easily with IRC. Hess also wanted to share more than music files.

Without access to any formal computer science education, Hess picked up the most widely available development tool: Microsoft’s Visual Basic (VB). Sure, Java might have been a better choice, but at 17, Hess didn’t know anything but VB. After I groused at Hess for several minutes about how his program couldn’t be ported to Linux, Hess sighed in a way that made me realize that he’s probably received a zillion flame-saturated emails full of my very same gripe. “This was a pragmatic decision,” he explained. “I hadn’t heard about open source when I started the program in high school. It just wasn’t a thought to me. VB was easy, I could spit something out really fast that worked, and debugging is great. That’s why I picked VB.” (And just for the record, turbo-geeks: he does want to port DC to another operating system. So why don’t you shut up and help out?)

After Hess posted the DC prototype on betanews.com last year, the program got 1,000 downloads in one day. He knew he was on to something and decided to devote himself to the program full time. These days, he has thousands of users who contribute and share everything from MP3s to movies and E-books. Although Hess isn’t advocating piracy, it’s worth noting that DC is a pirate’s dream. Hess wants users to put as much data as possible online so that he can claim DC has a “petabyte” of data (1,000,000 gigabytes). The system currently has an average of 100 terabytes, and a lot of that stuff is not usually available for free.

Some users on DC like to carry on the IRC “no leechers” rule, meaning that they won’t allow you to delve into their data troves unless you can demonstrate that you have 10 gigabytes (or some other huge amount) of data to share with them. Luckily, one of the documents available on DC is called “how to cheat on DC” and teaches you how to make it appear that your hard drive is packed with tons of freely shared data when it isn’t. Hess isn’t worried about that. “I want open distribution of data,” he said emphatically. “People should be able to skip out on rules that are too strict.”

But the best part of all this, for Hess, is that he’s finally making some money at a thing he loves to do. By selling banner ads on DC, he’s able to earn enough to pay for all his expenses outside his college tuition. Hess isn’t interested in selling DC to anyone–he just wants to run his small business so he can go out for pizza or buy CDs. He said, “People flame me for trying to commercialize DC, but I’m still giving out the product for free. I just want to be compensated for the work I’m doing.”


Interview With DirectConnect’s Jon Hess
by Thomas Mennecke

DirectConnect, briefly dubbed FileShare, arrived to the P2P scene in November 1999. Since then, DirectConnect has quickly become an important aspect of the file-sharing community. Although this community uses an older networking architecture, the DirectConnect network continues to expand, as its resources exceed FastTrack. We would like to thank Jon Hess, the sole programmer of DirectConnect, for taking the time to participate in this interview.

Slyck.Com: How do you feel about third party clients such as DC++? Do you feel they have enhanced or diminished the Direct Connect network? What, if anything, have you learned from them?

Jon Hess: At first I was very angry. But now I realize clients like DC++ are good for the network. They encourage competition and are the reason I was able to release so many updates of my version 2.0 client this year. At this point I’m flattered by their presence – the anger is gone.

Slyck.Com: Third party clients such as DC++ have included features that many would like to see in the official client such as: single window interface, bandwidth management, less memory usage and a streamlined GUI. Will we see such features implemented into the official client?

Jon Hess: “Single window interface” and “streamlined GUI” have always been features of Direct Connect 2.0. If the last release of Direct Connect that a user has tried was 1.0, they really need to give the 2.2 client a shot.

We briefly had a bandwidth management feature that allowed users to cap their upload bandwidth. Their download bandwidth would be capped at a multiple of the upload cap (8x). I’ve never received so many heated emails about a feature. Many users were upset over it, so we quickly removed it. There is no technical reason the feature isn’t included – the code is written.

Less memory usage is something the other clients are going to beat us on. It’s a trade off, and it’s worth it. Direct Connect is really split cleanly in two sections. We’ve got a c++ back-end and, on windows, a c# front end. The back-end, which is everything direct connect, isn’t actually using much data – usually about 512 kilobytes. The .NET Framework is however a ton of code that has to get loaded into our application’s address space and we can’t avoid that. But we feel all of this is completely worth it. The .NET Framework is best way to program windows applications. It lets us add features much faster than if we were coding to Win32 or MFC.

If there is a feature a user is missing in Direct Connect, we want to know – we aren’t clairvoyant. The best thing users can do is tell us how they feel about the program/ I love reading user reviews of Direct Connect, even the bad ones, as long as they say what is bad.

Slyck.Com: Many feel the Direct Connect network architecture is, by comparison to newer communities, a bit out dated. Is there any chance of introducing multi-source swarming, hashing or connectivity of servers (more like eDonkey2000)?

Jon Hess: There are other more technically advanced networks. And while those network structures may be different and more adept to certain tasks, I think they miss the point. They forget about the user experience and are after files-files-files. Direct Connect is about the user joining communities, not overlay networks. Now, Direct Connect was developed before the concept of swarming was popular. We shouldn’t forget that Direct Connect is probably the oldest file-sharing program on the scene. Swarming is something that we’ve been eying. We may end up implementing the Tiger Tree Hash found in other clients.

Slyck.Com: MetaMachine has introduced Overnet, a decentralized version of the eDonkey2000 network. Are there any prospects of introducing a decentralized Direct Connect network?

Jon Hess: No. That is counter to the nature of Direct Connect. Direct Connect is about hubs. However, Direct Connect shouldn’t be called centralized. There is no one machine that is required for the operation of Direct Connect. Direct Connect is a decentralized network. A hub may seem ‘centralized’ but the fact that there are thousands of them with users bridging all of them makes the network decentralized.

Slyck.Com: What kind of communications, if any, exist between you and the DC++ developers?

Jon Hess: None. However we do speak with users who have switched from DC++ to Direct Connect 2.0 frequently and always want to know how we can make their experience better. We were often viewed as a closed shop in terms of user input, but now users overwhelmingly influence the client’s development. We have an entire section of neo-modus.com devoted to user control over the development process (http://www.neo-modus.com/?page=Weekly&subPage=WhatIs).

Slyck.Com: Tell us a little bit about the future of Direct Connect. What features will be implemented to this network/client? Any radical departures from the current Direct Connect philosophy planned?

Jon Hess: Right now the focus is on iterating the development of Direct Connect 2.0. I’d like to see a point release happening bi/tri-monthly. The focus will be on this until users find our client clearly ahead of other Direct Connect implementations. We’d also like to implement the extensions to the network that other developers have included in their applications. I’ve also always wanted to explore more decentralized hubs where multiple hubs can run on many machines and appear to users as a large virtual hub. Honestly though, the road-map isn’t set in stone, and all I can assure users of is that the development will be fast-paced for the rest of the year.

Slyck.Com: Lets talk about the size of the Direct Connect network. The Neo-modus homepage usually states around 200,000 users and 9,000 Terabytes of information (DC++ typically reports more.) How are these numbers calculated, and are they indicative of the entire network?

Jon Hess: We have a tool that profiles the network by periodically visiting the hubs and looking at the user lists of the hubs visited. However, it’s possible that users get counted more than once due to multiple hub connections, and also possible that hubs are never visited due to firewalls or simply being private.

Slyck.Com: Direct Connect development has, at least in the eyes of the P2P community, been slow. How often do you work on Direct Connect? Will the development pace pick up or will maintain the course?

Jon Hess: Direct Connect’s development appeared slow from the summer of 2001 to the summer of 2003 because we dropped Direct Connect 1.0 completely and focused on a rewrite of the core application – Direct Connect 2.0. We released a Mac OS X version, a great operating system if you’re thinking of switching, and ported the Mac code to windows. Now, as a testament to the .NET Framework which many users are unjustly afraid of, I was able to build the Direct Connect 2.0 GUI in 4 months, including the time taken to learn C#. This simply would not have been possible with MFC. Now we have a vastly improved code base and anything is possible.

We may have paid a high start up cost to move to Direct Connect 2.0, but now that we’re here, development is steaming right along. We released more than 20 ‘beta’ builds of the 2.20 client between July and January and there is no plan to slow down.

Like many open source projects, we’ve also split our development into two paths. We have the standard stable build of Direct Connect 2.0 available on the download section along with what we call the latest ‘Weekly Build’. The weekly build is frequently updated, and has all the new features. As the weekly build matures, we slowly roll it over to a stable release and continue the cycle again.


If anyone comes across Hess, I’d like to add an additional interview, with the following questions;

  • What were your goals or ambitions with Direct Connect?
  • What can you say to those who think that the NeoModus protocol is bad or ill formed?
  • What would you have done different, protocol, application or just in general, if you were given the chance?
  • You haven’t been involved in Direct Connect for a number of years, will you ever again?
  • What was the most difficult thing you did with Direct Connect?
  • What was the most important thing you did with Direct Connect?
  • What was the most fun thing you did with Direct Connect?
  • The lock/key combination in the protocol isn’t particularly used in today’s implementations, why did you feel it was necessary?
  • What were your initial thought and response when other implementations of NMDC arose?
  • What do you know of ADC and feel you want to comment on?
  • Why did you not open source the NMDC software?
  • What made you think of the name “NeoModus Direct Connect”?
  • Is there anybody “on the street” that know you wrote Direct Connect?

(And any other questions I might have missed or asked to the others I’ve ‘interviewed’.)


Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”