Archive for May, 2007

Rollback and advanced resume

May 25, 2007

File rollback and advanced resume are the two file integrity checking mechanisms used by DC++ such that one can stop and resume a download whilst ensuring it remains intact, especially allowing for switching between different sources. DC++ versions through 0.699 rely by default upon file rollback as a download resume integrity-checking mechanism; the currently developing version, by contrast, uses exclusively advanced resuming.

File rollback, having originated prior to TTH, assumes little about the relationship between two putatively equivalent sources. Instead, when resuming a file, it examines the last $ROLLBACK_WINDOW bytes of the already downloaded file, begins downloading the new segment $ROLLBACK_WINDOW bytes before the ostensible resume position, and declares the files intact and fit for resuming if and only if the remote and local $ROLLBACK_WINDOW-sized buffers exactly match. Otherwise, it declares a rollback inconsistency and ceases resuming.

This has the weaknesses of detecting only errors in the accumulated portion of the file within the usually small rollback window, requiring to compensate extra bandwidth to transfer that overlap window, creating a loophole in the TTH regime allowing the corruption of transferred files. Further, in recent DC++ versions, file rollback has been destructive, truncating a file according to the rollback window size provided a detected inconsistency. This process repeats up to every couple of minutes, as often as DC++ retries and gets a remote slot; this results in a trade-off between large rollback windows capable of detecting more transfer corruption and a greater risk of losing what progress one’s client has achieved.

Advanced resuming resolves all of these issues by relying, rather than on the limited rollback window, upon the entire file as locally downloaded. As such, it could not be primarily utilised until hash-capable clients had reached ubiquity. Advanced resuming a file involves re-hashing the locally held portion of the file being resumed, proceeding until it detects either a hash inconsistency or the exhaustion of the local data; this point it selects as the resume position. Because it only resumes files the root TTH informs it are identical, it needs not concern itself, by contrast with rollback, with cross-checking the local and remote data, saving bandwidth. Finally, unlike rollback-resumed files, subsequent sessions can take advantage of the now-primed TTH checking machinery, so is as secure as downloading a file without resuming.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Denying distributed attacks

May 22, 2007

The most unfortunate thing about the past, the present and the future, is the ever growing thought of people that are using the Internet (that include DC) to cause havoc.

This havoc could problably most summarized into using distributed denial of service attacks.

You’ve seen it in action; The cause of dcpp.net’s disappearance and Hublist.org’s unresponsiveness during the past year or so.

To tackle the problem, we first need to understand how people are performing them using DC.

It’s quite simple. These people are in control over one or many hubs. They have operator-status, and are able to manipulate two things. First, they can re-direct your client to an address that’s their target. (Redirect to example.com to cause a bunch of users to connect to the server simultaniously.) Secondly, they are manipulatng C-C initations. When you say to someone else that you want to connect to them, you “say” your IP. (You know, the thing that identifies your computer on the Internet?) This IP can be altered by the hub. So you might have the IP “192.168.0.1″ and say this, but the hub will change the message so the address is “192.168.0.2″, causing the other client trying to connect to the second address. (Since it believe that’s where you’re at.)

What can we do to prevent this, or at least minor the damage somewhat?

A few versions back, DC++ added a particular clever scheme. If it tries to connect to a hub but fail and it’s the first time DC++ has seen it (that is, during this session), DC++ will cease to re-connect. Only when a successful connect has been made, will DC++ start to re-connect if the connection fail. I urge all client developers to also add this, if you haven’t.

Another type of protection that was added to DC++, was to internally block certain addresses when users tried to connect to them. (Eg dcpp.net and hublist.org.)

A suggestion that was fairly hard pushed by Nev (do I need to say Y[n]Hub?) and Jove (the dude behind Aquila) was to block certain re-directs. These certain re-directs would be identified by looking at the port number that is being used in the re-direct. One of the propositions were that ports like 25, 80 and other known service-ports should be blocked. Another proposition were to completely restrict port numbers below 1024 (as they’re usually restricted/registered to a known service). The entire suggestion was faced with opposition. I didn’t and don’t like the suggestion. But I can understand wanting it. [I have no idea if the suggestion actually went into production in either hub software.]

Unfortunately, we need to come to a grip that no matter how much protection we add to new hubs and clients, there will always be those who are using old versions of their client or hub of choice. Which is exactly how people exploit DC. They are taking advantage of people’s resistance of upgrading.

We could always force people to upgrade, by using DC++’s (now deprecated) BadVersion in version.xml… Or perhaps by other hub operators to block those clients that do not have the above mentioned protected.

Having that said, I think hub list operators are the ones with much power. They are able to filter what hub are going to be displayed in the list. With information on who are running malicious hubs, the hub list operators can simply filter away the hub. This will lower the rouge people’s leverage.

Sure, one could build a database over “safe and good hubs” (or something similar to the information for browsers concerning phising sites), but I think it’ll be expensive and difficult to build.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Decentralizing the centralized network

May 22, 2007

The centralized nature of DC is quite apparent. A client connects to a hub, which in turn is used as the central point for all communication, except the raw file transfers. Also, clients connect to hub lists that are also, essentially, a central point for all communication. (Well, one only access the site and download a file.)

What can we do, then?

A solution for this problem would be to have multiple hubs acting as one entity. Something similar to IRC, that is. There are already ways to do this, have a look at Hublink for NMDC and the “IHUB” initiative started by qhub.

We need to consider what the clients are capable of. During file transfer information exchange, one party is considered as “hub” and the other as “client”. This mean that there’s an abstract thought of thinking of clients as hubs. So, if clients may act as a hub, in some sort of limited sense, how can the clients act (more) as a hub?

The most obvious instance for me is “private messages”. That is, a PM in NMDC and ADC (BASE, at least) is routed through the hub. This even mean that the hub can manipulate the message (or simply avoid sending it). There is a solution to this though, and that would be to connect to the other party (as one would do when trying to download a file) and avoid having the messages being routed through the hub. (This can easily be achieved through ADC.)

Further, in ADC, one might send a private message to an entire group. We can use this to our advantage yet again by having a member of the group acting as the “hub”, so all messages are routed through that client. (Yes, I’m aware that there might be a problem if everyone is in passive mode…)

In a secure environment, perhaps even allowing certain clients act as a hub; that is, something similar as “hub link”, but for certain trusted clients only. (QuickDC apparantly is able to switch between being a real hub and true client.)

An important instance, which possibly have gotten the most attention the past year or so, is the hub list situation. Take down the address of a hub list, and you cause havoc.

DC++ formed a solution for this by caching the hub lists it has been able to download. This mean that we only “need” to access a list once, to have a semi-useful list. But how can we take it further? A solution that has been discussed is to use a service similar to Coral, where the actual file would be cached on the net.

Another solution would be to use invent a new command (a new INF-parameter in ADC would suffice, I think), where we’d say “Hey. This hub list address doesn’t work anymore. People, send me a new one!” (Eg, “NHhttp://www.example.com/hublist.xml” where the address would be considered as “this hub list doesn’t work”.) The clients could even send a hub list (or using some other command) when connecting to the hub, meaning “I got this or these list(s)”. The hub would then respond with “Oh, I got these ones”, where the addresses would be all of the hub lists the hub has gathered from the clients.

Another possiblity would be for clients and hubs to agree on another extension, where one requests the actual hub list file and the other party sends the file. Of course, a hub list that is only in circulation and never re-downloaded will be outdated in a while. So, we obviously need to download the hub list for ourselves, at intervals. However, this interval can, with mentioned techniques, be much greater than the situation we have now.

Are there other ways you can think of where we can decentralize DC?

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

The evolution of Direct Connect

May 22, 2007

Direct Connect is quite old. The community have lasted for little over six years, and I doubt DC will die in the next few years. During this time, we have seen various things pop up. Like hashing of files, segmented downloads and ADC. And that’s just to name a few things.

As DC will continue to age, I think we need to start to think about the next evolutionary step. We’ve gone past going from identifying files based on file name and size to using a hash based on file content. Then it was going from single source downloads to multiple sources. Then it was going from the NMDC protocol to the ADC protocol. And so on and so on. What we need now, is the new implementation or idea on how to improve Direct Connect. By saying this, I mean not that creating the next step is or need to be easy, but to force people to try to think about the future.

During the past years, I think there have been a growing dependency on Jacek (on the client and ADC side) and PPK, Yoshi and Nev on the hub side, to create the “next thing”. Other people have of course contributed to the development, but the people I mention are those who have the largest market share and thus the best ability to change things. I believe this should stop. We should stop depending on these people, and try to enforce standards in a different way; By forcing people (those above, as well) to use a particular feature or scheme simply because it’d be too difficult to resist.

Mind you, this post is only intended as a preface for a series of posts, so if you want to comment, comment in the post that is related.

Are there other things we can do to improve Direct Connect?

Ps. No, this isn’t an April fools joke.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

ADC as an open messaging protocol

May 17, 2007

Szabolcs Molnar’s recent post advocated hub-side command filtering, specifically on BMSG PMs. I believe this, and obvious generalisations, to be mistakes. They sap the openness of a protocol capable of routing arbitrary messages between users, subject to bandwidth limits on involved hubs and clients.

NMDC, as that post observes, implicitly provides such limitations. One cannot send a broadcast message but it will be interpreted as intended as a mainchat message for display by the receiving parties. Similarly, private messages under NMDC as widely interpreted exclusively contain user-visible messages, leaving those who attempted protocol innovation to either seek quirks of buggy parsing or overloading such messages as $SR to achieve their ends. These limitations don’t ultimately help those using a protocol, instead pushing it towards a choice of ugly kludges or stagnation.

ADC, among other goals, includes the means to obviate the need for those workarounds and instead to directly implement unanticipated protocol features. To shut this down, as the previous blog post suggests, would merely invite the same harmful cycles seen in NMDC. Instead, an ADC hub should function essentially to authenticate identity, ensuring registered users are who they claim and that messages sent between users contain a correct source CID.

That stated, the motivation behind desiring hub-side filtering of BMSG PMs is real, and a rejection of centralised limitations on them should include a response to that impetus. Rather than specifically targeting BMSG PMs, a both freer and more robust system allocates a certain portion of a hub’s bandwidth each user can consume and under conditions of stress prevents or prioritizes as low traffic beyond that allowed.

Clients, meanwhile, can simply ignore BMSG PMs if they so desire; someone in control of a hub who desires equivalent functionality can use DMSG PMs instead. This allows her to retain a more general bandwidth allocation regime whilst simultaneously allowing free use of the ADC protocol with the ability for individual clients to choose to ignore BMSG PMs. Such a system, of course, represents a compromise in itself (why should a hub have to lie about BMSGs being DMSGs just so those who control it can get their mass messages displayed?), but unlike the alternatives doesn’t collapse with smallest gaming.

Legitimate uses of hub-side filtering do exist, primarily where the those administering a hub have unique knowledge of a pattern of abuse undetectable via static structural analysis. For example, URL spam tends to be both more dynamic and harder to detect a priori than the BMSG PMs, and therefore more worthwhile of hub filtering. The general principle involved I’d identify is that when a DC client can do something autonomously with negligible loss of functionality over what an ADC hub could do, the hub should refrain from performing that functionality.

Summary : don’t stunt ADC by reducing it to NMDC’s capabilities when alternatives exist.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Thoughts about an ADC-hubsoftware

May 10, 2007

As you might already know, on clientside, ADC does work more or less. You can chat, you can download, you can search, you can browse other user’s file lists. Sure it has bugs, but it works.

But what about ADC-hubs? There are several, but none of them are mature yet, it lacks services which nmdc hubowners already got used to. And there are differences which the developers need to think about. Well, I collected some of them. Not to be wise or whatever, just because I think someone shall start it :)

Ok.. Well, let’s see:

  1. Hubsoftware must ensure that the CID matches the PID and not allow users entering the hub if they couldn’t provide a valid CID for their PID
  2. Hubsoftware must not store, broadcast or make available anyone’s PID to someone else including hubowners and scripts too. This would weaken the security of the system. People should not use or install hubsoftwares which does this to protect their operators and users.
  3. It’s a good option to register users using their CIDs, but the hub should note the users that their registration will lost if they modify or lost their PID/CID. Moreover, it’s a good idea to store the last nick for every registration to disallow other users to connect and talk in the name of someone else while that other user is offline. This protects the users’ reputation.
  4. Filtering commands is the hub’s job, not the client’s. So hubs must ensure that regular users are not allowed to send mass messages to other users for example by adding a PM flag to their BMSG.

Sure there is a lot more, but I think it’s enough for now. Feel free to comment.

Bug reporting

May 8, 2007

As we don’t have a Bugzilla at our disposal, the only means of reporting bugs in DC++ is by sending an e-mail or contacting someone at DCDev Public.

Well, now, you can comment here in the blog concerning bugs you encounter.

Continue reading here.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

UPnP file required for compiling

May 7, 2007

At one point in time, DC++ added support for UPnP, which enabled DC++ an automagic configuration for active mode.

For accomplish this, a specific file in the source code was required. This file is called ‘natupnp.h’. I’m sure people have noticed it; the compile information mention this file and if you’ve attempted to compile DC++ without the file, the compiler would complain.

(Visual Studio 2005 have this file built in, but if you used Visual Studio 2003, you need this file in your includes.)

The compile information note that you can get the file in three ways; (1) To download the .NET SDK, (2) get it through our Bugzilla install (which is down, yes) and (3) by contacting someone of us and then we’d give it to you.

Unfortunately, this presents us with a problem. We might actually be in a pickle if we provide it for you directly because we don’t have permission by Microsoft to distribute it.

So to save us all some time and trouble, download the SDK provided by Microsoft and you’re all set.

(I don’t know if this file, or another one, is required for UPnP compilation with other compilers, other than Microsoft’s.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”