Archive for March, 2007

Detecting your hub software

March 25, 2007

There exist many hubs for NMDC, and for ADC. Although the amount of hubs supporting the former is greater, the latter is increasing.

As hubs get developed, they are intended for an audience, and the hub developer will try to market the software so more people will use the hub.

One of the market tricks hub developers use is that they will broadcast what the hub is called, when your client connect to the hub. You’ve seen it; “This hub is using DCH++” or some other type of message.

Though, the target audience is not just the normal users of the hub. It is also a hub list. The hub list can then, through either a website or the actual hub list file, broadcast what hub is being used. This is great advertisement, as the hub developer don’t need to do any active marketing.

However, the hub list isn’t composed of some guy in a basement manually typing the hub software used. Hub lists use programs to determine the hub software. This mean that the program need to look at a specific pattern to identify a hub.

When a client connects to a hub, the hub will broadcast the software. As we have two different protocols, we have two different methods of doing so. In NMDC, the hub list will check the $Lock the hub send. Respectively in ADC, the hub list will look at the VE parameter in the INF.

To do this effectively, Gadget (one of the people behind Hublist.org) published Visual Basic code to be able to parse the information in the lock, so it’s NMDC. But it shouldn’t be very difficult for you to replace the “$Lock” stuff with “VE” or some such. In the code, regular expressions are used, so you need to find the corresponding function in your language to do the same thing.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Command extension

March 17, 2007

If you have ever been looking at some documentation or discussion about NMDC, you probably have noticed that it’s not extensible (or at least easily).

But why isn’t it?

First of all, you have to understand that the protocol Jonathan Hess originally wrote wasn’t intended for third parties. There were no one that were supposed to look at the code or the traffic and say “hey! Shouldn’t we add this feature?”

With that in mind, let us get down to the technical stuff. When you send a command in NMDC, the other side “know” how to parse it because there have been discussion how commands are supposed to look like. Well, not really a discussion. More like “NMDC did it like that and we shouldn’t break backwards compatibility because we’d be breaking a lot of clients and hubs.” So, when a client or hub see information in a command it doesn’t understand, it will ignore the entire command or, hopefully though not likely, the bad part. The problem with this is that there’s no, and haven’t been any (as far as I know), discussion concerning extending certain commands. (Though, there’s eg $Supports which have been agreed, or forced by arne, to use certain rules when being extended.)

So, when you add something in NMDC, you will need to enforce and/or notify that change to other clients and hubs, since they will most likely break with your change. If you add something in ADC, you will not break clients or hubs since there’s a native requirement for receipents to handle unknown data. (However, of course, you will need to have the other clients and hubs understand your command if you intend for them to use it, too. You will never get away from this.)

Search the active ADC and the passive NMDC protocol

March 17, 2007

One of the big culprits in DC traffic is searching. For hub owners, this is a crucial item, as it may make or break their love for either protocol. The less traffic, the more overall users.

So, let us jump right in. I’m going to break this up in two posts. The first post (this) will be about the actual searching, and I’ll later bring up the responses. I’m going to go ahead and begin with active searching and later continue with passive.

NMDC: $Search ip:port F?F?0?1?motd$txt
This is 35 characters. Atleast. (I used single digits in the IP and port. [x.x.x.x:x] Remember that a lot of users aren’t on single digit ISPs. Or are using port 1-9.)

ADC: BSCH BABA ANmotd ANtxt TOsometoken
This is 26 characters. Atleast. (I used a single character for token to come up with 26. The token may vary on implementation. It can be less or more, depending on how many searches one would want to have simultaniously. It’ll probably be less than 10 characters though, but one can’t say that for sure.) Now, this isn’t entirely true, with the 26 characters bit. You see, if you’re an active user you will most likely broadcast your IP and port when connecting. However, as this is done only once, it doesn’t matter much. If we’re going to be picky, it’s 14 additional characters in the INF. (Going by the same rules that I used previously concerning IP and port.)

Moving on to passive searching…
NMDC: $Search Hub:requesternick F?F?0?1?motd$txt
This is 29 characters, excluding the nick. Here again, the length of the nick matter.

ADC: FSCH BABA ANmotd ANtxt TOsometoken
This is (also) 26 characters, and the same rule about the token apply here. Mind you, here, the broadcasting in the INF isn’t performed. But other than that, it’s just a matter of a B for a F.

Comparing NMDC and ADC searching in active mode is definitely a clear win for ADC. The initial broadcasting is forgiveable since it’s only done once and isn’t rippled through the future searches. (It’s up to the clients to keep track of the IP and port.) If we look at passive searching, ADC will remain the same (well, ‘better’ since there’s no initial broadcast), and NMDC wil be slightly better than its active counter part. However, as the nick of the user is important, and is seldom one or two characters, ADC will win here, too.

LockToKey examples

March 14, 2007

The DC++ wiki is down, so all you developers out there are out of luck. Until now… Well, at least for one thing;

The LockToKey examples. Since it’s quite a complicating thing to reverse engineer, here’s the examples that could be found through the wiki.

Note that I haven’t written any or have any knowledge about them.

Sorry about .doc, but WordPress didn’t allow .rar. Just rename the file to .rar. (The reason I’m not pointing to the cached Google page is because it might change over night.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Time left: 584 million years

March 14, 2007

In the upcoming version, there will be a fix for (as noted by the changelog) “time issues with DC++ running for more than 49 days”. This changelog entry mean that if you have your system for running more than 49 days, DC++ will most likely crash. This isn’t a new issue, it has existed for a while, though no one had gotten around to fixing it. (Though, fulDC has had it fixed a couple of versions.)

When the system starts, a specific variable is set that keep track of how long ago it was since your computer had started. When creating the variable, one must set its size as well. If you set the size of it to 100, and you assign the variable 102, the variable will overflow and nasty things will happen. This is also what has happened with DC++.

The size for this specific variable is 32-bit. This mean that the maximum size is 2^32, or 4294967296.

Like I said, this variable keep track of when the system was started, in milli-seconds. So if the variable would read 10000 it would mean that there have been 10 seconds since the start. Let us now convert the maximum amount of milli-seconds to maximum amount of days. This yeild; 4294967296 / 1000 / 3600 / 24 = 49,7102… days. Now look back on the changelog entry. It says 49 days, too.

So, if this is fixed, how is it fixed?

This is fixed by setting a new maximum size for this variable. “But… Then you’ll just end up having to fix that, too!” Well… That’s true. Sort of. If you’re planning on living for a couple of million years, that is.

The new maximum size is 64-bit. This mean that the maximum size is 2^64, or 18446744073709551616 milli-seconds. Let us convert this to days, again. 18446744073709551616 / 1000 / 3600 / 24 = 213503982334,6… days. Seems days aren’t enough. That’s 584942417,3 years. Yikes! OK, this mean that there’s 584 million years until DC++ will crash!

So you see, you don’t need to worry about this issue any more. :)

[If current DC++ crashes, and you restart it, your system will obviously still be running for more than 49.7 days. However, as the variable overflow, it is also reset, so you'll have another 49.7 days until DC++ crashes again. Repeat ad infinitum.]

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Hashing of files

March 14, 2007

Something that every so often arises on the (now absent) forum is why DC++ re-hashes some files. People with network drives are among the majority of these users.

There are two reasons why DC++ would (re-)hash a file.
(1) The path to the file has changed. (The file name is included here.)
(2) The file content has changed.

People don’t realize why (1) is important. They think that DC++ could just look at the file name and see “that it’s the same file”. However, this would obviously not work well if you have multiple files named the same (“example.png”) sprinkled through your share.

(2) is obvious if you’re indeed changing the content of a file intentionally. However, there’s some software that “automatically” do this for you. You might experience this the most with MP3 files and documents. Certain media players like to change the ID3 tag of MP3s, and various document editors like to set their own foot print on the files.

People with network shares may see these things regularly. Detaching and re-attaching the network drive may cause them to update the files’ timestamp and may cause DC++ to re-hash the files [2]. Sometimes, the path may also change, causing (1) to happen.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Protocol chat

March 5, 2007

A key element in DC is the ability to chat. Basic chat is very easily implemented in a client.

As is noted by the previous ADC and NMDC run downs, commands are sent differently, which is why we need to know about each.

Let me create two categories; NMDC chat and ADC chat.

Both categories have chat in the main window. Further, NMDC has “private” messaging, which is essentially that chat is sent to a particular user. ADC has also “private” messaging. However, while the chat functionality in ADC allows a single user to receive the message, the private message is intended for a ‘group’ of users. This group can be anything, really. It could be only those who us DC++, or those who natively support user commands. What is interesting is that sending a message in ADC to a user is the same as sending to a group, just a replacement of some info, but the length will be the same.

Let us continue, and start with a basic main chat message.

Main chat is rather simple in NMDC, and it require no real difficult parsing it. In this example, my nick is going to be “ullner” and the message being “hello everybody”. This is how that will look;
<ullner> hello everybody|
Which is pretty straight forward. The nick in brackets, followed by the message and an ending pipe. This amounts to 25 characters or bytes.

Moving on to main chat in ADC… The difficulty to parse is raised, although not very much. I’m going to use the same message. However, due to ADC’s user -> SID mapping, I won’t be using a nick, but a SID. The SID is “MD3Z”. This is how that will look;
BMSG MD3Z hello\severybody\n
Which is also pretty straightforward, although not as much as the NMDC example. The \s replaces the space and the last \n is one character (ending the command). This amounts to 27 characters. Not that much difference, really. At least not in our example.

Note that changing nick won’t change the SID. So, NMDC will consume slightly less bandwidth if the nick is 7 or less characters, 8 characters and we’re dead on and above 8 mean that ADC will scale a better.

Let us continue to “private” messaging.

I’m going to now use the same message, and with the other user being “arne”. arne’s SID is in this case “6DKN”.

Starting with NMDC; $To: arne From: ullner $<ullner> hello everybody|
So, essentially it is the same as for a normal main chat message except the beginning. This amounts up to 49 characters or bytes.

In ADC, we will have the following; DMSG MD3Z 6DKN hello\severybody PMMD3Z\n
In this, we changed the initial letter from a B to a D (works with E, too), added arne’s SID before the message and our own SID after (in the PM parameter). This amounts up to 39 characters or bytes.

If the two users’ nicks were only one character long, we’d end up with 36 characters. (Though, most users don’t have that, naturally.)

In conclusion; main chat messages are somewhat better in NMDC, but up to a point where ADC will be better and; ADC will in almost every case out-weigh NMDC in private messaging.

Identifying ADC

March 4, 2007

Something very important in ADC is the different client identification schemes. I already noted something about them all, but I thought I’d dedicate an entire post for them.

Session ID
The session ID (SID) is the unique ID that is used per hub. When a client connect to a hub, the hub will assign a particular SID for that user. The SID is calculated by taking 20 arbitrary random bits and then encoding it with Base32 (to a form a 4 byte string). There is only one reserved value that the hub must not assign a user; It is “AAAA”. As the hub isn’t considered a “client”, it does not have a SID. However, to simplify client implementation, the client can (artificially) assign the hub the SID AAAA, since the client know no one else can have that. (Elise and DC++ does this, at least.) During one session, that is, when a user is logged in, the SID for a user mustn’t change. The user must log out and log in to get re-assigned a SID. The SID is assigned before any real information from the client has been sent. This in turn mean that the hub doesn’t care about what kind of information the client send. If the client’s nick change (can happen) during the session, the client won’t get a new SID. Note that the SID is *per hub*. This mean that a user with SID “6DKN” on HubA isn’t necessarily the same user as “6DKN” on HubB (it is possible, however). This ID is what is to be used when sending commands.

Private ID
This ID is the unique ID that is used to verify your CID (see below). The PID must not be given to another client. Doing so will allow others to claim they’re you. If you’re an operator in a hub, I’m sure you don’t want others to know how to potentially get in and gain that operator-status. Of course, there’s always the possibility of rogue hub operators, but I guess you’ll have to trust them in the end. According to the ADC draft, PIDs should be “generated by hashing the MAC address of the generating client followed by the current time using the Tiger hash algorithm.” Personally, I had a couple of issues with that when I wrote the PID generation for Elise. (1) The MAC address isn’t always that simple to get. If you can’t get to it, just use an arbitrary string that you know will be (at least) semi-unique. (2) “The current time” phrasing is a little fuzzy, since we have no idea of what format that “current time” should be. Seconds since 2000? Time of day? Doesn’t really matter here, either. Make sure you are using something that is (at least) semi-unique. In the waiting for a potential re-phrasing, use strings that make the final hash probably to be unique in the end. Elise do, at least. Note that it is the Tiger hash algorithm. Not Tiger Tree. (I have no idea if it actually really matter here since the info isn’t very large, but worth noting anyway…) The final PID should be 192 bits and encoded with Base32 (to form a 39 byte string). You can actually change your PID to something you’d like in the Experts Only page… Note that this means that the CID will also change.

Client ID
While having the most ambiguous ID-name, it is also the most important. The CID is the ID that will (should) uniquely identify you across entire DC. The CID is constructed by taking the unencoded hash, hash it again, and then apply the encoding with Base32 (also 39 bytes). This ID is something you will identify yourself for your friends, for the appropriate hub status (registering per CID is a lot better than nick since people can change nick all the time) and as other people’s source.

Changing CID and PID is potentially possible during a session, though there’s a rather large chance that the hub will kick you. (ADCH++ will.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

ADC: The run down

March 4, 2007

I previously wrote an extensive post about NMDC. Since we’re moving away from NMDC to ADC, I guess a post about ADC is in order. If I don’t explicitly say that something is different from NMDC, assume that the ADC way is the same as the NMDC way.

While a final version (that is, “1.0″) doesn’t exist, the current draft is mostly what will be in the finalized version. The first draft of ADC saw day light on the 3rd of December 2003. ADC was spawned from ideas from another replacement draft; DCTNG. The draft mentions ADC to stand for “Advanced DC”, though it isn’t official. (I always thought of ADC as a recursive acronym; ADC Direct Connect, but maybe that’s just me.)

As a network, ADC work the same as NMDC does; with hubs and clients, where the hub is a central part. Everything is routed through the hub, except the actual file transfers. However, a client could claim (to the hub) that it wants to download from another client, the hub allows it, and instead of trying to get a file, the client will start sending other messages (such as chat). Truely private chat.

Contrary to NMDC, the one that does the connecting, speaks first. That is, eg when connecting to a hub, the hub will wait, after establishing a connection socket, for the client to say “hello. I want to come in”.

In ADC, there are two key characters. The first is a space, used as a delimiter inside commands and a “newline” character to denote the end of a command. There’s no starting character. Whatever that come after the newline character is considered to be a totally new command.

Commands are constructed in multiple ways. In all of these ways, an initial four characters (well, five with a space) are required. These character say (1) how the message should be routed or used (“type”) and (2) what the message is about (“action”). When a client recieve a command, it shouldn’t actually even look at the type to determine what it should do. As I said, there’s multiple ways to create commands, but you’ll need some more info on ADC.

In ADC, when the client connects to a hub, the hub will assign the client a unique ID for that particular hub. This unique ID is very important since the client will need it to interact with the hub. (This is called a ‘SID’.)

Also, beyond a unique ID per hub, ADC require that all users in DC have a unique ID for the entire network. That is, I should be able to say “hey, that user is the same user as that one”. This unique ID is broadcasted to everyone in the hubs (well, doesn’t have to, but most likely will in most hubs) a user frequent. (This is the ‘CID’, which you can visibly see in DC++…) Further, so users aren’t allowed to spoof someone else’s CID, they need to provide another special unique ID (‘PID’) to hubs. The hub will then verify that there’s a match, and let the client to continue. You can spot a security issue here; users need to trust hubs, that they don’t give out the PID to others.

Let us continue. Each action, have a set of parameters that are allowed and/or have to be used. These parameters can either be mandatory or voluntary. If the parameter is voluntary, it is required that it is preceded by an two-character identifier. If the paramter is mandatory, there shouldn’t be an identifer.

Moving on… There are three types of commands. Since the initial bit is always mandatory, I’ll leave it out from these examples. (1) Only the parameters of the action are present. (2) The SID for whom it is from, followed by the parameters. (3) The SID for whom it is from, followed by a SID for who it for (“send this to person x only”), followed by the parameters.

In ADC, all commands are uppercase characters and case-sensitive. Voluntary parameters have no particular order; one can send them however they want.

Something else that is interesting in ADC, which NMDC doesn’t do, is that if a parameter need to have a space in it (like a description for a user), the space is replaced by “\s”. “\n” to display a real new line and “\\” to display the character \.

One of the most interesting aspects of ADC for developers is the ability to create extensions, without trouble. If a client or hub doesn’t understand something, it just ignores it (well, there’s always a possibility of kicking/disconnecting).

Let us get away from this somewhat boring info… In NMDC is a hub assumed to be running at port 411 and file transfers on 412. However, this assumption is not allowed in ADC; Addresses must be explicit in the usage of port.

Contrary to NMDC, chat is rather easy in ADC. In ADC, all chat is assumed to be in UTF-8, meaning that everyone should be able to see everything. Also, there’s no such thing as the “highest number wins”, in transfers.

As NMDC has a protocol specifier (dchub://), ADC has one, too. It is “adc://”. In the future, you may see “adcs://” do denote that the hub is using TLS.

Rather obvious but… ADC natively require the usage of TTH…

I’ve spoken much about ADC, so make sure you read all of the other posts on it as well.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Detecting your client

March 2, 2007

A client detection mod (CDM) is an client that is run by an operator in a hub. The CDM will gather information about users and try to enforce rules set by the operator. The CDM have various ways of gathering information and using it, some obvious and some not so obvious ones. We have all seen them; an ‘operator’ is mass-kicking users in a hub because of cheating, slot ratio or some other stuff. CDMs are sometimes (jokingly) referred to as ’spreading cancer’ because of their nature; they use purely logic and assumptions. For a CDM, you are either good or bad. No grey area. And of course, innocent users will always come in the middle…

CDMs can be set to use a ‘white list’ or a ‘black list’. Clients on the white list is the only clients that are allowed in the hub, with no exceptions. If the CDM discover a client not being part of the white list family, it will be kicked (or banned). Clients on the black list are the only clients that are restricted from the hub. This means that if the CDM discover your client, and it’s not on the black list, it will be allowed in. From a security point of view, the white list is better. However, from a network point of view, the black list is better since it will allow new clients so they have a possibility to grow.

There are various things a CDM check to conclude the client’s status;

  • Commands
  • Share
  • Tag

The first is essentially that the CDM will monitor traffic from your client and if the traffic is, or not, in the list of (un)approved clients, the CDM will act on it. Eg, you can use your fresh copy of DC++ to detect other DC++ clients; connect to them, and their icon should become blue. This is because DC++ has a set of specific commands it sends, thus increasing the possibility for someone to know which client you’re using.

The second, share, can be divided into a few sub-categories.

  • Number of broadcast bytes
  • File list
  • Normal files

The number of broadcast bytes is a classic. Essentially, the one thing checked is the amount of bytes your client claim you share. If the value is too common, or the entire number share some common denominator, the CDM will know about it. Most CDMs will e.g. kick if they see someone broadcasting “444444444″ bytes with the message “Too many similar numbers” or something like that. This is only the first frontier, and will most likely flush out the most common and crappy cheaters. (Of course, some normal users may be kicked, though it’s probably rather rare.)

Going on to file lists, they are the second frontier and most often the last stop for CDMs regarding share. What the CDM does is that is downloads your file list, (1) looks at the amount of broadcast bytes and compares with the file list’s shared byte. If they differ too much, you’ll (most likely) be kicked. (2) The CDM will also go through the share, and look at file names and hashes. If one of the files is the same file as a known fake or illegal (as in not allowed in that particular hub) file, you’ll (most likely) be kicked. (3) Also, besides checking for file name and hash, most hubs enforce a “maximum file size” rule, and the CDM will look for that, too.

The last part is verifying normal files, which to my knowledge, very few CDMs actually do. This means that the CDM will download the file list, and then attempt to download a random file. If the CDM can download the file without trouble, no action is taken. However, if there’s an constant error, like TTH inconsistency (wrong leaves) or ‘no slots available’ etc, the CDM will conclude that the user is faking somehow. This is non-trivial for the CDM because; it requires more logic on behalf of the CDM to download a ‘random’ file and then delete it when the download is complete. To successfully pass such a CDM of that skill, the client need to successfully create a correct leaf-database for each of the shared files, which is non-trivial.

The third part a CDM will look at is the tag. This usually contain (1) client and version, (2) slots, and (3) amount of hubs. Most CDMs use a white-list and the CDM will look at (1) as a means of seeing if that’s an allowed client and version. Sometimes, users are kicked by CDMs because they use a brand new version of the client (has happened to me several times). The CDM will also look at (2) as a means of figuring out how many slots are acceptable in the hub. The CDM may also run a search, and check the ’search window’ and see how many slots appear there. (The CDM can search, see that there’s plenty of slots available, and try and download a file, but being unable to because of a ‘no slots available’. The CDM can then conclude that the client in question has locked its slots.) And lastly, (3), is used to enforce a “maximum hubs” rule. This rule concern most often the amount of ‘normal’ hubs you’re in, and not where you’re registered and/or operator. And of course slot ratio is enforced; the amount of slots you have to have open per amount of hubs you’re in.