Hub Bandwidth management

Balancing hub bandwidth between users requires triage of messages such that no individual user can monopolize hub bandwidth. Repeatedly I have observed a temptation to implement such an idea by limiting different messages differently. Whilst such an idea fails somewhat for NMDC, ADC is an open-ended protocol which separates message types from message contents, leading to more significant design failure.

This post addresses hubs for which the limiting resource is upload bandwidth. Those hubs limited by CPU or RAM have separate issues and those limited by their own download bandwidth are effectively under DoS attack. Within this constraint, the cost to a hub of a user’s message is the uploaded data triggered for the hub by that message. Therefore, broadcast-type messages should generally dominate hub bandwidth; empirical data bears this out. To a first approximation, then, one can ignore non-broadcast messages, as well as INF, which must be specially handled by a hub regardless.

Both merely large numbers of users and a smaller required quota of actively hostile users can strain a hub’s upload bandwidth. Because the former is subsumed by the latter and a hub should be able to withstand the latter whilst retaining service, the rest of this post will focus on an attack model of actively malicious users only. If a hub can maintain usability for non-malicious users proportional to the the bandwidth available per user given that a certain portion of users do maintain that hostile stance, then the hub will also be able to handle merely large numbers of non-malicious users whilst rationing resources such that each user will have access to a fair amount of bandwidth.

One tactic for selecting messages to forward towards such an end depends on treating different broadcast messages differently. However, any scheme which does this ends up merely requiring an attacker to maximize his damage via usage of multiple messages, preferentially those messages relatively least accounted for. For example, if MSGs is preferred over RES over active SCH over passive SCH, an attacker must merely concentrate his attacks as much through MSG as other constraints allow, then via RES, then, finally through the SCH variants in order. The net result isn’t necessarily less hub bandwidth usage, just bandwidth usage with different content.

Some messages do occur in different temporal distributions and a competent hub bandwidth management system should be able to handle those. Such a case plausibly (I don’t have data on this) occurs with TTH searches versus filename searches, wherein the former might tend to be more uniformly distributed than the latter due to the formers’ occurring through auto-search. In such circumstances, a hub can instead calculate which messages to drop based on a historical moving average bandwidth over time measure.

Only when such a distribution fails to smooth out to less than a hub’s total available upload bandwidth must a hub pull back from merely delaying or queuing some messages, amortizing over an overall low average bandwidth, to outright dropping messages. Importantly, precisely these same considerations and arguments apply to any message, SCH or otherwise, due to the assumption of a hostile user seeking the most efficient exploit mechanism.

SCH might still appear special due to its often automatically triggering RES messages. Rather than specially count RESes, instead one may simply account for them to the user which actually sends them, rather attempting to do so via the user which sent the search to which they’ll often respond. Again, SCH and RES are less unique than they might appear: not only could another such pair of messages appear in a non-DC++ client, but RESes don’t actually have to come in response to any SCH, even given the search token in ADC. Not only cannot a hub keep track of all searches in progress, including some that clients might take a while to respond to and thus be in the somewhat distant past, unless it maintains a greater history than might be desirable, but even were it to attempt to do so, it might miss searches it’s not involved in forwarding from one user to another. In principle, it cannot reliably associate searches with search responses, and therefore should credit search responses to those users sending them. Otherwise, once more assuming a hostile adversary, users could just switch to spamming with RESes.

This system, which has been proposed to be at least three separate times by three separate hub developers, contains conceptual flaws that merely promote its being gamed. Certainly a hub developer or hub owner can respond in an arms-race fashion and adjust the relevant heuristics, but this is a suboptimal, unstable outcome.

Instead, a hub which merely accounts for how much bandwidth any given user’s message, regardless of content but dependent on type (broadcast or non-broadcast, as well, in ADC, as which features it specifies), will consume on broadcast and accounts to that user that amount of bandwidth. Each user then has a specified amount of bandwidth available to him, dependent on the number of users on the hub at that time. Whether or not a message is forwarded, queued, or blocked will then depend purely on non-gameable factors – if the dominant cost is upload bandwidth (see initial assumption), and the hub actually does decide whether to forward a message based on upload bandwidth, the heuristic matches the actual cost so cannot be gamed, regardless of hostility of users.

Therefore, instead of the flawed message-dependent bandwidth shaping, hubs should aim for a message-agnostic bandwidth management system. Note that this allows as well for unknown messages in ADC, for which my previous, linked blog post argues. The result is a more effective, more robust file-sharing system.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

5 Responses to Hub Bandwidth management

  1. djoffset says:

    Interresting.

    I just wanted to mention uhub and QuickDC’s built-in hub since these actually prioritize messages in three categories:

    Criticial (INF, QUI, STA)
    Normal (all messages)
    Unimportant (SCH, RES)

    Depending on circumstances these hubs will drop unimportant messages if situation is slightly bad (a connection endpoint cannot swallow fast enough), and even drop normal messages when things are going very bad.
    If critical messages cannot be delivered, the connection is dropped.

  2. cologic says:

    That avoids the worst parts of the system this post describes, yes, by effectively imposing a bandwidth cap per user using TCP/IP stack feedback – but (1) as you describe it it doesn’t attempt to prevent a small number of users from monopolizing hub upload bandwidth, but rather finds the effects when attempting to broadcast it out to users not responsible for the origination of the message; and (2) it promotes user behaviour adverse to the hub by encouraging legitimate users to search more than they might otherwise if they suspect the hub is nondeterministically blocking their searches. Actually, that effect might be worth another blog post.

  3. djoffset says:

    I agree with what you say about (1), and this is perhaps the root cause for (2).
    I think it is absolutely OK for users to expect different results for different searches. For high volume hubs users are typically joining and leaving frequently, so after a few minutes it is only fair to assume the search result will be somewhat different.
    However for clients sending too many searches we can limit that pretty much like we would implement a blocking behavior of (1).

    I still think prioritizing important messages is better than no messages.

  4. cologic says:

    What I’m describing actually leaves your “critical” messages alone – they’re the ones the hub has to specially process regardless. I reject, though, your distinction between “normal” and “unimportant” messages: someone looking to consume hub bandwidth should just focus on normal priority messages rather than unimportant ones. Further, a legitimately functioning hub user might actually care more about his searches than his chat messages.

    I agree that searches shouldn’t be expected to return the same results several minutes apart, but the system you describe would result in potentially different search results moments apart, creating an incentive to simply retry a search a few seconds later if the first search didn’t return the desired results – who knows what the hub did with it, given that it’s “unimportant”?

  5. djoffset says:

    I think we agree here; what we need to do is tackle flood input as it enters the hub.

    Further more, I’d like to point out that the behavior I’m talking about in the above mentioned hub implementation is not to selectively drop messages as the hub pleases. It is only done when it is strictly needed (the “need” is configurable though).

    I have seen other hubs choosing to drop a client completely instead. QuickDC/uHub tries to throttle messages first, if that doesn’t work out, then drop the client. If the situation is resolved however, bandwidth is saved by not needing to broadcast “quit” and “info” messages when the user is dropped and reconnects.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: