Denial of service: I’d like one, please

Responses for searches for the most time is usually non-time consuming or difficult to perform. The techniques may vary from application to application, and depend on whoever wrote the code, what libraries were used etc. While simply a linear search of substrings may be fine, techniques such as regular expressions have made its way to ADC. However, there is a huge problem with regular expressions: they’re CPU consuming. A regular expression that is carefully constructed may, by design of course, force the CPU to spike and force the unsuspecting user’s computer to crash. Basically, anyone that is supporting regular expressions may be victims of a denial of service (DOS) attack.

I just know that some people are thinking “hey, let us restrict the length of the regular expression [hub/client side] and it can’t happen”. Well, that’d be pointless and shortsighted. The length of the command (well, any command for that matter) does not have a direct correlation to its possible damage vector. Additionally, the hub doesn’t really care in this matter. Since it’s as well difficult (impossible?) to parse the expression, evaluate if you really want to perform the expression and be completely guarded against an attacker.

(I’m not aware of any attempts of patching NMDC with regular expressions, but you’d probably end up with the same problem.)

8 Responses to Denial of service: I’d like one, please

  1. quicksilver666 says:

    A simple solution to prevent time Consuming Reg exps would be to forbid the | char … this seems to be the real big cpu consumer..

    Which is quite understandable when we model it in our Mind as NFA ..its simple but modelling it back into a DFA . it becomes quite large..

    Never the less imho any regexp is really too slow.. as using regexp for evaluation needs O(n) time with n the filelist size.

    while all other searches (substring with/without boolean expressions , or Hash) we have in DC in the filelist can be done in O(1) or at least in O(r) with r the result size in any case independent from the filelist size. Which is imho quite important.

  2. quicksilver666 says:

    hmm allthough one can really do nasty things with operators distinguishing explicitly the evaluation …
    like distinguishing between greedy and reluctant quantifiers ..

    again with a very limited number of quantifiers (shouldn’t 2 or 3 be enought).. the next danger is banned..

  3. arnetheduck says:

    one can simply cut the search off if it takes too long…

  4. quicksilver666 says:

    though again DoS is not created by one user but by multiple…

    if several users run searches that might just cost a hundred times of the cpu of a normal search before you cut it …
    well take multiple users like that and you have again your DDoS.

  5. arnetheduck says:

    not if you limit the number of seconds per minute (for example)…sure, there will be searches dropped, but this should work for the vast majority…running the searches in a low prio thread stops those searches that do get through from ruining other computer usage…

  6. fuldc says:

    What DC++ really need is regexp support when openening fillists, just like fuldc !

    It’s extremely usefull!

    spawn a low prio thread that does the sorting … Its not hard to implement …
    If you dont know how, take a peek in the fuldc source! ;)

  7. koninglat says:

    It’s far more effective to just bzip 500 GB of spaces if you want to give soemone a good DoS…

    I think regexp’s aren’t such a good idea to speed up searches: I think it’s far more effective to split the HashIndex.xml in neat memory chunks so you can thread your way through them. Cross-platform and character set regexp’s sound very, very nasty to me ;-)

  8. quicksilver666 says:

    To add a more theoretical remark…

    Theoretically its possible to evaluate most regexp in sublinear time.

    A suffix array data structure containing the filelist could be traversed by a regexp … though I fear thats some functionality the average stdlib has not implemented…

    Probably one has to implement this all by one self… even suffix arrays are not really common for any langauge.. and they are the easyest part…

    Regexp over suffix arrays seem to be common in genetics / gene sequences… may be one could find useable code searching that area…

Leave a comment