Base32 encoding in ADC
June 1, 2011 Leave a comment
The ADC Protocol specification is well-defined for the most part, but there is a lack of information on Base32 encoded strings. Hopefully this post helps clear up a few things relating to them.
The specification for the Base32 encoding is defined in RFC 4648. The method to use when converting bytes into a Base32 encoded string is to take the first 40 bits of data, divide them up into 8 groups of 5, then convert each group of 5 bits to it’s character representation (using the Base32 lookup table). This is to be done repeatedly until there is no more data left to encode. In the case where the number of total bits is not divisible by 40, there will be a shortage of bits in the final group of 40. In this case, the last group of 5 bits there is data for is to be padded with 0’s (if needed) and the remainder of the 8 characters in the group are set to a padding character (‘=’).
The padding character can be excluded but only if the specification of the standard referring to the RFC explicitly states so. When the padding character is omitted, taking the input in 40 bit chunks becomes unnecessary. Simply take the input 5 bits at a time until the end of the data. If the number of bits in the input doesn’t divide by 5, pad the last set of 5 with 0’s.
When converting a Base32 encoded string to raw bytes of data, generate a binary representation of the encoded string (using the Base32 lookup table), then take every 8 bits and store them in a byte. If the length of the Base32 encoded string multiplied by 5 doesn’t divide evenly the extra bits are discarded.
The only clue as to if Base32 encoded strings should be padded or not is the line ‘base32_character ::= simple_alpha | [2-7]’. This (in my mind) does not explicitly state that padding should be omitted, it simply states what Base32 characters can be and leaves the interpretation up to developers. It’s a small leap, but enough that it could make people look through alternate sources for confirmation.
A quick recap:
When converting from raw bytes to Base32, pad the extra bits with 0’s to make the final character and omit the padding character(s). When converting from Base32 to raw bytes, discard the extra bits.
This information was learned through searching through the Base32 specification, DC++ source code, Googling, guesswork and trial-and-error. A few additional footnotes in the ADC protocol specification would go a long way for developers who choose to implement an ADC-compliant application from scratch, without using the DC++ core (which is developed by the author of the ADC protocol).
This post was written by pR0Ps, the author of NetChatLink.
If you have something you want to post, drop a note in the suggestion box or mail me.