Bitcoin addresses include a checksum. The Base58Check algorithm uses the first four bytes of a hash function as a checksum before applying Base58 encoding.
Ethereum addresses did not include a checksum, but it became apparent later that a checksum was needed. How can you retroactively fit a checksum into an address without breaking backward compatibility? Here’s what Ethereum did in adopting the EIP-55 proposal.
Ethereum addresses are the last 20 bytes (40 hexadecimal characters) of the Keccak-256 hash of the public key. The protocol allowed the letters in hexadecimal addresses to be either upper or lower case. This option provided the wiggle room to retrofit a checksum. You could mix upper and lower case letters deliberately to encode an extra message (i.e. a checksum) on top of the key. This is sorta like stegonogrphy, except that it’s out in the open rather than hidden.
Hexadecimal notation uses 10 digits and 6 letters, and so the probability that a hexadecimal character is a letter is 6/16. On average a string of 40 hexadecimal characters will contain 15 letters. This means you could add 15 bits of information to an Ethereum address, on average, by alternating the case of the letters.
It’s conceivable that an address won’t contain any letters, or will consist entirely as letters. The number of letters in a random string of 40 hexadecimal characters is a binomial random variable with parameters n = 40 and p = 3/8, and this is approximately a normal random variable with mean 15 and standard deviation 3. This says the number of letters in an Ethereum address will be fairly tightly clustered around 15, rarely more than 21 or less than 9.
OK, but how do you take advantage of an average of 15 bits of freedom? You can’t encode a 15-bit number because you might not have 15 bits at your disposal.
Here’s what EIP-55 did. Left-align the address and the Keccek-256 hash of the address (which was itself a hash: there are two hashes going on) and capitalize all letters in the address the align with an address character greater than or equal to 0x8.
As an example, let’s suppose our address is
7341e5e972fc677286384f802f8ef42a5ec5f03b
This address contains 13 letters, which would not be unusual. Now let’s compute its hash.
>>> from Crypto.Hash import keccak >>> kh = keccak.new(digest_bits=256) >>> kh.update(b"7341e5e972fc677286384f802f8ef42a5ec5f03b").hexdigest() 'd8e8fcb225fb835fdb89a5918e736820ec75b3efaf3cbb229f03cdc41bf9f254'
Now we line up the address and its hash.
341e5e972fc677286384f802f8ef42a5ec5f03b d8e8fcb225fb835fdb89a5918e736820ec75b3e...
The characters in the address that line up with a hash character of 0x8 through 0xf are highlighted red. The digits will be left alone, but the red letters will be turned to upper case.
341E5E972fc677286384F802F8ef42a5EC5f03B
Related posts
The post Retrofitting error detection first appeared on John D. Cook.