Archive for January, 2007

Using Puzzles to Make SPAM, and Only SPAM, Expensive To Send

Tuesday, January 23rd, 2007

SPAM, as we know, is a problem. Ultimately, the problem is that SPAM is practically free to send. People have previously proposed schemes where a “micro-charge” is applied to delivering email, e.g., 0.1 cents per message. However, that approach is hard to implement, and, besides, who should get all the money? During a meeting this morning I had an different idea that is simple to implement: Add a “solve this” message to SMTP. Allow me to explain.

Modern mail servers perform a variety of checks on incoming mail, trying to discern SPAM from HAM, i.e., sort the wanted email out from the incoming stream of both good and bad. Unfortunately, a lot of heuristics are involved, and so the result is an educated guess, but a guess all the same. At present, there isn’t much choice but for mail servers to accept a certain amount of SPAM, so that they can be confident that they aren’t rejecting too much HAM.

My idea is when you think that a message might be SPAM, you ask the sender’s email program to solve a puzzle, the difficulty of which is proportional to how likely it is that the message is SPAM. If the mail sender does solve the puzzle, then the message is delivered normally, and we have succeeded in creating an economically measurable cost on the delivery of SPAM, which is produced in a way that consumes resources that spammers have in finite supply, i.e., CPU time. On the other hand, if the mail sender refuses to solve the puzzle, then the receiving mail server can delay, defer or refuse to accept the message, just as they do now (though to be most effective, mail servers should reject such messages — but a transition period would be required). This means that we can use our heuristics to educate the process, and place the vast majority of the computational burden onto spammers, rather than legitimate users and servers. Thus, an economic pressure is created that works to reduce the production rate of SPAM, and, hopefully, this effect is great enough to make SPAM unprofitable.
If, for example, puzzles were chosen so that they required approximately 1 CPU second to solve on a modern computer, then each spam-bot would only be able to send one message per second, regardless of how fast its network connection. This effectively limits the amount of SPAM which can be sent per unit time to a much lower level than at present. Thus, even if SPAM remains profitable, the volume of SPAM would decrease. To maintain the decrease, all that is required is to increase the difficulty of puzzles in line with Moore’s Law.

Note also, that once implemented by some, this scheme introduces a motivation for people to configure their mail servers: (a) to support the puzzle facility; and (b) to minimise their perceived SPAM probabilities, so that they get simpler puzzles to solve, and thus save on CPU resources, and fending complaints from users whose messages get refused or deferred. Therefore it provides a positive social and economic pressure to encourage adoption and good behaviour.
So, all this means we need a class of puzzles which: (a) can be generated rapidly; (b) can be generated so that they require a specified degree of effort; (c) are resistant to replay, caching results or other means of circumvention.

I suggest that a suitable example class would be, given two random binary strings, P and R, find a third string, S, of length <=320 bits, that when appended to P, and subjected to the SHA1 hash function, gives a result with the first len(R) bits that equals the string R. The difficulty of the puzzle is set by specifying the length of R, while security against caching results and replay is by choosing P to be sufficiently long, and sufficiently random: 128 random bits should be sufficient. SHA1 produces a 160 bit random hash for any given input, therefore the search space of each puzzle can be ranged between 2^1 and 2^160, which should be ample for several years to come.
So, my questions are: Is this a new idea? Have I overlooked anything? or does this scheme seem workable? Anyone want to implement a prototype, perhaps by enhancing exim or sendmail?