[Asrg] DNSBL comparison
Daniel Feenberg <feenberg@nber.org> Tue, 08 March 2005 02:18 UTC
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA08277 for <asrg-web-archive@ietf.org>; Mon, 7 Mar 2005 21:18:58 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D8ULa-0006wL-Un for asrg-web-archive@ietf.org; Mon, 07 Mar 2005 21:21:23 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D8UGS-0000BJ-Pr; Mon, 07 Mar 2005 21:16:04 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D8UGR-0000BD-2q for asrg@megatron.ietf.org; Mon, 07 Mar 2005 21:16:03 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA08145 for <asrg@ietf.org>; Mon, 7 Mar 2005 21:16:00 -0500 (EST)
Received: from mail1.nber.org ([66.251.72.7]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D8UIj-0006t9-Ie for asrg@ietf.org; Mon, 07 Mar 2005 21:18:26 -0500
Received: from mail1 (root@localhost) by mail1.nber.org (8.12.8/8.12.8) with SMTP id j282Fr0g017723 for <asrg@ietf.org>; Mon, 7 Mar 2005 21:15:53 -0500
Received: from nber6.nber.org (nber6.nber.org [66.251.72.76]) by mail1.nber.org (8.12.8/8.12.8) with ESMTP id j282Fq8n017705 for <asrg@ietf.org>; Mon, 7 Mar 2005 21:15:52 -0500
Date: Mon, 07 Mar 2005 21:15:52 -0500
From: Daniel Feenberg <feenberg@nber.org>
To: asrg@ietf.org
Message-ID: <Pine.GSO.4.10.10503072109250.14669-100000@nber6.nber.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 6ba8aaf827dcb437101951262f69b3de
Subject: [Asrg] DNSBL comparison
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/asrg>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
Sender: asrg-bounces@ietf.org
Errors-To: asrg-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 827a2a57ca7ab0837847220f447e8d56
The ASRG has not had too much actual research posted to it in its short life, and perhaps this is too late to have any effect, but for what it is worth, here is a quantitative comparison of 16 well known DNSBLs (including MAPS, which as a subscription service usually escapes examination). I know many people here will disagree, but I have long felt that the DNSBL is the long-run best approach to spam suppression, and this test does not discourage me in that belief. Quantitative Evaluation of DNSBLs Here are counts of messages that would be blocked by each of 16 DNSBLs, out of 86,252 messages to actual users at nber.org during the last week of February 2005. Messages to non-existent mailboxes are ignored, as they don't actually inconvenience users. Unlike some seemingly similar charts, we have queried all the lists for every message, so the consultation order doesn't affect the result. Lists are queried within a few seconds of mail receipt. Table 1 Rejection rates by Blocklist (higher numbers are better) R e j e c t e d % # Blocklist 65.54 56,529 t1.dnsbl.net.au 50.23 43,324 blackholes.five-ten-sg.com 49.19 42,424 sbl-xbl.spamhaus.org 44.67 38,529 xbl.spamhaus.org 44.26 38,175 cbl.abuseat.org 41.87 36,114 dnsbl.sorbs.net 38.80 33,465 rbl-plus.mail-abuse.org 35.60 30,710 bl.spamcop.net 32.21 27,784 unconfirmed.dsbl.org 31.83 27,457 list.dsbl.org 31.17 26,885 dsbl.dnsbl.net.au 24.60 21,216 no-more-funn.moensted.dk 16.94 14,612 bl.csma.biz 12.48 10,765 combined-hib.dnsiplists.completewhois.com 12.04 10,381 dnsbl.njabl.org 7.89 6,806 l1.spews.dnsbl.sorbs.net So t1.dnsbl.net.au looks attractive - it blocks 66% of inbound mail, compared to mail-abuse.org (MAPS) which we currently subscribe to and which blocks only 39% of incoming mail. But what about false positives? We don't have any accurate way of counting incorrectly rejected messages (there are essentially no complaints), and no way to make users cooperate in a mass identification, so we decided to take the list of 1,473 persons invited to our conferences over the last year or so, and check the MX servers for their addresses. If many of them were blocked, that would be a red flag indicating that a blocking list was overly enthusiastic. We realize that some ISPs may use separate servers for incoming and outgoing mail, so the estimate of blocked servers will be low, but hopefully not biased among the various DNSBLs. Our conference participants are Ph.D. economists at universities and government agencies - we expect that they are less likely than average to be blacklisted, but they are representative of our most important (to us) correspondents. These are real people well known to us and with correct addresses. Table 2 MX hosts of Actual Correspondents (lower numbers are better) Listed MX hosts list name % # 3.18 169 unconfirmed.dsbl.org 1.32 70 blackholes.five-ten-sg.com 0.81 43 bl.csma.biz 0.62 33 no-more-funn.moensted.dk 0.55 29 l1.spews.dnsbl.sorbs.net 0.53 28 t1.dnsbl.net.au 0.38 20 rbl-plus.mail-abuse.org 0.30 16 dnsbl.sorbs.net 0.24 13 combined-hib.dnsiplists.completewhois.com 0.23 12 dnsbl.njabl.org 0.04 2 list.dsbl.org 0.04 2 dsbl.dnsbl.net.au 0.02 1 sbl-xbl.spamhaus.org 0.00 0 cbl.abuseat.org 0.00 0 bl.spamcop.net So 28 (.7%) of our list of participants would be unable to write us if we use T1 as our blocking list, while MAPS does a bit better - blocking only 20 (.4%) participants. In spite of its controversial reputation, Spamcop does not seem aggressive in this test, with none of our correspondents blocked. Claims by supporters of anti-spam methodologies of very low false positive rates should be taken with a grain of salt. Any technique will have a low rate for its developer, but legitimate mail is much more varied than spam, and casual users are much less proficient at tuning anti-spam engines. So others will rarely match the near perfect record nearly all techniques advertise. Furthermore, the denominator of the error rate will include multiple messages from correspondents whose messages are correctly accepted, but rejected correspondents presumably don't write back after being ignored once. This leads to a unrealistically small quoted error rate. Our measure (which admittedly has other defects) doesn't have that problem, since each correspondent is counted only once. There are five lists above with very low false positive rates and all of these have rejection rates in the 30-50% range. Apparently to get better spam control we would have to accept a significant number false positives. However, Spamhaus looks like a good compromise - blocking 50% of all mail, but only .02% of good addresses. On the Spamhaus web page, they suggest that the list should block 65% of spam. This is consistent with the numbers above if one third of all mail is good mail. I have other charts (not included here) showing the effects of all possible combinations of the 15 lists. It is a lot of data, but in brief - combining two of the better lists is a lot like taking the higher number from each DNSBL, and therefore isn't desirable. You might hope it would be the sum - no such luck. It would protect against a DDOS against one of the DNSBLs, but that is not a real problem for DNSBL users. We also have runs where the DNSBLs were consulted several days after the mail was presented, and the blocking rates are substantially lower. This was a surprise to us, since the rationale for removing an address is not obvious. There is a statistical principle which says that if your detector detects only a small fraction of events changes in the observed event rate are more likely changes in the detection rate than changes in the underlying event rate. That would suggest that removing an address just because it hasn't mailed to a spamtrap lately is probably not justified. I want to add that we prefer the DNSBL approach to spam control, compared to content analysis, because we don't feel comfortable dropping messages on the floor. With a DNSBL it is quite easy to reject a message, and all false positives will be returned to the sender (rather than disappearing into the ether). With content analysis, it isn't so easy to reject a message, and we don't believe that delivery to a spam folder is much help to the user. Of course, sending a bounce to the (usually) forged return address is out of the question. It is also true, that content analysis reduces the pressure on ISPs to discourage outbound spam, which we are loath to do. Various sender authorization techniques also have that disadvantage. My thanks to Alex Aminoff for Perl programming and John Reid (of Spamhaus) for suggesting that we check the lists immediately upon receipt of the messages rather than waiting several days. Daniel Feenberg NBER feenberg isat nber dotte org 5 March 2005 _______________________________________________ Asrg mailing list Asrg@ietf.org https://www1.ietf.org/mailman/listinfo/asrg
- [Asrg] DNSBL comparison Daniel Feenberg