Re: [Asrg] Unique innovations made to anti-spam system
Richard Clayton <richard@highwayman.com> Mon, 23 January 2006 20:30 UTC
Received: from localhost.cnri.reston.va.us ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1F18Ka-0005qV-Je; Mon, 23 Jan 2006 15:30:28 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1F18KZ-0005qQ-LD for asrg@megatron.ietf.org; Mon, 23 Jan 2006 15:30:27 -0500
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA15832 for <asrg@ietf.org>; Mon, 23 Jan 2006 15:28:56 -0500 (EST)
Received: from anchor-post-35.mail.demon.net ([194.217.242.85]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1F18Ty-0003Kh-NA for asrg@ietf.org; Mon, 23 Jan 2006 15:40:11 -0500
Received: from gti.noc.demon.net ([195.11.55.101] helo=happyday.al.cl.cam.ac.uk) by anchor-post-35.mail.demon.net with esmtp (Exim 4.42) id 1F18Cz-000OFJ-I0 for asrg@ietf.org; Mon, 23 Jan 2006 20:22:37 +0000
Message-ID: <j97rJ1iUyT1DFAYH@highwayman.com>
Date: Mon, 23 Jan 2006 20:29:08 +0000
To: asrg@ietf.org
From: Richard Clayton <richard@highwayman.com>
Subject: Re: [Asrg] Unique innovations made to anti-spam system
References: <cb84d2fe0601212215i5094f589leef0e29026d5cdcd@mail.gmail.com> <20060122142951.86098.qmail@simone.iecc.com> <cb84d2fe0601220733k592b1e5dn5fedb2035490a403@mail.gmail.com> <1060122182623.ZM26383@candle.brasslantern.com> <cb84d2fe0601221217n60347477i1cc7a4a52a3449a4@mail.gmail.com> <1060122220629.ZM26661@candle.brasslantern.com> <cb84d2fe0601221505r7562a9c4o4ff39785c23386b3@mail.gmail.com>
In-Reply-To: <cb84d2fe0601221505r7562a9c4o4ff39785c23386b3@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Mailer: Turnpike Integrated Version 5.02 M <3t9$+j0b77$LvOKL9+c+dO7nrt>
X-Spam-Score: 1.3 (+)
X-Scan-Signature: 3f3e54d3c03ed638c06aa9fa6861237e
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by ietf.org id PAA15832
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/asrg>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
Sender: asrg-bounces@ietf.org
Errors-To: asrg-bounces@ietf.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In article <cb84d2fe0601221505r7562a9c4o4ff39785c23386b3@mail.gmail.com> , Michael Kaplan <michaelkaplanasrg@gmail.com> writes > On 1/22/06, Bart Schaefer <schaefer@brasslantern.com> wrote: >> On Jan 22, 3:17pm, Michael Kaplan wrote: >> Many reputable businesses send very large volumes of email. If >> it is >> economically infeasible for spammers to decode the CAPTCHAs, why >> do you >> believe it will be feasible for other businesses? is it infeasible ? Where is the evidence ? I suggest spammers don't decode CAPTCHAs because they are not yet widely employed... so there's no point. As it happens, I think they are missing out unnecessarily... I think the main difficulty in dealing with CAPTCHAs is more the wide range of systems offering them, rather than an inherent difficulty in solving what is on offer today. I've recently been receiving a lot of C-R response email (the Pharmacy guys seem to like using my domain for their junk)... and so I have started looking at how easy it would be to process automatically. I'm currently corresponding with a handful of C-R users who object to my responding to the challenges ... apparently they don't think I'm behaving myself in arranging for them to read Pharmacy spam which they are too lazy to filter for themselves :( One even reported me to my own abuse@ address ! Anyway, a lot of the C-R's I am currently receiving merely require 3rd Grade reading skills and the ability to reply to the email. These could be trivially automated since there is no perceptible variation in the text that is presented :( Some websites provide the challenge as text embedded in the page -- and that is ever so easy to move to the POST response. Most websites provide simple images that are trivial to process (there's several other researchers breaking the trivial ones on a regular basis, try Google -- at least one of the breakers is selling a service to soup up your CAPTCHAs using the knowledge they've got from breaking others) There was a paper at last year's CEAS showing that the hard part of breaking text CAPTCHAs was the glyph separation -- after that computers were better than humans at distinguishing mangled shapes! Strong CAPTCHAs are currently the exception. However, if I was going to process a lot of them I think I'd automate as far as possible and then spend my money in the Third World... But first don't forget to allow for stupidity -- a large ISP [from which I have received several hundred C-R emails] has some pretty strong looking text-based CAPTCHAs ... unfortunately they only have 30 of them! so it's easy to provide a dictionary of responses :-( I'm currently trying to reverse-engineer how they select amongst the 30 because that would make it even quicker to respond ! [BTW Kaplan's website has most of this information (though not the story about the ISP with only 30 images). I also note that his CAPTCHAs are not text based. I'd need to do some more work to comment as to whether his stick-figures are genuinely harder to solve. They looked as if they made some cultural assumptions that might not travel well.] > On my website I assume that the spammer would spend a tenth of a > cent to manually decode a CAPTCHA and I demonstrate how this would > be a crippling expense. Just to be clear -- the tenth of a cent is the right sort of number. A primary (grades 1-4) school headmistress in a Tamil Nadu (in rural India), earns about $15 a week and is solidly in the middle classes. The particular person I was told of (a colleagues relation) owned a nice home (worth maybe $6000) in a salubrious leafy suburb. So one could get appropriate skills for about $10 or so a week [labour rates are higher for towns with broadband]. For a 50 hour week that means you're paying about 20 cents an hour. I've never tried solving CAPTCHAs at speed, so I couldn't predict how fast I could do them for hours on end. But it looks to me that the cost is definitely going to be in fractions of a cent/solution. Of course you need to add in the cost of the connectivity and the kit ($100 laptops anybody?) but people who think of CAPTCHAs in terms of the hourly charging rate of their attorney (or plumber!) are entirely missing the point. OK... so let's look at Kaplan's analysis which initially assumes that it doesn't matter if all CAPTCHAs are broken for free: His assumptions are: 1 The email service provider can filter 95% of spam. 2 The CAPTCHA is broken 100% of the time. 3 The spammer has a 3:1 ratio of bogus to real email addresses. 4 A spammer sends 100 million emails using a valid return address. 5 Users click on a "This Is Spam" button when spam arrives. He then shows that the spammer has to send 1600 emails to get one spam to its destination. This sum is based on a key assumption which I think is incorrect. He assumes that the spammer sends 1600 emails, and just 80 get through the filter. This is not inconsistent with measured values for filters in the real world. So far so good. He then assumes that the spammer solves the 80 CAPTCHAs and resends. This then results in a further attrition of 95% (ie 4 get through) and only then is it discovered that 3 are bogus addresses and the final 1 is delivered. However, this is dumb by the spammer (and/or magical by the filter). Why does the filter suddenly improve when the email is sent for the second time (viz: it starts to discard 95% of the email that it approved earlier ?). Or -- same idea but different: why does the spammer send something that is filterable at the first stage ? It seems to me that the scheme (which is just filtering and nothing to do with CAPTCHAs at this point) only ensures that the spammer must send 80 emails to get one delivered. (ie: it's 20x worse than Kaplan proposes). Kaplan has a second sum Assumptions: 6 A spammer must pay $0.001 per manually solved CAPTCHA 7 The spammer wants to successfully deliver one million spam per day He then calculates a cost of $80,000/day for getting the one million spam emails delivered. However, with the adjustment to the sums that I suggest is more reasonable [not assuming that the filtering of the two stages is independent] then to deliver one million spams then 80 million emails must be sent and 4 million CAPTCHAs must be solved, costing $4,000/day Kaplan multiplies his number by 365 to make it sound even bigger, but this just obscures things.... ... the question is whether the expense of solving the CAPTCHAs can be afforded by the spammer. Note that sending the emails is essentially free -- the spammer will use zombies to send the emails via innocent (insecure) end users, so there's no costs for electricity of bandwidth to worry about. There's some consensus around a response rate (up to a couple of years ago) of about 0.003% for spam (these figures come via journalists from Laura Betterley and the Iraqi playing cards interviews). So the 1 million delivered maps to about 30 customers a day. This means that you'd need a profit margin of about $133 per sale to make spamming worthwhile. That's quite a lot (though if you're selling fake pills or Rolls Royces then you might still press on). However, even in Betterley's day there was some filtering and spam discarding going on -- so we're not comparing like with like. The one million spams are DELIVERED SPAM -- ie: they have got through the filters and are sitting there waiting to be opened by the gullible. If we assume the 0.003% came from a time when filters were 50% effective (ie only about 50% of people had any) then the profit margin necessary drops to $62/sale. That's still a lot -- but if the spammer weeds out their list better (only 25% valid addresses isn't too brilliant) then the required profit margin would drop again. Also [and this is key to spammer success], if they improved their message (hire a Madison Avenue executive to teach them how to make their advert more compelling) then the abysmal response rate would rise [[for example, the Iraqi playing cards were 4 times more likely to be ordered than the spammers usual fare of pills and toner cartridges.]]. BUT this is the spammer working the way the system wants him to. Why on earth would he do that [even though he may do OK that way with high profit margin goods, it's still eating into his lifestyle] There's a much simpler approach that the clever spammer would take. Instead of solving CAPTCHAs to send spam, he would solve CAPTCHAs to acquire a valid sub-address. Once he had this, he would then send as many different pieces of spam as possible as fast as possible to this sub-address. He'd advertise pills and mortgages and anatomy enhancers and lotto winnings and poker sites and... etc. ((or he could just sell it to fellow spammers and they would send the spam...)) Viz: he'd get more than one email delivered per validated sub-address Clearly there are things that could be done to improve end-user software to counter this, but in the meantime, profitability would be restored. Bottom line is that I agree that the CAPTCHAs raise spammers costs, but I don't agree that they do anything more than freeze out low profit margin spam (and make the pills more likely to be fake). Even if challenge-response systems were perfect, I'd not be in favour because of the damage to innocent third parties. But they are not (on these assumptions) as effective as claimed :( Plus of course there are other objections as put forward by others, but I wanted to concentrate on the economics because I've written about these before (in the context of proof-of-work schemes) http://www.cl.cam.ac.uk/~rnc1/proofwork2.pdf and most of the analysis carries over just fine. > Let's assume that over the course of a year Amazon.com emails 10 > million customers. I'll say that 5% of these sub-addresses are > deactivated without the customers bothering to notify amazon. I'll > say that it costs Amazon 5 cents to decode a CAPTCHA (fifty times > as expensive as what I assumed the spammer would have to pay!). actually Amazon are experimenting with the Mechanical Turk... so they might be able to manage Third World rates :) http://www.mturk.com [ ah I see that Bart also spotted the double application of the filtering stage ] >> Further, I'd dispute that applying two 95%-effective spam >> filters has >> a net 99.75% success rate. > > Very well hmm... I think it needs more than that as a reply :( - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -----BEGIN PGP SIGNATURE----- Version: PGPsdk version 1.7.1 iQA/AwUBQ9U8lJoAxkTY1oPiEQLnHACdFqWzT0DPK8AJFjR78jcK2zwoh3EAnj2l Yg8Crojbjn9/6qgtf+d+q79D =p+iB -----END PGP SIGNATURE----- _______________________________________________ Asrg mailing list Asrg@ietf.org https://www1.ietf.org/mailman/listinfo/asrg
- [Asrg] Unique innovations made to anti-spam system Michael Kaplan
- [Asrg] Re: Unique innovations made to anti-spam s… Frank Ellermann
- Re: [Asrg] Unique innovations made to anti-spam s… John Levine
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… B. Johannessen
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… John Levine
- Re: [Asrg] Unique innovations made to anti-spam s… Peter J. Holzer
- Re: [Asrg] Unique innovations made to anti-spam s… Michael McConnell
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… der Mouse
- Re: [Asrg] Unique innovations made to anti-spam s… Danny Angus
- Re: [Asrg] Unique innovations made to anti-spam s… Peter J. Holzer
- Re: [Asrg] Unique innovations made to anti-spam s… Douglas Otis
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… Richard Clayton
- Re: [Asrg] Unique innovations made to anti-spam s… Peter J. Holzer
- Re: [Asrg] Unique innovations made to anti-spam s… Justin Mason
- Re: [Asrg] Unique innovations made to anti-spam s… Douglas Otis
- Re: [Asrg] Unique innovations made to anti-spam s… Peter J. Holzer
- [Asrg] Re: Unique innovations made to anti-spam s… Frank Ellermann
- [Asrg] Re: Unique innovations made to anti-spam s… Frank Ellermann
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- Re: [Asrg] Unique innovations made to anti-spam s… Richard Clayton
- Re: [Asrg] Unique innovations made to anti-spam s… Bart Schaefer
- Re: [Asrg] Re: Unique innovations made to anti-sp… Douglas Otis
- Re: [Asrg] Unique innovations made to anti-spam s… Michael Kaplan
- [Asrg] Misconceptions about SPF (was: Unique inno… Frank Ellermann
- Re: [Asrg] Misconceptions about SPF (was: Unique … Douglas Otis
- [Asrg] Re: Misconceptions about SPF Frank Ellermann
- Re: [Asrg] Re: Misconceptions about SPF Douglas Otis