At 2:49 AM -0500 3/11/03, Valdis.Kletnieks@vt.edu wrote:
Yes and no. Country restrictions are per-user, so the blacklist needs to be multi-national. On the other hand, it's generated primarily based on spotting spam that is getting through from fixed sources, and that comes from user feedback. So there is a bias towards countries that people usually receive email from.On Mon, 10 Mar 2003 18:39:57 EST, Kee Hinckley said:I currently have a sample database 22,000 confirmed spam messages sent to roughly 200 real email accounts. 40% blocked by the country restriction. 4% blocked due to obvious viruses. 14% blocked due to system blacklist. <1% blocked by user blacklists. There's less than three percent overlap between those factors. TheActually, there's a hidden assumption here that means that there's a lot MORE than 3% overlap. Your 14% system blacklist refers to a blacklist that was tailored thinking "and this list doesn't include anything from .XY because we country-restrict them already".
Initially the false positive rate starts a little high until the user tunes their filters by specifying which countries they regularly get email from. They can also approve a sender even if we are blocking the sender system wide--user rules win.What percent of mail was tagged with the country restriction but *NOT* tagged as spam by users? (For instance, it would be quite easy to flag
They'd okay mail from China, and then possibly have to okay senders from some Chinese ISPs on a per-user basis. (E.g. I'm not sure, but I suspect that 163.net is on our blacklist, even though they are legit ISP.)all mail from .CN as spam - and although my users would probably tag back 100% of the spam from .CN, they'd not tag 100% of the mail from .CN, as
Although users can pre-establish a blacklist, they tend not to. Instead we let them blacklist a sender at the time they report a false negative (spam got through). 1% is the number of subsequent messages blocked by those blacklists.Is the "user blacklist" number the percentage caught by pre-established user filters, or is that saying that your other checks were 99% effective in identifying spam and only 1% got through to users for them to report?
This gets back to the problem I mentioned earlier on the list. You can't trust users to check the email sitting in their blocked queue.Do you have any guesstimates of how much *unreported* spam got through to the 200 accounts?