[Maia-users] PDF spam solutions
Robert LeBlanc
rjl at renaissoft.com
Mon Aug 13 14:41:19 PDT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Like the rest of you, I'm sure, I've been receiving a glut of PDF spam
lately, and I've been experimenting with various tactics for curbing the
onslaught. Some tactics work better than others, naturally, so I
thought I'd share my results here.
(1) SpamAssassin core rules
To deal with PDF spam, the SpamAssassin developers added a new core rule
called TVD_PDF_FINGER01, which identifies emails that have empty bodies
but contain PDF attachments. It works well, but its default score of
1.0 is too low to make it the only tool for the job. Increasing the
score isn't really a good idea, though, since a lot of business users
regularly send PDF attachments with empty mail bodies, and this could
lead to false positives in a hurry.
You can certainly get this new rule for any version of SpamAssassin
(newer than 3.1.1) using sa-update, but now that the 3.2.x series
appears to have stabilized I'd also recommend that you upgrade to 3.2.3
to take advantage of the latest rulesets.
(2) PDFInfo plugin
Available from <http://www.rulesemporium.com/plugins.htm>, this plugin
is a step better in that it tries to identify specific PDF spams by
their characteristics--image dimensions, number of images in the file,
image-to-text ratio, filename, and meta-information (e.g. author,
creator, creation/modified date, etc.), as well as fuzzy hashes of the
file itself.
The downside is that it's /too/ specific, and that requires you to
download new versions of the pdfinfo.cf file whenever new signatures are
added, because every new signature is a new rule. This makes the plugin
very nice for catching PDF spam that's already circulating, but it's not
effective at catching new variants, and updating it is awkward.
(3) PDFText plugin
The PDFText plugin uses the pdftotxt and pdfinfo utilities from the xpdf
package to try to extract the text and meta-information from PDF files,
so that they can then be subjected to pattern-based tests for spammy
content. Two versions are currently available:
For SpamAssassin 3.1.x:
<http://www.mail-archive.com/users@spamassassin.apache.org/msg45465.html>
For SpamAssassin 3.2.x:
<http://www.mail-archive.com/users@spamassassin.apache.org/msg45494.html>
Unfortunately this plugin is still a very early alpha--proof-of-concept,
really--and needs a considerable amount of polishing before it could
really be recommended for production use. It also relies on its own
wordlist for scoring, rather than making the discovered text available
to the full battery of SpamAssassin rules, but the author is apparently
working on that, along with experimental support for using GOCR to scan
the images in PDF files.
(4) FuzzyOCR plugin
There's been some discussion about FuzzyOCR's potential role in catching
PDF spam--at least the PDF spam that incorporates images. The plugin's
author is reluctant at best: "actually, I will not try to scan PDFs, the
risk of false positives is too high and PDFs do not have a future for
spammers (in my opinion) as most clients do not display them directly.
Sending PDFs is only a desperate try of spammers to circumvent image
scanners, but I don't think this will be the new "trend", neither do I
think that this kind of spam has any future or success, like image spam
has."
That said, he seems to have relented under the pressure, and some basic
support for this was added recently to the svn version with a lot of
disclaimers ("highly experimental and disabled by default", "Enable this
at your own risk, this might lead to false positives and classify
important documents as spam. YOU HAVE BEEN WARNED.").
Since you need to be using the svn version of FuzzyOCR if you're running
SpamAssassin 3.2.x anyway, you may wish to experiment with the
PDF-scanning support, since it won't cost you any resources you aren't
already spending. If you're /not/ using FuzzyOCR, though, I wouldn't
advise installing it just to solve the PDF spam problem.
(5) Custom rules
Eric A. Hall posted a custom ruleset recently to the SpamAssassin-Users
list that uses the AWL to determine whether the sender of a binary
attachment (major MIME-type of application, image, audio, video, or
model) has sent the recipient mail before. If this is the first email
the recipient has ever received from this sender, and it contains such
an attachment, it gets penalized accordingly for coming from a stranger.
You need to have the MIMEHeader plugin installed, but this is included
by default in the newer SpamAssassin 3.2.x series. The ruleset can be
added easily to your local.cf file:
ifplugin Mail::SpamAssassin::Plugin::MIMEHeader
mimeheader __L_C_TYPE_APP Content-Type =~ /^application/i
mimeheader __L_C_TYPE_IMAGE Content-Type =~ /^image/i
mimeheader __L_C_TYPE_AUDIO Content-Type =~ /^audio/i
mimeheader __L_C_TYPE_VIDEO Content-Type =~ /^video/i
mimeheader __L_C_TYPE_MODEL Content-Type =~ /^model/i
meta L_STRANGER_APP (!AWL && __L_C_TYPE_APP)
score L_STRANGER_APP 1.0
tflags L_STRANGER_APP noautolearn
priority L_STRANGER_APP 1001 # defer till after AWL
describe L_STRANGER_APP Application file sent by a stranger
meta L_STRANGER_IMAGE (!AWL && __L_C_TYPE_IMAGE)
score L_STRANGER_IMAGE 1.0
tflags L_STRANGER_IMAGE noautolearn
priority L_STRANGER_IMAGE 1001 # defer till after AWL
describe L_STRANGER_IMAGE Image file sent by a stranger
meta L_STRANGER_AUDIO (!AWL && __L_C_TYPE_AUDIO)
score L_STRANGER_AUDIO 1.0
tflags L_STRANGER_AUDIO noautolearn
priority L_STRANGER_AUDIO 1001 # defer till after AWL
describe L_STRANGER_AUDIO Audio file sent by a stranger
meta L_STRANGER_VIDEO (!AWL && __L_C_TYPE_VIDEO)
score L_STRANGER_VIDEO 1.0
tflags L_STRANGER_VIDEO noautolearn
priority L_STRANGER_VIDEO 1001 # defer till after AWL
describe L_STRANGER_VIDEO Video file sent by a stranger
meta L_STRANGER_MODEL (!AWL && __L_C_TYPE_MODEL)
score L_STRANGER_MODEL 1.0
tflags L_STRANGER_MODEL noautolearn
priority L_STRANGER_MODEL 1001 # defer till after AWL
describe L_STRANGER_MODEL Model file sent by a stranger
endif
(6) SaneSecurity signatures
If you use ClamAV (you do, don't you?), another option is to use the
phishing and scam signatures published by SaneSecurity
<http://www.sanesecurity.co.uk/clamav/>. These signatures are updated
multiple times a day, and include a lot of PDF spam, making it perhaps
the most responsive solution available at the moment.
These phishing/scam emails get caught by ClamAV rather than
SpamAssassin, so they show up in Maia's "Viruses/Malware" quarantine
instead of the spam quarantine, which is a bit annoying, but that's
something I'll be working to address in future versions.
I can't argue with the effectiveness of SaneSecurity's signatures,
though--they are by far the most effective blockers of PDF spam that
I've found, and I would strongly recommend that you use them.
(7) Other plugins
While rules and plugins that target PDF spam specifically are very
useful, it's worth noting that the bulk of the PDF spam comes from
botnets, so adding the Botnet plugin
<http://people.ucsc.edu/~jrudd/spamassassin/> can catch a lot of these
things on its own, and it provides a nice score supplement to go along
with the PDF-specific rules. The latest version is 0.8, and it just
needs one small patch (courtesy of Mark Martinec):
- --- Botnet.pm.orig Mon Aug 6 15:59:16 2007
+++ Botnet.pm Mon Aug 6 16:02:43 2007
@@ -711,5 +711,14 @@
(defined $max) &&
($max =~ /^-?\d+$/) ) {
- - $resolver = Net::DNS::Resolver->new();
+ $resolver = Net::DNS::Resolver->new(
+ udp_timeout => 5,
+ tcp_timeout => 5,
+ retrans => 0,
+ retry => 1,
+ persistent_tcp => 0,
+ persistent_udp => 0,
+ dnsrch => 0,
+ defnames => 0,
+ );
if ($query = $resolver->search($name, $type)) {
# found matches
@@ -834,5 +843,14 @@
my ($ip) = @_;
my ($query, @answer, $rr);
- - my $resolver = Net::DNS::Resolver->new();
+ my $resolver = Net::DNS::Resolver->new(
+ udp_timeout => 5,
+ tcp_timeout => 5,
+ retrans => 0,
+ retry => 1,
+ persistent_tcp => 0,
+ persistent_udp => 0,
+ dnsrch => 0,
+ defnames => 0,
+ );
my $name = "";
- --
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFGwM//GmqOER2NHewRAhqDAKCRY5U7T4hgl3yj928ajM8KuceI2wCfYESS
25zC3NMEDVmcUaEJw9En4A8=
=zjNR
-----END PGP SIGNATURE-----
More information about the Maia-users
mailing list