When BSD meet Linux and Windows: Curbing Image/PDF spam : SpamAssassin

A lot of spam image/PDFs were slipping through my office MXs since this spamming technique has gained its popularity and it was getting really out of hands. I have decided to put an end to this madness and experimented various tactics to curb image/PDF spam. Generally, this can be achieved with spam scoring from SpamAssassin or clamav via Sanesecurity’s Phishing and Scam Signatures for ClamAV.

On this post, I will share some of the tactics that I have tried with SpamAssassin. With SpamAssassin, fighting image/PDF spam was trivial.

SpamAssassin rules

A) Built-in ruleset

TVD_PDF_FINGER01, which looks for mail matches standard pdf spam fingerprint (emails that have empty bodies
but contain PDF attachments), was added by the SpamAssassin developer. It works well by add 1.0 mark to PDF spam. However, this is too low to effectively catch PDF spam as threshold for tagging spam commonly stands at 5.0 - 10.0. Increasing the
score is a bad idea since a lot of lazy users regularly send PDF attachments with empty mail bodies, and this could lead to false positives.

B) Custom rulesets

This one goes to Ditesh as he wanted to further tighten his server by blocking attachment from stranger. I would suggest to use this ruleset with higher scoring. (Blocking is not a good idea). This custom ruleset was posted by Eric A. Hall on the SpamAssassin-Users
list recently. It uses the AWL to determine whether the sender of a binary
attachment is a stranger (Image/PDF spammers, of course, are strangers to you. ;-)). As MIMEHeader is included
by default in the SpamAssassin 3.2.x series, you can just happily add the ruleset to your local.cf.

ifplugin Mail::SpamAssassin::Plugin::MIMEHeadermimeheader __L_C_TYPE_APP Content-Type =~ /^application/i
mimeheader __L_C_TYPE_IMAGE Content-Type =~ /^image/i
mimeheader __L_C_TYPE_AUDIO Content-Type =~ /^audio/i
mimeheader __L_C_TYPE_VIDEO Content-Type =~ /^video/i
mimeheader __L_C_TYPE_MODEL Content-Type =~ /^model/i
meta L_STRANGER_APP (!AWL && __L_C_TYPE_APP)
score L_STRANGER_APP 1.0
tflags L_STRANGER_APP noautolearn
priority L_STRANGER_APP 1001 # defer till after AWL
describe L_STRANGER_APP Application file sent by a stranger
meta L_STRANGER_IMAGE (!AWL && __L_C_TYPE_IMAGE)
score L_STRANGER_IMAGE 1.0
tflags L_STRANGER_IMAGE noautolearn
priority L_STRANGER_IMAGE 1001 # defer till after AWL
describe L_STRANGER_IMAGE Image file sent by a stranger
meta L_STRANGER_AUDIO (!AWL && __L_C_TYPE_AUDIO)
score L_STRANGER_AUDIO 1.0
tflags L_STRANGER_AUDIO noautolearn
priority L_STRANGER_AUDIO 1001 # defer till after AWL
describe L_STRANGER_AUDIO Audio file sent by a stranger
meta L_STRANGER_VIDEO (!AWL && __L_C_TYPE_VIDEO)
score L_STRANGER_VIDEO 1.0
tflags L_STRANGER_VIDEO noautolearn
priority L_STRANGER_VIDEO 1001 # defer till after AWL
describe L_STRANGER_VIDEO Video file sent by a stranger
meta L_STRANGER_MODEL (!AWL && __L_C_TYPE_MODEL)
score L_STRANGER_MODEL 1.0
tflags L_STRANGER_MODEL noautolearn
priority L_STRANGER_MODEL 1001 # defer till after AWL
describe L_STRANGER_MODEL Model file sent by a stranger
endif

PDFInfo

Grab PDFInfo.pm and pdfinfo.cf from PDFInfo plugin site. Place pdfinfo.cf in the SpamAssassin’s configuration directory (/usr/local/etc/mail/spamassassin/) and PDFInfo.pm in the SpamAssassin plugin directory (/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Plugin/). To load the plugin, you should add loadplugin Mail::SpamAssassin::Plugin::PDFInfo to init.pre (or v310.pre). Alternatively, you could use loadplugin Mail::SpamAssassin::Plugin::PDFInfo /path/to/your/plugin for placing PDFinfo.pm file in directory other than your SpamAssassin plugin directory. With that in place, you restart your Spamassassin and verify that PDFInfo plugin was loaded properly with debug output from Spamassassin

spamassassin --lint -D

You should get similar lines as below:-

[32487] dbg: config: read file /usr/local/etc/mail/spamassassin/pdfinfo.cf
[32487] dbg: plugin: loading Mail::SpamAssassin::Plugin::PDFInfo from @INC

FuzzyOcr

I’ve installed FuzzyOcr plugin from the FreeBSD ports. /usr/ports/mail/p5-FuzzyOcr-devel/ FuzzyOcr development is recommended as stable release was way too old. It’s easy to maintain. However, manual installation is relatively easy as the tarball contains FuzzyOcr pearl module plugin, configure files and some sample test Image/PDF test mails. Just copy FuzzyOcr.cf and FuzzyOcr.words to the SpamAssassin’s configuration directory (If you installed from ports, the configuration file is located at /usr/local/share/examples/FuzzyOcr/. I created a directory in /var/db called “fuzzyocr” for all FuzzyOcr database and words list. My configuration file looks like this:-

focr_enable_image_hashing 2
focr_global_wordlist /var/db/fuzzyocr/FuzzyOcr.words
focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile, $ocrad -s 0.5 -T 0.5 $pfile
focr_digest_db /var/db/fuzzyocr/FuzzyOcr.hashdb
focr_db_hash /var/db/fuzzyocr/FuzzyOcr.db
focr_db_safe /var/db/fuzzyocr/FuzzyOcr.safe.db
focr_hashing_learn_scanned 1

Again verify if the plugin is loaded properly in spamassassin.
Other tactics

There are other tactics of fighting Image/PDF spam which I have not tried. As I’m aware of at this point of writting; PDFText and botnet plugin with patch.
CONCLUSIONS

There has been a lot of discussion/experience sharing on SpamAssassin-users and Maia-users list. One notable comment/experience (with the title : [Maia-users] PDF spam solutions) was posted by Robert LeBlanc on Maia-users list. It is comprehensive enough to give you an edge of fighting image/PDF spam. Nevertheless, new spam tactics are evolved day by day. Who knows we might be seeing M$ word / powerpoint spam soon.

When BSD meet Linux and Windows

Tuesday, November 11, 2008

Curbing Image/PDF spam : SpamAssassin

No comments:

Blog Archive

About Me

Info Sharing