Defense Against the Dark Arts: Week 8

This week’s lectures focus on web and email security and were presented by Eric Petersen of McAfee.


  • Spam: Unsolicited email messages sent in bulk.
  • Spamtrap: A honeypot, usually in the form of an email address, that’s used to collect spam.
  • Botnet: A collection of internet-connected devices, each of which is running bots. Can be used for attacks like a distributed denial of service attack.
  • Snowshoe spam: A strategy in which spam sending is spread out over domains and IP addresses to weaken filters.
  • Phishing: Emails sent in hopes of tricking users into revealing personal information.
  • Spear phishing: As in the above, except targeted to particular individuals.
  • Realtime Blackhole List: A list of IP addresses from which spam originates. Useful for blocking.
  • Heuristics: Using common-sense rules drawn from experience.
  • Bayesian logic: Using the knowledge of prior events to predict future events.
  • Fingerprinting: Taking a large item and mapping it to a short string. Like hashing.

General email classification

There are two main strategies when it comes to blocking spam, as defined above. The first strategy focuses on the reputation of the sender, and the second strategy focuses on the content of the message.


The sender can be evaluated based on a variety of metrics. These include the IP address and the URL from which the email address originated from. Once this information is known, it can be compared against lists like the Realtime Blackhole Lists described above. Needless to say, messages originating from known bad actors are more likely to be spam and can be blocked or filtered.


Spam can also be filtered based on the contents of the message. This can range from simple keyword checking (for example, filtering all messages that contain the word “viagra” more than twice) to more complex analysis employing regular expressions, messages attributes, or combinations thereof.


Each lecture presents a variety of tools which might be useful in carrying out security research. This week’s tools included the following:

  • DIG: Command-line tool for investigating DNS records.
  • WHOIS: For searching IP/Domain registration information.
  • GREP, SED, AWK, etc.: For data parsing and manipulation.
  • Historical and current reputations based on McAfee data.
  • Authoritative source on reputation data.

Email analysis


SMTP stands for Simple Mail Transfer Protocol. It’s the internet standard for email transmission and uses port 25.


The headers provide information about the email, like who sent it. Unfortunately, SMTP wasn’t designed as a secure protocol and it’s very easy to spoof things like the sender of the message.


The labs focus on searching through a PostgreSQL database containing 100,000 messages. Our directive was to use regex to search strings in order to classify messages as being spam or not spam. This is similar to the lab from last week in which we were classifying URLs.

One tool that I find very helpful for writing regular expressions is regexr. I take a sample of what I’m working on, put it in the text box, and then I can see in real-time what my regular expression is matching.


Spam, both printed and digital, is pervasive. Worse, it costs us billions of dollars every year. Figuring out how to stop it is a worthwhile effort.

These efforts are centered around two things: accurate classification, which depends on knowing the data, and automation, which allows us to act on it faster and more efficiently.