As we add complexity to the parsing process, we also need to think about performance. Not every message requires the same degree of analysis. I suggest that the flow looks like this:
- Perform authentication tests so that whitelisting can be done accurately
- If any test triggers a whitelisting result, stop filtering
- After whitelisting is ruled out, if any test produces a blacklist result, stop filtering
- After whitelisting and blacklisting are ruled out, if any test produces quarantine, stop filtering.
- After all tests are processed with none of these dispositions, deliver the message.
A bit of philosophy relates to this sequence: Content filtering is error-prone, and hostile content indicates a hostile sender who needs to be blocked. So I argue that suspicious content should always send to quarantine, so that the false positives can be allowed and the true positives can lead to sender block rules. Blocks should only be applied to certainty, which is based on sender reputation, and can be determined early in the filtering process. Collectively, this means that the first quarantine result is the only one that I need.
Back to the appeal of this service:
Traditional content filtering is inefficient because it checks every filter rule against every message. The cost increases linearly with the number of checks being performed, so the feasible number of checks is measured in 10s rather than 1000s. Cost also increases as the size of the message being checked. This service is interesting because cost is driven by the number of extracted content items, which is limited, not by the number of known threats, which may be very large.
Nonetheless, it could be a big waste of time to do this full suite of tests on highly trusted messages. Most or all of the threats come from less than 10% of all messages (particularly messages from previously unknown senders). Everything related to message filtering is improved if you know your prior senders from new senders.