Email Filtering - An Imposible quest?
Problem reported by Douglas Foster - Yesterday at 8:28 PM
Submitted
I am more discouraged than ever before about the challenges of email filtering.   One DNS label can be up to 64 characters, which works out to 10^100 ("one google") possibilities!    One google underneath ".com", another google underneath ".net', another google underneath "@outlook.com", another google underneath "@gmail.com", etc.   Infrastructure providers like Sendgrid.net and Appriver.com provide more ways for attackers to catch us by surprise.  At some point the whole notion of a block list falls apart because there are more addresses to list than any list can hold.   Maybe someday the automated generation of DNS domains breaks DNS itself.

I might add that country blocking is not as useful as I hoped.   I just finished matching our inbound mail to the MaxMind database, and the results were shocking.   We are not a multi-national business, but we receive email from servers all over the world.   I never know when Outlook.com will decide to contact me from Japan, so can I block Japan if the 3 messages received so far were unimportant?

And there is the capacity planning problem:   What happens if your inbound message volume increases 10-fold this year?   Nothing good.   Who determines if that happens?   The spammers, not you. 

I am weary of pretending that it is sufficient to block attackers after the first attacks sneak through my filters.   I am weary of pretending that I can protect my organization by hoping that someone else will be the first victim and that I will be protected by subscribing to the right BRL or the right filtering company.

The only feasible way to prevent infection is to have a short list of trusted senders, and to quarantine or block every thing else.   But evaluating a big quarantine is a big labor cost, and my mind was breaking today as I was working through our collection of new quarantine items.

My thoughts about spam filtering are not about features right now.   I have built a good set of features (because I could not buy them), and I admit to pride of authorship for having doing so.  But today it is about feasibility.   I have started grappling with the idea that I am trying to do the impossible.  Some spam gets through my filters every day.  Fortunately, my user base is pretty savvy.  But sooner or later, the spammers will win and I will lose, because my security strategy for email is to trust agents that are not vetted as being trustworthy.  From a security viewpoint that security posture is indefensible, except that everybody does it.
Brother, I hear you.  I am so irritated recently, and its not just the spammers, but also clients that ultimately shoot themselves in the foot with Google and gMail.
We have our server set to block IP Addresses that send too many emails to accounts that don't exist on our server - "Harvesting". 
I constantly get users complaining that they are not getting emails from gMail. So I try to figure out what IP Addresses gMail is using to send and see if "it" is blocked... well it is not an "it" IP address. This is the current list of all of the ip address RANGES that google uses.
I simply cant keep up. This list represents tens of millions of IP Addresses that google uses.

What has been happening is a domain admin deleted the email address of a person that was with the company for 25+ years. So there are thousands of people that have this persons email address and they were also part of multiple legitimate email lists from google groups.  
So now that email address does not exist on our servers, BUT all of these people and email lists still try to send emails to that person. SOOOOO many from gMail that gMail IP Addresses start getting banned by our server. And that ban is server wide, not just a single domain. 

So the list 30 or so non profits that we work with and provide hosting for are now not getting emails from various state agencies that use google groups as their mailing lists. The non profits essentially banned their own inbound email because they killed off previously heavily used email addresses and our server now counts all of the legitimate inbound email as harvesting- banned.
So what did the the non profits do to get around this ? Now they created email addresses on gmail - "UserName.Agency@gmail.com" and are having those email addresses forward email to our servers that have a matching username.
It just increased the traffic load from google.

It seems like If I want my users to get email from google / gMail / google groups my only recourse is to disable that "feature" of banning spammers - sending to email addresses that do not exist. I don't know how else to fix this.
www.HawaiianHope.org - Providing technology services to non profit organizations, low income families, homeless shelters, clean and sober houses and prisoner reentry programs. Since 2015, We have refurbished over 11,000 Computers !
Douglas Foster Replied
A lot of things have been broken because Internet usage is unmetered.   When attackers have universal connectivity and essentially zero marginal cost, what outcome would you expect?    It shifts tremendous costs to recipient organizations, and because those defenses are inadequate, it leads to even higher costs as a result of breaches.   A similar problem has been created for voice telephony, because the free internet has made international calling free for attackers, and knowing the true source of a call has become proportionately difficult.

But it seems worth elaborating on the problem by pointing out obstacles in the defense process.   The first obstacle is obtaining an effective and efficient message review tool, so that malicious messages can be identified.   The second obstacle is interpreting an unwanted message to determine if a message is acceptable, unwanted, or malicious.   In some cases, the appropriate response is an unsubscribe, other times a block, and sometimes a block with an abuse report to the attacker's infrastructure provider.  Once a decision is made, the filtering rules need to be updated to make an acceptable message allowed next time, or to make an unacceptable message blocked next time.   All of that decision-making is labor-intensive.  I don't trust the "trust us" vendors, and the assemble-it-yourself toolkits seem to stop well short of a complete workflow.

But to start thinking for anyone who wants to build a better workflow, here is my description of the "whom to block" problem:

 
Spam Filtering – Deciding what to block
Assume you have one or more confirmed-hostile messages, so something needs to be blocked.   How do you decide what identifiers should be blocked?    Identifiers can be treated as a hierarchy, in descending order:
  • Reply-To address(es)
  • Message From address
  • SMTP Mail From address
  • Helo server name
  • Reverse DNS server name
  • IP address

Identify Infrastructure identifiers

One identifier will represent the entity primarily responsible for the message.  Identifiers above that entity will either be impersonations or identifiers controlled by the attacker.  Identifiers below that entity will be infrastructure providers that offer services to many clients.  Infrastructure services are expected to have a mix of acceptable and unacceptable clients, so they are generally not punished for misbehavior by a single client.   Exceptions to this rule will be discussed later.
Working from the top down, assess each identifier role, to evaluate whether it is an infrastructure identifier or an entity responsible for the message.  Infrastructure identifiers include:
  • Internet Service Providers that assign IP address to client organizations
  • Bare-metal hosting services that allocate their assigned IP addresses to their clients.   IP allocations tend to be shared or fluid, so an IP address may not reliably indicate the hosting service’s client organization.
  • Email hosting services that serve multiple clients on multiple domains.
  • Email service providers (ESPs) that send messages on behalf of clients.  The ESP domain is indicated in the SMTP Mail From address, while the message From address indicates the client (Sendgrid.Net, ConstantContact.com)
  • Mailbox providers (Gmail.com, Hotmail.com)
  • Outbound gateway service providers (MaxMind.com, ProofPoint.com)
(Note that Outlook.com falls into most of these categories)

Assess Impersonation

Some identifiers within a malicious message may be impersonations.  These are ignored, as they are neither responsible for the message nor an infrastructure provider for the message.
Some spammers use a bare-metal hosting provider that does not restrict outbound email.  That allows them to impersonate both the message From address and the SMTP Mail From.  If the server names represent the hosting service, the attacker remains completely anonymous.
Server names may indicate the hosting service, the SMTP organization, or obfuscation.   Forward-confirmed DNS can rule out obfuscation, while names without a PSL parent domain are confirmed obfuscation.

Considerations when an email address needs to be blocked:

  • You need to determine whether the domain is a mailbox provider or not.  If it is a mailbox provider, you block the full email address.  Otherwise, you block the domain name.
  • In almost every case, a domain name should be blocked at the organizational domain level, because you have determined that the domain is controlled by malicious actors.    For example, user@mail.badguys.com is blocked using “badguys.com”, not “mail.badguys.com”.

Considerations when the server needs to be blocked:

  • You need to determine if the server name represents the attacker or a hosting service with a malicious actor.   If the server name represents the malicious actor, block the server organizational domain as well as the IP address.  If the server names represent a hosting service, block the IP address and send an abuse report to the hosting service.
  • In the rare case that the hosting service is determined to be fully untrusted, block the hosting service by organizational domain and known IP address ranges.
  • If the hosting service is determined to be legitimate but too frequently used by malicious clients, configure messages from that server organizational domain to be quarantined by default.  Then as messages arrive from that company’s clients, create rules to allow acceptable clients and block unacceptable ones.
  • MaxMind.com and IPInfo.io can be useful for determining IP address ownership and IP address blocks assigned to a single company.

Reply to Thread

Enter the verification text