Additional protection with an expanding CSS dataset

Posted on
November 02, 2022 Author
Spamhaus Technology Team Read time
3 mins

Spam DNS Blocklist Data Query Service API

In this guide

Introduction Meet Robert Are these “bad guys” really bad?Why the “slowly but surely” approach?We’d like you to share your CSS-related observations You spoke. We stopped listing. Now we are back on!

Introduction

As of Wednesday, November 9th, the eagle-eyed amongst you will notice the CSS dataset will start to swell. Imperceptibly at first, but slowly and surely, we anticipate the addition of 1.5 million listings over the next 4-6 months; that’s approximately a 100% increase! The goal? Increased protection and insight for all users of this dataset. Whether you use it to filter email via the Data Query Service (DQS) or for intelligence on IPs through the Spamhaus Intelligence API, you will benefit…. And ultimately, so will the broader internet community.

Meet Robert

I could bore you with all the blurb about “continuous improvement,” – but you know that Spamhaus researchers aren’t going to sit on their laurels while miscreants constantly change their modus operandi to evade detection. So, I’ll quickly move on and introduce you to Robert.

Robert is one of the Spamhaus Project’s data scientists; boy, he “gets” data. Recently, Robert’s been beavering away and has identified additional areas of malicious activity. How? Good question – but not one we’ll be answering. After all, we don’t want to tell the bad guys how we identify them!

Are these “bad guys” really bad?

We know that some individuals and organizations get listed due to naivety relating to domain and IP reputation. For example, some domain owners don’t realize that if you unwittingly host your domain on infrastructure shared with cybercriminals, you may suddenly find yourself listed due to the mess they’re making of your shared IP space.

However, almost all this new intelligence is focused on those who are purposefully abusing the internet. The Project’s researchers will list IP addresses spewing out spam, not because of a compromised device or a proxy but because of outright malevolent behavior.

Why the “slowly but surely” approach?

The research team will introduce these listings at approximately 50,000 per day. Understandably, some of you may still be thinking why we’re going with a “slowly but surely” methodology. If we’re seeing badness, why not list it all immediately?

The answer is that it isn’t always possible to test IP and domain reputation to the point of having no doubt a false positive won’t arise. Of course, due diligence is undertaken, and the analysis, signals, and rules used to curate these listings are tested, checked, and retested. But threat hunting isn’t black and white, and until the data is used in the wild, researchers can’t be 100% sure.

Therefore, it’s vital to release in small bursts, monitor, and then continue with the next release. On the occasions The Project’s threat hunters and researchers haven’t observed this process, the feedback from the wider internet community has been a loud and resounding “Please, don’t do that.” (or sometimes using language with a little more color to it!)

You spoke. We stopped listing. Now we are back on!

Following the introduction of the long-term behaviour rules to the CSS list, as promised, we started introducing new listings…slowly. However, as we reached approximately 1.4 million listings, some of you reached out (thank you!) to say, “hey, we are observing a few false positives”. In early February, we stopped.

Over the next month – our data scientist wizard – Robert, meticulously reviewed, researched, and refined the long-term behaviour rules to eradicate the false positives you were seeing. Today, the rules are back on, listing over 900,000 IPs and increasing daily!

Keep sharing your CSS list experiences – your insight is invaluable!

Related Resources

Spam DNS Blocklist Data Query Service API