WHOIS: identification or correlation?

Posted by Carel Bitter on 7 Dec 2023

Recently, an industry peer pointed out that WHOIS data made it possible to uncover a large cluster of domains. The domains were used for a fake URL-shortener scheme and a massive SMS phishing operation, known as Prolific Puma. Of course, this particular method of correlation is not new. Except since the arrival of GDPR, this technique has lost much of its power, due to redacting of ownership records by registries. And this is why she mentioned it: WHOIS correlation is becoming so rare that any successes deserve mention.

WHOIS correlation: A success story

Let’s take a deeper dive into the specifics of this case. The original research from Infoblox on Prolific Puma highlights a powerful case of correlating a large number of malicious domains via WHOIS domain owner records. Unfortunately, this is far less common these days.

In this particular case the choice of TLD by the Prolific Puma operator definitely helped. The domains were all registered under the .us TLD – in theory the official TLD for the United States. Compared to many other TLDs, .us has two things that set it apart. First, there is a policy that forbids WHOIS proxy services, meaning whatever registrant info is on file will appear in the public record. And second – often overlooked, but almost equally important in a case involving thousands of domain names – the data is reasonably accessible for research. Meaning, the WHOIS service has usable rate limits and responds quickly with the data you want.

Why mention this? Because this certainly isn’t the case for every TLD or registrar that maintains and provides thick WHOIS data.

Using WHOIS data for correlation

When talking about WHOIS, policy debate typically focuses on identification, considering things like GDPR, and the privacy implications of publishing ownership data. The fact that this same data allows for large scale correlation regrettably receives much less airtime.

When researching cybercrime, it is often the case that the ownership data of malicious domain names is fake (the ownership data is made up) or stolen (the owner may exist, but they have not purchased that specific domain). While there is attribution value in some of the data, the real value is in the correlation or clustering that WHOIS data can fuel. Once you can achieve this at scale, preventive left-of-bang action becomes a reality for most types of online crime that rely on multiple domain names.

Correlating new domains to ‘good’ clusters

Using WHOIS data for correlation rather than identification has another use case. While we care about finding malicious domain names, we are also interested in identifying benign ones. After all, domain reputation is a spectrum which has a good end, too.

Established businesses can register new domains for a variety of reasons. Over time, this may end up generating a portfolio of thousands, or even tens of thousands of domains. Being able to easily correlate a new domain name to a cluster of existing benign domains is incredibly valuable, allowing defenders to focus on finding potentially malicious domains at the middle of the spectrum.

Downfall of WHOIS data collection

In light of the above, ICANNs recently launched RDRS system is of questionable use. As it requires manual work per-domain, it is irrelevant in large scale processing workflows that are often used to identify security threats within the domain name space. That said, it is not unlike the current state of WHOIS data collection, where policy and technical implementation make it harder – not easier – to get to the valuable data in the registry.

In the absence of at-scale access to this data, those that need it have developed different ways to do correlation. While some of these methods can help identify relationships that can’t be found via WHOIS, they are often slower and much more computationally expensive. Unfortunately, these approaches are not true replacements, as there is simply no good alternative for a comprehensive domain ownership registry.

Towards a solution for correlation

As you might imagine, it is beyond frustrating for researchers that a treasure trove of useful data is still out there, but in practical sense inaccessible for use. Yes, RDRS is a positive step forward, however, it does not address the scale issue. Implementing a public identifier accessible at scale that uniquely correlates an owner across a registrar, while not perfect, would go a long way. It would enable correlation without revealing actual PII, helping prevent cybercrime damage instead of cleaning it up afterwards.

To make this happen the security, fraud prevention and IP fields need to work together to drive the necessary change in policies and practices. It will not be easy, but it can be done.

Blog

Spamhaus Intelligence API (SIA)

Spamhaus Intelligence API (SIA) contains context-rich metadata relating to IP and domain reputation. Integrate this data with your applications to enhance existing data feeds, or consume as an independent data source.

In this easy-to-consume format, SIA can be used for threat detection and investigation, risk scoring, customer vetting, validation and much more.

Save valuable time investigating and reporting
Simple and quick to access
Data you can trust in

DNS Firewall Threat Feeds

Applied at the DNS level of your infrastructure, these threat feeds automatically stop users from accessing malicious sites including phishing and malware dropper websites.

These threat feeds can be integrated with existing recursive DNS servers, or for those who don’t manage their own DNS, we have a managed service available.

Reduce IT costs
Set and forget
Save money on risk insurance

Border Gateway Protocol Firewall

Border Gateway Protocol (BGP) Firewall provides your users and network with up-to-date protection against botnets and other external attacks.

Set up takes minutes; our data is constantly updated in real time by our experienced researchers on your behalf and can be utilized in your existing firewalls or routers.

Prevent data exfiltration
Protect your network from botnets
Reduce infected machines on your network

Resources

Spamhaus Quarterly Domain Reputation Update, Q2 2023

19 July 2023

Report

This quarter our domain experts observed 17 million new domains and as expected, Freenom's final departure. With this, new TLDs and registries are now front and centre, with cheap gTLDs the latest victim. Find out more in this Q2 report.

Download

Spamhaus Quarterly Domain Reputation Update, Q1 2023

14 April 2023

Report

Researchers observed unprecedented change, with a decrease in registration and abuse number for all five Freenom ccTLDs, including a steep decline for .ml (-74%). Yet with this, significant increases for gTLDs .store and .fun. Is this the Freenom effect?

Download

WHOIS: identification or correlation?

WHOIS correlation: A success story

Using WHOIS data for correlation

Correlating new domains to ‘good’ clusters

Downfall of WHOIS data collection

Towards a solution for correlation

Related Products

Spamhaus Intelligence API (SIA)

Save valuable time investigating and reporting

Simple and quick to access

Data you can trust in

DNS Firewall Threat Feeds

Reduce IT costs

Set and forget

Save money on risk insurance

Border Gateway Protocol Firewall

Prevent data exfiltration

Protect your network from botnets

Reduce infected machines on your network

Resources

Spamhaus Quarterly Domain Reputation Update, Q2 2023

Spamhaus Quarterly Domain Reputation Update, Q1 2023