About the data

How is the data generated?

For 23 years, we have been developing a vast network of sensors to provide the broadest possible view of online networks. Data sources range from government organizations around the world to industry-leading internet providers to specialized researchers, analysts, honey pots, and beyond.

This truly big, raw data is then put under scrutiny with machine learning, robust manual investigations, and heuristics to compile listings of malicious activities into IP, domain, and content-based datasets. The listings are objective, based on policies that have been carefully defined over years with the wider industry, and are now relied on by billions of people every day.

IP-based datasets

Our IP datasets contain information on IP addresses that have been observed to be involved in sending or hosting spam, connected to hijacked servers or computers infected with botnets. The IP data is broken down into 3 primary datasets:

  1. Spamhaus Blocklist (SBL) – contains IP addresses that are observed to be involved in sending spam, snowshoe spamming, botnet command and controllers (C&Cs), bulletproof hosting companies and hijacked IP space. Indicative size: 30 million listings
  2. eXploits Blocklist (XBL) – individual IPs (/32s) that are infected with malware, worms, and Trojans; third party exploits, such as open proxies; or devices controlled by botnets. Indicative size: 4 million listings
  3. Policy Blocklist (PBL) – IP address ranges for end-user devices, such as home routers, smart TVs, and other Information of Things (IoT) devices, from which email should never be sent. Indicative size: 1.2 billion listings

We also have available the following datasets – subsets of the 3 primary datasets:

  • Auth Blocklist (AuthBL) – lists IP addresses known to host bots using brute force or stolen SMTP-AUTH credentials to send spam, phishing, and malware emails
  • Extended XBL (eXBL) – containing live and historical metadata to bring context to XBL listings
  • Botnet Controller List (BCL) – an advisory “drop all traffic” list consisting of single IPv4 addresses, used by cybercriminals to control infected computers (bots).
  • eBCL – containing live and historical metadata to bring context to BCL listings
  • CSS – specific to SMTP traffic, only listing port-25 based detections. These target spam and other low-reputation sources.
  • eCSS – containing live and historical metadata to bring context to CSS listings
  • DROP – Don’t Route Or Peer – advisory “drop all traffic” lists, consisting of netblocks that are “hijacked” or leased by professional spam or cyber-crime operations

Content-based datasets

Our content datasets contain information relating to content that has been connected to malicious activity – be that compromised domains, crypto wallets such as Bitcoin, malware and email addresses. The content data is broken down into 3 primary datasets:

  1. Domain Blocklist (DBL) – Domains owned by spammers and used for spam or other malicious purposes. This blocklist also contains domains owned by non-spammers which are used for legitimate purposes, but have been hijacked by spammers. Indicative size: 350k listings
  2. Zero Reputation Blocklist (ZRD) – lists newly registered domains for 24 hours. Domains that have just been registered are rarely used by legitimate organizations immediately. Indicative size: 200k listings
  3. Hash Blocklist (HBL) – this blocklist contains the following content areas: Cryptowallet (Bitcoin etc.), Malware and Email addresses. Indicative size: 1.1 million listings

Available via an API, we also have available the extended DBL (eDBL) containing live and historical metadata to bring context to DBL listings.

How is the data delievered?

There are a variety of options – it really depends on your use case. It can be delivered via:

  • Rsync – consume all the data with awareness of potential drawbacks due to batch processing and synchronization
  • Realtime Updates – if you need all the data but require updates in real-time, 24/7
  • Data Query Service – to make individual queries rather than consuming all the data. Data is accurate to the second
  • Spamhaus Intelligence API – the eXBL and eCSS datasets are also available via API, with additional datasets being added

Or if you have an alternative integration method in mind, or require delivery by API beyond the data listed, please get in touch with your request.

Who is the data relevant for?

Typically, data for integration is adopted by Product Managers to incorporate into end-user products or Security Operations Managers and Security Incident Event Managers for internal threat intelligence operations.