Product Details
What data is listed?
- 7.5 million listings in total (approx.)
- 75,000 (up to) newly observed IPs added every 24 hours
- 1 million (average) “refreshed” listings added every 24 hours
This dataset contains IP addresses that are exhibiting compromised behavior, including:
- Malware infections
- Worm infections
- Trojan infections
- Devices controlled by botnets command and controllers
- Third-party exploits, such as open proxies
Every IP address has various metadata relating to it, including bot names, first seen date, and valid until date. Both historical and live data is accessible to provide additional insight.
Benefits of Spamhaus Intelligence API
-
Breadth
Includes both live and historical data relating to IPs showing signs of compromise, with access to 20 different fields per infected IP.
-
Control & convenience
API access makes the data easily consumable across multiple applications, without the requirement to download the entire data set.
-
Real-time updates
Threats are included in this dataset as soon as researchers observe them.
Data limits
Free usage of the beta service is up to 1,000 queries per day and a total of 20,000 queries per month. This free beta service ends on 17th March 2021.
Included Fields
ipaddress
The IP address identified as the source of the bot-generated traffic. Always Provided
botname
The bot name associated with the detected activity. Where detection can’t be clearly associated, “unknown” will be returned. Always provided
seen
Unix timestamp of the last detected event for the given IP and the given bot name. Always provided.
first seen
Unix timestamp (rounded to the minute) of the first detection event for this IP and bot name combination. This will match the value of “seen” if it’s the first sighting of this type on this particular IP. When there has been no activity for this given combination for a month, this field is reset. Always provided.
listed
The Unix timestamp (rounded to the minute) of when the entry reached our database. Usually, this is very close to the value of “seen” unless the data is coming from batched processes. Always provided.
valid_until
Unix timestamp (rounded to the minute) of when the given entry will be considered “expired” from our dataset. Always provided.
detection
Human-readable form, briefly describing how the data was collected. This field only appears when the heuristic can involve multiple ways of collecting said data.
rule
An internal ID pointing to the rule operating the detection. Detections operated by different means or rules will show different IDs, even when they refer to the same detection. Always provided.
dstport
The destination port of the traffic triggering the detection. Not always disclosed/available.
helo
When detection results from SMTP traffic, this is the HELO string used in the SMTP session triggering the detection.
helos
Specific to MPD detections only. This is an array enumerating all the HELO strings involved in the detection. Appears only in records for the MPD heuristic.
heuristic
The heuristic applied to generate the detection. This returns a limited number of possible values.
asn
The Autonomous System Number (ASN) announcing the IP, predominantly obtained from routeviews data.
lat
Geographic latitude of the IP. Only provided when geolocation data is available.
lon
Geographic Longitude of the IP. Only provided when geolocation data is available.
cc
The ISO Country Code of the nation where the IP resides. Only provided when geolocation data is available.
protocol
IP protocol of the traffic triggering the detection. Usually either UDP or TCP.
srcip
Source IP of the traffic triggering the detection. In rare cases, this usually matches the argument of the listing.
uri
Specific to the “SINKHOLE” heuristic, and to HTTP sinkholes detections only. This is the URI of the HTTP request triggering the listing. Not always available.
useragent
Specific to the “SINKHOLE” heuristic, and to HTTP sinkholes detections only. It is the User-Agent header of the HTTP request triggering the listing. Not always available.
domain
Mostly specific to the “SINKHOLE” heuristic, and to HTTP sinkholes in particular. It’s the domain/hostname the traffic triggering the detection is reaching, i.e., the sinkhole’d domain. Often obtained from the “host” header of the HTTP request triggering the listing. Not always available.