ReversingLabs and Sophos released a dataset of 20 million Windows Portable Executable files. This includes 10 million malware samples as well. The database is meant for improving security improvements across the IT industry. It provides features, labels, and metadata for the internal files and allows the interested audience to download the available malware samples for more research. The publicly-accessible dataset contains a labeled and curated set of samples and related metadata.
While ML models are developed on data, the security field lacks a large-scale and standard that all levels of users can access; however, it has hindered advancement in the field.