Collection #1 - massive data breach that contains 773 million records

A security researcher found a massive amount of users' email addresses and unhashed passwords on MEGA cloud service

87GB of data contained emails and passwordsTroy Hunt discovered Collection #1 on the web which contained 773 million emails and 21 million unique passwords

Troy Hunt, a security researcher, Microsoft Regional Director and the owner of Have I Been Pawned,[1] discovered[2] unprotected data that contained 772,904,991 unique email addresses and 21,222,975 unencrypted passwords. This type of mega breach is one of the largest ones ever announced, surpassing even Equifax, but falling short to Yahoo's email hack back in 2013.[3]

The compiled data came from thousands of different data breaches and placed on a cloud-based service called MEGA. Named “Collection #1,” the 87 GB container consisted of 12,000 separate files and 2,692,818,238 rows. The researcher received a tip from a colleague that led him to a popular hacking forum where the data was promoted as “a collection of 2000+ dehashed databases and Combos stored by topic.”

According to Hunt, users can enter their email addresses and passwords on Have I Been Pawned to find out whether or not they are affected. There are 1,160,253,228 unique combinations of email addresses and passwords, although many of the data was dismissed after the researcher cleaned and software the existing data:

This also includes some junk because hackers being hackers, they don't always neatly format their data dumps into an easily consumable fashion. (I found a combination of different delimiter types including colons, semicolons, spaces and indeed a combination of different file types such as delimited text files, files containing SQL statements and other compressed archives.)

Soon after the discovery, the cloud service MEGA took down the hosted information.

The origin of the data is not recognized

Collection #1 is the largest accumulation of data found in a single piece. While Hunt recognized many breaches where the information came from, the origin of the data is still unclear, and some of the services might not have been involved in the data breach.

Nevertheless, he notes that, after checking his own passwords and email addresses from the past, he confirmed that the data was accurate. The expert also says that many of the uncovered passwords were initially hashed during the compromise. However, hackers managed to “dehash” and convert them to plain text.

Hunt referred to the data as “random” when interviewed by Wired:[4]

It just looks like a completely random collection of sites purely to maximize the number of credentials available to hackers. There’s no obvious patterns, just maximum exposure.

Shield yourself from data compromise

Hunt urges people to go and change their passwords immediately if they are included in the HIBP database, as “one or more passwords you've previously used are floating around for others to see.”

Credential stuffing is a popular technique used by cybercriminals, and it seems like Collection #1 is created just for that purpose, seeing how untidy the database was. There are over 2.7 billion records on Hunt's site currently, meaning that all of this data can be used for credential stuffing.

The process relies on users reusing their old passwords or keeping the same ones for multiple accounts. After acquiring credentials from a data breach, any criminal can attempt to enter other accounts based on that data. For that reason, changing passwords frequently, or using password managers is vital when it comes to cybersecurity, as explained by Hunt:

Perhaps your personal data is on this list because you signed up to a forum many years ago you've long since forgotten about, but because its subsequently been breached and you've been using that same password all over the place, you've got a serious problem.

According to security experts, 2018 was a “year of the data breach tsunami,”[5] which resulted in such notorious cases like Cambridge Analytica, Quora, Marriott,[6] and many others. Looking into the future, it becomes evident that data harvesting is a lucrative business for criminals, so adequate measures should be undertaken to protect sensitive information in the year 2019.

About the author
Alice Woods
Alice Woods - Likes to teach users about virus prevention

Alice Woods is the News Editor at 2-spyware. She has been sharing her knowledge and research data with 2spyware readers since 2014.

Contact Alice Woods
About the company Esolutions

References
Files
Software
Compare