The exposed MongoDB compromises 202 million Chinese job seekers' data

A security researcher discovered an unprotected database that was not protected by a password

MongoDB disclosed 202 million users' dataAn unprotected MongoDB database exposed personal data of 202 million Chinese job seekers

Bob Diachenko from HackenProof discovered[1] a MongoDB database that was exposed online and contained CVs of over 202 million Chinese job seekers. MongoDB is a NoSQL[2] cross-platform database that focuses on keeping documents and is hosted by American server.

The disclosed information included highly sensitive personal details, such as full names, dates of birth, phone numbers, emails, home addresses, driver licenses, and even political views along with work experience and skills. The database contained 202,730,434 records which size reached 854.8 GB.

Most alarmingly, all the exposed data was unprotected by a password or any type of authentication requirement and allowed anybody on the internet to view it, as long as the correct address is provided.

Currently, the database has been secured and no longer accessible, although it is still unclear who took it down, as the researcher found over 12 IPs that accessed the database before its removal.

The unknown web app was responsible for data gathering from multiple job seeker sites

Upon discovery, the researcher was eager to contact the author of the server, however, he was not able to connect the database to a specific service or company, so the ownership is unclear. According to Diachenko, a GitHub repository managed to clear some things:[1]

The origin of the data remained unknown until one of my Twitter followers pointed to a GitHub repository (page is no longer available but it is still saved in Google cache) which contained a web app source code with identical structural patterns as those used in the exposed resumes

The tool named data-import was created three years ago and was most likely designed to harvest data from multiple Chinese job seeker sites, such as Besides, the pattern of information in the GitHub repository matches the data found in the leaky database.

Nevertheless, site's representatives denied any association with the discovered app, saying:[1]

We have searched all over the database of us and investigated all the other storage, turned out that the sample data is not leaked from us.

It seems that the data is leaked from a third party who scrape data from many CV websites.

The sensitive information of millions of people was disclosed for about a week before it was taken down

According to the CEO of Comforte security firm,[3] the data was open for no more than a week before it was closed down. Considering that in many previous instances the information was accessible for months or even years, this incident might seem pretty minor.

However, the number of people affected was enormous – we are talking about 202 million people and their very private details, which is a bit less than a half of Marriott data breach[4] – the most extensive data leak of 2018.

The incident proves that the information stored on various databases is not that secure and can be exposed at any given time. Therefore, more efforts should be put into protecting sensitive data of innocent users, as regular users put their trust into the hands of companies organizations in the hopes of full protection.

Unfortunately, considering 2018 was a year of the data breach tsunami,[5] it is evident that corporations and businesses have a long way to go before that trust can be justified.

About the author
Jake Doevan
Jake Doevan - Computer technology expert

Jake Doevan is one of News Editors for He graduated from the Washington and Jefferson College , Communication and Journalism studies.

Contact Jake Doevan
About the company Esolutions