5 min read
DeepSeek’s Data Breach: Could ProjectDiscovery’s Cloud Have Prevented the Hack?

Last week, DeepSeek made headlines with R1, an open-source model positioned as a competitor to OpenAI’s Model o1. What set R1 apart wasn’t just its capabilities - it was the claim that DeepSeek trained it with significantly fewer GPUs. It challenged conventional wisdom around the compute requirements for large-scale model training and even caused a small stock market selloff.
Amid the hype, a security incident surfaced: researchers gained access to DeepSeek’s internal databases. The breach reportedly exposed over a million lines of log data, including chat histories, secret keys, backend details, and other critical information like API secrets and operational metadata. Once DeepSeek resolved the issue, security researcher Nagli from Wiz hopped on X to share exactly how he found the vulnerability, showcasing how open-source tools from ProjectDiscovery played a part in identifying the misconfiguration.
If you're particularly interested in learning how to secure your own assets using our ProjectDiscovery, feel free to jump to the end of this blog post.
What Went Wrong
Before diving into the technical details of how security researchers uncovered this vulnerability, let’s first understand ClickHouse—the database at the center of the issue. ClickHouse is an open-source, columnar database designed for high-performance analytics. It excels in real-time data processing, making it a popular choice for data warehousing, business intelligence, and interactive dashboards where large datasets need to be queried efficiently using SQL.
In DeepSeek’s case, their internal ClickHouse instance was left exposed to the internet with no authentication, no access controls, and no login requirements. This misconfiguration meant that anyone could access the database simply by navigating to the /play endpoint on publicly accessible instances. From there, a malicious actor could execute SQL queries and retrieve sensitive data in plain text with ease.
In the next section, we'll take a closer look at how the researchers used ProjectDiscovery’s open-source tools to identify DeepSeek's exposed ClickHouse instance.
Inside the Breach: A Technical Breakdown
Effective reconnaissance always starts with identifying a company’s external attack surface. Here, researchers enumerated known subdomains belonging to DeepSeek by running subfinder, which queries multiple public DNS sources across the internet.

Running subfinder is known as a passive form of enumeration. In addition, the researcher used puredns to fuzz potential domain names using common wordlists to discover subdomains that may not appear in public data. This active discovery process also could have been completed using ProjectDiscovery’s shuffledns.
It is best practice after subdomains have been properly enumerated to scan these DNS records for web servers and other services that may be running on those assets. In this case, researchers used naabu to scan for open ports and httpx to probe for HTTP status codes.

While most results appeared fairly standard – public HTTP servers for DeepSeek’s chatbot, API endpoints, and documentation – four targets stood out:

These hosts were running on ports 8123 and 9000, which are more commonly used for internal development use cases, not public-facing services.
Researchers then proceeded to run Nuclei against these endpoints to scan for any exploitable vulnerabilities and common misconfigurations. Nuclei, another one of ProjectDiscovery’s tools, is an open-source vulnerability scanner that relies on community-contributed, YAML-based detection templates. The nuclei-templates repository contains over 9500+ predefined checks for common misconfigurations, exposures and vulnerabilities along with latest CVEs.
In this instance, researchers matched with a clickhouse-unauth template against hosts dev.deepseek.com:9000
and oauth2callback.deepseek.com:9000
, suggesting that there were unauthorized ClickHouse instances exposed on these hosts.

After this detection, the researchers validated the exposure by navigating to the /play
interface on these ClickHouse instances. The researchers were then able to execute SQL queries to list tables and expose sensitive data in plaintext, proving that an attacker could retrieve logs containing API keys, chat histories, and more. In this case, over 1M lines of log were available for public access.

Asset Discovery and Continuous Monitoring with ProjectDiscovery Cloud
Every step of the DeepSeek reconnaissance process described above can be fully automated with ProjectDiscovery Cloud, which would have prevented the leak by instantly alerting upon detecting a misconfigured ClickHouse instance.
Our platform integrates httpx
, dnsx
, naabu
, nuclei
and seven other open-source tools we built to continuously enumerate, probe, and scan your infrastructure for on a daily basis.
Beyond checking for unauthorized database access, our nuclei-templates repository includes checks for exploitable vulnerabilities, CVEs, default credentials, exposed panels, cloud platform misconfigurations, and much more – helping teams stay ahead of security risks and out of the headlines.
You can enable continuous discovery on the cloud platform to automatically identify new assets and turn on Real-Time Scans to detect vulnerabilities as new templates are added.

ProjectDiscovery Cloud also comes with
- Real-time alerts via email, Slack, Microsoft Teams, and webhook
- Automatic scanning of trending exploits with newly released Nuclei templates
- Internal network scanning
- Compliance-friendly reporting for frameworks like SOC 2 and PCI
Want to see how companies like Asana
, Chipotle
, Elastic
, Vercel
and PepsiCo
use ProjectDiscovery to secure their perimeter from similar exploits?
Takeaways & Conclusion
A big thanks to Wiz for sharing the details of this incident. It's a good reminder of how a single misconfiguration can lead to widespread data exposure. Even the most advanced AI startups can face critical security failures if basic security hygiene, such as automated scanning and continuous monitoring, is neglected.
At ProjectDiscovery, we focus on equipping organizations with the tools to proactively manage security risks. Our open-source tools, combined with the automation capabilities of ProjectDiscovery Cloud, enable teams to continuously monitor assets, detect vulnerabilities early, and prevent small misconfigurations from turning into major breaches. To make security accessible to everyone, ProjectDiscovery Cloud offers flexible tiers—from free for individuals to team and enterprise plans—ensuring that organizations of all sizes can strengthen their security posture without barriers to access.