Unlocking Secrets: Accessing Private GitHub Repos Through Copilot Despite Previous Exposure
Security researchers are sounding the alarm about a serious issue: data exposed on the internet, even for a brief moment, can persist in online generative AI chatbots like Microsoft Copilot. This lingering data poses a significant risk to privacy and intellectual property, as highlighted by recent findings from Lasso, a cybersecurity firm based in Israel that focuses on emerging generative AI threats.
Data Exposure Risks in Generative AI
Recent research has unveiled that thousands of previously public GitHub repositories, including those from major companies like Microsoft, have been compromised. Lasso co-founder Ophir Dror shared insights with TechCrunch about how even data that has been made private can still be accessed through AI tools.
Discovery of Cached Data
Dror explained that Lasso discovered content from its own GitHub repository in Microsoft Copilot due to indexing and caching by Bing, Microsoft’s search engine. The repository, which had been mistakenly made public for a short time, was later set to private, resulting in a “page not found” error on GitHub. However, the data remained accessible through Copilot.
- Data Accessibility: Users can retrieve sensitive information by simply asking Copilot the right questions.
- Investigation Findings: Lasso compiled a list of repositories made public at any point in 2024, discovering over 20,000 that had since become private.
- Organizations Affected: More than 16,000 organizations, including major players like Amazon Web Services, Google, and IBM, are impacted.
Implications for Affected Companies
According to Lasso, some organizations could unintentionally expose confidential GitHub archives containing sensitive corporate data, intellectual property, access keys, and tokens through Copilot. For example, Lasso retrieved data from a deleted GitHub repository that previously hosted a tool for creating “offensive and harmful” AI images using Microsoft’s cloud services.
Response from Affected Companies
Lasso contacted all companies significantly impacted by this data exposure, advising them to rotate or revoke any compromised keys. However, no company, including Microsoft, has publicly responded to these findings.
Microsoft’s Assessment and Future Precautions
After notifying Microsoft in November 2024, the company classified the issue as “low severity,” asserting that the caching behavior was acceptable. Microsoft also stated that it stopped including links to Bing’s cache in search results from December 2024 onwards. Nonetheless, Lasso indicates that despite disabling the caching feature, Copilot can still access previously exposed data, suggesting a temporary solution rather than a comprehensive fix.
For more insights on cybersecurity and data privacy, you may want to explore our cybersecurity tips or visit Cybersecurity.gov for resources on protecting your online data.