Unveiling Hidden Gems: Accessing Private GitHub Repositories Through Copilot
Security researchers are raising alarms about the potential risks associated with data exposure on the internet, particularly regarding generative AI chatbots like Microsoft Copilot. Even brief visibility of sensitive information can lead to long-term accessibility issues, as evidenced by recent findings from Lasso, a cybersecurity firm focused on the emerging threats of generative AI.
Data Exposure Risks in AI Chatbots
According to Lasso, thousands of previously public GitHub repositories from major corporations, including Microsoft, are at risk. This issue stems from content being indexed and cached by Microsoft’s Bing search engine, making it retrievable through AI tools long after it has been made private.
Case Study: Lasso’s Own Repository
Ophir Dror, co-founder of Lasso, shared insights with TechCrunch regarding their findings. He noted that content from a GitHub repository, which had been briefly public, was still accessible via Copilot. “If I browsed the web, I wouldn’t see this data. Yet, anyone could ask Copilot the right question and retrieve it,” Dror explained.
Investigation Findings
Upon realizing the potential for data exposure via GitHub, Lasso conducted a thorough investigation. They compiled a list of repositories that were public at any point during 2024 and identified those that have since been deleted or made private. Through Bing’s caching system, Lasso discovered that over 20,000 private GitHub repositories still had data accessible via Copilot, impacting more than 16,000 organizations.
- Companies affected include major players such as Amazon Web Services, Google, IBM, PayPal, Tencent, and Microsoft.
- Amazon later clarified that it is not impacted by this issue.
- Lasso has since removed references to AWS following legal advice.
Potential Consequences of Data Exposure
The implications for affected organizations are significant. Copilot could potentially disclose confidential GitHub archives containing:
- Intellectual property
- Sensitive corporate data
- Access keys and tokens
In one instance, Lasso used Copilot to retrieve the contents of a previously deleted GitHub repository that hosted a tool for creating “offensive and harmful” AI images utilizing Microsoft’s cloud AI service.
Recommendations for Affected Organizations
Dror stated that Lasso reached out to those severely affected by the data exposure, advising them to rotate or revoke any compromised keys to mitigate risks.
Responses from Affected Companies
As of now, none of the organizations listed by Lasso have responded to inquiries from TechCrunch. Microsoft has also remained silent regarding the matter.
In November 2024, Lasso informed Microsoft of its findings, which the company classified as “low severity,” asserting that the caching behavior was “acceptable.” Notably, Microsoft ceased including links to Bing’s cache in search results starting December 2024. However, Lasso contends that Copilot continues to access this data despite the visibility issue, indicating that the problem persists.
For more information on data security and generative AI, visit our comprehensive guide on data security best practices.
Stay informed about the latest developments in cybersecurity by following our updates.