The Web hosted archives containing the source codes for Yandex projects. They were authenticated by the company, who acknowledged that the materials had been stolen from an internal repository. However they denied any suggestion that it was a hack.
The attackers published more than 44.7 GB of archives (.tar.bz2) – hackers claim they were able to access the source codes of Yandex project sources, except for anti-spam rules. According to data thieves it occurred in July 2022. The archives contain materials in Python, C ++, Go and TypeScript, as well as methods for working with Protocol Buffers, YAML and JSON data, according to a publication on Habré. Some of the strangest features of the archive information include large amounts of Python 2.7 support code, and a single date for all folders and files – “2022-02-24”. This contradicts the statements of hackers.
Representatives from “Yandex”, however, acknowledged the authenticity of the published materials and stated that there had been no hacking. “Yandex security service has found code fragments from an internal repository in the public domain. They are different than the current repository and used in Yandex services..
The company also emphasized that the repositories are not intended to store users’ personal data, and an investigation is underway into the incident: “Although we are still investigating the source code fragments that got into the public domain, we don’t believe there is any risk to users’ data or platform performance.“. Sources familiar with the situation stated that Yandex project source codes were stolen from the Network by an employee.
It is worth noting that the leaked source codes are more interesting to study, but it is unlikely that it will be possible to directly use and launch “your own Yandex” based on them. There are many solutions, some tailored to Yandex’s infrastructure. For AI projects, there is not one thing that is most important – there aren’t trained neural networks or a data set to train them.