As described by Kavita Philip in Your Computer is On Fire, the Alpha60 project measures media (tv/film/video/data) flows on peer-to-peer networks, and uses that information to create a public and reception-focused index of peer swarm behavior: cumulative size, geographies for the biggest swarms, rate of decline, day-over day growth, etc. The goal is to make global media flows visible, and to conceptualize analytical methods for media distribution and reception that are explicitly global in nature and unbound from existing models that require proprietary information or access.
The transparency organization Distributed Denial of Secrets operates a mailing list that notifies the general public of leaks, and an associated website that catalogs and distributes the same leaks, often using the Bittorrent protocol to transfer files. In 2019, it distributed the first Russian-focused leak, The Dark Side of the Kremlin. This continued through 2022, with the first leak post the Febuary 24th Russian invasion the limited-distribution of the Pravada Soldier leak. On March 11, the first public cyberwar leak was distributed, Roskomnadzor.
On that same day, the Alpha60 project started sampling the peer swarm activity from the Roskomnadzor leak. Since then, as Distributed Denial of Secrets announces new public leaks in the cyberwar category, they are added to the existing set of UKR-RUS leaks being sampled. The Alpha60 project defines the collection data set of all leak files at the time of writing (currently 63) as the BTIHA (for BitTorrent Info Hash Array) for this paper.
In a nutshell, the method is to oversample peer-to-peer swarms (each leak every 4 minutes), collect peer and seed information, and serialize it. Next, a caching pass removes duplicates and calculates unique peer and seed information for a given hour, day, week. Then, an analysis pass uses the intermediate cache files to compute geolocation, persistance, and other infomation.
The sample collection and data cleaning and analysis tools are free software running on Linux and are available in this GitHub repository.
Results (through 2022-08-11)
- 63 individual leak files, 154 days
- 825,006 peers
- 58,937 seeds
- 144 PB (143,905 TB) maximum transferred, (if all peers completed)
- $887,527 USD bandwidth cost (at .005/GB)
During the sample period, the cumulative breakdown of peers/seeds by country location is as follows for the top ten countries (using ISO-3 country codes).
|Rank||Peer Size||Peer Country||Seed Size||Seed Country|
The cumulative distribution in shades of gray on the world.
Largest/Smallest in the aggregate (BTIHA).
Within the cyberwar collection, the leak with the largest two swarms was the first leak and longest sample, the Roskomnadzor emails and databases. The top five are as below.
Results in Context
In this same time frame as the cyberwar leak, the Alpha60 project sampled several other media objects. Coincident with the Roskomnadzor leak on March 11, the film Turning Red was released and the first digital copies of the film Spider-Man: No Way Home were leaked. The streaming series Tokyo Vice was released on April 7th, the first digital copies of the film Everything Everywhere All at Once were leaked on May 18th, and the streaming series Stranger Things v4 and Obi-Wan Kenobi debuted new episodes on May 27th and then continued into July.
Results from these (ongoing) samples can be found here, and serve as a useful control group for the cyberwar collection. The cyberwar collection swarms are about 3% the size of a huge global media sensation like Spider-Man: No Way Home.
This swarm size difference is quite apparent in cartographic representations. Some experimental geolocation visualizations are here, and can be compared at the same scale with the equivalent cyberwar collection (colored in shades of gray), below.
Although the peer to peer network topology is constantly changing due to country-level filtering and network infrastructure decisions, there has not been a decrease in peer to peer activity in Russia year over year. Pre-war and current swarm activity do not show any systemic network interference. In fact, due to economic sanctions and the removal of western content from Russian media markets, the expectation is that swarm activity in Russia will rise over 2021 levels.
Due to issues cited by Kraganis, Russia is and has been a historically dominant peer to peer power, so even though the peer ratio (30%) for this specific collection skews high, this is in a range that has been seen previously when specific film and television texts are especially popular in Russia, such as Witcher (30%) and Don’t Look Up (32%). This most probably a reflection of the contents of this specific collection, which is Russian language and concerns Russian organizations and is likely to be of interest by people living in …. Russia.
Of note, however, is the small seed/peer ratio. Comparing this collection to other collections from media texts (such as the two above) shows a 5.6x smaller number of seeds than would be expected. This may be explained (at least in part) by the larger file sizes for these leaks, which are quite a bit larger than the usual media object digital file. Another part of the explanation may be the legal, ethical, and emotional register of participating in document sharing (and the perceived higher participation risk) in the middle of a highly partisan land war.
|Collection Name||Peer/Seed Ratio|
|UKR-RUS cyberwar leaks||14|
|Don’t Look Up||2.2|
Outside of Russia, of interest is the major lack of participation in ether peer or seed warms from India, a country that is also a dominant peer to peer power. This may be a reflection of India’s neutrality in this conflict. The imbalance between South Korea’s very active peer swarms and very sleepy seed swarms is also unusual, but unexplained.
- VPN/Tor usage
Some of our research questions are only answerable with the ability to categorize IP addresses as from known VPN ranges, specific cloud providers, or known nation-state. Although some IP addresses correlate with known Tor exit nodes, and a stubborn 2-6% of all IP addresses cannot be resolved at all with MaxMind’s free versions, known VPN ranges and segmenting our data for VPN has proved to be elusive. We are soliciting advice on the best methods to accomplish this. (We have also approached NetAcuity about using their geolocation options for analysis).
- Unintended consequences
What’s best practice for releasing public geolocation data? What granularity, what time scale? Current option is country-level only. How to research and publishing on this internet phenomena without destroying it? How many peers/seeds does a leak need to be considered permanently in public?
- Are auditable leak protocols useful for assessing the bias of the leak source?
A currently unsolved issue for transparency organizations is distributing a leak that has been planted as part of a misinformation or propaganda campaign (See WikiLeaks, 2016 Democratic National Committee email leak). Given that Distributed Denial of Service is publishing leaks using the BitTorrent protocol, techniques such as the ones outlined above can be used to characterize the distribution of the published leak, which may reveal nation-state hosting, or other unusual behavior. Is this helpful for free and open communication on the internet?
- What about non-cyberwar leaks?
Would a control group of non-Russian, non-cyberwar leaks be a useful control group for cyberwar category leaks? What about other organizations? Media object leaks from the TV/film world seem very different in terms of behavior.
HOPE 2022, Using Topic Models to Organize Huge Leaks, 2022-07-23
Bodó, Balázs, The Genesis of Library Genesis: The Birth of a Global Scholarly Shadow Library, Shadow Libraries, MIT Press, 2018
De Kosnik, Abigail, Piracy is the Future of Culture: Speculating about Media Preservation after Collapse, Third Text, 2019
De Kosnik, Benjamin and De Kosnik, Abigail. Network Activity Monitoring Service, US10911337B1
Ensafai, Roya, A look at router geolocation in public and commercial databases, 2017, IMC
Guerrero-Saade, Juan Andrés. Hacktivism and State-Sponsored Knock-Offs | Attributing Deceptive Hack-and-Leak Operations, 2022
Li, Jinying, Pirate cosmopolitanism and the undercurrents of flow, Transnational Convergence of East Asian Pop Culture, Routledge, 2021
Madory, Doug. Rerouting of Kherson follows familiar gameplan, 2022-08-09
Madory, Doug. Internet Impacts Due to the War in Ukraine, NANOG 86, 2022-10-25
McLaughlin, Jenna. How a nonprofit group has become the biggest repository for hacked Russian data, NPR, 2022-07-05
Karaganis, Joe, Access from Above, Access from Below, Shadow Libraries, MIT Press, 2018
Karaganis, Joe and Renkema, Lennart, Copy Culture in the US & Germany, 2013
Philip, Kavita. The Internet Is A Leaky Pipe Made of Imperial Rubble, Your Computer Is On Fire!, MIT Press, 2019
Spinielli, Enrico and Olive, Xavier and Rivière, Philippe “Impacts de l’invasion de l’Ukraine sur l’aviation civile”, visionscarto.net
Streibelt, Lindorfer, Gürses, Gañán, Fiebig, We have to go back: A Historic IP Attribution
Service for Network Measurement, arxiv.org, 2022-11-12