Intro
The Alpha60 project’s manifesto is to conceptualize methods and metrics to rigorously compare media flows around the world. One of the types of files that are popular on free peer networks is the category of leaks: usually text files, email messages, and databases obtained through a variety of means that are made public in the press after being shared in peer swarms.
Wikileaks distributed several leaks on Bittorrent, and the successor organization Distributed Denial of Secrets (abbreviated as DDoSecrets) distributes a majority of its leaks on Bittorrent. Before the advent of streaming, many film, tv, and streaming media objects produced in the USA were leaked to voting members of award shows with the tag “DVD SCREENER.” Journalism interest in leaks spiked after the Wikileaks/Gucifer leak as part of the 2016 Presidential election in the United States of America, when hacked email from the Hillary Clinton campaign was leaked to news outlets timed to give advantage to her opponent, Donald Trump.
This post attempts to look at leaks in a systematic fashion over the last six years, make a first pass at the data, and attempt to notice patterns and similarities of leaks as a genre as compared to the other types of files (which also include audio, video, books, magazines, pornography, proprietary software, 3D models) shared on peer networks. Is there a way to characterize leak swarms (in real time) as unusually supported by one nation state?
Method
The transparency organization Distributed Denial of Secrets operates a mailing list that notifies the general public of leaks, and an associated website that catalogs and distributes the same leaks, often using the Bittorrent protocol to transfer files. In 2019, it distributed the first Russian-focused leak, The Dark Side of the Kremlin. This continued through 2022, with the first leak post the Febuary 24th Russian invasion the limited-distribution of the Pravada Soldier leak. On March 11, the first public cyberwar leak was distributed, Roskomnadzor.
On that same day, the Alpha60 project started sampling the peer swarm activity from the first Roskomnadzor leak. Since then, as Distributed Denial of Secrets announces new public leaks in the cyberwar category, they are added to the existing set of UKR-RUS leaks being sampled. The Alpha60 project defines the collection data set of all leak files at the time of writing (currently 63) as the BTIHA (for BitTorrent Info Hash Array) for this paper.
In a nutshell, the method is to oversample peer-to-peer swarms (each leak every 4 minutes), collect peer and seed information, and serialize it. Next, a caching pass removes duplicates and calculates unique peer and seed information for a given hour, day, week. Then, an analysis pass uses the intermediate cache files to compute geolocation, persistance, and other infomation.
The sample collection and data cleaning and analysis tools are free software running on Linux and are available in this GitHub repository.
Initial Results (2022-03-11 through 2022-08-11)
General measures
- 63 individual leak files, 154 days
- 825,006 peers
- 58,937 seeds
- 144 PB (143,905 TB) maximum transferred, (if all peers completed)
- $887,527 USD bandwidth cost (at .005/GB)
Geographic characteristics
During the sample period, the cumulative breakdown of peers/seeds by country location is as follows for the top ten countries (using ISO-3 country codes).
Rank | Peer Size | Peer Country | Seed Size | Seed Country |
01 | 265k | RUS | 9k | UKR |
02 | 80k | USA | 5.8k | USA |
03 | 57k | KOR | 5.5k | RUS |
04 | 37k | CHN | 3.2k | CHN |
05 | 25k | UKR | 2.5k | DEU |
06 | 25k | FRA | 2.3k | NLD |
07 | 22k | NLD | 1.7k | JPN |
08 | 18k | GBR | 1.6k | CZE |
09 | 17k | JPN | 1.5k | POL |
10 | 16k | CAN | 1.3 | GBR |
The cumulative distribution in shades of gray on the world.
Largest/Smallest in the aggregate (BTIHA).
Within the cyberwar collection, the leak with the largest two swarms was the first leak and longest sample, the Roskomnadzor emails and databases. The top five are as below.
Rank | Leak | Peers | Seeds |
1 | Roskomnadzor | 33,472 | 7,043 |
2 | Roskomnadzor_Databases | 32,390 | 2,946 |
3 | Socarenergoresource.ru | 24,486 | 415 |
4 | Central_Bank_of_Russia | 23,888 | 5,132 |
5 | Admblag.ru | 22,873 | 939 |
Results in Context
In this same time frame as the cyberwar leak, the Alpha60 project sampled several other media objects. Coincident with the Roskomnadzor leak on March 11, the film Turning Red was released and the first digital copies of the film Spider-Man: No Way Home were leaked. The streaming series Tokyo Vice was released on April 7th, the first digital copies of the film Everything Everywhere All at Once were leaked on May 18th, and the streaming series Stranger Things v4 and Obi-Wan Kenobi debuted new episodes on May 27th and then continued into July.
Results from these (ongoing) samples can be found here, and serve as a useful control group for the cyberwar collection. The cyberwar collection swarms are about 3% the size of a huge global media sensation like Spider-Man: No Way Home.
This swarm size difference is quite apparent in cartographic representations. Some experimental geolocation visualizations are here, and can be compared at the same scale with the equivalent cyberwar collection (colored in shades of gray), below.
Commentary
Although the peer to peer network topology is constantly changing due to country-level filtering and network infrastructure decisions, there has not been a decrease in peer to peer activity in Russia year over year. Pre-war and current swarm activity do not show any systemic network interference. In fact, due to economic sanctions and the removal of western content from Russian media markets, the expectation is that swarm activity in Russia will rise over 2021 levels.
Due to issues cited by Kraganis, Russia is and has been a historically dominant peer to peer power, so even though the peer ratio (30%) for this specific collection skews high, this is in a range that has been seen previously when specific film and television texts are especially popular in Russia, such as Witcher (30%) and Don’t Look Up (32%). This most probably a reflection of the contents of this specific collection, which is Russian language and concerns Russian organizations and is likely to be of interest by people living in …. Russia.
Of note, however, is the small seed/peer ratio. Comparing this collection to other collections from media texts (such as the two above) shows a 5.6x smaller number of seeds than would be expected. This may be explained (at least in part) by the larger file sizes for these leaks, which are quite a bit larger than the usual media object digital file. Another part of the explanation may be the legal, ethical, and emotional register of participating in document sharing (and the perceived higher participation risk) in the middle of a highly partisan land war.
Collection Name | Peer/Seed Ratio |
UKR-RUS cyberwar leaks | 14 |
Witcher 2 | 2.5 |
Don’t Look Up | 2.2 |
Outside of Russia, of interest is the major lack of participation in ether peer or seed warms from India, a country that is also a dominant peer to peer power. This may be a reflection of India’s neutrality in this conflict. The imbalance between South Korea’s very active peer swarms and very sleepy seed swarms is also unusual, but unexplained.
Questions
- VPN/Tor usage/IP characteristics
Some of our research questions are only answerable with the ability to categorize IP addresses as from known VPN ranges, specific cloud providers, or known nation-state. Although some IP addresses correlate with known Tor exit nodes, and a stubborn 2-6% of all IP addresses cannot be resolved at all with MaxMind’s free versions, known VPN ranges and segmenting our data for VPN has proved to be elusive. We are soliciting advice and support on the best methods to accomplish this.- Update, 2023-09-24.
- Access for academic use to NetAcuity’s database license is mid to high five figures. Research use has been proposed but discussion is currently stalled. Deep-pocketed restriction-free donations earmarked for this cause are welcome.
- Access for academic use to TeleGeography’s telecom databases is low five figures. Medium-pocketed restriction-free donations earmarked for this cause are also welcome.
- Whois analysis, ASN lookup implemented and cached. This (non-default path) looks up IPs addresses that error out of MaxMind with whois, but could be expanded to the full set of peer IPs. However, this many requests to a public whois server may be misconstrued as abuse and the originating IP may subsequently be denied service or rate-limited, and commercial services to do this are expensive and of unknown accuracy.
- MaxMind has granted a license for research use on three sites. For access to the least accurate geolocation database.
- Some doubt has been raised as to IP address purity. In particular, foreign-located VPN usage using (re-using?) the existing USA domestic consumer IP space. See PAM/Active monitoring 2022 talk. There may be no known method to pick apart non-USA VPN traffic from domestic use IPs.
- Current research-interests include
- whois Registration Country vs. Physical Switch Location Country, starting with FRA, GBR, USA, CHN, RUS registered companies operating on the African continent. Perhaps this should be called ASN analysis?
- tor exit node saturation rank. What percentage of the total swarm (BTIHA) is each node trafficking? Experimental data indicates to be different for leaks.
- CHN, RUS network blocks, text and image filtering, media censorship. Subsequent pirate activity.
- Update, 2023-09-24.
- Unintended consequences
What’s best practice for releasing public geolocation data? What granularity, what time scale? Current option is country-level only. How to research and publishing on this internet phenomena without destroying it? How many peers/seeds does a leak need to be considered permanently in public?- Update, 2023-09-24.
- Buy a license.
- Per-day, per-week, each duration itself and cumulative.
- Time frame was two years, is now extended to five years.
- Initial results were shared at HOT FOCI in 2022. Any final analysis will be published in public after a negotiated settlement ending hostilities has been signed by Ukraine and Russia of their own free will.
- Update, 2023-09-24.
- Are auditable leak protocols useful for assessing the bias of the leak source?
A currently unsolved issue for transparency organizations is distributing a leak that has been planted as part of a misinformation or propaganda campaign (See WikiLeaks, 2016 Democratic National Committee email leak). Given that Distributed Denial of Service is publishing leaks using the BitTorrent protocol, techniques such as the ones outlined above can be used to characterize the distribution of the published leak, which may reveal nation-state hosting, or other unusual behavior. If this helpful for free and open communication on the internet, how can it be published and archived for future research by others? - What about non-cyberwar leaks?
Would a control group of non-Russian, non-cyberwar leaks be a useful control group for cyberwar category leaks? What about other organizations? Media object leaks from the TV/film world seem very different in terms of behavior.- Update, 2023-09-24. The following super-set of leaks is defined (to answer the questions immediately above) as
- screener leaks 2017-2018
- distributed denial of secrets corporate
- distributed denial of secrets cyberwar rus ukr
- distributed denial of secrets leaks usa
- distributed denial of secrets iran
- yandex leak
- Update, 2023-09-24. The following super-set of leaks is defined (to answer the questions immediately above) as
Bibliography
HOPE 2022, Using Topic Models to Organize Huge Leaks, 2022-07-23
HOPE 2022, Leaks and Hacks: Four Years of DDoSecrets, (youtube), 2022-07-23
DEFCON 2022, Leak the Planet, (slides), Emma Best & Xan North, 2022-08-12
DEFCON 2022, Computer Hacks in the RUS-UKR War, (paper, slides), Kenneth Geers, 2022-08
Bodó, Balázs, The Genesis of Library Genesis: The Birth of a Global Scholarly Shadow Library, Shadow Libraries, MIT Press, 2018
De Kosnik, Abigail, Piracy is the Future of Culture: Speculating about Media Preservation after Collapse, Third Text, 2019
De Kosnik, Benjamin and De Kosnik, Abigail. Network Activity Monitoring Service, US10911337B1
Ensafai, Roya, A look at router geolocation in public and commercial databases, 2017, IMC
Guerrero-Saade, Juan Andrés. Hacktivism and State-Sponsored Knock-Offs | Attributing Deceptive Hack-and-Leak Operations, 2022
Li, Jinying, Pirate cosmopolitanism and the undercurrents of flow, Transnational Convergence of East Asian Pop Culture, Routledge, 2021
Madory, Doug. Rerouting of Kherson follows familiar gameplan, 2022-08-09
Madory, Doug. Internet Impacts Due to the War in Ukraine, NANOG 86, 2022-10-25
McLaughlin, Jenna. How a nonprofit group has become the biggest repository for hacked Russian data, NPR, 2022-07-05
Karaganis, Joe, Access from Above, Access from Below, Shadow Libraries, MIT Press, 2018
Karaganis, Joe and Renkema, Lennart, Copy Culture in the US & Germany, 2013
Philip, Kavita. The Internet Is A Leaky Pipe Made of Imperial Rubble, Your Computer Is On Fire!, MIT Press, 2019
Spinielli, Enrico and Olive, Xavier and Rivière, Philippe “Impacts de l’invasion de l’Ukraine sur l’aviation civile”, visionscarto.net
Streibelt, Lindorfer, Gürses, Gañán, Fiebig, We have to go back: A Historic IP Attribution
Service for Network Measurement, arxiv.org, 2022-11-12
Toler, Aric, From Discord to 4chan: The Improbable Journey of a US Intelligence Leak, 2023-04-09, Bellingcat
Nershi, Karen and Grossman, Shelby, Assessing the Political Motivations Behind Ransomware Attacks, July 13, 2023
Chen, Adrian. “The Agency,” NYT, 2015-06-02
The Tactics & Tropes of the Internet Research Agency, United States Senate Select Committee on Intelligence (SSCI), 2019
U.K. Says Russia Has Targeted Lawmakers and Others in Cyberattacks for Years, NYT, 2023-12-07
The Leak as Genre: An Introduction
Russian trolls target U.S. support for Ukraine, Kremlin documents show, The Washington Post, 2024-04-08, (target Truth Social)
Online Reaction to the Death of Alexei Navalny, Russian Opposition Leader, Open Measures Newsletter, 2024-03-19 (analyze Truth Social)
You must be logged in to post a comment.