Skip to content
alpha60

alpha60

  • Instagram
  • Threads

A research project

  • About
  • Contact

Leak Distribution Analysis Part One

Dee Kay2022-08-112025-08-04access, alpha60, bittorrent, data, data viz, Distributed Denial of Secrets, geolocation, government, hackers, hacking, infrastructure, internet, leaks, location, metadata, open access, russia, scholarly literature, ukraine, usa, VPNs

Post navigation

Previous
Next

Intro

The Alpha60 project’s manifesto is to conceptualize methods and metrics to rigorously compare media flows around the world. One of the types of files that are popular on free peer networks is the category of leaks: usually text files, email messages, and databases obtained through a variety of means that are made public in the press after being shared in peer swarms.

Wikileaks distributed several leaks on Bittorrent, and the successor organization Distributed Denial of Secrets (abbreviated as DDoSecrets) distributes a majority of its leaks on Bittorrent. Before the advent of streaming, many film, tv, and streaming media objects produced in the USA were leaked to voting members of award shows with the tag “DVD SCREENER.” Journalism interest in leaks spiked after the Wikileaks/Gucifer leak as part of the 2016 Presidential election in the United States of America, when hacked email from the Hillary Clinton campaign was leaked to news outlets timed to give advantage to her opponent, Donald Trump.

This post attempts to look at leaks in a systematic fashion over the last six years, make a first pass at the data, and attempt to notice patterns and similarities of leaks as a genre as compared to the other types of files (which also include audio, video, books, magazines, pornography, proprietary software, 3D models) shared on peer networks. Is there a way to characterize leak swarms (in real time) as unusually supported by one nation state?

 

Method

The transparency organization Distributed Denial of Secrets operates a mailing list that notifies the general public of leaks, and an associated website that catalogs and distributes the same leaks, often using the Bittorrent protocol to transfer files. In 2019, it distributed the first Russian-focused leak, The Dark Side of the Kremlin. This continued through 2022, with the first leak post the Febuary 24th Russian invasion the limited-distribution of the Pravada Soldier leak. On March 11, the first public cyberwar leak was distributed, Roskomnadzor.

On that same day, the Alpha60 project started sampling the peer swarm activity from the first Roskomnadzor leak. Since then, as Distributed Denial of Secrets announces new public leaks in the cyberwar category, they are added to the existing set of UKR-RUS leaks being sampled. The Alpha60 project defines the collection data set of all leak files at the time of writing (currently 63) as the BTIHA (for BitTorrent Info Hash Array) for this paper.

In a nutshell, the method is to oversample peer-to-peer swarms (each leak every 4 minutes), collect peer and seed information, and serialize it. Next, a caching pass removes duplicates and calculates unique peer and seed information for a given hour, day, week. Then, an analysis pass uses the intermediate cache files to compute geolocation, persistance, and other infomation.

The sample collection and data cleaning and analysis tools are free software running on Linux and are available in this GitHub repository.

 

Initial Results (2022-03-11 through 2022-08-11)

General measures

  • 63 individual leak files, 154 days
  • 825,006 peers
  • 58,937 seeds
  • 144 PB (143,905 TB) maximum transferred, (if all peers completed)
  • $887,527 USD bandwidth cost (at .005/GB)

Geographic characteristics

During the sample period, the cumulative breakdown of peers/seeds by country location is as follows for the top ten countries (using ISO-3 country codes).

Rank Peer Size Peer Country Seed Size Seed Country
01 265k RUS 9k UKR
02 80k USA 5.8k USA
03 57k KOR 5.5k RUS
04 37k CHN 3.2k CHN
05 25k UKR 2.5k DEU
06 25k FRA 2.3k NLD
07 22k NLD 1.7k JPN
08 18k GBR 1.6k CZE
09 17k JPN 1.5k POL
10 16k CAN 1.3 GBR

The cumulative distribution in shades of gray on the world.

Largest/Smallest in the aggregate (BTIHA).

Within the cyberwar collection, the leak with the largest two swarms was the first leak and longest sample, the Roskomnadzor emails and databases. The top five are as below.

Rank Leak Peers Seeds
1 Roskomnadzor 33,472 7,043
2 Roskomnadzor_Databases 32,390 2,946
3 Socarenergoresource.ru 24,486 415
4 Central_Bank_of_Russia 23,888 5,132
5 Admblag.ru 22,873 939

 

Results in Context

In this same time frame as the cyberwar leak, the Alpha60 project sampled several other media objects. Coincident with the Roskomnadzor leak on March 11, the film Turning Red was released and the first digital copies of the film Spider-Man: No Way Home were leaked. The streaming series Tokyo Vice was released on April 7th, the first digital copies of the film Everything Everywhere All at Once were leaked on May 18th, and the streaming series Stranger Things v4 and Obi-Wan Kenobi debuted new episodes on May 27th and then continued into July.

Results from these (ongoing) samples can be found here, and serve as a useful control group for the cyberwar collection. The cyberwar collection swarms are about 3% the size of a huge global media sensation like Spider-Man: No Way Home.

This swarm size difference is quite apparent in cartographic representations. Some experimental geolocation visualizations are here, and can be compared at the same scale with the equivalent cyberwar collection (colored in shades of gray), below.

 

Commentary

Although the peer to peer network topology is constantly changing due to country-level filtering and network infrastructure decisions, there has not been a decrease in peer to peer activity in Russia year over year. Pre-war and current swarm activity do not show any systemic network interference. In fact, due to economic sanctions and the removal of western content from Russian media markets, the expectation is that swarm activity in Russia will rise over 2021 levels.

Due to issues cited by Kraganis, Russia is and has been a historically dominant peer to peer power, so even though the peer ratio (30%) for this specific collection skews high, this is in a range that has been seen previously when specific film and television texts are especially popular in Russia, such as Witcher (30%) and Don’t Look Up (32%). This most probably a reflection of the contents of this specific collection, which is Russian language and concerns Russian organizations and is likely to be of interest by people living in …. Russia.

Of note, however, is the small seed/peer ratio. Comparing this collection to other collections from media texts (such as the two above) shows a 5.6x smaller number of seeds than would be expected. This may be explained (at least in part) by the larger file sizes for these leaks, which are quite a bit larger than the usual media object digital file. Another part of the explanation may be the legal, ethical, and emotional register of participating in document sharing (and the perceived higher participation risk) in the middle of a highly partisan land war.

 

Collection Name Peer/Seed Ratio
UKR-RUS cyberwar leaks 14
Witcher 2 2.5
Don’t Look Up 2.2

Outside of Russia, of interest is the major lack of participation in ether peer or seed warms from India, a country that is also a dominant peer to peer power. This may be a reflection of India’s neutrality in this conflict. The imbalance between South Korea’s very active peer swarms and very sleepy seed swarms is also unusual, but unexplained.

 

Questions

  1. VPN/Tor usage/IP characteristics
    Some of our research questions are only answerable with the ability to categorize IP addresses as from known VPN ranges, specific cloud providers, or known nation-state. Although some IP addresses correlate with known Tor exit nodes, and a stubborn 2-6% of all IP addresses cannot be resolved at all with MaxMind’s free versions, known VPN ranges and segmenting our data for VPN has proved to be elusive. We are soliciting advice and support on the best methods to accomplish this.
    • Update, 2023-09-24.
      • Access for academic use to NetAcuity’s database license is mid to high five figures. Research use has been proposed but discussion is currently stalled. Deep-pocketed restriction-free donations earmarked for this cause are welcome.
      • Access for academic use to TeleGeography’s telecom databases is low five figures. Medium-pocketed restriction-free donations earmarked for this cause are also welcome.
      • Whois analysis, ASN lookup implemented and cached. This (non-default path) looks up IPs addresses that error out of MaxMind with whois, but could be expanded to the full set of peer IPs. However, this many requests to a public whois server may be misconstrued as abuse and the originating IP may subsequently be denied service or rate-limited, and commercial services to do this are expensive and of unknown accuracy.
      • MaxMind has granted a license for research use on three sites. For access to the least accurate geolocation database.
      • Some doubt has been raised as to IP address purity. In particular, foreign-located VPN usage using (re-using?) the existing USA domestic consumer IP space. See PAM/Active monitoring 2022 talk. There may be no known method to pick apart non-USA VPN traffic from domestic use IPs.
      • Current research-interests include
        • whois Registration Country vs. Physical Switch Location Country, starting with FRA, GBR, USA, CHN, RUS registered companies operating on the African continent. Perhaps this should be called ASN analysis?
        • tor exit node saturation rank. What percentage of the total swarm (BTIHA) is each node trafficking? Experimental data indicates to be different for leaks.
        • CHN, RUS network blocks, text and image filtering, media censorship. Subsequent pirate activity.
  2. Unintended consequences
    What’s best practice for releasing public geolocation data? What granularity, what time scale? Current option is country-level only. How to research and publishing on this internet phenomena without destroying it? How many peers/seeds does a leak need to be considered permanently in public?
    • Update, 2023-09-24.
      • Buy a license.
      • Per-day, per-week, each duration itself and cumulative.
      • Time frame was two years, is now extended to five years.
      • Initial results were shared at HOT FOCI in 2022. Any final analysis will be published in public after a negotiated settlement ending hostilities has been signed by Ukraine and Russia of their own free will.
  3. Are auditable leak protocols useful for assessing the bias of the leak source?
    A currently unsolved issue for transparency organizations is distributing a leak that has been planted as part of a misinformation or propaganda campaign (See WikiLeaks, 2016 Democratic National Committee email leak). Given that Distributed Denial of Service is publishing leaks using the BitTorrent protocol, techniques such as the ones outlined above can be used to characterize the distribution of the published leak, which may reveal nation-state hosting, or other unusual behavior. If this helpful for free and open communication on the internet, how can it be published and archived for future research by others?
  4. What about non-cyberwar leaks?
    Would a control group of non-Russian, non-cyberwar leaks be a useful control group for cyberwar category leaks? What about other organizations? Media object leaks from the TV/film world seem very different in terms of behavior.
    • Update, 2023-09-24. The following super-set of leaks is defined (to answer the questions immediately above) as
      • screener leaks 2017-2018
      • distributed denial of secrets corporate
      • distributed denial of secrets cyberwar rus ukr
      • distributed denial of secrets leaks usa
      • distributed denial of secrets iran
      • yandex leak
      •  

 

Bibliography

HOPE 2022, Using Topic Models to Organize Huge Leaks, 2022-07-23

HOPE 2022, Leaks and Hacks: Four Years of DDoSecrets, (youtube), 2022-07-23

DEFCON 2022, Leak the Planet, (slides), Emma Best & Xan North, 2022-08-12

DEFCON 2022, Computer Hacks in the RUS-UKR War, (paper, slides), Kenneth Geers, 2022-08

Bodó, Balázs, The Genesis of Library Genesis: The Birth of a Global Scholarly Shadow Library, Shadow Libraries, MIT Press, 2018

Bodó, Balázs and Poort, Joost, Bellwethers of Change: Book Piracy in Context (November 13, 2023). Amsterdam Law School Research Paper Forthcoming, Forthcoming IN: Schüller-Zwierlein, André (Ed): “Age of Access? Grundfragen der Informationsgesellschaft” / „Age of Access? Fundamental Issues of Information Society“, de Gruyter , Available at SSRN: https://ssrn.com/abstract=4631288

De Kosnik, Abigail, Piracy is the Future of Culture: Speculating about Media Preservation after Collapse, Third Text, 2019

De Kosnik, Benjamin and De Kosnik, Abigail. Network Activity Monitoring Service, US10911337B1

Ensafai, Roya, A look at router geolocation in public and commercial databases, 2017, IMC

Guerrero-Saade, Juan Andrés. Hacktivism and State-Sponsored Knock-Offs | Attributing Deceptive Hack-and-Leak Operations, 2022

Li, Jinying, Pirate cosmopolitanism and the undercurrents of flow, Transnational Convergence of East Asian Pop Culture, Routledge, 2021

Madory, Doug. Rerouting of Kherson follows familiar gameplan, 2022-08-09

Madory, Doug. Internet Impacts Due to the War in Ukraine, NANOG 86, 2022-10-25

McLaughlin, Jenna. How a nonprofit group has become the biggest repository for hacked Russian data, NPR, 2022-07-05

Karaganis, Joe, Access from Above, Access from Below, Shadow Libraries, MIT Press, 2018

Karaganis, Joe and Renkema, Lennart, Copy Culture in the US & Germany, 2013

Philip, Kavita. The Internet Is A Leaky Pipe Made of Imperial Rubble, Your Computer Is On Fire!, MIT Press, 2019

Spinielli, Enrico and Olive, Xavier and Rivière, Philippe Impacts de l’invasion de l’Ukraine sur l’aviation civile, visionscarto.net

Streibelt, Lindorfer, Gürses, Gañán, Fiebig, We have to go back: A Historic IP Attribution
Service for Network Measurement
, arxiv.org, 2022-11-12

Toler, Aric, From Discord to 4chan: The Improbable Journey of a US Intelligence Leak, 2023-04-09, Bellingcat

Nershi, Karen and Grossman, Shelby, Assessing the Political Motivations Behind Ransomware Attacks, July 13, 2023

Chen, Adrian. “The Agency,” NYT, 2015-06-02

The Tactics & Tropes of the Internet Research Agency, United States Senate Select Committee on Intelligence (SSCI), 2019

U.K. Says Russia Has Targeted Lawmakers and Others in Cyberattacks for Years, NYT, 2023-12-07

The Leak as Genre: An Introduction

Russian trolls target U.S. support for Ukraine, Kremlin documents show, The Washington Post, 2024-04-08, (target Truth Social)

Online Reaction to the Death of Alexei Navalny, Russian Opposition Leader, Open Measures Newsletter, 2024-03-19 (analyze Truth Social)

How to negotiate hacked and leaked data, International Consortium of Investigative Journalists, 2024

The New Propaganda War, Anne Applebaum, Atlantic, June 2024, archive

Stauffer, Brian, Disrupted, Throttled, and Blocked: State Censorship, Control, and Increasing Isolation of Internet Users in Russia. Human Rights Watch. 2025-07-30

Share this:

  • Click to share on Tumblr (Opens in new window) Tumblr
  • More
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on Mastodon (Opens in new window) Mastodon

Like this:

Like Loading...
academic, access, alpha60, analysis, data, FOCI, geolocation, hackers, HOTFOCI, leak, leaks, location, netblocks, open access, russia, scholarly literature, tor, ukraine, VPNs, whois

Post navigation

Previous Star-X Summer 2022
Next MPA Comments on Notorious Markets 2022

Published by Dee Kay

View all posts by Dee Kay

Search

Categories

Tags

1 2x platinum 7 8 8 week 2021 2022 2023 2024 academic access acolyte admin advertising africa ai alph60 alpha60 analysis art articles arxiv ashoka australia bittorrent blackAF blockchain boba fett boundary object cahill-keyes cartography chat gpt China chn collection content data data interchance day compare dense discovery disney disney+ distribution dsc DTC (Direct-to-Consumer) durational study ETL experiments FOCI fun future game of thrones generative ai geographic coordinates geolocation geospatial hackers hacking HBO HOTFOCI ICA India infrastructure internet internet society iptv jay-z kodi latitude leak leaks location longitude lord of the rings mandalorian manifesto maps marvel metadata mpa music netblocks netflix network nielsen notorious markets obi wan kenobi open access open source ai oscars oz Paramount paramount+ peer-to-peer peer phenomena performance piracy pirates pirates in the outfield population density press question mark ratings reading release team russia sale scholarly literature screeners season 2 sense8 SHOWTIME soti sparse spotify stanford star trek discovery star wars state of the internet statics stranger things strangerthings02 streaming swarm t-shirt television The Walking Dead tidal tor torrent transparency tv ukraine upfronts upload team urban areas video art visualization vpn VPNs warhol westworld what is this? White Papers whois why Wikipedia word salad year in review

Archives

Authors

  • 1 Dee Kay
    • Leak Distribution Analysis Part One
    • Star-X Summer 2022
    • Articles mentioning alpha60 in California Magazine and The Ringer
    • How Hollywood Is Racing to Catch Up with Netflix
    • AV Club urges Netflix to release own ratings
  • 1 sunglint
    • ITU Fiber Maps
    • HOPE_16 NYC Talk
    • Upfront Aesthetics, Tropes, Performances
    • Pirate Ratings 2025
    • OII Information Geographies
Powered by WordPress.com.

Discover more from alpha60

Subscribe now to keep reading and get access to the full archive.

Continue reading

 

Loading Comments...
 

You must be logged in to post a comment.

    %d