An Empirical Study of the I2P Anonymity Network and its Censorship Resistance
In this blog post, we summarize the outcomes of our project entitled “An Empirical Study of the I2P Anonymity Network and its Censorship Resistance” supported by the Open Technology Fund - Information Controls Fellowship Program.
The Invisible Internet Project (I2P) is one of the most well-known and widely used anonymity networks. I2P can be used by privacy-conscious Internet users to protect their online privacy, or by censored users to bypass censorship conducted by local Internet regimes. Regardless of its wide usage, several aspects of I2P have not been comprehensively studied. For instance, network properties, including the population of I2P relays, their geographical distribution, and relay types have not been investigated systematically. In addition, a more important aspect of the I2P network has yet to be investigated is its censorship resistance.
In this project, we aim to fill this research gap by conducting empirical measurements of the I2P network. We hope that our study will not only help popularize I2P to both academic researchers and Internet users, but also contribute to the understanding of the I2P network properties in general. With these goals in mind, we are interested in answering the following questions.
What is the population of I2P relays, their geographical distribution, and properties?
What are blocking strategies that censors may use to block access to I2P?
How to make I2P more resilient to censorship, especially against those blocking techniques found above?
To that end, we have conducted several measurements to build an I2P Metrics portal, and identified blocking methods that censors use to hinder access to I2P. Finally, we have also implemented solutions to make the reseeding process more resistant to blockage.
1. I2P Metrics portal
I2P Metrics collects and analyzes historical infrastructure data from the I2P network. The portal aims to facilitate understanding of relays in the I2P network by visualizing several network properties, including the population of relays, their geographical distribution, and relay types. These metrics are aggregated and updated on a daily basis. To assist future studies and increase the reproducibility of plots, anonymized data of the network infrastructure is made publicly available. The portal is also accessible via I2P hidden service (eepsite) at this base32 I2P address:
1.1 Relay population:
The I2P relay population is estimated based on the unique number of cryptographic identifiers found in the network database. Therefore, the population is estimated by I2P Metrics based on the number of unique relays observed by our measurement infrastructure. It is not the absolute (real-time) number of all relays in the network since it is impossible to obtain the exact number of active relays in a real-time manner due to the dynamic, decentralized, and high-churn nature of I2P. For instance, an I2P relay that joins the network for less than a couple minutes and leaves shortly after that would less likely be observed by our measurement infrastructure, thus not being counted. The daily estimated population gives us an idea about the size of the network and could also be used to detect anomalies, e.g., sybil attacks. Since the launch of I2P Metrics, we have been always observing at least 20,000 relays in the network on a daily basis.
1.2 Relays by country:
From the IP addresses of relays observed from the netDb, I2P Metrics then maps them to their countries and aggregates the number of relays from country/region with ISO 3166 2-letter code. Knowing the number of daily relays from a particular region can help to infer potential Internet censorship events or network attacks happening in that region. For example, I2P Metrics has been observing around 200 relays from China, and more than 5,000 relays from 30 countries with poor Press Freedom Score (i.e., greater than 50) on a daily basis. This observation shows that I2P is actually being used in regions where the Internet is often censored.
Note that numbers shown here are not absolute numbers of relays from a particular country/region. The I2P router software does allow a user to set up her relay as “hidden”, thus not sharing the relay’s IP address publicly. As a result, the geographical information of such relays are unknown. Currently, I2P Metrics does not aggregate/publish countries whose number of daily I2P relays is less than 30. This threshold is discussed and adopted from a geographic survey of I2P relays by David Swanlund.
1.3 Relays by type:
Depending on the role, connectivity status, and shared bandwidth, relays are classified into different types. I2P Metrics learns about these types from the “flag” information of each relay. Knowing the population of different relay types is critical in improving the network security and performance. For instance, we learn that there has been a consistent number of about 2,500 “floodfill” relays in the network. This group of relays plays a vital role by maintaining the decentralized network database of the I2P network.
Of different relay types, we observed that
"L" is the most dominant
relay type. Our observation aligns with the fact that the I2P router software
12-48 KBps as the default shared bandwidth. As far as am we are
concerned, this default shared bandwidth could be one of the causes of the
performance issue in net browsing activities in I2P. We have raised this
concern to I2P development team to investigate ways to change this default
setting because (1) most users often do not change default configuration, and
(2) the Internet speed has gotten better nowadays. Since then, we have noticed
the Network Diagnostic Tool of
M-lab being introduced in the I2P
router console to
help users to determine which bandwidth sharing option is suitable for their
1.4 Dataset usage:
Together with the metrics portal, anonymized data that can be used to reproduce all plots and conduct further study can be downloaded from Google Drive.
Since its launch in October 2018, the metrics portal has provided useful data for other researchers who need to verify their findings about the network or use the data as an initial input for their own studies. Moreover, the geographical distribution of relays also gives the I2P developers some insights about relays in the network and consider whether or not relays from certain countries should be put into hidden mode by default. There are other analyses that future researchers can conduct on our dataset to further understand the network, e.g., the speed of I2P router software update based on the software version stored in the record of each relay.
2. Measuring I2P censorship at a global scale
In the second stage of this project concluded, we have conducted measurements to shed light on the current situation of I2P blockage around the world. From early March until April, 2019, we conducted measurements from 1.7K network locations located in 164 countries to examine the accessibility of four different I2P services: the official homepage, its mirror site, reseed servers, and active relays in the network. Below are some highlights of our findings.
Analyzing data collected during this period, we could identify blocking attempts in five countries. China consistently hinders access to I2P by poisoning DNS resolutions of the I2P homepage and reseed servers; SNI-based blocking was detected in Oman and Qatar when accessing the I2P homepage over HTTPS; TCP packet injection was detected in Iran, Oman, Qatar, and Kuwait when visiting the mirror site via HTTP; and explicit block pages were discovered when visiting the mirror site from Oman, Qatar, and Kuwait.
While conducting our measurements, we also detected leakage of DNS injection by the Great Firewall at two networks in South Korea. In addition, of many abused IP addresses used by the Great Firewall to inject falsified DNS responses that we could also identify, there are many new IP addresses belong to Facebook that were not observed in previous studies.
Although in a previous work we have shown that IP harvesting attack can be conducted with a relatively low cost to hinder access to I2P, we did not detect such blockage in the wild. Our findings from the second stage of the project were presented at the 9th USENIX Workshop on Free and Open Communications on the Internet.
3. A more censorship-resistant I2P reseeding
One of the ways to hinder network filtering is to make the cost of collateral damage as high as possible. As we found in the second stage of the project that most censors prevent users from using I2P by blocking access to the download homepage and reseed severs. To cope with this problem, we opt to mirror the latest I2P installation packages and reseed bundle on major cloud providers. The installation packages will be updated as soon as new version is released, while the reseed bundle will be updated periodically to provide censored users with currently active relays in the network. To that end, the censors will have to bare high collateral damage to block access to all of the aforementioned cloud service providers to prevent I2P users from manually reseeding. While the censors can also block access to active I2P relays contained in the reseed bundle, we did not notice such a blocking case in the wild in the second stage of the project.
Censored users can download the installation packages and the latest reseed bundle from these cloud storage providers:
- Box: https://app.box.com/s/aednqugd5zf07mlg65wjeafay3b1qqbg
- Dropbox: https://www.dropbox.com/sh/3w9pn8l4269ky01/AACo-l7GpK2TYji5y5vOyQR7a
- Google Drive: https://drive.google.com/drive/folders/1PD7trHv1K0uQvGLmmcG1AfZys9oh3kIl
- OneDrive: https://1drv.ms/u/s!Agjj2p_MBHA0aZF3LWIYBEWbGDg
In addition to hosting on cloud storages, we also distribute the installation packages and the reseed bundle on the InterPlanetary File System (IPFS), which is an emerging technology that powers the Distributed Web. Files stored on IPFS can be located by Content Identifiers (CID). In order to fetch files from IPFS, users can install IPFS on their machine, or use one of the public IPFS gateways, which can be found here. Users can refer to this blog post for more technical details.
4. Closing Remarks
Throughout this project, we have built an I2P metrics portal through which we hope to provide useful data for other researchers and facilitate the understanding of the I2P network infrastructure. We have also identified blocking methods that censors use to block access to I2P around the world, and implemented solutions to make the I2P reseeding process more resistant to blockage and more accessible to end users who need the tool to circumvent Internet censorship and online surveillance.
As the Internet censorship arms race continues to escalate, we believe that continuous network measurements are necessary to ensure that censorship circumvention tools are accessible to end users. We thus plan to keep the I2P metrics portal running while periodically conducting censorship measurements using the platform we created.
We hereby would like to thank all members of the I2P team for their collaboration, the anonymous reviewers of our IMC ‘18 and FOCI ‘19 papers for their constructive feedback and advice. Last but not least, we also thank the members of OTF Advisory Council and the Research Director (Adam Lynn) of the Information Controls Fellowship Program for their initial feedback and guiding us throughout this fruitful project. This blog post was improved thanks to valuable feedback from the Communications Coordinator of OTF - Dan McDevitt.