We made a fast geolocation library, and now we’re open sourcing it.

We made a fast geolocation library, and now we’re open sourcing it.

Geo-locating users is useful for many webites and applications. In some cases (such as ours), it is even required. To identify where a user is based, MaxMind’s GeoLite databases are popular, as they contain geolocation info linked to IP addresses. They are often accessed using the geolocation library ‘GeoIP-lite‘, but this library has a lot of overhead. For use cases where you need to geolocate a single (or a few) IP addresses, that library simply increases memory usage and startup time by too much. As such, we built Fast-GeoIP: a faster & low-memory library for mapping IPs to geographical information.

 

Why we needed fast-GeoIP to geolocate users

At Onramper, we are building a widget that allows users from all around the world to buy cryptocurrencies from various ‘fiat gateways’. However, not all fiat gateways can provide services to in all countries. Some jurisdictions must be excluded due to imposed sanctions, such as Iran and North Korea. Other jurisdictions require specific licenses at the federal or state level, such as the United States.

The Onramper widget needs to load fast. IP location is only needed for users using the widget to buy cryptocurrency. The IP geolocation process is therefore instantiated upon user contact. Here’s where we ran into an issue with GeoIP-lite. Concretely, what GeoIP-lite does is that, on startup, it reads the whole database from disk, parses it and puts it all on memory, thus this results in the startup time being increased by about ~233 ms along with an increase of memory being used by the process of around ~110 MB, in exchange for any new queries being resolved with low sub-millisecond latencies (~0.02 ms).

This works if you have a long-running process that will need to geolocate a lot of IP addresses and don’t care about the increases in memory usage nor startup time, but if, for example, your use-case requires only geolocating a single IP, these trade-offs don’t make much sense as only a small part of the database is needed to answer that query, not all of it. Clearly, we needed to figure out a way to look up the location of users without adding 230ms lag to our widget.

 

Optimizing for low-volume IP address lookups

GeoIP-lite was too slow for us because it has to read the entire IP geolocation database, parse it, and put it on the memory. While great for high-volume queries, we figured that a combination of indexation and filesystem optimization strategies would allow us to significantly improve both loading speed and memory usage.

The Fast-GeoIP geolocation library we built tries to provide a solution for these use-cases by separating the database into chunks and building an indexing tree around them so that IP lookups only have to read the parts of the database that are needed for the query at hand. With this in mind, the library uses a set of multi-level indexes that are constructed with the following process:

  1. Get all the IP networks in the database and sort them
  2. Split the sorted list into a set of chunks
  3. Take the first IP network in each chunk and build a sorted list out of them (called first index)
  4. Split the newly created list into another set of chunks and build another sorted list out of them by grouping the first IP in each chunk
  5. Store this last list into another file, this will be our root index.

After this process is done, it’s possible to look for an IP in the database by taking the following steps:

  1. Load the root index into memory and perform a binary search on it
  2. Load the index file associated with the IP that was obtained in the previous step
  3. Repeat the previous step to get a file (a portion of the original IP list)
  4. Apply binary search on that file to find the details associated with the IP net that contains the queried IP
  5. Obtain extra location data from the locations.json file by using a pointer provided in the previous step

This algorithm has a logarithmic bound of O(log n), and, in terms of practical performance, it’s much more efficient than other O(log n) algorithms such as a simple binary search because it localizes the information that is being searched on, thus getting a performance boost from successful cache hits while minimizing the amount of data that has to be read from disk.

As some of you might now, optimization is a rabbit hole. Once having gone down this hole, it is difficult to stop improving to reach the best speed possible. In this blog post, however, we will spare you the details of how we optimized file and index size, as well as (block) storage and retrieval.

 

The result: Fast-Geo IP is a fast geolocation library

As a result of the above-described query method, we have made a geolocation library that has a lower memory overhead, while being faster for low-volume IP lookups. This results in the first query taking around 9ms and subsequent ones that hit the disk cache taking 0.7 ms, while memory consumption is kept at around 0.7MB.

graph showing speed and performance of fast-geoIP, with number of sequential calls on the x-axis and time on the y-axis

graph showing memory consumption of fast-geoIP, clearly indicating that fast-geoip is more than 100x better than geoip-lite

graph comparing fast-geoip to geoip-lite showing file sizes a more than 3x reduction in file-size

As shown, fast-GeoIP is faster, with smaller file sizes and more than 100x less memory consumption for low volume IP lookups.

 

Open sourcing the Fast-GeoIP localization library

At Onramper, we believe strongly in the power of open source development. Our company has emerged to make the lives of developers who want their users to be able to buy easier. If in doing so, we happen to build code that has uses beyond what we are building ourselves, we are happy to open source it.

And, without further ado, here’s a code snippet of library usage:

const geoip = require('fast-geoip');

const ip = "207.97.227.239";
const geo = await geoip.lookup(ip);

console.log(geo);
{
  range: [ 3479298048, 3479300095 ],
  country: 'US',
  region: 'TX',
  eu: '0',
  timezone: 'America/Chicago',
  city: 'San Antonio',
  ll: [ 29.4969, -98.4032 ],
  metro: 641,
  area: 1000
}

This article is a slimmed-down version of the .readme written for fast-geoip’s geolocation library Github page. You can find the full version of the article, as well as tests, benchmarks, and the source code there.