Parsing Millions of URLs per Second

input string	`https://7-Eleven.com/Home/../P/Montréal`
PHP	unchanged
Python	unchanged
WHATWG URL	`https://xn--7eleven-506c.com/Home/P/Montr%C3%A9al`
curl 7.87	`https://7-Eleven.com/P/Montr%C3%A9al`
Go runtime (`net/url`)	`https://7-Eleven.com/Home/../P/Montr%C3%A9al`

![center](simdjsonlogo.png)

---

- Long URLs: `http://nodejs.org:89/docs/latest/api/foo/bar/qua/13949281/0f28b//5d49/b3020/url.html#test?payload1=true&payload2=false&test=1&benchmark=3&foo=38.38.011.293&bar=1234834910480&test=19299&3992&key=f5c65e1e98fe07e648249ad41e1cfdb0`

most browsers, JavaScript runtimes ; curl, runtime libraries $\to$ RFC 3986

PHP (`parse_url`): naive processing (no validation, no normalization)

# How long are URLs? ![w:800 h:500](input_size.png) https://github.com/ada-url/url-various-datasets/tree/main/top100 --- # How long does it take to parse a URL on average? curl 7.81.0 (RFC 3986), written in C - 18 000 instructions/URL - 7 100 cycles/URL

Compilers may do it for you, but not always.

--- # URL parsing no longer a bottleneck in Node 20 | node version | request/second (simple) | request/second (href) | gap | |--------------|-----------|------------|----| | 20.1 | 61k | 59k | 3% |

Parsing Millions of URLs per Second

Software performance

State of Node.js Performance 2023

Structure of an URL

Examples

WHATWG URL

Assumptions

HTTP Benchmark

URL parsing was a bottleneck in Node 18.15

Wrote a C++ library (called Ada)

Trick 1: perfect hashing

Trick 2: use memoization (tables)

Trick 3: use vectorization

Efficient C++/JavaScript bridge

JavaScript Benchmark

JavaScript Results

The Ada C++ library is safe and efficient

Ada is available in the language of your choice

Links