Packages proxies

Packages proxies

Packages store

All package repositories are technically similar and unfancy. Technical boredom is a guarantee of stability.

On a website, packages are indexed per release, per version, per architecture (and sometimes per group, like “security” or “backport”).

Lots of technical details are explained in the previous blog post (sorry for my French).

Lac Léman vu depuis Thonon-les-bains

Faster, closer, and more efficient

If you manage a large float of servers or a busy continuous integration service, you have to cache package repositories (to avoid banishment, throttling, or bad karma).

Package integrity is handled behind the distribution channel with signatures. At least, it should be so.

Distribution channel (HTTP most of the time) is not a security problem.

Old-school providers sign (with a trusted public key) the index of the packages (including their hashes).

The (almost) universal signing tool for open-source code is Sigstore.

With signatures, package integrity is guaranteed, so caching or mirroring is safe.

Repositories can be mirrored, but beware of the synchronization periodicity.

Some package providers handle their own mirrors, with explicit and implicit GeoDNS and cache proxy. They can broadcast updates to minimize mirror lags.

Bring your own cache

Cache can be fancy, with a lot of features and complexity, or crude, just a plain old HTTP cache proxy.

Fancy caches

Here is a short selection of specialized proxies:

Some specialized proxies handle different kinds of packages, like Pulp.

apt-proxy (which also handles yum and apk) is designed to handle flappy Internet connections and find the best mirror (behind it).

Old school caches

Squid is boring; lets use Nginx cache.

A proxy cache for your repositories

Dinosaurs remember the time when a simple HTTP_PROXY environment variable plugged a proxy in front of the target site.

Proxy only works with HTTP servers. HTTPS encrypts the connection, which can’t be cached.

Now, all repositories use HTTPS by default now (kudos to SSL Everywhere). The proxy must be a “man in the middle”, and intercept the queries.

Each kind of package must be configured specificallyy.

Setting a proxy for all package families can be done with a config file. It’s explicit, but not universal.

Developers and CI can use different private caches.

Some package manager can be configured with few ENV; for the others, a mix of file configuration and ENV can do the trick.

Demo time

One cache to rule them all.

Setting a private DNS and TLS is boring; life is short. The easiest and laziest way is to mix local and containerized development with bare IP and containerized cache server.

Please don’t do that ia real production environment.

Caching for few package families are explained later in this blog post.

This examples use Docker to build images.

The adaptation for local development is trivial.

Docker has the ONBUILD command in its Dockerfilen very useful for setting variable in the build process.

The easiest way to build image which can use cache is to build an image atop the main language image.

The layers of the cake are:

  • base image
  • cachable image
  • project image
Nginx cache

The official Nginx image doesn’t have the subs-filter module.

Lets build a new image with it.

docker build \
  --build-arg ENABLED_MODULES="subs-filter" \
  -t nginx-subs \
  https://raw.githubusercontent.com/nginx/docker-nginx/refs/heads/master/mainline/alpine/Dockerfile

Each cache packages families use path route, and when it’s impossible, it uses hostname route (the client is configured to use the cache as a proxy).

Nginx configuration is minimalistic, tune it if you wish.

Pick your favorite resolver, I use 193.110.81.0 (dns0.eu), 8.8.8.8 (google) is “déjà vue”.

Cache size needs your attention, set a correct value, nor to small, nor to huge.

One server, one port, no hostname.

worker_rlimit_nofile 8192;

events {
    worker_connections 4096;
}

http {
    log_format proxy '$remote_addr '
                        '"$request" $status $bytes_sent bytes'
                        ' -> "$upstream_addr" "$http_location" '
                        ' "$http_user_agent" ';

    error_log /dev/stdout info;
    access_log /dev/stdout proxy;

    include /etc/nginx/mime.types;

    index index.html;
    resolver 193.110.81.0; # dns0.eu
    default_type application/octet-stream;

    tcp_nopush on;
    server_names_hash_bucket_size 128;

    proxy_cache_path /data/cache keys_zone=fat_cache:10m max_size=1g inactive=60m use_temp_path=off;

    server {
        listen 80;
    }
}

The server section is empty, some location will be added later.

Pick a private interface face, and its IP for publishing the cache service. The cache server needs to be available from containers and your workstation.

The ip is stored as SERVER_IP :

ip -f inet addr show docker0 | grep "inet " | sed -E "s#.*inet (.*)/.*#\1#g"

Or something more dirty with MacOS, docker0 is inside the VM, so, the IP of the laptop is used (check your firewall settings, first).

ifconfig -v en0 | grep "inet " | cut -w -f 3

You can now run the cache server (with its SERVER_IP).

mkdir -p data/cache
docker run --rm \
    -v `pwd`/data/cache:/data/cache \
    -v `pwd`/nginx.conf:/etc/nginx/nginx.conf:ro \
    -p $(SERVER_IP):8082:80 \
    nginx-subs

On Linux, you should use an user with your id, cache owned by root is unorthodox.

Debian (and Ubuntu)

Debian uses URLs starting with /debian.

server {
    location ~^/debian(.*)$ {
        proxy_cache fat_cache;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        proxy_pass http://deb.debian.org/debian$1;
    }
}

Cache image.

FROM debian:bookworm-slim

ONBUILD ARG APT_MIRROR=""
ONBUILD RUN if [ -z '$APT_MIRROR' ] ; \
    then \
    echo 'No mirror'; \
    else echo "Acquire::http::Proxy \"$APT_MIRROR\";" \
    > /etc/apt/apt.conf.d/cache.conf; \
fi
docker build -f Dockerfile.debian-mirror -t deb-mirror .

Debian demo imagee :

FROM deb-mirror

RUN apt-get update \
    && apt-get install -y --no-install-suggests --no-install-recommends \
        cowsay \
    && rm -rf /var/lib/apt/lists/*

CMD ["cowsay", "Through the Looking-Glass"]

Build it , with a specific cache server.

docker build \
    -f Dockerfile.debian \
    -t debian-demo \
    --build-arg APT_MIRROR=http://192.168.1.35:8082/ \
    .

You should see a flow of package URLs in the terminal with Nginix Cache.

Run it :

docker run --rm debian-demo

The Ubuntu variant needs only few modifications.

Ubuntu doesn’t use prefixed URLs;the hostnameme must be used.

server {
    server_name ~^(.*)\.ubuntu.com$;
    listen 80;

    location / {
        proxy_cache fat_cache;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        proxy_pass https://$1.ubuntu.com;
    }
}
Alpine

Nginx conf:

location /alpine/ {
    proxy_cache fat_cache;
    proxy_cache_background_update on;
    proxy_cache_lock on;
    proxy_pass https://dl-cdn.alpinelinux.org/alpine/;
}

Cache image:

FROM alpine:latest

ONBUILD ARG HTTP_PROXY=""
ONBUILD RUN if [ -z '$HTTP_PROXY' ] ; \
    then \
        echo 'No mirror'; \
    else \
        sed -i 's/https/http/g' \
        /etc/apk/repositories; \
    fi

There is trick, $HTTP_PROXY will be used by apk.

Build cache image:

docker build \
    -f Dockerfile.alpine-with-cache \
    -t alpine-with-cache \
    .

Demo image:

FROM alpine-with-cache

RUN apk add --no-cache figlet

CMD ["figlet", "Carpe diem"]

Build demo image:

docker build \
    -f Dockerfile.alpine \
    -t alpine-demo \
    --build-arg HTTP_PROXY=$(SERVER_IP):8082 \
    .

Run demo:

docker run --rm alpine-demo
Pypi

Nginx conf for pypi:

location ~ ^/pypi/(.*)$ {
    proxy_cache fat_cache;
    proxy_pass https://pypi.org/simple/$1;
    proxy_cache_background_update on;
    proxy_cache_lock on;
    proxy_ssl_protocols TLSv1.2;
    proxy_ssl_session_reuse off;
    proxy_ssl_server_name on;
    proxy_ssl_name pypi.org;
}

pip can use an index without https, but the host needs to be trusted.

Cache image:

FROM python:3.13-slim

ONBUILD ARG PYPI_CACHE=""
ONBUILD RUN if [ -z '$PYPI_CACHE' ] ; \
    then \
        echo 'No mirror'; \
    else \
        echo "[global]\n\
        index-url = http://${PYPI_CACHE}/pypi\n\
        trusted-host = $(echo ${PYPI_CACHE} | cut -d : -f 1)" \
        > /etc/pip.conf ;\
    fi

Build cache image:

docker build \
    -f Dockerfile.python-with-cache \
    -t python-with-cache \
    .

Demo image:

FROM python-with-cache

RUN python3 -m venv /demo \
    && /demo/bin/pip install cowsay

CMD ["/demo/bin/cowsay", \
    "-c", "tux", \
    "-t", "\"Welcome to the thunder dome\"" ]

Build example:

docker build \
    -f Dockerfile.python \
    -t python-demo \
    --build-arg PYPI_CACHE=$(SERVER_IP):8082 \
    .

Run example:

docker run --rm python-demo
npm

Nginx conf:

server {
    server_name registry.npmjs.org;
    listen 80;

    location / {
        proxy_cache fat_cache;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        proxy_pass https://registry.npmjs.org/;
        proxy_ssl_protocols TLSv1.2;
        proxy_ssl_session_reuse off;
        proxy_ssl_server_name on;
        proxy_ssl_name registry.npmjs.org;
    }
}

Cache image:

FROM node:24-alpine

ONBUILD ARG NPM_CACHE=""
ONBUILD RUN if [ -z '$NPM_CACHE' ] ; \
    then \
        echo 'No mirror'; \
    else \
        npm set proxy "http://${NPM_CACHE}/" --location global && \
        npm set https-proxy "http://${NPM_CACHE}/" --location global &&\
        npm set registry http://registry.npmjs.org/; \
    fi

Build cache image:

docker build \
    -f Dockerfile.npm-with-cache \
    -t node-with-cache \
    .

Demo image:

FROM node-with-cache

# npm should be somewhere, not /
RUN mkdir -p /opt/demo \
    && cd /opt/demo \
    && npm --verbose install cowsay

CMD ["/opt/demo/node_modules/.bin/cowsay", \
    "-e", "xx", \
    "\"With a little help from my friends\"" ]

Build demo image:

docker build \
    -f Dockerfile.node \
    -t node-demo \
    --build-arg NPM_CACHE=$(SERVER_IP):8082 \
    .

Run demo:

docker run --rm node-demo
Docker

Docker daemon can use a mirror.

Docker Hub, now, has quota; if you don’t want to be banned, use a mirror.

A private registry is mandatory to deploy your images.

All registries can be used as a proxy cache for another public or private registry.

Harbor, graduated by CNCF, like all registries, can be a proxy cache

Using nginx for caching Docker Hub should not be done in production, but it’s fun to cache everyting with one server.

Docker daemon configuration.

Add this line in the daemon.json config file:

"registry-mirrors": ["http://_server_ip_:8082/docker/"]

On Linux the path is /etc/docker/daemon.json, but, before breaking something, RTFM the Docker configuration file.

With Docker Desktop (OSX or Windows), the configuration is in the tab “Docker Engine.”

Docker can use a containerized mirror, but pull and build the image BEFORE using it.

Nginx conf:

location /docker/ {
    rewrite ^/docker/(.*)$ /$1 break;

    proxy_cache fat_cache;
    proxy_cache_background_update on;
    proxy_cache_lock on;
    proxy_pass https://registry.hub.docker.com/;
}
Github Demo Project

Copy pasting is boring;, the package-cache contains all the files cited in this post, with an useful Makefile.

Split your terminal (with tmux maybe), and build and run the cache server.

make cache

Run all demos:

make demo

For the Docker cache demo, you have to tweak your Docker daemon configuration and pull a fresh image, not one already cached in your local registry.

Cache all the things

CI can share a private local cache between steps.

Useful, but this cache is not shared (cache poisoning is very dangerous).

The cache proxy is safe, the user can’t write data (and poison it), and the cache is shared between all project builds.

Have fun with Nginx demo, but sooner or later, you will use specifics caches.

blogroll

social