What's the most cost-effective way to serve lots of large files

omegastick@lemmy.ml · 1 year ago

What's the most cost-effective way to serve lots of large files

Oliver Lowe@lemmy.sdf.org · 1 year ago

If the files are not going to be changing much, then what is typically done is to use a CDN service (e.g. Cloudflare, Akamai, Fastly). The idea is you have an “origin” which could be any old server which serves your files over HTTP (even a VPS running nginx). The CDN is configured to proxy requests to the origin, building up a cache of the files it serves. The CDN can serve files from cache on their own (very large) infrastructure. See also What is a CDN?

Oliver Lowe@lemmy.sdf.org · edit-2 1 year ago

So I got curious and wondered how HuggingFace hosts their files. It’s AWS CloudFront:

$ curl -LI https://huggingface.co/facebook/musicgen-large/resolve/main/state_dict.bin

HTTP/2 302 
content-type: text/plain; charset=utf-8
content-length: 1198
location: https://cdn-lfs.huggingface.co/repos/81/...
date: Fri, 23 Jun 2023 07:09:39 GMT
x-powered-by: huggingface-moon
x-request-id: Root=1-64954533-1eb79eed4ea500203f6435cb
access-control-allow-origin: https://huggingface.co
vary: Origin, Accept
access-control-expose-headers: X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,ETag,Link,Accept-Ranges,Content-Range
x-repo-commit: c19300a6b2b62d29b345ae9eb7b163278e65238a
accept-ranges: bytes
x-linked-size: 6514108999
x-linked-etag: "1f0cf17b5e65c5dd8ba71767371c377f174e4ce1db44bc4d6657825769f26ffd"
x-cache: Miss from cloudfront
via: 1.1 65c7d0c3355767ac8658c2122c8280b6.cloudfront.net (CloudFront)
x-amz-cf-pop: SYD1-C1
x-amz-cf-id: XAU8X4yneUeudylCi_9MeAYZmISCr8OHiBcgjAGcBQT-edrBF6wGCA==

HTTP/2 200 
content-type: application/octet-stream
content-length: 6514108999
date: Fri, 23 Jun 2023 07:06:20 GMT
last-modified: Thu, 08 Jun 2023 19:05:02 GMT
etag: "44ef1b51c0cc2200e29fed5cddbf8e27-408"
x-amz-storage-class: INTELLIGENT_TIERING
x-amz-server-side-encryption: AES256
x-amz-version-id: upVN9_QvGmQZMDWfVECqbytHgWjHTz4t
content-disposition: attachment; filename*=UTF-8''state_dict.bin; filename="state_dict.bin";
accept-ranges: bytes
server: AmazonS3
x-cache: Hit from cloudfront
via: 1.1 dff3fc94ddb54b32b708edf2668b23d2.cloudfront.net (CloudFront)
x-amz-cf-pop: SIN52-P1
x-amz-cf-id: JWTqKUDp8bVPD7Tt4DBySj8zbgT8G60sCA5BMC1cYF4vK9nx45f1Iw==
age: 200
vary: Origin