What would happen if instead of users swarming existing servers when a fediverse service was put in the spotlight, each user spun up their own micro-instance and tried to federate with existing servers?
There’s always the odd person who decides to host a personal fediverse service in their homelab for themselves, but would the fediverse work if that was actually the primary mode of interaction? Or would it fail in a similar way to now where the servers which receive the most federation requests need to scale up?
Presumably the failure modes for federation are easier to scale than browser requests since it’s an async process.
Possibly failure, because setup isn’t just a simple or of box plop. And i can’t see how pings from 5000 microservers is better than 5000 users looking to register? But that’s more of a question than an informed opinion
that ansible book works great, its just a bash script away from regular user DiY.
I’ve watched people who never used a computer install blockchain nodes and miners (including the networks). If someone wants to do it, they WILL figure it out.
Sure I’m not saying they won’t I’m saying there’s not that many people who ‘want’ to beyond the effort of clicking install
Maybe I should clarify with “each user successfully spun up…” I’m mostly curious if the 5000 microservers trying to federate is a more sustainable access pattern than 5000 users hitting the website.
Since federation is an async process, it can be optimized on both ends in a way that user browser requests cannot.
At the same time, federation would overall result in more bandwidth being used because not every user wants to view every post in the frontend.
Maybe I should clarify with “each user successfully spun up…” I’m mostly curious if the 5000 microservers trying to federate is a more sustainable access pattern than 5000 users hitting the website.
Sustainable in what sense?
It’s way more sustainable in the sense of “one website is not controlling the entirety of the experience of a given type of service for 5000 users”, for example. I think it’s important to talk about specific kinds of sustainability, and specific threats to it.
Things to consider (apart from bandwidth-related considerations):
- technical knowledge necessary to safely and securely run and maintain a service
- space, time, and resources (including financial) to do so
- ability, willingness, and energy to moderate a service (this is where Big Tech platforms are falling flat on their faces, for example, and where smaller fedi communities work pretty damn well)
But instance federation is an async process that is happening constantly. A user on your instance may be a realtime load, its only sporatic (on a per user basis). Basically, me spinning up an instance is a constant burden on the network, but me browsing is just a temporary load on a single server.
My understandings is that the best situation is a good number of powerful machines with instances with users evenly distributed amongst them.
You also have to account another type of “ping” if a user lives in a cave 300 meters deep under sea level
I dont think so. As an example, take the [email protected] community for example. It can have say 1000 subscribers from lemmy.ml but only needs to send content to lemmy.ml once as it comes in. All 1000 subscribers see the cache copy from lemmy.ml and a message is only sent back to beehaw.org for comments, votes, etc. With everyone having their own instance beehaw.org would have to send updates to each one instead of sending an update to one instance and 100 users seeing it. A good level to strive for is many small communities of say a few thousand (1-5 thousand or so). That way one single server doesnt get to massive but federation requests arent overwhelming instances either
So let’s say we want to scale up to several million users - what would that look like?
Well, if we wanted say 50 million users at 5000 users per instance we would have 10000 instances. If we wanted 1 billion users we would have 200 thousand instances
What you’re describing is no longer federation but full P2P. From a purely technical point of view, it may work, but the biggest problem will be abuse (spam, excessive resource use, illegal content). When a new instance shows up, how do you know if it’s a spammer or not? And if an instance is blocked by another instance, whose side should you be on?
It wouldnt really be full P2P: I’d expect moderated communities to act as a funnel which everyone interacts with each other through. I wasn’t really considering the hypothetical micro instances to be like a normal server, since even when federated its unlikely that they would consume as much federation bandwidth as a large instance. Most people wouldn’t run a community, simply because they don’t want to moderate it.
Realistically, the abuse problems you mention can already currently happen if someone wants to. It’s easier to make an account on an existing server with a fresh email, spam a bit, and get banned than it is to register a new domain ($) and federate before doing the same. I think social networks would have a lot less spam if every time you wanted to send an abusive message, you had to spend $10 to burn a domain name.
Most of the content would still live on larger servers, so you end up moderating in the same place. Not much difference between banning an abusive user on your instance and banning an abusive single-user instance.
The way activitypub works is that each community has a list of every server that has at least one subscriber to that community.
Every time someone does something in that community, the community sends all those servers a message that tells them what just happened.
So instead of a few hundred servers it might have to inform of your one upvote of a post, it would have to basically inform every user (every user’s server)
It would be bad, it’s not designed to do that.
So you’re saying that there’s a sweet spot between the number of servers being federated and the number of users per server. I wonder what the optimal network distribution would look like.
not a great range but im going to guess between 1,000 and 10,000 users per node.
this is usually the point where midrange servers can be used successfully and the operation is manageable by normal people. This also groups people enough that they aren’t spamming the network with more requests than necessary to sync with thier friends.
I would be shocked if it worked well, seeing as it wasn’t designed for that.
Even if it did though, where would we be having this conversation? It would work more like a texting app than any kind of community.
I’m running my personal instance, I haven’t had any issue interacting.
AFAIK it would help spread the load since my instance just asks/receives the activity once from other instances and then aggregates everything locally.So each time I access a post I need to ask: How many upvotes does it have? How many comments? Which ones are new? From those comments how many upvotes each one has? Which ones are replays to others? Also, get me pfp of each user.
I just changed the sorting, either main feed or comments in a post, well I need to ask in what order they should be displayed.All of these queries are done only in my own instance with my instance’s DB.
In this case beehaw.org just sends “Hey this post got an upvote”, and my instance figures out how it would affect the rest of the posts in my feed.
Also, right now lemmy.ml is taking a toll with all the new users, it takes a while to refresh the page and get any update, but with my instance I can keep scrolling and reading the data my instance already got from lemmy.ml or any other instance.
I’m also running my own instance, very few users and everything is really fast. Because I’m not on the same instance as all those users.
I think there is a tipping point somewhere.
I think the connection calc is
n * (n - 1) / 2
(at least, that’s what it is for mesh networks) so 1000 servers would be handling ~500k connections each.
That would be for 1000 users.
Those connections might be more lightweight, but there are significantly more of them (might even run into OS issues with that many open connections)If each server was handling 50 users, the mesh connections would then be 1.2k.
50 users should be a blip wrt server load, and 1.2k mesh connections is more manageable.At the same time, those graph connections don’t need to be persistent network connections. You could easily cycle through connected nodes and batch update events without issue, and in that case, the primary constraint is bandwidth to the connected graph, not network connections.
It’s a similar concept to email, so I would imagine there will always be big players who will have a reputation of trustworthiness/reliability.
The whole concept here seems to favor spinning up your own “cache” instance between you and the content you want (similar to how old email clients worked, downloading emails from the mail server and never live-fetching them), which is fabulous for distributing the load. Discovery takes a back seat when doing that, but it’s still pretty doable.
I think the main difference between fediverse and email WRT cache instances is that if you create a cache instance for email, you’re only caching your personal emails. If you create a cache instance for a lemmy community, you’re caching every event on the community.
My intuition says there’s probably a breakpoint in community size where the cost of federating all events to the users who subscribe to them becomes greater than the cost of individually serving API requests to them on demand. Primarily because you’ll be caching a far greater amount of content than you actually consume, unlike with email.
Edit: That said, scaling out async work queues is a heck of a lot easier than scaling out web servers and databases. That fact alone might skew the breakpoint far enough that only communities with millions of subscribers see a flip in the cost equation…
Maybe I’m wrong (I’m on Lemmy since yesterday morning) but if you host your instance you’re only caching the communities you are interested in …if you never care about a community or interacted with an instance then those data will never reach your instance. Federated doesn’t imply full redundancy
This is correct, and it’s also worth noting that the remote comments are not “backfilled”, so you don’t get to read all the old stuff