Any post and community could be accessed through a theoretically limitless amount of instances, which also means a theoretically limitless amount of URLs.

Will this hinder Lemmy from ever coming into the mainstream? If I type any topic in Google, I will get a reddit thread that deals with that. Can something like that ever happen for Lemmy?

  • marsara9@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    I’m using the public API to grab every post / comment and then I essentially replace the content with only the unique words. Then when you go to search it just looks for any post or comment, in my database, that has the words you typed in. Finally I sort based on the number of upvotes.

    Right now it only craws a specific instance that you point it to. But as long as that instance is federated it /should/ get everything. But eventually I plan on using that instance’s list of federated instances to scan everything and lighten the load on any one particular instance.

    Edit: I thought about tapping into the existing database but the existing database is more geared towards serving content but not necessarily searching. The database that I’m building you can search but I drop so much of the original data that using it for content is worthless.

    • ATwig@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Now I’m curious what your stack is? Are you using an elastic database? Have you considered possibly using something like Azure Cognitive Search (hosted Elastic with AI/ML functions to add some NLP to your data/queries)? Bing uses it as part of their backend.

      • marsara9@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        HTML + JavaScript frontend. Rust backend with a postgres database.

        It’ll be open sourced once I can get the MVP ready.