My small, non-profit team produces a lot of content in the form of blogs, presentations, graphics, mp3 and mp4 files. We are looking for a tool that can classify the content and allow us to search on it to find relevant information on topics. The goal is to maximize existing IP we’ve developed. Are any of you using any #foss tools do this? Bonus points if it supports natural language querying or generative AI.

  • The Hobbyist
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    I suppose you can split your content in 3 categories:

    • text
    • audio
    • image

    For text, you can use Langchain which allows to get embeddings from text (read more here: https://js.langchain.com/docs/modules/data_connection/text_embedding/).

    For images, you can use CLIP (this model is open source, from OpenAI). You can read more about it here: https://github.com/openai/CLIP

    For audio, I don’t know anything off the top of my head but you are likely to find something even open source similar to the above I mentioned.

  • Skedaddle@beehaw.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    An internal wiki like Docuwiki or wiki.js might suit your needs. Although they won’t automatically categorize\classify anything, it could be a useful searchable repository (especially if you can train your team in standardizing descriptions\tags\categories\etc).

    • astromd@beehaw.orgOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Interesting suggestion. I’ll see if there are any existing workflows along these lines.