Folks in the field of AI like to make predictions for AGI. I have thoughts, and Iā€™ve always wanted to write them down. Letā€™s do that.

Since this isnā€™t something Iā€™ve touched on in the past, Iā€™ll start by doing my best to define what I mean by ā€œgeneral intelligenceā€: a generally intelligent entity is one that achieves a special synthesis of three things:

A way of interacting with and observing a complex environment. Typically this means embodiment: the ability to perceive and interact with the natural world. A robust world model covering the environment. This is the mechanism which allows an entity to perform quick inference with a reasonable accuracy. World models in humans are generally referred to as ā€œintuitionā€, ā€œfast thinkingā€ or ā€œsystem 1 thinkingā€. A mechanism for performing deep introspection on arbitrary topics. This is thought of in many different ways ā€“ it is ā€œreasoningā€, ā€œslow thinkingā€ or ā€œsystem 2 thinkingā€. If you have these three things, you can build a generally intelligent agent. Hereā€™s how:

First, you seed your agent with one or more objectives. Have the agent use system 2 thinking in conjunction with its world model to start ideating ways to optimize for its objectives. It picks the best idea and builds a plan. It uses this plan to take an action on the world. It observes the result of this action and compares that result with the expectation it had based on its world model. It might update its world model here with the new knowledge gained. It uses system 2 thinking to make alterations to the plan (or idea). Rinse and repeat.

My definition for general intelligence is an agent that can coherently execute the above cycle repeatedly over long periods of time, thereby being able to attempt to optimize any objective.

The capacity to actually achieve arbitrary objectives is not a requirement. Some objectives are simply too hard. Adaptability and coherence are the key: can the agent use what it knows to synthesize a plan, and is it able to continuously act towards a single objective over long time periods.

So with that out of the way ā€“ where do I think we are on the path to building a general intelligence?

World Models Weā€™re already building world models with autoregressive transformers, particularly of the ā€œomnimodelā€ variety. How robust they are is up for debate. Thereā€™s good news, though: in my experience, scale improves robustness and humanity is currently pouring capital into scaling autoregressive models. So we can expect robustness to improve.

With that said, I suspect the world models we have right now are sufficient to build a generally intelligent agent.

Side note: I also suspect that robustness can be further improved via the interaction of system 2 thinking and observing the real world. This is a paradigm we havenā€™t really seen in AI yet, but happens all the time in living things. Itā€™s a very important mechanism for improving robustness.

When LLM skeptics like Yann say we havenā€™t yet achieved the intelligence of a cat ā€“ this is the point that they are missing. Yes, LLMs still lack some basic knowledge that every cat has, but they could learn that knowledge ā€“ given the ability to self-improve in this way. And such self-improvement is doable with transformers and the right ingredients.

Reasoning There is not a well known way to achieve system 2 thinking, but I am quite confident that it is possible within the transformer paradigm with the technology and compute we have available to us right now. I estimate that we are 2-3 years away from building a mechanism for system 2 thinking which is sufficiently good for the cycle I described above.

Embodiment Embodiment is something weā€™re still figuring out with AI but which is something I am once again quite optimistic about near-term advancements. There is a convergence currently happening between the field of robotics and LLMs that is hard to ignore.

Robots are becoming extremely capable ā€“ able to respond to very abstract commands like ā€œmove forwardā€, ā€œget upā€, ā€œkick ballā€, ā€œreach for objectā€, etc. For example, see what Figure is up to or the recently released Unitree H1.

On the opposite end of the spectrum, large Omnimodels give us a way to map arbitrary sensory inputs into commands which can be sent to these sophisticated robotics systems.

Iā€™ve been spending a lot of time lately walking around outside talking to GPT-4o while letting it observe the world through my smartphone camera. I like asking it questions to test its knowledge of the physical world. Itā€™s far from perfect, but it is surprisingly capable. Weā€™re close to being able to deploy systems which can commit coherent strings of actions on the environment and observe (and understand) the results. I suspect weā€™re going to see some really impressive progress in the next 1-2 years here.

This is the field of AI I am personally most excited in, and I plan to spend most of my time working on this over the coming years.

TL;DR In summary ā€“ weā€™ve basically solved building world models, have 2-3 years on system 2 thinking, and 1-2 years on embodiment. The latter two can be done concurrently. Once all of the ingredients have been built, we need to integrate them together and build the cycling algorithm I described above. Iā€™d give that another 1-2 years.

So my current estimate is 3-5 years for AGI. Iā€™m leaning towards 3 for something that looks an awful lot like a generally intelligent, embodied agent (which I would personally call an AGI). Then a few more years to refine it to the point that we can convince the Gary Marcusā€™ of the world.

Really excited to see how this ages. šŸ™‚

  • stingpie@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    Ā·
    edit-2
    5 months ago

    Why do the leaders in AI know so little about it? Transformers are completely incapable of maintaining any internal state, yet techbros somehow think it will magically have one. Sometimes, machine learning can be more of an art than a science, but they seem to think itā€™s alchemy. They think theyā€™re making pentagrams out of noncyclic graphs, but are really just summoning a mirror into their own stupidity.

    Itā€™s really unfortunate, since they drown out all the news about novel and interesting methods of machine learning. KANs, DNCs, MAMBA, they all have a lot of promise, but canā€™t get any recognition because transformers are the laziest and most dominant methods.

    Honestly, I think we need another winter. All this hype is drowning out any decent research, and so all we are getting are bogus tests and experiments that are irreproducible because theyā€™re so expensive. Itā€™s crazy how unscientific these ā€˜researchā€™ organizations are. And OpenAI is being paid by Microsoft to basically jerk-off sam Altman. Itā€™s plain shameful.

    • BlueMonday1984@awful.systems
      link
      fedilink
      English
      arrow-up
      2
      Ā·
      5 months ago

      Honestly, I think we need another winter. All this hype is drowning out any decent research, and so all we are getting are bogus tests and experiments that are irreproducible because theyā€™re so expensive. Itā€™s crazy how unscientific these ā€˜researchā€™ organizations are. And OpenAI is being paid by Microsoft to basically jerk-off sam Altman. Itā€™s plain shameful.

      If an AI winter does happen, I expect itā€™ll be particularly lengthy/severe. Unlike previous AI hype cycles, this particular cycle has come with some serious negative externalities (large-scale copyright infringement, climate change/water consumption, the flood of AI slop, disinformation, etc).

      Said externalities have turned the public strongly against AI, to the point where refusing to use it has become a viable marketing strategy.

      You want my suspicion, any further AI research will probably be viewed with immediate distrust, at least for a while.

      • David Gerard@awful.systemsM
        link
        fedilink
        English
        arrow-up
        1
        Ā·
        5 months ago

        The key difference is that in previous AI springs, the customer was the DoD, and winter set in when they declined to set more money on fire for an approach that wasnā€™t working.

    • scruiser@awful.systems
      link
      fedilink
      English
      arrow-up
      1
      Ā·
      edit-2
      5 months ago

      I am probably giving most of them too much credit, but I think some of them took the Bitter Lesson and learned the wrong things from it. LLMs performed better than originally expected just off context, and (apparently) scaled better with bigger model and more training than expected, so now they think they just need to crank up the size and tweak things slightly (i.e. ā€œprompt engineeringā€ and RLHF) and donā€™t appreciate the limits built into the entire approach.

      The annoying thing about another winter is that it would probably result in funding being cut for other research. And laymen donā€™t appreciate all the academic funding that goes into research for decades before an approach becomes interesting and viable enough to scale up and commercialize (and then overhyped and oversold before some more modest practical usages become common, and relabeled as something other than AI).

      Edit: or more cynically, the leaders and hype-men know that algorithmic advances arenā€™t an automatic dump money in, get out disruptive product process, so they donā€™t bother putting as much monetary investment or hype into algorithmic advances. Like compare the attention paid towards Yann LeCunn talking about algorithmic developments vs. Sam Altman promising grad student level LLMs (as measured by a spurious benchmark) in two years.

  • Sailor Sega Saturn@awful.systems
    link
    fedilink
    English
    arrow-up
    2
    Ā·
    edit-2
    5 months ago

    Robots are becoming extremely capable ā€“ able to respond to very abstract commands like ā€œmove forwardā€, ā€œget upā€, ā€œkick ballā€, ā€œreach for objectā€, etc.

    Computers now have the cutting edge ability to-- *checks notes*ā€“ parse Zork text adventure style user input.

    If youā€™ve done any reading on the history of AI you may recognize these sorts of commands as ā€œblocks worldā€, which goes back to the '60s. Sure thereā€™s been a ton of advancements since then, but just the basic act of a computer understanding ā€œkick ballā€ is not exactly groundbreaking.

  • lurklurk@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    Ā·
    5 months ago

    It uses system 2 thinking to make alterations to the plan (or idea). Rinse and repeat.

    They probably meant to write system 1 thinking here.

  • David Gerard@awful.systemsM
    link
    fedilink
    English
    arrow-up
    1
    Ā·
    5 months ago

    Iā€™ve been spending a lot of time lately walking around outside talking to GPT-4o

    and it loves me very much, I just need to send it money for its ticket over via western union