• 1 Post
  • 12 Comments
Joined 1 year ago
cake
Cake day: July 13th, 2023

help-circle




  • Using tools from physics to create something that is popular but unrelated to physics is enough for the nobel prize in physics?

    If only, it’s not even that! Neither Boltzmann machines nor Hopfield networks led to anything used in the modern spam and deepfake generating AI, nor in image recognition AI, or the like. This is the kind of stuff that struggles to get above 60% accuracy on MNIST (hand written digits).

    Hinton went on to do some different stuff based on backpropagation and gradient descent, on newer computers than those who came up with it long before him, and so he got Turing Award for that, and it’s a wee bit controversial because of the whole “people doing it before, but on worse computers, and so they didn’t get any award” thing, but at least it is for work that is on the path leading to modern AI and not for work that is part of the vast list of things that just didn’t work and it’s extremely hard to explain why you would even think they would work in the first place.






  • The counting failure in general is even clearer and lacks the excuse of unfavorable tokenization. The AI hype would have you believe just an incremental improvement in multi-modality or scaffolding will overcome this, but I think they need to make more fundamental improvements to the entire architecture they are using.

    Yeah.

    I think the failure could be extremely fundamental - maybe local optimization of a highly parametrized model is fundamentally unable to properly learn counting (other than via memorization).

    After all there’s a very large number of ways how a highly parametrized model can do a good job of predicting the next token, which would not involve actual counting. What makes counting special vs memorization is that it is relatively compact representation, but there’s no reason for a neural network to favor compact representations.

    The “correct” counting may just be a very tiny local minimum, with tall hill all around it and no valley leading there. If that’s the case then local optimization will never find it.



  • Other thing to add to this is that there’s just one or two people in the train providing service for hundreds of other people or millions of dollars worth of goods. Automating those people away is simply not economical, not even in terms of the headcount replaced vs headcount that has to be hired to maintain the automation software and hardware.

    Unless you’re a techbro, who deeply resents labor, someone who would rather hire 10 software engineers than 1 train driver.