Yesterday Framework unveiled a small form factor desktop based on Halo Strix.
Halo strix seems to require memory with high bandwidth, specifically 256-bit LPDDR5x, according to the specs.
Allegedly, the company said they tried to find a solution to use modular memory (e.g. lpcamm) but it did not work out signal integrity wise (@36:10, from the unveiling video above and here.)
So I’m wondering exactly, why not?
It seems LPCAMM2, offers a 128-bits bus and can scale today up to 7500-8500 MT/s.
This would offer 7500 x 128 / 8 = 120GB/s
. Would it not have been possible to simply place two LPCAMM2 modules, to cover the full extent of the 256-bit bus and reach the 256 GB/s, by using the 8000 MT/s configuration?
Did they reach integrity issues because they tried to reach the speeds using only one LPCAMM2 stick? That would indeed have been impossible. Maybe LPCAMM2 can not be combined (perhaps due to space as they are using the mini-ITX motherboard format)? Or am I missing something?
I think you may be confused about how bus widths work.
If you connected 2 seperate memory modules with 128 bit bus widths… that doesn’t add up to a combined/total bus width of 256 bit for the busses together.
It means you have 2 seperate memory modules, more overall memory, but they are both still going to top out at 128 bit data transfer standards.
Its like… if you have one tunnel with a speed limit of 128 kph, and then you build a second, adjacent tunnel with 128 kph speed limit… you can now have twice as many vehicles (theoretically), but they’re still driving through the tunnels at the same speed.
EDIT: Maybe a more accurate example would be if you have two tunnels, each 128 lanes wide, neither of them are going to be able to fit a 256 lane wide monster truck.
Also, the max supported memory of a Halo Strix 395+ is 128 GB.
So… even with your I think flawed example, you’d have to use 2 64 GB LPDDR5x modules … because the adding a second 128 GB module to an existing 128 GB module would be completely useless, the CPU would just ignore or do undefined things with the additional 128 GB of memory.
Also, the Halo Strix CPU bandwidth isn’t 100% going to the memory. Some of that is reserved for the L1 L2 and L3 caches, probably other stuff as well.
…
Beyond that…
The extremely new ‘hotshit’ LPDDR5x ram is texhnically:
LPDDR5 8533
Whereas Framework is using:
LPDDR5 8000
… where the latter number indicates MT/s of throughput.
Why isn’t Framework using the fancier hotshit stuff?
They were not able to negotiate any of it at a reasonable price from any manufacturers, all the output for the hotshit is only available in prebuilt laptops, its all going to other vendors.
You can’t just buy LPDDR5 8533 on the open market right now, as a consumer, to the best of my knowledge. Its B2B only, purchased from Samsung or Micron or whoever, and then assembled into laptops by HP or Dell or w/e, and then sold to BestBuy or Amazon sellers, and then sold to end consumers.
The people making laptops with LPDDR5 8533 memory are likely paying premiums and/or making huge volume orders… Framework almost certainly doesn’t have the money or overall organizational size to do something like that.
…
Why aren’t Framework using LPCAMM or LPCAMM2 memory?
The answer is right there in the Linus vid you linked, but maybe you misunderstood it.
The Halo Strix wasn’t designed to work with LPCAMM/2 memory. The 256 bit bus on the CPU wasn’t designed to work with the smaller busses on LPCAMM/2 memory. It would be sending, and expecting to recieve data to and from the memory on lanes that the memory does not posses.
It would be somewhat analagous to trying to run a 64 bit program on a 32 bit OS. The program won’t work because it is sending and attempting to recieve data via pathways that do not exist in the OS.
Except that in that example, you can have a software translation compatability layer, at least if the analogy is reversed and its a 32 bit program on a 64 bit OS.
But… you can’t do that in hardware without basically a massive chipset driver overhaul, and it might end up just being impossible anyway.
AMD would be the only people likely capable of developing that as a feature upgrade, and Framework would likely have had to cajole and pay AMD a significant amount of money into attempting to develop that, which would have taken a lot of time, and might result in failure anyway.
So, Framework decided on pushing out a product that is actually viable now.
I was trying to reason from how GPUs occasionally use a so called clamshell design where, if I understand correctly, they split their bus to reach double the number of memory chips. The chips are paired and respond to the same addresses but then each provide part of the data which is then combined.
Your example for vehicles got me confused, because as you point out, if you double the number of lanes while keeping the speed the same, you do effectively double the number of vehicles passing per unit of time, which is the bandwidth we are trying to achieve.
I’m sorry if I’m missing some important details but I am still rather confused.
PS: as per the specific framework memory speed specs, the halo strix chip maxes out at 8000, so 8533 is not supported, as per the specs I linked in the the post.
https://usercomp.com/news/1238286/gddr5-clamshell-mode-and-bandwidth
Clamshell mode is relevant to GDDR5 memory.
Memory that is a part of the GPU, part of a GPU board.
LPCAMM/2 and LPDDR5(x) memory is system memory, aka, RAM sticks.
Completely different kinds of memory.
…
You do not understand correctly.
There are many different kinds of memory chips, and they are not all directly compatible with all other kinds of processors.
You know how pcpartpicker or sites like that will tell you hey, you can’t plug DDR4 RAM into a mobo that only has slots for DDR5 RAM?
Imagine that but 1000x more complicated for trying to mix and match things at not the ‘hardware component’ level, but the ‘hardware subcomponents of hardware components’ level.
Clamshell mode only works on GPUs because the processor(s) on a GPU is only ever going to directly interface with the GDDR memory that is part of the the GPU, and is intentionally designed to specifically work with that kind of GDDR memory… so you don’t need to worry about making GDDR memory being able to directly interface with other parts of the system.
A GPU is an add-in-board. Its much closer to an entire miniature PC itself, on one board, and everything it communicates to the rest of the system is through the PCIE 16 slot on the motherboard.
That allows a GPU to have more specialized tech within itself, as you’re not going to be customizing, manually modifying the memory or other components of your GPU.
System memory directly connects to a motherboard, and thus has to be directly compatible with anything else that could be plugged in to any thing else directly to the motherboard.
Because system memory is directly connected to potentially a much greater variety of other hardware via the mobo, it cannot be so specialized, otherwise less advanced components that directly connect to the mobo wouldn’t be able to interface with it.
…
The Halo Strix is technically not a CPU.
It is an APU.
That means its a single hardware component that is a hyrbrid of CPU style processor cores, and GPU style processor cores.
That means the system its plugged into has to use a kind of system memory that is both generally compatible with other hardware components on the mobo, but is also compatible with the more specialized GPU style processor core demands.
This is why, in general, there are architecture differences between laptop/smartphone style memory, and pc style system memory, why you can’t usually plug laptop RAM into a PC, or visa versa.
…
I’m sorry if this is complicated and confusing, there may be a more straightforward way to explain it, but computer hardware really is quite complicated at this level of detail.
If my example/analogies are still confusing and inadequate, then abandon them and try this:
Bus widths are data transfer standards and protocols… in that sense they are kind of like a language.
A 256 bit ‘language’ speaking 256 ‘words’ at the same speed as a 128 bit ‘language’ may be able to push twice as much meaningful informatiom in the same amount of time… but this is useless if there isn’t an instantaneous translation between the 128 bit and 256 bit ‘speakers’.
If your standard is expecting ‘words’ that are all 128 ‘letters’ long, and then you try to send a 256 letter word… you’re gonna have a problem. Half the letters won’t get through.
You would have to have some specialization on the sender side, that breaks its 256 letter words into 2 seperate 128 letter words, and some specialization on the reciever side that takes 2 concurrent 128 letter words, combines them into the 256 letter word, and then decodes that into the actual intended meaning (instruction set) of the 256 letter word.
At that point, your memory now also has its own ‘translator’ or, less metaphorically, processor of some kind… which is silly and wasteful.
It makes more sense to just use memory that can ‘speak the same language’ as the processor.
… If this still doesn’t make sense, then go to wikipedia or find an instructional course or video series that actually, properly explains compuyer hardware design, including what a bus actually is and how it works.
…
The AMD specs you link actually don’t specify what you are saying here.
They say:
LPDDR5x is a blanket term. A container term.
The ‘x’ is a placeholder for all the specific throughput speeds of all different kinds of LPDDR5 - (number) speeds.
It certainly does support LPDDR5 - 8000.
It likely also supports LPDDR5 - 8533…
…though AMD does not actually specifically confirm nor deny this on your cited source page, as they are using a more vague, catch-all term.