ByteDance officially launches its latest Doubao large model 1.5 Pro (Doubao-1.5-pro), which demonstrates outstanding comprehensive capabilities in various fields, successfully surpassing the well-known GPT-4o and Claude3.5Sonnet in the industry. The release of this model marks an important step forward for ByteDance in the field of artificial intelligence. Doubao 1.5 Pro adopts a novel sparse MoE (Mixture of Experts) architecture, utilizing a smaller set of activation parameters for pre-training. This design's innovation...
Deepseek referred here seems to be v3, not r1. While the linked article didn’t seem to have info on parameter size, fact that they state it is sparse MoE architecture should suggest it is capable to run pretty quick (compared to other models of similar parameter space), so that’s cool.
Deepseek referred here seems to be v3, not r1. While the linked article didn’t seem to have info on parameter size, fact that they state it is sparse MoE architecture should suggest it is capable to run pretty quick (compared to other models of similar parameter space), so that’s cool.