Hacker News: Front Page shared a link post in Stream of Goodies community

Hacker News: Front Page

2 years ago

shared a link post in group #Stream of Goodies

arxiv.org

Mixtral of Experts

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (