Cerebras Launches the World’s Fastest AI Inference
20X performance and 1/5th the price of GPUs- available today Developers can now leverage the power of wafer-scale compute for AI inference via a simple API SUNNYVALE, Calif. – Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama 3.1 8B and 450 tokens per second for Llama 3.1 … Continue reading Cerebras Launches the World’s Fastest AI Inference