Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Gemma 4 on Cerebras - The Fastest Inference Is Now Multimodal (cerebras.ai)

23 points by Tiberium 1 days ago | 8 comments

tmanchester 1 days ago [-]

Okay this is actually pretty cool. Gemma 4 is a nice little model and I've really enjoyed playing around with it. At 1800 tok/s turns are essentially instant, it's a bit of a trip

simianwords 18 hours ago [-]

I just tried it on their website and it is extremely fast. I wonder what is the value prop of this? Where would I want

1. a smaller model

2. also non local, hosted on cloud

I can't think of any case.

johntash 14 hours ago [-]

OCR is a decent use-case for smaller models. I've had good experience using gemma for OCR'ing handwritten stuff that tesseract doesn't do so well on.

But for 2, probably only useful if you have a huge batch workload you want to get done quicker and don't want the local hardware for it?

jamesponddotco 14 hours ago [-]

A voice assistant comes to mind. Ideally, it'd be local, but if you don't have the hardware you'll go with the cloud, in which case, the fastest, the better.

anthonypasq 17 hours ago [-]

speed is always better. if you have ever used a coding agent with 1000 tps going back to 50 seems like walking through sludge. for simple question i hate waiting 2 minutes for opus to loop 50 times just to read some files and answer a question.

its not necessarily specifically labout gemma 4, but in a year or 2 when we have opus class models at 2000 tps imagine the productivity.

simianwords 17 hours ago [-]

Of course I think speed is preferable but I don’t see myself paying for a fast Gemma

anthonypasq 17 hours ago [-]

i mean, i can imagine a million different apps that use ai that want cheap multimodal capabilities with high latency.

simianwords 18 hours ago [-]

Answering myself: fancy autocomplete in my IDE?

Text autocorrect on my phone? Like give it all the context about me and so on.

keynha 4 hours ago [-]

[dead]

Krishnaswaroop 1 days ago [-]

[flagged]

Rendered at 08:07:43 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.