AI & Tech Developments - Mar 25

  • 00:39 — A user tests a new method to run larger models using Qwen3.5. @0xSero
  • 01:05 — LiteParse is launched as a fast, free, non-VLM based document parser that provides high-quality context to AI agents. @jerryjliu0
  • 01:07 — A user shares insights on the performance of a new local LLM agent for video generation. @cocktailpeanut
  • 01:10 — A user discusses the storage of conversations in temporary memory by AI chatbots, highlighting the impact of cache size. @AnishA_Moonka
  • 20:00 — Google introduces TurboQuant, a compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup with zero accuracy loss. @GoogleResearch
  • 20:52 — A user discusses the implementation of TurboQuant in MLX, achieving impressive results with various context lengths. @Prince_Canuma
  • 21:23 — OpenAl was burning $10-15 million per day on Sora; now they’re reallocating those expensive GPUs towards ChatGPT, Codex and tools that people use daily. @shiri_shh
  • 21:37 — Study: Andractim DHT gel can effectively activate dormant androgen receptors in the penis thanks to its localized action, without suppressing the hpta. @DHTGELS
  • 22:04 — Claude Dispatch, a way for you to talk to Claude Cowork & Claude Code on your computer, is now available on all Teams plans. @felixrieseberg

📱 Source Tweets

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: http://goo.gle/4bsq2qI

@GoogleResearch

Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across 8.5K, 32.7K, and 64.2K context lengths: → 6/6 exact match at every quant level → TurboQuant 2.5-bit: 4.9x smaller KV cache → TurboQuant 3.5-bit: 3.8x

@Prince_Canuma

Testing this tomorrow, will report back if it works on Qwen3.5 Might be able to run much larger models if this works.

@0xSero

There’s not that many fast, free, non-VLM document parsers out there: there’s PyPDF, PyMuPDF, Markitdown, OpenDataLoader. Last week, we launched LiteParse : a fast, free, and non-VLM based document parser that provides the highest quality context to AI agents compared to

@jerryjliu0

Bonjour, je viens de créer une alternative à Getquin et Finary qui est Open-Source entièrement self host et qui utilise Enable Banking afin de synchroniser les comptes. C'est un MVP mais franchement contente d'avoir de la synchro pour un usage privé, marre de payer des abos.

@Zoeillle

Every time you message an AI chatbot, the model stores your entire conversation in temporary memory called a KV cache (a cheat sheet so it doesn’t re-read everything from scratch). On a large model like Llama 70B running a long conversation, that cache alone eats 40GB of GPU

@AnishA_Moonka

Fully Local Video Generation Agent!! WanGP now ships with a native LOCAL LLM agent. Yes, it's FULLY local, making use of @Alibaba_Qwen 3.5VL. Just talk to Deepy and he will fill out the Gradio UI and run inference automatically. Just 8GB VRAM needed. Get it on pinokio.

@cocktailpeanut

A user tests a new method to run larger models using Qwen3.5.

@0xSero

LiteParse is launched as a fast, free, non-VLM based document parser that provides high-quality context to AI agents.

@jerryjliu0

A user shares insights on the performance of a new local LLM agent for video generation.

@cocktailpeanut

A user discusses the storage of conversations in temporary memory by AI chatbots, highlighting the impact of cache size.

@AnishA_Moonka

Google introduces TurboQuant, a compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup with zero accuracy loss.

@GoogleResearch

A user discusses the implementation of TurboQuant in MLX, achieving impressive results with various context lengths.

@Prince_Canuma

OpenAl was burning $10-15 million per day on Sora; now they're reallocating those expensive GPUs towards ChatGPT, Codex and tools that people use daily.

@shiri_shh

Study: Andractim DHT gel can effectively activate dormant androgen receptors in the penis thanks to its localized action, without suppressing the hpta.

@DHTGELS

Claude Dispatch, a way for you to talk to Claude Cowork & Claude Code on your computer, is now available on all Teams plans.

@felixrieseberg