Shital Shah (@sytelus) / X

Shital Shah

7,298 posts

Shital Shah

@sytelus

Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.

Redmond, WA

Joined July 2007

Pinned
Shital Shah
@sytelus
Jun 3
We are so happy to announce our new model Aion 1.0 today! Our team at AI Frontiers Lab at Microsoft Research had been cooking hard on this for quite a while. Aion 1.0 is 14B model that can run locally with reasoning + tool calling capabilities. You can choose whatever agentic
94K
Shital Shah
@sytelus
Dec 2, 2022
ChatGPT was dropped on us just bit over 24 hours. It's like you wake up to the news of first nuclear explosion and you don't know yet what to think about it but you know world will never be the same again. Here some interesting snapshots of this "explosion"🧵:
Shital Shah
@sytelus
Nov 3, 2024
Now that we are done with counting r in strawberry…
422K
Shital Shah
@sytelus
Jan 27, 2025
Do people even understand that majority of chip buys are for inference and not training? Inference needs are going to grow exponentially yoy no matter how much juice we try to squeeze out.
574K
Shital Shah
@sytelus
Sep 14, 2024
Terence Tao’s grading: GPT-4o: Completely incompetent graduate student o1-preview: Mediocre but not completely incompetent graduate student A step change.
304K
Shital Shah
@sytelus
Jan 4, 2024
So, this robot was made under $32k. It’s driven by a cheap laptop with a mobile 3070ti. It has 2 low res cameras on wrist and one front facing (+ proprioception from arms). Models are tiny ResNet18 backbones. The key insight is that co-training improves performance! 1/3
Zipeng Fu
@zipengfu
Jan 3, 2024
Introduce 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Learning! With 50 demos, our robot can autonomously complete complex mobile manipulation tasks: - cook and serve shrimp🦐 - call and take elevator🛗 - store a 3Ibs pot to a two-door cabinet Open-sourced! Co-led @tonyzzhao, @chelseabfinn
00:00
422K
Shital Shah
@sytelus
Oct 21, 2023
RL community should be in awe and shock from Eureka paper🫨. The idea here is that you feed the source code of environment to GPT-4 and ask it to write code for the reward function itself! Then you evaluate this reward function in simulation and provide your evaluation results
Jim Fan
@DrJimFan
Oct 20, 2023
Can GPT-4 teach a robot hand to do pen spinning tricks better than you do? I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API! Eureka bridges the
00:00
940K
Shital Shah
@sytelus
Feb 26, 2025
After we learned that DeepSeek folks were using undocumented PTX instructions, now we are learning that they are using stuff that probably even NVidia people don't know. 🫡
162K
Shital Shah
@sytelus
Dec 13, 2023
Mistral-7B is cool but you know what's cooler? A more powerful model in just 1/3rd of the size! Welcome to Phi-2. This is something our team at Microsoft Research had been tirelessly working on and now we have more numbers comparing with Llama-7B, 13B, 70B and Gemini Nano. 👇
568K
Shital Shah
@sytelus
Sep 12, 2024
wow.... so ChatGPT o1 is getting 80% on my privately held benchmark. The previous best was 30% by Sonnet 3.5 and 20% by GPT 4o. Before folks jump to conclusion that there is some simple new algo waiting to be replicated, let's take time to appreciate that this was a research
235K
Shital Shah
@sytelus
Dec 13, 2024
Are you ready for an early Christmas present from our team at Microsoft Research? Introducing the most powerful smol model ever built in the world! Welcome to Phi-4! 👇
216K
Shital Shah
@sytelus
Dec 9, 2023
I think we haven’t fully grasped the impact of Mamba paper that was just dropped this week. From the results so far, it is very likely that Mamba might just be the architecture that finally unseats the attention from its long held grip on the throne.🧵
426K
Shital Shah
@sytelus
Oct 21, 2024
DeepMind's chess paper has sharply divided AI community: Some are pointing to it as evidence that LLMs can do reasoning and planning while others say it's just lookup table/memorization. In reality, I think the paper uncovers something else if you look into details! 🧵
411K
Shital Shah
@sytelus
Jan 6, 2023
Just about to wrap up my day and saw VALL-E! Wow!! This model takes 3 seconds of speech sample for a person and can synthesize text-to-speech in same voice with unbelievable fidelity. It can maintain even emotion and acoustic environment in the sample. valle-demo.github.io
249K