← Home

Week 1162

My second week in NYC went by in a blur. I spent most of the week catching up with friends but haven’t been as productive as I would’ve liked. Justin and I have been exploring several ideas with the intention that we will pick one to investigate deeper at a later time. It feels weird to me to be working at such an abstract level because it goes it against my natural instincts to understand the entire stack of a problem. But it makes sense to optimize for breadth instead of depth right now given how early we are in the ideation phase. It’s physically impossible to go deep on every single idea so we need to prioritize. Hearing all the cool things our founder friends are working on just gives me a lot of FOMO and an itch to start building stuff.

For ideation, we’ve been taking three approaches to this:

Alongside ideation, we’ve continued to read more papers and explore the latest in AI. One of the coolest papers that came out this week was the MAMBA paper by Albert Gu and Tri Dao [tweet][arxiv][github]. They introduce a new sequence model architecture which they claim to be much faster for training and inference than transformers while maintaining the same modeling capabilities. If true, this could result in a world where training high quality LLMs becomes much cheaper and accessible. They trained some small LLMs (2.8B params) and released the weights, but we won’t know if MAMBA will be transformative until someone invests the resources to train a massive model based on it.

Mistral released a mixture of experts model comprised of eight 7B param models with a sequence length of 32k tokens. GPT4 apparently derives a lot of performance gains from its mixture of experts architecture so it’s amazing that we now have a pre-trained open source version for the community to try out.

Google released a press release for Gemini and they claim to have achieved better performance than GPT4. However, the performance gains aren’t huge so there’s been speculation that we are near the performance limits of LLMs. Among a bunch of capabilities that they announced, one that stood out to me was the fact that the Gemini Nano model can run on Pixel phones. This could be the start of a trend towards LLM inference on edge devices.

My friend Varun launched a project called latentverse where you can use text to generate a 3D scene and pan around in it. I looked into the implementation and it led me down a rabbit hole of latent consistency models, image generation LoRAs, and threejs. It’s so inspirational to see friends work on cool projects like this. His blog post on open source AI is also a great read.

--------

Outside of work, I tried exploring the city of New York as much as I could. I wandered through the parks and neighborhoods, ate a lot of good food, and even reactivated my Hinge for a day. I don’t have any strong opinions about the city as of now. Most things seem about the same to me as SF besides the weather (way colder) and the amount of homeless people (way less). The most relevant difference I’ve observed is that tech is not the main industry in NYC. SF is filled with mostly tech people while NYC has a lot more diversity in professions. There’s pros and cons to each side, and I’m not yet sure which one fits me better. I’ll reflect on it more once I land back in SF and have some time to decompress from this trip. It’s been fun and eye-opening but I can’t wait to sleep in my own bed again.