Week 1171
I forgot to post last week since I didn’t do much besides code, so I’m going to batch it together with this week. We spent some time reviewing our learnings from ghiblify to figure out what to work on next. Some of those lessons were:
- Storytelling is critical. We can make things go viral as long as we can present a compelling story.
- User retention is hard. Ghiblify was designed to tell a specific, but short-lived story. Users won’t come back to your product if there aren’t new stories your product can show to users.
- The image generation space is huge. There is such a wide variety of use cases that people have come up with, and it’s still only getting started. Human creativity is infinite.
Based on these lessons, we feel the right thing to work on next is a generalized image generation product. We do realize there are already a gazillion competitors tackling this space. We’ve heard all the caveats like:
- “the market is too saturated”
- “Midjourney is too far ahead”
- “you don’t have enough ML background”
Blah blah blah. The most likely scenario is that this idea fails, but that’s fine as long as we learn from it. All that matters right now is we shoot our shot, listen to user feedback, and ship cool shit. We’ll launch a beta soon so stay tuned!
--------
Other stuff:
- Neal Agarwal launched Infinite Craft and I got inspired to recreate the concept for images. I tweeted a demo of a canvas that blends images together but it’s slower than I would like (usually between 3-6 seconds per generation). I tried to get a demo out quickly so I used moondream to caption the images and pass it through an LCM model for fast image generation. When I have more time I want to turn this into a true image-to-image pipeline with no intermediate captioning step. I also want to dig deeper into StreamDiffusion and OneDiff to figure out how they speed up inference.
- I recently checked out a bunch of O’Reilly style books from the SF public library. They are quite helpful for filling in my technical knowledge gaps and I wish I had started reading them earlier.
--------
Bookmarks:
- Explaining the SDXL latent space by Timothy Vass
- Beyond Self-Attention by Shyam Pather
- Representation Engineering Mistral-7B by Theia Vogel
- Lilian Weng on high quality data
- Someone finetuned Mixtral 8x7B locally on their M3 Macbook
- OpenAI system prompt was leaked
- Aceternity UI Library: tailwind + framer motion components
- Came across these two articles discussing similar technical challenges I worked on at Retool. I’m a big fan of both these companies and it was cool to see how they approach them.
- The end of my childhood by Vitalik
- Dan Wang’s 2023 letter
- The reality of the Danish fairytale by DHH