Week 1172 -- Why AI?

February 17, 2024

Why are we working on AI? Justin and I get this question surprisingly often. Besides how fun it is, we’re convinced that the biggest opportunities in this next decade will be in AI. Yes, our background from Retool would suggest we are better positioned to tackle something like vertical SaaS. Yes, neither of us have much experience compared to the Ph.Ds who’ve been working on this stuff for years. But what we do have is a voracious curiosity and enough self awareness to recognize the skill gaps we need to fill. These early days and the feeling of playing catchup will just be a footnote in the entire journey.

In the short term, I honestly don’t think it matters what we work on. It’s fine if our projects seem trivial or look like toy products. It’s fine if we let ourselves get distracted by new tech. Everyone in Silicon Valley preaches focus and market sizing, but it doesn’t feel like the right approach given how quickly the space is evolving and our own situation of being in exploration mode. The most important thing is that we learn and iterate quickly. Maybe the thing we once thought of as “too small” ends up being the huge opportunity that everyone missed. Nobody really knows and that’s the beauty (and fun) of it all.

--------

For the past 2 weeks we’ve been working on a generalized image gen product called Lightjourney. We’ve learned a ton in the process and I’ll try to chronicle some of the learnings here.

Our initial goal was to generate images with Midjourney-like quality but 90% faster. We used LCM models which are trained to denoise images in as little as 4 steps instead of 20+ steps that most current models require. However, after some initial user testing, we realized that this wasn’t the correct tradeoff to make. For this type of product, users are willing to wait a bit longer in order to reach a higher quality image. Generation speed certainly matters, but not as much for this type of use case.

We then pivoted towards designing an experience in between Midjourney and ComfyUI. Midjourney generates images that are default good but there’s less customizability. ComfyUI provides the most customization options, but there’s a steep learning curve and it’s very easy to output bad quality images. We wanted to create a UX in the middle, where there’s guardrails to prevent bad images but enough customization so that advanced users won’t “graduate” themselves to ComfyUI.

However, we found out this product already exists. There’s an open source project called Fooocus which accomplishes exactly what we set out to do. They provide good default settings but still expose a ton of implementation details that users can modify for their specific use case. Their prompting experience and the backend logic is really well designed. The two negative things about using Fooocus is that you need to run it on your own machine and the gradio frontend is kinda ugly. Maybe there’s an opportunity for us to solve those via Lightjourney but we’ve paused working on the idea for now.

We then decided to production-ize the image blending demo I tweeted a while back. We switched the captioning model to BLIP and tuned some more UX optimizations to make it feel as smooth as possible. Please try it out at ligma.art!

--------

Bookmarks: