
Kaizen via AI - Part I
A couple of weeks ago I caught up with Mark Ghiasy for lunch and as is wont to happen when a couple of nerds catch up for lunch these days the conversation rolled around to AI.
- Mark: “So, are you using it much?”
- Me: “Yeah, I dabble. Bits & pieces here and there”
- Mark: “How are you using it to improve your processes or productivity?”
- Me: “I mean… I… It’s often that I remember too late. Like ‘ah! I could have delegated or augmented that task with an AI tool!”
- Mark: “So when have you scheduled yourself to change that behaviour?”
- Me: “uuhhhh…”
- Mark: “Come on man, you know how this works. If you actually want to change the behaviour you need to be intentional and committed to making it happen. When are you doing this? Literally, what time every day?
- Me: ”…”
- Mark: 🤨
- Me: “Ok ok! I’m putting it in my calendar right now to review everything I could have or should have done differently and making a plan for the next day”
And so for the past couple of weeks I’ve been making a serious effort to review my daily activities and look for areas where I could have improved the way I did things by some way incorporating one or more AI tools. And “improved” is maybe setting too high a bar there? Just forcing myself to have more exposure every single day to ensure I’m keeping across of developments and having a clear understanding for areas where it’s useful and areas where that’s not realistic. The reality has been that outside of whenever I’m on a call or meeting I’d estimate a out 90% of my time has been in some heavily AI-driven or augmented workflow. Even when I’m on a call the vast majority of those are recorded, transcribed, and summarised by an AI tool.
My plan is to publicly document my journey here. Partly to keep myself honest and accountable, but also to share the process so others can either replicate it or just take a shortcut by copying the things that are working for me.
So without further ado, part 1 in this series of how I’m making constant incremental improvements through the use of AI.
Desktop, Mobile, and Local LLMs
I’m a very happy (paying) user of Kagi for search and had already largely dropped Google. I have also found myself leaning more on ChatGPT for answers to certain questions like “How do I do X in Screenflow on Mac?”. I don’t always know the correct name for the feature I’m looking for which can make finding the answer in the official docs difficult. There’s often a dozen or more videos explaining how to do what I want but that can mean scrubbing 15 minutes of video just to try and find the name or location of the one menu item I’m looking for because I don’t want a whole tutorial I just can’t find the thing I’m after! These combined search + summary for a specific and well defined context have been great. I’ve also observed other people lean heavily on ChatGPT and Claude as their primary interface for answering questions (i.e., not using a search engine at all). It’s definitely felt like I’m witnessing a generational divide happen in real time as the kids these days shun their grandparents’ search engine approach to finding answers.
To try and help me push past this I decided I would make accessing LLMs easier and more prominent on my systems. So I installed:
- ChatGPT Desktop & Mobile
- Claude Desktop & Mobile
- Ollama (and the
gemma3:4b
model)
The latter because I felt like being able to running a model locally would become a useful thing. Especially in situations where I might have data (e.g., my own personal or financial data) that I’m not comfortable sharing with a hosted and shared service. So far I’ve not done more than play with it but I think those use cases may still present in the future.
My hope with the mobile apps is that if I’m out walking and inspiration strikes I can use it dictate my barely-filtered thoughts and summarise or apply some structure to them later. Another scenario that is aspirational at this point and not something that’s actually been added into my routine.
What has become wedged into my routine is using the desktop apps! Looking at my chat history I’m clearly giving preference to Claude here, though I don’t have a rationale or data driven explanation for why that’s the case. The answers from both, for the types of questions I’m asking, seem fine and not as though one is objectively better than the other. What has been interesting to observe is I’m not using either of them at all like a replacement for search. The chat history shows a list of seemingly random and scattered ideas across a vast range areas. They’re functioning almost like a dumping ground for me to put whatever distraction pops into my head that I’d love to come back to but probably don’t need to spend cycles thinking about right now. Like an idea I had for a hardware project that would involve a raspberry pi, an NFC reader, and some custom 3D printing. I was able to ask “What are the ways I could design and build a system that (detail the requirements/features/etc). Include in the options how and where the this process could be outsourced in addition to how I would build it myself”. It’s immediately out of my head and while Claude was generating a page of options for me I could get back to work. Later that night I came back to read what it had produced and it seemed like pretty good outline and actually suggested a feasible way for me to get what I want!
Another use case that’s becoming increasingly common has been to ask for help on using these AI tools. Sometimes that’s because a blank canvas is overwhelming and it’s not obvious how or where to start (Claude again provided some great tips when I asked how I should use the mobile app to be most productive). The recurring example is when I’m just feeling mentally fried and need a little help with my own thinking. “What questions should I ask about X?”, “I’m writing a blog post about Y and need an image for the top of the page. What are some prompts I can use to generate an interesting image?”. There’s been something valuable about continuity across days with this approach too. It’s no surprise that this brain fade is most likely to happen at the end of a long day. A tip I was given a long time ago when writing is to leave your last sentence of the day unfinished, as you’ll find it much easier the next day to pick back up where you left off. In a similar way, going back to my chat history first thing in the morning and acting on the last result has made it feel easy to get straight back into things each day.
Vibe Coding
I used Co-pilot in the beta and it felt like magic. I keep hearing about how amazing Cursor and Windsurf are. Even though I’m not writing code day-to-day I figured I’d owed it to myself try the latest tools and see how far things had come. Maybe it’s just a frequency illusion because now I’m paying attention, but it also seems like “Vibe Coding” has hit almost meme levels of commentary over the past 3 weeks so my timing here definitely feels exciting and like fun things are happening in the general vicinity.
🤩 And wow, this stuff is amazing! 🤯
I was seriously blown away by how good the output was from my first attempt at using the AI coding agent in Cursor. I am not a Swift developer, I have never at any point in my career have I been. I gave Cursor a brief on a desktop app I wanted to help automate some person tasks, and ended the brief with “Ask me any additional questions you need answers to before implementing anything”. It asked for a whole bunch of extra information that was obviously valuable to know when designing a desktop app but not something I’d considered given it’s not my area of expertise. I answered those questions and within a minute I had a fully working app! Sure it had a couple of rough edges, and we iterated and fixed some of them simply by saying things like “Change the icon of X to be something more visually appropriate for the content”. For an app for personal use, and something that took literally minutes to create, it was incredible.
Next up was a website to serve as an asset for some product discovery and validation. This was another mind blowing experience. I had specific opinions on what I wanted here so I pointed Cursor at the framework and component libraries I wanted to use, gave it the brief, instructed it to ask me questions, I answered them, and again within minutes I had a result that I think may have taken me a day or more to roll myself if I’d gone that path. It wasn’t perfect, but even some of the imperfections were quickly addressed through prompting. At one point I gave it an instruction like “The way the sidebar navigation is collapsed and expanded looks gross. Come up with a better UX that maintains the functionality I requested” and it did! A really nice UX! One that is better than I would have done myself (I might have opinions on what I like, but I’m not great at the design parts!).
The whole experience made me think this is such a huge game changer for Product Managers & Designers. No more click-through wireframes. No more having find non-existent slack in a schedule to get an engineer or two to help spike out a really speculative prototype. It’s now totally feasible to get real people to use something that looks and feels real. The quality of the feedback and learning we can get from that is incredible. The conviction we can build before committing an engineering team to building a new feature will be amazing. There’s just sssoooo much iteration and learning that can happen on really quick cycles with prototypes we can happily throw away because they took us minutes to build.
The other use I’ve had over the past week or so was getting my blog back to a publishable state. I turns out it’s been a couple of years since my last post (where did that time go?!) and a series of dependency issues meant I couldn’t actually build my site any more. Upgrading to the latest version of Gatsby was proving to be difficult given some of those dependencies no longer worked, and overall it felt like everything about my tooling and pipeline was overkill for what should be a static site. I tasked Cursor with taking the markdown content from my current site, and making it work with Astro instead of Gatsby. And… it did?! You’re reading the result right now! That was almost pretty magical. It created a new working directory, pulled in the dependencies, wrote a migration script for my markdown and MDX files, got me to run it and then start the server, automatically detected the error the server had parsing some of the migrated content, re-wrote the script and had me re-run it, repeated that loop a couple more times until the server started successfully, and here we are! I came in and updated some of the styling and component choices and pushed my changes and here we are. Again probably just a few minutes of waiting and pushing the occassional “run” button and it did everything else automatically.
It’s not all unicorns and rainbows though!
The deeper I got into my projects the more that first use experience seemed to fade into a distant memory. In typical AI fashion I’d find Cursor (or more specifically I guess, the Claude Sonnet 3.7 LLM it was using) would be confidentally wrong about things if and when things didn’t work as expected. It seemed to get confused about which version of the docs it should use for the frameworks that were being used (Svelte and Tailwind CSS in this case) and would regularly implement features based on the way earlier versions would work. When presented with the error it would then often take drastic actions like pinning everything to earlier versions and then reimplement basically the entire site (and it still wouldn’t work!). I’ve quickly developed a muscle of reverting any fixes if they don’t work the first time and just going through the debug and resolution process manually.
Similar frustrations emerged when I wanted to replicate existing functionality but in a new way. “I want to allow users to create a new Company. Add a new company page that captures the required details, base the design and form input on the existing New Person page”. A minute later and I have the New Company page… I also have a whole new library for managing form inputs, a page that looks nothing like anything else on the site, and inexplicably the main site navigation has been reimplemented from being a sidebar to the left to instead being across the top of the page and half the items are removed. The next habits to develop are to be trigger happy in committing things to version control so that it’s easy to roll back, to read every single suggested diff very careful, and to not bother with AI at all if the basic functionality of what I want already exists in a similar way. It’s easier to just copy the “New Person” page and make the required field changes than it is to try and prompt engineer my way to the desired end state.
My observation on the state of “Vibe Coding” right now is it is truly at it’s best when the AI coding agent is able to take an incredible amount of license in terms of implementation and has very few constraints on what and how things need to be done. As things grow though there is less room for that creative freedom. Nobody wants every page of a site to be an ecclectic mess of different design choices (well, except for AWS 😝). Internal consistency both in terms of coding implementation choices and end user experience become more important than simply having it “done”. It felt like a wrestling match to get things to play nice as the constraints grew. The need to constantly instruct it to not do more than I asked (don’t change the navigation! I only want to change the animation on a form button) became tiresome.
All that said, it’s still an amazing improvement in productivity and one I’m confident is now a permanent part of my workflow. It’s also incredible to think that this is the state of things now, this glimpse of capabilities is the worst it will be, things will only continue to improve. Whatever frustrations I have now will continue to be solved and disappear over time.
Image Generation
Image generation is probably one of the most ubiquitous AI use cases at this point. It’s certainly difficult to escape it, and this post is no exception. I’ve been happily using DiffusionBee on my Mac for a couple of years now. I’ve played with Midjourney and being able to observe the prompts other people use is educational but I just don’t love the whole UX around sitting in Discord to produce images. I prefer being able to run the process locally, switch out models as required, tweak the parameters depending on the speed/variations/quality I need to move things forward, etc.
As I called out earlier, what has changed over the past week is the way I find inspiration for the prompts. I’ve been leaning on Claude heavily to either generate or expand on my own ideas. Then I’ve been feeding multiple prompts into a queue in DiffusionBee and letting it run in the background while I finish writing things up. It produces a dozen or so variations for each prompt, and then I pick one to upscale and add to my social posts, blog post, whatever.
Similar to the observation with coding agents, this stuff really shines when there’s a huge amount of scope for letting the model be creative and when there isn’t really much in terms of an objective definition of “correct”. Where things struggle is when you can look at a picture and quickly observe things that are wrong. Hands and fingers on people are still often a challenge (some models being better than others), I’m also not using it to generate anything with writing on it. Sometimes the system will take it upon itself to smatter writing over things, and it’s always non-sensical. Occassionally that’s fine because it is an irrelevant enough detail to be overlooked. A lot of the time though it turns out to be an unwelcome distraction and the image isn’t usable.
What’s next?
So what’s next? I’ve no idea! I’ve installed a time tracking app to help me see how and where I’m spending my time. From there I’m hopeful I’ll spot some more low-hanging fruit where a slight change in my workflow rapidly increases either the speed or quality of what I can achieve.
Though I’m not sure how much more there is to optimise. I feel like there’s a class of work where there’s value in the process of doing the actual work. And others where I need to be certain the outputs are correct and up to my standard, but verifying that takes the same amount of time as doing the actual work.
I guess we’ll see what happens. 🤖
Published: 09/04/2025
I've spent most of my career working with or at startups. I'm currently the VP of Product / GTM @ Ockam where I'm helping developers build applications and systems that are secure-by-design. It's time we started securely connecting apps, not networks.
Previously I led the Terraform product team @ HashiCorp, where we launched Terraform 1.0, Terraform Cloud, and a whole host of amazing capabilities that set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.