TL;DR:
- AI coding tools and the LLMs that drive them can be powerful, but they’re not security-aware by default.
- Outdated packages and insecure practices can creep in. Don’t assume LLMs "just know."
- Clear, specific prompts matter more than you think.
- Don’t skip automated linters, SAST/DAST tools, or dependency checkers. AI doesn’t replace them.
- Code review still matters. A lot.
Throughout my career, I’ve brought a healthy skepticism to the hype that comes with each new wave of technology, and AI code generation is no different. Even now, working at a company building tools to improve security and productivity for AI-assisted developers, I’ve had my doubts. Can LLMs actually produce high-quality, secure code? Can they be trusted with real-world applications, especially for people like me who aren't full-time developers?
That skepticism shapes how I've approached my exploration of these tools. I’m not just curious about what they can do out of the box, I want to understand how to use them effectively. Where do AI coding assistants and LLMs genuinely help? Where do they fall short? And how can we guide them to get better results?
It quickly became clear that AI can help write code but doesn’t take on responsibility for it. That still falls on us, especially when it comes to keeping things secure. LLMs don’t “just get it.” You're still responsible for ensuring the code you deploy is secure, even if an AI writes it. "Vibe coding" with an AI won’t save you when it generates an insecure login page or uses deprecated packages.
A real-world experiment
For context: I have a computer science degree and spent time as a developer early in my career. I quickly pivoted into infrastructure, where I focused heavily on automation, and now work in technical marketing with a recent focus on AI assisted coding. I’ve kept my foundational understanding of app structure and security risks, but I relied on AI tools here both to teach myself how to work with them and to fill in the gaps in my Python knowledge.
I recently dusted off a very old PHP-based web app. It’s a small tool for checking in attendees at community meetups, printing name badges, and picking winners for door prizes. Not mission-critical by any means, but I was curious how an LLM like Claude could help modernize it.
So I used Cline and asked Claude 3.7 Sonnet to rebuild it in Python/Django. A few prompts and iterations later, I had a working app. Victory, right?
Not quite.
The problems
Old versions: Claude defaulted to Django 4, even though Django 5 was released well before its knowledge cutoff. Claude probably made this choice because most public examples during its training were still based on Django 4. LLMs generate code based on what they’ve seen most often, not necessarily what’s most current. I had to explicitly ask for Django 5 before it even considered using it.
Outdated dependencies: Most of the libraries it picked were outdated (by years in some cases) even within the model’s supposed knowledge window.
Password security fail: For the admin login, Claude implemented MD5 hashing for password storage. In 2025. Yikes.
OWASP Top 10? Nope: When I asked it to review its own code against the OWASP Top 10, it found glaring issues: insecure cookies, XSS risks, poor session handling.
These weren’t edge cases or niche scenarios. They were textbook mistakes.
What I took away
This experience clarified a few things for me:
Prompt engineering really matters. If I had started by specifying what versions to use, which tools to integrate, and what security standards to follow, I would have gotten better results. Instead, I gave a casual, open-ended prompt and got casual, open-ended code in return.
Security tooling is essential. Arguably even more so with AI-assisted workflows. Using an LLM to generate code doesn't reduce the need for linters, scanners, or security checks; it makes them more critical. Tools like
pylint
,bandit
, andtrivy
help with dependency and security scanning. Guidance from resources like the OWASP Top 10 rounds out the picture. These aren’t optional, they're lifelines.Code review is still critical. Just because it compiles and works doesn’t mean it’s good. Experience helps you see what automated tools (including LLMs) don’t.
A checklist for safer AI coding
If you’re using AI to help write or refactor apps:
- Be specific: Name your preferred versions, frameworks, and security practices in the prompt.
- Break it into smaller pieces: For migration projects, split the codebase into manageable units (DB, business logic, API, etc.) and prompt the AI layer-by-layer. This reduces context overload and makes review easier.
- Automate review: Use linters, SAST/DAST tools, and SBOM scanners in your CI/CD pipeline. Don’t ship without them.
- Don’t skip human review: Nothing replaces experienced eyes on code.
Final thought
AI-assisted development is powerful, but it’s not magic. It still requires thoughtfulness, review, and good security hygiene. LLMs can save time writing and reviewing code, but that time savings can quickly disappear if you’re cleaning up after a security incident. A little extra effort up front is still the best defense against costly surprises later.
If you've tried AI coding tools, what surprises or pitfalls have you run into? Let me know in the comments.
Top comments (0)