How CodeRabbit AI Pull Request Reviews Caught Critical Security Vulnerabilities in AI-Generated Code

Version 2 of this article. I checked the article into source control and code rabbit found yet another bug with it, so it’s updated. If interested in the diff see me.

I had this article written like I had AugmentCode write features in a branch that I created as a PR on GitHub, where CodeRabbit reviews code to find security, correctness and optimization issues.

A case study on the importance of AI-powered code review in catching security flaws that slip through initial development

Introduction

In the rapidly evolving world of AI-assisted development, tools like Augment Code are revolutionizing how we write software. However, as this case study demonstrates, even sophisticated AI code generation tools can produce code with serious security vulnerabilities. This is where AI-powered code review tools like CodeRabbit become invaluable, serving as a critical safety net in the development process.

The Project: Redacted SSH Key Management

The Redacted project is a Flutter application designed for redacted with SSH connectivity and cloud server management. The application includes sophisticated SSH key management functionality, which handles sensitive cryptographic operations and secure data storage.

Initially, much of the security-related code was generated using Augment Code, an advanced AI coding assistant. While the generated code was functionally correct and followed good architectural patterns, it contained several critical security vulnerabilities that could have compromised user data and system security.

Critical Security Issues Discovered by CodeRabbit

1. Insecure XOR Cipher Implementation

The Problem:

// Original AI-generated code
final dataBytes = utf8.encode(data);
final keyBytes = utf8.encode(key);
final encrypted = <int>[];

for (int i = 0; i < dataBytes.length; i++) {
  encrypted.add(dataBytes[i] ^ keyBytes[i % keyBytes.length] ^ salt[i % salt.length]);
}

CodeRabbit’s Analysis: CodeRabbit immediately flagged this as a critical security vulnerability, noting that XOR ciphers are cryptographically weak and easily breakable. The AI reviewer pointed out that this implementation:

  • Uses a simple XOR operation that can be easily reversed
  • Lacks proper authentication
  • Doesn’t provide forward secrecy
  • Is vulnerable to known-plaintext attacks

The Fix:

import 'package:cryptography/cryptography.dart';
import 'dart:typed_data';

// Secure AES-GCM implementation with proper KDF
Future<Map<String, dynamic>> encryptData(String data, String password) async {
  // Generate cryptographically secure salt and nonce
  final salt = Uint8List.fromList(List.generate(16, (_) => Random.secure().nextInt(256)));
  final nonce = Uint8List.fromList(List.generate(12, (_) => Random.secure().nextInt(256)));

  // Derive key using PBKDF2 with secure parameters
  final pbkdf2 = Pbkdf2(
    macAlgorithm: Hmac.sha256(),
    iterations: 100000, // OWASP recommended minimum
    bits: 256, // 32 bytes for AES-256
  );

  final secretKey = await pbkdf2.deriveKey(
    secretKeyData: SecretKeyData(utf8.encode(password)),
    nonce: salt,
  );

  // Encrypt using AES-GCM (authenticated encryption)
  final algorithm = AesGcm.with256bits();
  final secretBox = await algorithm.encrypt(
    utf8.encode(data),
    secretKey: secretKey,
    nonce: nonce,
  );

  // Return encrypted data with salt and nonce for storage
  return {
    'ciphertext': base64Encode(secretBox.cipherText),
    'salt': base64Encode(salt),
    'nonce': base64Encode(nonce),
    'mac': base64Encode(secretBox.mac.bytes),
  };
}

Security Note: This implementation uses PBKDF2 with a cryptographically secure salt to derive the encryption key from the password, preventing rainbow table attacks. The AES-GCM mode provides authenticated encryption (AEAD), ensuring both confidentiality and integrity. The salt and nonce are stored alongside the ciphertext and must be preserved for decryption. Never truncate passwords directly for key derivation as this weakens security significantly.

2. Insecure Private Key Storage

The Problem:

// Original AI-generated code
final keyData = {
  'n': privateKey.n.toString(),
  'e': privateKey.exponent.toString(),
  'd': privateKey.privateExponent.toString(),
  'p': privateKey.p.toString(),
  'q': privateKey.q.toString(),
};

final keyJson = jsonEncode(keyData);
final keyBase64 = base64Encode(utf8.encode(keyJson));

CodeRabbit’s Analysis: The AI reviewer identified this as a severe security vulnerability, explaining that:

  • Private key components were stored as plain JSON
  • No proper ASN.1 encoding was used
  • The format was easily readable and extractable
  • It violated cryptographic standards (PKCS#1/PKCS#8)

The Fix:

// Proper PKCS#1 ASN.1 encoding
final sequence = ASN1Sequence();
sequence.add(ASN1Integer(BigInt.zero)); // Version
sequence.add(ASN1Integer(privateKey.n!)); // modulus
sequence.add(ASN1Integer(privateKey.exponent!)); // publicExponent
sequence.add(ASN1Integer(privateKey.privateExponent!)); // privateExponent
// ... additional PKCS#1 components
final derBytes = sequence.encodedBytes;

3. Invalid SSH Public Key Format

The Problem:

// Original AI-generated code
final keyData = {
  'n': publicKey.n.toString(),
  'e': publicKey.exponent.toString(),
};

final keyJson = jsonEncode(keyData);
final keyBase64 = base64Encode(utf8.encode(keyJson));
return 'ssh-rsa $keyBase64 $comment';

CodeRabbit’s Analysis: CodeRabbit caught that this implementation:

  • Generated invalid SSH keys that wouldn’t work with standard SSH tools
  • Used JSON encoding instead of SSH wire format (RFC 4253)
  • Lacked proper MPINT encoding for RSA components
  • Would be rejected by SSH servers and clients

The Fix:

// Proper SSH wire-format encoding
final buffer = <int>[];
final algorithmBytes = utf8.encode('ssh-rsa');
_writeSSHString(buffer, algorithmBytes);

final exponentBytes = _bigIntToBytes(publicKey.exponent!);
_writeSSHMpint(buffer, exponentBytes);

final modulusBytes = _bigIntToBytes(publicKey.n!);
_writeSSHMpint(buffer, modulusBytes);

final keyBase64 = base64Encode(buffer);
return 'ssh-rsa $keyBase64 $comment';

4. Insufficient Key Validation

The Problem:

// Original AI-generated code
if (!privateKey.contains('BEGIN') || !privateKey.contains('END')) {
  return Result.failure(SSHException.invalidKey());
}

if (!publicKey.startsWith('ssh-rsa') && !publicKey.startsWith('ssh-ed25519')) {
  return Result.failure(SSHException.invalidKey());
}

CodeRabbit’s Analysis: The AI reviewer noted that this validation was superficial and missed:

  • Actual PEM structure validation
  • Base64 content verification
  • Key strength requirements (minimum key sizes)
  • SSH wire format validation
  • Key correspondence verification

The Fix:

// Comprehensive validation with proper parsing
final validationResult = _validatePrivateKeyPEM(privateKey);
if (validationResult.isFailure) return validationResult;

final publicKeyValidation = _validatePublicKeySSH(publicKey);
if (publicKeyValidation.isFailure) return publicKeyValidation;

final strengthValidation = _validateKeyStrength(privateKey, publicKey);
if (strengthValidation.isFailure) return strengthValidation;

The Impact of CodeRabbit’s Review

Security Improvements Achieved

  1. Encryption Security: Moved from easily breakable XOR to military-grade AES-GCM
  2. Key Storage: Implemented industry-standard PKCS#1 encoding
  3. SSH Compatibility: Generated RFC 4253-compliant SSH keys
  4. Validation Robustness: Added comprehensive security checks

Metrics of Improvement

  • Vulnerability Count: Reduced from 4 critical security flaws to 0
  • Cryptographic Strength: Improved from trivially breakable to industry-standard
  • Standards Compliance: Achieved full compliance with SSH and cryptographic standards
  • Test Coverage: Added 23 comprehensive test cases covering all security fixes

Lessons Learned

1. AI Code Generation Limitations

While AI tools like Augment Code excel at:

  • Generating functionally correct code
  • Following architectural patterns
  • Implementing complex business logic
  • Maintaining code consistency

They can struggle with:

  • Cryptographic security best practices
  • Industry-specific standards compliance
  • Subtle security vulnerabilities
  • Context-aware security decisions

2. The Critical Role of AI Code Review

CodeRabbit’s AI-powered review proved invaluable by:

  • Catching What Humans Miss: Identifying subtle cryptographic flaws
  • Providing Context: Explaining why each issue was a security risk
  • Suggesting Solutions: Offering specific remediation strategies
  • Ensuring Standards Compliance: Verifying adherence to security standards

3. The Importance of Layered AI Tools

This case demonstrates the power of using multiple AI tools in the development pipeline:

  • Generation Phase: Augment Code for rapid development
  • Review Phase: CodeRabbit for security and quality assurance
  • Testing Phase: AI-assisted test generation for comprehensive coverage

Best Practices for AI-Assisted Secure Development

1. Never Skip Code Review for Security-Critical Code

Even AI-generated code should undergo thorough review, especially for:

  • Cryptographic operations
  • Authentication mechanisms
  • Data storage and transmission
  • Input validation and sanitization

2. Use Specialized AI Tools for Security

Consider using AI tools specifically trained on security patterns:

  • CodeRabbit for comprehensive code review
  • Security-focused static analysis tools
  • AI-powered penetration testing tools

3. Implement Comprehensive Testing

Always include:

  • Unit tests for cryptographic functions
  • Integration tests for security workflows
  • Penetration testing for real-world scenarios
  • Compliance testing against industry standards

4. Stay Updated on Security Standards

Ensure your AI tools and development practices stay current with:

  • Latest cryptographic standards
  • Industry best practices
  • Emerging threat vectors
  • Regulatory requirements

Conclusion

This case study highlights both the promise and the limitations of AI-assisted development. While tools like Augment Code can dramatically accelerate development and generate sophisticated code, they are not infallible when it comes to security.

CodeRabbit’s AI-powered code review proved to be an essential safety net, catching critical security vulnerabilities that could have had serious real-world consequences. The combination of AI code generation followed by AI code review creates a powerful development pipeline that maximizes both speed and security.

The key takeaway is that AI tools should complement, not replace, security-conscious development practices. By leveraging multiple AI tools in a layered approach—generation, review, and testing—development teams can achieve both rapid development velocity and robust security posture.

As AI continues to evolve and become more sophisticated, we can expect these tools to become even better at generating secure code. However, the principle of defense in depth remains crucial: multiple layers of AI-assisted review and validation will always be more effective than relying on any single tool, no matter how advanced.

The future of secure software development lies not in choosing between human expertise and AI assistance, but in thoughtfully combining both to create development workflows that are faster, more reliable, and more secure than either could achieve alone.

Steve Jobs comparing teams to rock tumblers

In this video, I noticed Steve Jobs rubbed his right eye when he said he knew the old man a bit. Jobs talks about a rock tumbler as a metaphor in this video. Two things deeply meaningful to me as a human. I hate this itchy eye and nose shit I experience, sometimes meaningful, sometimes BS like my Poppop said. Bullsh*t nose lie detector, but i hate itchy skin.

https://www.linkedin.com/posts/mariyavaleva-yourscalingpartner_comfort-never-built-anything-worth-remembering-activity-7352298683847114753-T7gz?utm_source=share&utm_medium=member_android&rcm=ACoAAABSwZsB6ufLHH7_-GSWNQHRwLARCYpwJOA

Processing a million pending domains

I mentioned big PRs before. I can also do normal size ones

Having fun on a secret project and also on the Internet Directory project today, I will let the pending domain crawler/categorizer script process all the rest of the pending domains after I had my code reviewed by CodeRabbit.

Status Update For Internet Directory, Other Projects, And Still Seeking Paid Work

I’m about halfway through my latest PR review using CodeRabbit as a reviewer for Internet Directory.  89 commits commited thus far with 169 total comments on the PR. This PR contains what is in the summary above, mainly adding AI and vector search. AugmentCode doesn’t generate secure or optimized code, so CodeRabbit resolves that.

I also started a Dart Flutter project yesterday in order to learn Flutter since I see some jobs asking for it and I had a project idea waiting to work on.

I’m also working on making Electric Sheep javascript threejs game into a native android app playable offline. I was able to add achievements and leaderboard to the native app and had AI create me a zip file of my achievements for E-Sheep to upload to Google to quickly add achievements.

Responding to CodeRabbit is the most boring dev stuff I do, but I think it’s essential.

I was actually having AI do it’s coding on all 3 of those projects at once. I’m so happy my mac can handle it.

My resume is available at this link:
https://raw.githack.com/andytriboletti/publicfiles/main/resume/triboletti_andy_resume-latest.pdf

Here is how I start a new project:

I write around 100 line readme. I hooked up CodeRabbit for this repo. I will check it in, get a review. Reviews are unlimited. Then I will ask AugmentCode to create the architecture and roadmap docs. AugmentCode has a task manager built in, and I also ask it to keep the roadmap up to date, as far as in progress/to-do. Once I review the docs and CodeRabbit reviews the docs, I respond to review comments and then ask Augment to get started on it. Every 50 tool calls it says, “Should I continue?”. That’s good to me, it means I’m getting my money’s worth as they only charge for messages I send to the agent.

Flesh eating bacteria and Putin make me mad and sad

Have you heard of the flesh eating bacteria in the ocean like at Outer Banks, Florida? Will that be permanent now due to global warming? That is a real bummer! We can no longer get in the ocean? What can we do to destroy that bacteria in the ocean? It’s sad to not go to the ocean any more.

I saw the prime minister of India called Putin his friend, and Putin said he will end the war in exchange for part of Ukraine. That makes me so mad about that.

My Jenkins Pipeline for Internet Directory

I worked on adding stages to Jenkins today. They’re all passing. If you can’t figure out a Jenkins problem, let me know. I could share my Jenkinsfile publicly.

By the way, I watched “Zero Day” on Netflix recently, and it is a great show. I highly recommend it. Lots of relevant tech and political stuff!

How Jenkins CI Caught a Critical Git Configuration Bug That Local Testing Missed

A real-world case study in why continuous integration is essential for catching environment-specific issues

The Problem: “It Works on My Machine”

We’ve all been there. Your code runs perfectly in your local development environment, all tests pass, and everything seems ready for production. But then your CI pipeline fails with a cryptic error that makes no sense given your local success.

This exact scenario happened to us recently while working on the Internet Directory project, and it perfectly illustrates why robust CI/CD pipelines are invaluable for catching issues that local development simply cannot detect.

The Mysterious Jenkins Failure

Our Jenkins CI pipeline started failing with this error:

ModuleNotFoundError: No module named 'app.models.enums'

The confusing part? Everything worked perfectly locally:

  • ✅ All imports succeeded
  • ✅ Tests passed
  • ✅ Application ran without issues
  • ✅ The file app/models/enums.py clearly existed

The Investigation

When faced with a “works locally, fails in CI” situation, the first instinct is often to blame the CI environment. But Jenkins was actually doing us a huge favor by exposing a critical configuration issue.

Here’s what we discovered through systematic debugging:

Step 1: Reproducing the Issue Locally

# Test the exact import that was failing in Jenkins
python -c "from app.models.enums import SiteSource"
# ✅ Success locally

# Check if file exists
ls app/models/enums.py
# ✅ File exists

# Check git status
git status
# ✅ Working tree clean

Everything looked normal, which made the Jenkins failure even more puzzling.

Step 2: The Git Discovery

The breakthrough came when we checked what files were actually tracked by git:

git ls-files | grep enums
# ❌ No output - file not tracked!

Despite the file existing locally and git status showing a clean working tree, the critical enums.py file was never committed to the repository.

Step 3: The Root Cause

The culprit was hiding in our .gitignore file:

# Model files and checkpoints
models/

This innocent-looking line was designed to ignore machine learning model files, but it had an unintended consequence: it was also ignoring our SQLAlchemy model directory at backend/app/models/.

Why Jenkins Caught This But Local Testing Didn’t

This is a perfect example of why CI environments are so valuable:

Local Environment Characteristics:

  • Persistent state: Files created during development stay around
  • Incremental changes: You rarely start from a completely clean slate
  • Developer assumptions: You know what files “should” be there

CI Environment Characteristics:

  • Fresh checkout: Every build starts with a clean git clone
  • Only committed files: If it’s not in git, it doesn’t exist
  • No assumptions: The environment only knows what’s explicitly defined

Jenkins was essentially performing a clean room test that revealed our git configuration was broken.

The Broader Implications

This incident highlighted several critical issues that could have caused problems in production:

  1. Deployment Failures: Production deployments would have failed with the same missing file error
  2. Team Collaboration Issues: New team members cloning the repository would be unable to run the application
  3. Backup/Recovery Problems: Disaster recovery procedures would fail due to missing critical files

The Fix and Lessons Learned

Immediate Fix:

# Fix the .gitignore to be more specific
- models/
+ /models/
+ backend/ml_models/

# Add the missing files
git add backend/app/models/enums.py
git add backend/app/models/pending_domain.py
git commit -m "Fix missing model files"

Long-term Lessons:

  1. CI is Your Safety Net: Never skip CI checks, even for “simple” changes
  2. Test Fresh Environments: Regularly test in clean environments that mirror your CI
  3. Be Specific with .gitignore: Overly broad patterns can cause unexpected issues
  4. Trust Your CI: When CI fails but local works, investigate thoroughly rather than assuming CI is wrong

Creating a Local Jenkins Simulation

To prevent this in the future, we created a simple test script that simulates the Jenkins environment:

#!/bin/bash
# Clear Python cache to simulate fresh environment
find . -name "__pycache__" -type d -exec rm -rf {} +

# Test imports in a fresh Python session
python -c "
import sys
sys.path.insert(0, '.')
from app.models.enums import SiteSource
print('✅ Import successful')
"

This allows developers to catch git configuration issues before they reach CI.

Conclusion

This incident perfectly demonstrates why continuous integration is not just about running tests—it’s about validating that your entire development workflow is sound. Jenkins didn’t just catch a bug; it caught a process failure that could have caused significant problems down the line.

The next time your CI fails with a mysterious error while everything works locally, don’t dismiss it as a “CI problem.” Instead, treat it as valuable feedback about potential issues in your development process, git configuration, or environment assumptions.

Your CI pipeline is often the first line of defense against the dreaded “it works on my machine” syndrome. Trust it, investigate thoroughly, and you’ll often discover issues that would have been much more expensive to fix in production.


Have you experienced similar “works locally, fails in CI” situations? What did they teach you about your development process? Share your stories in the comments below.