James Duffy

My Best Practices for DevOps and Infrastructure Engineering

🔎
Open to New Opportunities Cloud Engineering Leader with Expertise in AWS, Kubernetes, Terraform, and Building Scalable, Reliable Infrastructure for High-Traffic Applications. Connect on LinkedIn

I recently put together a GitHub repository with some of my best practices, guidelines, and documentation that I’ve developed over the years as a DevOps and Infrastructure Engineer. I found myself writing similar documentation at each place I worked and decided it was time to stop rewriting it over and over. It is available on GitHub for anyone to use, modify, or contribute.

Check it out here: GitHub - jamesduffy/documentation.

# What’s Included

This repository includes:

  • Incident Response: An incident response plan, postmortem templates, and tabletop scenarios to help teams handle and learn from incidents.
  • On-call Handbook: A guide for on-call engineers, including pager setup, general preparation tips, and advice on optimizing rotations.
  • Runbook: Information about building runbooks to make handling common issues as smooth as possible.
  • Terraform Best Practices: My thoughts on how to use Terraform efficiently, covering some of the lessons I’ve learned from managing infrastructure at scale.

# Why I Made This

The goal is to make it easier to build, maintain, and scale reliable infrastructure, and I’m hoping these resources will help others in the community. Feel free to browse, use, and adapt anything you find helpful. I also welcome suggestions or contributions.

I would love any feedback or thoughts on what could be added or improved.