My Best Practices for DevOps and Infrastructure Engineering
I recently put together a GitHub repository with some of my best practices, guidelines, and documentation that I’ve developed over the years as a DevOps and Infrastructure Engineer. I found myself writing similar documentation at each place I worked and decided it was time to stop rewriting it over and over. It is available on GitHub for anyone to use, modify, or contribute.
Check it out here: GitHub - jamesduffy/documentation.
# What’s Included
This repository includes:
- Incident Response: An incident response plan, postmortem templates, and tabletop scenarios to help teams handle and learn from incidents.
- On-call Handbook: A guide for on-call engineers, including pager setup, general preparation tips, and advice on optimizing rotations.
- Runbook: Information about building runbooks to make handling common issues as smooth as possible.
- Terraform Best Practices: My thoughts on how to use Terraform efficiently, covering some of the lessons I’ve learned from managing infrastructure at scale.
# Why I Made This
The goal is to make it easier to build, maintain, and scale reliable infrastructure, and I’m hoping these resources will help others in the community. Feel free to browse, use, and adapt anything you find helpful. I also welcome suggestions or contributions.
I would love any feedback or thoughts on what could be added or improved.