One of the good parts: "A very important skill to learn is the ability to remain calm in the face of disaster. It’s not always easy, especially with a client on the phone, but panicking will only make a situation worse. Yes, the most critical server in the infrastructure may have just completely failed without a backup. Instead of focusing on what will happen as a result of the crisis, focus on what needs to be done to bring the system back up. Deal with the results later, after fixing the immediate failure. The fallout of the crisis might be terrible, but it will almost certainly be worse if the immediate problem isn’t fixed. A calm mind can carefully analyze a situation to determine the best solution. Panic responses do not benefit from the same calculating rationality."
I've been a a sysadmin for about 20 years. I currently work on a large system with about a million paying customers. I think I'm competent technically. Troubleshooting broken stuff is something I gravitate to and I've put a lot of effort into learning and developing processes for doing it effectively.
However, being able to "remain calm in the face of disaster" is something I struggle with to this day. I think I have gotten quite good at hiding it from my peers, but it's the one thing that sometimes makes me question my choice of career.
What follows the quote above is also very good advice.
Different people will adapt to handling crisis situations in different ways. Some will adopt the detached, analytical calm of a surgeon. Others will take a few deep breaths to calm themselves before digging in to analyze the problem. The ability to stay calm in the face of disaster is more important than the method by which calm is achieved. It will take practice to reach the point of reacting to a disaster calmly.
On the topic of learning ops, I want to second my recommendation from another thread for "The Practice of System and Network Administration" and "The Practice of Cloud System Administration"
These books focus on principles, rather than specific technologies, and I love them for it.
I welcome this, though. Too often the importance of good sysadmin and good ops gets lost among other topics like language advocacy, product announcements, VC rounds, bug bounty humblebrags, or <insert stereotypical HN subject matter here>.
Absolutely amazing work. That must have taken a great deal of time, effort, and patience.
I'm the founder of a company (https://commando.io) that provides a web-based distributed ssh interface, with auditing, scheduling, and GitHub deployments all baked in, but the biggest hurdle for us, is teaching ops. Projects like opsschool, help bridge this gap. Thanks!
Great job and very interesting for us. I'm the founder of https://cloudacademy.com and we are working to open for free part of our content specifically on Amazon Web Services. Agree on having an e-book.