FIRST 1-2 HOURS (Active Response)
Immediate Actions (0-30 min):
• Acknowledge & Get Help - Message senior engineers immediately
• Execute Safe Runbook Steps - Follow procedures, escalate if unclear
• Be Eyes and Ears - Monitor dashboards, report changes to seniors
• Communication Hub - Update stakeholders, maintain timeline
⚠️ IMPORTANT! ⚠️
‼️ Clear communication and staying calm
‼️ Clear communication -> say what you are doing when you are doing and why
‼️ Staying calm -> divert the conversation from finger pointing and blaming to solving the issue at hand
Actual Role:
• Information Gathering - Logs, monitoring, user reports
• Safe Config Changes - Timeouts, feature flags, cache clearing
• Stakeholder Management - Customer support, status page updates
• Documentation - Timeline, decisions, what was tried
• Coordination Support - Bridge calls, incident channels
What You DON'T Do:
• Database restarts, deployments, infrastructure changes
• Complex troubleshooting alone - escalate quickly
POST-MORTEM (24-48 Hours Later)
• Provide Detailed Timeline - You have the best notes of what happened when
• User Impact Analysis - You understand customer experience better than infrastructure teams
• Process Improvements - Suggest communication, documentation, escalation improvements
• Runbook Updates - Help update procedures based on what actually worked
Reality Check:
You're the coordination and communication expert who enables senior engineers to focus on technical fixes.
Incidents fail when communication breaks down, not just when technology breaks.
WHO LEADS
• Support Engineer Leads When:
Runbook exists and is working
Issue is within your scope (config changes, user management, cache clearing)
You're making measurable progress
No complex debugging required
• Senior Engineer Takes Over When::
No runbook exists for this issue
Runbook procedures fail
Requires code changes, database admin, infrastructure changes
Complex root cause analysis needed
APPLICATION SUPPORT ROLE DEFINITION
• Reactive Role - Responds to issues, monitors systems, troubleshoots problems
• Limited Production Access - Can modify configs, feature flags, user accounts, but can't deploy code
• User-Facing Expertise - Understands business workflows, user impact, customer experience
• Operational Knowledge - Knows monitoring tools, runbooks, escalation procedures inside-out
• Communication Bridge - Translates technical issues for business stakeholders and customer support
• Incident Coordination - Manages communication, documentation, stakeholder updates during outages
• Tools: Monitoring dashboards, admin panels, log analysis, ticketing systems