2026-01-18
Technical Phone Screen Questions for DevOps Engineers
Technical Phone Screen Questions for DevOps Engineers
The phone screen is your first real conversation with a DevOps candidate. It's where you separate engineers with hands-on experience from those who've only read the documentation. Unlike generic developer interviews, DevOps screening demands specific, scenario-based questions that expose practical knowledge.
This guide gives you 35+ battle-tested phone screen questions organized by topic, complete with what to listen for in their answers. Whether you're sourcing your first DevOps hire or scaling a platform team, these questions will help you qualify candidates before investing interview time.
Why Phone Screening DevOps Engineers Differs From Other Technical Roles
DevOps roles sit at the intersection of software development, systems administration, and production operations. A candidate might have deep cloud knowledge but weak containerization skills, or vice versa. Generic "tell me about your experience" questions won't reveal these gaps.
What makes DevOps screening unique:
- Candidates must think in systems, not just code
- Troubleshooting ability matters more than theoretical knowledge
- Production incident experience is a major differentiator
- Tool proficiency varies wildly across the industry
- Communication skills are critical (DevOps bridges engineering and operations)
A strong DevOps phone screen takes 30-45 minutes and covers 4-5 core domains. You're not looking for perfect answers—you're listening for clear thinking, problem-solving approach, and honest acknowledgment of knowledge gaps.
Infrastructure and Cloud Platform Questions
These questions assess whether the candidate understands distributed systems, cloud architecture, and infrastructure design patterns.
1. Walk me through how you'd design infrastructure for a web application that needs to handle 10x traffic growth in the next quarter.
What you're listening for: - Do they ask clarifying questions? (application type, current architecture, budget) - Do they mention scalability patterns (horizontal vs. vertical scaling)? - Do they discuss databases, caching, load balancing, or just compute? - Do they mention monitoring and capacity planning?
Red flag: Answer that jumps straight to "just add more servers" without understanding the application.
2. Explain the difference between IaaS, PaaS, and SaaS. When would you recommend each to a startup?
What you're listening for: - Clear understanding of the abstraction layers - Real examples (AWS EC2 vs. Heroku vs. Salesforce) - Cost-benefit tradeoffs for early-stage companies - When to accept managed services vs. build custom infrastructure
Red flag: Confusing definitions or inability to explain when managed services make sense.
3. Your application's database is running on a single EC2 instance. It's becoming a bottleneck. Walk me through your options.
What you're listening for: - Do they ask about read/write patterns before proposing solutions? - Do they mention read replicas, sharding, or managed database services? - Do they discuss connection pooling and query optimization? - Do they acknowledge the difference between scaling reads vs. writes?
Red flag: Jumping to solutions without understanding the actual bottleneck.
4. What's the difference between vertical and horizontal scaling? When would you use each?
What you're listening for: - Clear definitions with examples - Understanding of cost tradeoffs - When database sharding becomes necessary - How load balancing fits into horizontal scaling
Red flag: Treating them as interchangeable or not understanding database scaling complexity.
5. Describe a time you had to recover from infrastructure failure. What went wrong and what did you change?
What you're listening for: - Honest account of a real incident - Root cause analysis (not just symptoms) - What monitoring/alerting gaps existed - What they changed afterward - Humble tone (admitting mistakes, not blaming others)
Red flag: Never experienced a production incident, or blames everyone except themselves.
CI/CD and Deployment Pipeline Questions
These questions test whether the candidate understands modern software delivery, automation, and risk mitigation.
6. Walk me through your ideal CI/CD pipeline. What stages would it include?
What you're listening for: - Do they mention build, test, security scanning, staging, and production? - Do they discuss gating criteria (e.g., tests must pass before deployment)? - Do they mention artifact management and versioning? - Do they discuss rollback strategies? - Do they mention monitoring post-deployment?
Red flag: Pipeline with no automated tests or security checks.
7. How would you implement blue-green deployments? What are the tradeoffs?
What you're listening for: - Clear understanding of running two identical production environments - How to switch traffic between them - Zero-downtime deployment benefits - Cost implications (running double infrastructure) - When this is overkill vs. necessary
Red flag: Confusion between blue-green and canary deployments.
8. What's the difference between continuous deployment and continuous delivery?
What you're listening for: - CD (delivery) = automated to production-ready, humans decide to release - CD (deployment) = fully automated to production - Understanding of organizational risk tolerance - When each approach makes sense
Red flag: Treating them as the same thing.
9. Your CI pipeline is taking 45 minutes to run. What's your troubleshooting approach?
What you're listening for: - Do they profile/measure first before optimizing? - Do they mention parallelizing test suites? - Do they discuss caching (dependencies, build artifacts)? - Do they think about test categorization (unit vs. integration vs. e2e)? - Do they consider removing unnecessary steps?
Red flag: Vague answers or immediately suggesting throwing hardware at it.
10. How do you handle secrets management in your CI/CD pipeline?
What you're listening for: - Awareness that secrets should never be in code or logs - Use of secret management tools (AWS Secrets Manager, HashiCorp Vault, etc.) - How they inject secrets into containers or deployments - How they rotate secrets - How they audit secret access
Red flag: Storing secrets in environment variables checked into git, or no clear strategy.
Containerization and Orchestration Questions
DevOps engineers today almost always work with containers. These questions separate hands-on experience from theoretical knowledge.
11. Walk me through what happens when you run docker run on an image.
What you're listening for: - Creating a container instance from an image - Pulling the image if it doesn't exist locally - Namespace and cgroup isolation - Running the specified command/entrypoint - Port mapping and volume mounting if specified
Red flag: Vague answer that doesn't demonstrate understanding of images vs. containers.
12. How would you optimize a Docker image for production use?
What you're listening for: - Multi-stage builds to reduce final image size - Choosing appropriate base images (alpine vs. full distros) - Layer caching and order of Dockerfile commands - Only copying necessary files - Running as non-root user - Scanning for vulnerabilities
Red flag: No mention of image size, security, or caching.
13. Describe a Kubernetes deployment. What problem does it solve compared to raw Docker?
What you're listening for: - Orchestration across multiple machines - Self-healing (restarting failed pods) - Rolling updates and rollbacks - Service discovery and load balancing - Persistent storage management - Resource limits and scheduling
Red flag: Confusion about Kubernetes benefits or treating it as just "Docker management."
14. You have a Kubernetes pod that's in a CrashLoopBackOff state. How do you debug it?
What you're listening for:
- Using kubectl logs to check container logs
- Checking pod events with kubectl describe pod
- Understanding that the application is crashing
- Checking resource limits that might cause OOMKill
- Checking liveness/readiness probe configuration
- Using kubectl exec to inspect running containers
Red flag: No systematic debugging approach.
15. What's the difference between StatefulSets and Deployments in Kubernetes?
What you're listening for: - Deployments are for stateless applications - StatefulSets maintain pod identity and stable storage - Examples: Deployments for web apps, StatefulSets for databases or cache - Understanding of ordered pod creation/deletion - Persistent volume claim binding
Red flag: Treating them as interchangeable.
16. How would you handle database backups in a containerized environment?
What you're listening for: - Do they mention persistent volumes/persistent volume claims? - Do they discuss backup tools (pg_dump, mysqldump, cloud-native backups)? - Do they mention testing restore procedures? - Do they discuss backup frequency and retention policies? - Do they mention offsite backups or disaster recovery?
Red flag: No clear understanding of how data persists in containers.
Infrastructure as Code (IaC) Questions
IaC is now table stakes for DevOps roles. These questions assess practical experience with declarative infrastructure.
17. Walk me through how you'd use Terraform to spin up a VPC with public and private subnets.
What you're listening for: - Creating VPC resource - Creating subnets with CIDR blocks - Understanding public subnets (internet gateway) vs. private (NAT gateway) - Route table configuration - Security groups for access control - Understanding of tfstate file management
Red flag: Never used Terraform, or vague understanding of networking concepts.
18. How do you manage Terraform state in a team environment?
What you're listening for: - Remote state backend (S3, Terraform Cloud, etc.) - State locking to prevent concurrent modifications - State encryption at rest and in transit - Access control to state files (they contain secrets) - Backup and versioning of state - Using workspaces for multiple environments
Red flag: Storing tfstate in git or no strategy for team collaboration.
19. What's the difference between Terraform modules and root modules?
What you're listening for: - Root module is the working directory with .tf files - Modules are reusable blocks of infrastructure defined in subdirectories - How to parameterize modules with variables and outputs - When to create modules (DRY principle) - Module versioning and source locations
Red flag: Treating them as the same thing.
20. You need to change infrastructure code but the change breaks something in production. How would you handle rollback?
What you're listening for: - Do they understand version control for IaC? - Can they revert to previous code/state? - Do they mention testing changes in non-prod first? - Do they discuss terraform plan and apply separation? - Do they mention code review processes?
Red flag: No clear rollback strategy or understanding of IaC versioning.
Monitoring, Logging, and Observability Questions
Production issues reveal themselves through data. These questions assess observability maturity.
21. Walk me through how you'd design a monitoring strategy for a web application.
What you're listening for: - Metrics (response time, error rate, throughput, resource utilization) - Logs (application events, errors, access logs) - Traces (request flow across services) - Dashboards for visibility - Alerting rules and thresholds - On-call runbooks
Red flag: Only monitoring CPU/memory without understanding application health.
22. What's the difference between metrics, logs, and traces?
What you're listening for: - Metrics: quantitative measurements (numbers) aggregated over time - Logs: discrete event records with context - Traces: request flow through distributed systems - When to use each (metrics for trends, logs for details, traces for debugging) - Tools: Prometheus for metrics, ELK/Splunk for logs, Jaeger for traces
Red flag: Treating them as interchangeable.
23. You're getting paged about high CPU on a production server at 2am. Walk me through your debugging approach.
What you're listening for:
- Staying calm and methodical
- Checking processes consuming CPU with top or htop
- Checking application logs for errors
- Checking recent deployments or changes
- Reviewing metrics history to understand when it started
- Whether to scale, restart, or dig deeper
- Communication with on-call team/manager
Red flag: Panicking, blindly restarting services, or not gathering data.
24. How would you set up alerting rules so you're not overwhelmed by noise?
What you're listening for: - Alert fatigue is a real problem - Using percentiles, not absolute thresholds - Combining multiple conditions (high error rate AND high latency) - Understanding alert severity levels - Runbook linking to alerts - Alert grouping and deduplication - Regular alert review to tune thresholds
Red flag: Setting alerts for everything, or no strategy to prevent noise.
25. What's the difference between pull-based and push-based monitoring?
What you're listening for: - Push: application sends metrics to monitoring system - Pull: monitoring system scrapes metrics from application (Prometheus) - Tradeoffs: push is easier for short-lived processes, pull is more secure - Examples of each approach - Understanding of service discovery for pull-based systems
Red flag: No awareness that different approaches exist.
Linux and System Administration Foundations
Even cloud-focused DevOps engineers need solid Linux fundamentals.
26. How would you find all processes listening on port 8080?
What you're listening for:
- Using lsof -i :8080 or netstat -tlnp | grep 8080
- Understanding of listening vs. established connections
- Potentially ss command for newer systems
- Following up by checking the process
Red flag: No systematic approach to network troubleshooting.
27. Your application is running out of disk space. How do you find what's consuming space?
What you're listening for:
- Using du to find large directories
- Using df to see overall disk usage
- Checking log files and their rotation
- Checking Docker image/volume storage
- Understanding of inode exhaustion (separate from disk space)
- Maybe using ncdu for interactive exploration
Red flag: No familiarity with standard Linux tools.
28. Walk me through how you'd set up log rotation for an application that generates 100GB of logs daily.
What you're listening for:
- Using logrotate with compression
- Setting rotation by size or time
- Archiving logs to S3 or other storage
- Ensuring the application doesn't break when logs are rotated
- Retention policies
- Centralized logging as alternative
Red flag: Only cleaning up logs manually or no strategy for retention.
29. What's the difference between a process and a daemon?
What you're listening for: - Process: running program instance - Daemon: background process that runs without terminal - Daemons typically have PID 1 parent, run in background - Examples: sshd, nginx, database servers - How systemd manages daemons
Red flag: Treating them as the same thing.
30. How would you create a systemd service to run a custom application?
What you're listening for:
- Creating a .service file in /etc/systemd/system/
- Defining ExecStart, User, and restart policy
- Using systemctl enable for startup
- Using systemctl start/stop/restart
- Checking status and logs with journalctl
- Understanding dependencies and order of startup
Red flag: Only familiar with older init.d style, or no experience managing services.
Security and Compliance Questions
Security is no longer optional in DevOps roles.
31. Walk me through your approach to securing a Kubernetes cluster.
What you're listening for: - Network policies to restrict traffic - Pod security policies or admission controllers - RBAC for access control - Secrets management (not in environment variables) - Container image scanning - Node security (OS patching, SSH access) - Audit logging - Regular security updates
Red flag: No security strategy or thinking security is someone else's job.
32. How do you handle vulnerability scanning in your CI/CD pipeline?
What you're listening for: - Scanning container images for CVEs - Scanning dependencies for vulnerabilities - Integration with pipeline (blocking builds on high severity) - Tools: Trivy, Snyk, Anchore - How to handle known vulnerabilities you can't immediately patch - False positive management
Red flag: No vulnerability scanning in pipeline.
33. Describe how you'd implement encryption for sensitive data at rest and in transit.
What you're listening for: - TLS/HTTPS for data in transit - Database encryption or field-level encryption at rest - Key management (not storing keys in code) - Understanding of encryption algorithms (AES-256, TLS 1.3) - Encryption overhead and performance impact - Regulatory requirements (GDPR, HIPAA, PCI-DSS)
Red flag: Vague understanding of encryption or no experience with it.
34. How would you handle a security incident in your infrastructure?
What you're listening for: - Having an incident response plan - Isolating affected systems - Collecting evidence/logs - Communicating with stakeholders - Post-incident review (blameless) - Learning and preventing recurrence
Red flag: Never thought about incident response.
Platform-Specific Questions
Ask one or two of these based on the job requirements.
35. What's the difference between AWS regions and availability zones?
What you're listening for: - Regions are geographically separate (us-east-1, eu-west-1) - Availability zones are isolated data centers within a region - Latency implications of region choice - Disaster recovery across regions - Cost variations by region - Understanding of multi-AZ deployments
Red flag: Confusing terminology.
36. Walk me through how you'd set up auto-scaling for an application on AWS.
What you're listening for: - Auto Scaling Groups with min/max/desired capacity - Launch templates specifying instance configuration - Scaling policies (target tracking, step scaling) - Metrics for scaling decisions - Cooldown periods to prevent flapping - Load balancing across scaled instances - Testing scaling behavior
Red flag: Unfamiliar with AWS scaling concepts.
37. How would you implement a disaster recovery plan?
What you're listening for: - RTO (Recovery Time Objective) and RPO (Recovery Point Objective) - Backup frequency - Testing restores regularly - Cross-region replication - Failover automation - Cost-benefit of recovery speed - Communication plan for outages
Red flag: No DR strategy or never tested recovery.
How to Conduct an Effective Phone Screen
Now that you have the questions, here's how to use them effectively:
Preparation (10 minutes before)
- Review the candidate's resume and GitHub profile
- Have Zumo open to see their actual code contributions
- Identify 2-3 questions based on their claimed experience
- Have a scorecard ready (you'll use it after)
The Call Structure (30-45 minutes)
- Introduction (5 min): Explain the role, ask what they know about your company
- Technical questions (20-30 min): Ask 4-6 questions from different domains
- Behavioral question (5 min): Ask about a production incident or challenge
- Their questions (5 min): Let them ask about your company, team, or role
- Next steps (2 min): Explain what comes next
What to Listen For
Beyond right answers, pay attention to:
- Clear communication: Can they explain technical concepts clearly?
- Curiosity: Do they ask clarifying questions?
- Humility: Can they admit knowledge gaps without shame?
- Problem-solving: Do they think through problems systematically?
- Experience level: Does their experience match the level of the role?
Red Flags
- Talking too much without pausing (not listening)
- Overconfidence about things they clearly don't know
- Blame-shifting during incident stories
- No hands-on experience, only theoretical knowledge
- Dismissive of tools or approaches they haven't used
Evaluating Answers: A Scoring Framework
Use this simple 3-point scale for each question:
| Score | Meaning | Example |
|---|---|---|
| 1 | Clear knowledge gap | Can't explain basic concepts, confused between related terms |
| 2 | Acceptable knowledge | Knows the concept, some implementation details, honest about gaps |
| 3 | Strong knowledge | Clear explanation, real examples, understands tradeoffs |
Aim for an average score of 2.0+ to move to next round. A single 1 on a must-have skill (for example, no Kubernetes experience when the role requires it) might be a disqualifier depending on seniority level.
Adjusting Questions by Experience Level
For junior/entry-level DevOps (0-2 years): - Focus on fundamentals: Linux, containers, basic CI/CD - Ask them to walk through simple scenarios - Acceptable if they don't know orchestration platforms deeply
For mid-level DevOps (2-5 years): - Expect hands-on experience with primary tools - Ask about scale and production incident experience - Should understand IaC and monitoring fundamentals
For senior DevOps (5+ years): - Ask about architectural decisions and tradeoffs - Focus on leadership and mentoring (if applicable) - Expect strong production incident experience - Should have opinions on tool selection
Common Mistakes When Phone Screening DevOps Engineers
Mistake 1: Asking only "tell me about" questions
These are too vague. Candidates prepare for these and you won't learn much.
Better: Ask specific scenario questions or ask them to walk through a technical concept.
Mistake 2: Accepting "I haven't used it but I could learn" too easily
For mid and senior roles, this is weak. They should have hands-on experience with core tools.
Better: Ask "what platforms have you used?" and focus on their actual experience.
Mistake 3: Not asking about production incidents
This is where you learn whether someone's theory matches reality.
Better: Always ask "Tell me about a time..." questions.
Mistake 4: Treating all roles the same
A platform engineer role looks different from a build engineer role, even both called "DevOps."
Better: Customize questions to your actual job requirements.
Mistake 5: Not probing follow-up answers
If someone gives a weak answer, dig deeper. Ask "walk me through that" or "how would you debug that?"
Better: Treat phone screens like conversations, not checklists.
Integrating Phone Screens with Your Sourcing Strategy
Phone screening works best when combined with sourcing candidates who actually have the experience you need. Zumo's GitHub-based sourcing lets you find DevOps engineers based on their actual contributions—contributions to Kubernetes projects, Terraform modules, Prometheus monitoring tools, CI/CD platforms, and infrastructure code.
You can search by: - Languages: Python, Go, Bash, TypeScript - Tools and frameworks: Terraform, CloudFormation, Helm, Docker Compose, Ansible - Platform activity: Commits to infrastructure projects, DevOps tool contributions - Recency: Recently active engineers (more likely to engage)
Sourcing candidates with proven experience in your specific tech stack means your phone screens focus on depth and fit, not whether they have basic familiarity.
FAQ
How long should a DevOps phone screen take?
A good technical phone screen takes 30-45 minutes. Less than 30 minutes, you won't have time to dig into responses. More than 45 minutes, you're interviewing instead of screening.
Should I ask coding questions in a phone screen?
Only if coding is a core part of the role (infrastructure-as-code, build tool development, etc.). Most DevOps roles focus more on systems thinking than algorithm implementation.
How do I handle candidates who freeze up on phone calls?
Some great engineers are nervous on calls. Ask permission to move slower, give them time to think, and ask clarifying questions to help them unfold their thinking. If they're genuinely paralyzed, you might offer a take-home instead.
What if a candidate doesn't know the answer?
That's fine. Listen for how they respond: "I haven't used that but here's how I'd approach learning it" is good. "I have no idea" without any problem-solving is weaker. Growth mindset matters more than perfect knowledge for early-to-mid career roles.
Should I tell candidates which questions are coming?
No. The point is to see how they think through problems, not how well they prepared answers. What you can do: tell them you'll ask technical questions, so they should be in a quiet space without distractions.
How do I screen for DevOps engineers who are good teachers/mentors?
Ask: "Tell me about a time you helped a junior engineer or non-technical person understand a complex infrastructure concept." Listen for patience, clarity, and whether they enjoyed the experience.
Related Reading
- Technical Phone Screen Questions for ML Engineers
- How to Hire a Cloud Architect: Infrastructure Design
- Technical Phone Screen Questions for Data Engineers
Start Screening with Confidence
Phone screening DevOps engineers separates hiring signal from noise. The questions here are tested in the field—they reveal who has hands-on experience, who thinks systematically through problems, and who'd be a good fit for your team.
The next step is sourcing candidates who've actually built the systems you need. Using Zumo, you can find engineers based on their real GitHub contributions to infrastructure and DevOps projects, then use these questions to qualify them. That combination—smart sourcing plus rigorous screening—is how you hire strong DevOps teams.
Start with the questions in the domain most relevant to your open role. Listen carefully to how candidates think, not just whether they know the answer. And always ask follow-ups.