Python for DevOps Interview Questions 2026: The Complete Guide
Python is the scripting language of DevOps. From AWS automation with Boto3 to writing Kubernetes operators to building internal tooling, Python proficiency is expected at every senior level. Here are the 20 questions that come up most in DevOps and SRE interviews.
Section 1: Python Scripting Fundamentals for DevOps
1. How do you run shell commands from Python and when do you choose each approach?
Why they ask this: Shell execution is a core DevOps scripting pattern. They want to see you know the safe options.
Ideal answer:
subprocess.run() (recommended):
import subprocess
result = subprocess.run(
['kubectl', 'get', 'pods', '-n', 'production'],
capture_output=True,
text=True,
check=True # raises CalledProcessError on non-zero exit
)
print(result.stdout)Why subprocess over os.system():
os.system()runs through the shell (injection risk), returns only exit codesubprocess.run()captures stdout/stderr, raises exceptions on failure, doesn't need shell
shell=True — use sparingly:
# Needed for shell features (pipes, redirects, glob)
result = subprocess.run('ps aux | grep nginx', shell=True, capture_output=True, text=True)Never use shell=True with user-controlled input — command injection risk.
When to use Fabric/Paramiko instead: For running commands on remote servers over SSH. subprocess only executes locally.
2. How do you use Boto3 to automate AWS operations?
Why they ask this: Boto3 is the standard AWS automation library for Python. If you work with AWS, you need to know this.
Ideal answer:
Boto3 is the official AWS SDK for Python. It provides clients (low-level API) and resources (higher-level object-oriented API).
import boto3
# Client (low-level, maps directly to AWS API)
ec2 = boto3.client('ec2', region_name='us-east-1')
# Describe running instances
response = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
print(instance['InstanceId'], instance['PrivateIpAddress'])
# Resource (higher-level, more Pythonic)
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
for obj in bucket.objects.all():
print(obj.key)Authentication order: Environment variables (AWS_ACCESS_KEY_ID) → ~/.aws/credentials → EC2/ECS instance metadata (IAM role). Always use IAM roles for code running in AWS — never hardcode credentials.
Pagination: Many AWS APIs are paginated. Always use paginators:
paginator = ec2.get_paginator('describe_instances')
for page in paginator.paginate():
for reservation in page['Reservations']:
# process instances3. How do you write a CLI tool in Python for DevOps automation?
Ideal answer:
Use the click library for production-quality CLI tools (preferred over argparse for DevOps tooling).
import click
import boto3
@click.group()
@click.option('--region', default='us-east-1', help='AWS region')
@click.pass_context
def cli(ctx, region):
ctx.ensure_object(dict)
ctx.obj['region'] = region
@cli.command()
@click.argument('environment')
@click.option('--dry-run', is_flag=True)
@click.pass_context
def scale(ctx, environment, dry_run):
"""Scale ASG in ENVIRONMENT."""
region = ctx.obj['region']
click.echo(f"Scaling {environment} in {region} (dry_run={dry_run})")
if not dry_run:
# actual scaling logic
pass
if __name__ == '__main__':
cli()Why Click:
- Automatic help generation (
--help) - Type coercion and validation
- Nested command groups (like
gitsubcommands) - Colorized output, prompts, progress bars
4. How do you handle YAML and JSON in Python for DevOps workflows?
Ideal answer:
JSON (built-in):
import json
# Parse JSON response
data = json.loads(response_text)
instance_id = data['Reservations'][0]['Instances'][0]['InstanceId']
# Write JSON
with open('output.json', 'w') as f:
json.dump(data, f, indent=2)YAML (PyYAML or ruamel.yaml):
import yaml
# Read Kubernetes manifest
with open('deployment.yaml') as f:
manifest = yaml.safe_load(f) # ALWAYS use safe_load, never load()
manifest['spec']['replicas'] = 3
manifest['spec']['template']['spec']['containers'][0]['image'] = f'myapp:{new_tag}'
# Write back
with open('deployment.yaml', 'w') as f:
yaml.dump(manifest, f, default_flow_style=False)ruamel.yaml: Preserves comments and formatting when round-tripping YAML — important when editing Kubernetes manifests or Helm values files.
Security: yaml.load() with arbitrary input can execute code — always use yaml.safe_load().
5. How do you use Python's `requests` library to interact with REST APIs?
Ideal answer:
import requests
# Basic GET with authentication
response = requests.get(
'https://api.github.com/repos/org/repo/pulls',
headers={'Authorization': f'token {github_token}'},
params={'state': 'open', 'per_page': 100}
)
response.raise_for_status() # raises HTTPError on 4xx/5xx
pulls = response.json()
# POST with JSON body
response = requests.post(
'https://jenkins.example.com/job/deploy/buildWithParameters',
auth=('user', jenkins_token),
json={'VERSION': new_version, 'ENVIRONMENT': 'production'}
)Session objects: Reuse TCP connections for multiple requests to the same host:
session = requests.Session()
session.headers.update({'Authorization': f'Bearer {token}'})
# All requests from this session share the auth header and connection poolRetry with backoff: Use urllib3.util.Retry with a HTTPAdapter for resilient API calls in automation scripts.
Section 2: Python for Cloud & Infrastructure Automation
6. How do you use the Kubernetes Python client?
Ideal answer:
The official kubernetes Python client lets you interact with K8s clusters programmatically.
from kubernetes import client, config, watch
# Load kubeconfig (local) or in-cluster config
config.load_kube_config() # local
# config.load_incluster_config() # inside a pod
v1 = client.CoreV1Api()
apps_v1 = client.AppsV1Api()
# List pods in a namespace
pods = v1.list_namespaced_pod(namespace='production')
for pod in pods.items:
print(pod.metadata.name, pod.status.phase)
# Update deployment image
body = {'spec': {'template': {'spec': {'containers': [
{'name': 'app', 'image': f'myapp:{new_tag}'}
]}}}}
apps_v1.patch_namespaced_deployment(
name='my-app',
namespace='production',
body=body
)
# Watch for events
w = watch.Watch()
for event in w.stream(v1.list_namespaced_pod, namespace='production'):
print(event['type'], event['object'].metadata.name)7. How do you write Python scripts that are safe and maintainable for production use?
Ideal answer:
Error handling — be specific:
import boto3
from botocore.exceptions import ClientError, BotoCoreError
try:
ec2.terminate_instances(InstanceIds=[instance_id])
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'InvalidInstanceID.NotFound':
print(f"Instance {instance_id} already gone")
else:
raise # Re-raise unexpected errorsLogging over print:
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s'
)
logger = logging.getLogger(__name__)
logger.info("Starting instance termination: %s", instance_id)
logger.error("Failed: %s", e, exc_info=True) # includes stack traceType hints for maintainability:
from typing import Optional
def get_instance_ip(instance_id: str, region: str = 'us-east-1') -> Optional[str]:
...Environment variables for config:
import os
DATABASE_URL = os.environ.get('DATABASE_URL') or raise ValueError("DATABASE_URL not set")8. How do you use Python context managers in DevOps scripts?
Ideal answer:
Context managers (with statements) handle resource lifecycle — acquiring and releasing resources safely even if exceptions occur.
Built-in examples:
# File handling — file auto-closes even on exception
with open('config.yaml') as f:
config = yaml.safe_load(f)
# Temporary directory
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
# tmpdir auto-deleted after block
subprocess.run(['git', 'clone', repo_url, tmpdir])Custom context manager for deployment locking:
from contextlib import contextmanager
import boto3
@contextmanager
def deployment_lock(lock_name: str):
dynamodb = boto3.client('dynamodb')
try:
dynamodb.put_item(
TableName='DeploymentLocks',
Item={'lock_name': {'S': lock_name}},
ConditionExpression='attribute_not_exists(lock_name)'
)
yield
finally:
dynamodb.delete_item(
TableName='DeploymentLocks',
Key={'lock_name': {'S': lock_name}}
)
with deployment_lock('production-deploy'):
deploy_application()9. How do you work with environment variables and configuration in Python DevOps scripts?
Ideal answer:
Direct env vars (simple scripts):
import os
token = os.environ['GITHUB_TOKEN'] # raises KeyError if missing
token = os.environ.get('GITHUB_TOKEN', 'default') # returns defaultpython-dotenv (for local development):
from dotenv import load_dotenv
load_dotenv() # loads .env file in development, ignored in productionPydantic settings (production-grade config validation):
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
aws_region: str = 'us-east-1'
slack_webhook_url: str # required — validation error if missing
deployment_timeout: int = 300
class Config:
env_file = '.env'
settings = Settings()What not to do: Hardcode credentials, commit .env files, use config files with secrets in version control.
10. How do you use Python for log analysis and alerting?
Ideal answer:
import re
from collections import defaultdict
from datetime import datetime, timedelta
def analyze_error_rate(log_file: str, window_minutes: int = 5) -> dict:
error_counts = defaultdict(int)
total_counts = defaultdict(int)
cutoff = datetime.now() - timedelta(minutes=window_minutes)
with open(log_file) as f:
for line in f:
# Parse nginx access log
match = re.match(
r'(?P<ip>\S+) .* \[(?P<time>[^\]]+)\] "\S+ (?P<path>\S+) \S+" (?P<status>\d+)',
line
)
if not match:
continue
timestamp = datetime.strptime(match.group('time'), '%d/%b/%Y:%H:%M:%S %z')
path = match.group('path')
status = int(match.group('status'))
total_counts[path] += 1
if status >= 500:
error_counts[path] += 1
return {
path: error_counts[path] / total_counts[path]
for path in total_counts
if total_counts[path] > 10 # ignore low-traffic endpoints
}Alerting to Slack:
import requests
def slack_alert(webhook_url: str, message: str, level: str = 'warning'):
color = {'critical': '#FF0000', 'warning': '#FFA500', 'info': '#36a64f'}[level]
requests.post(webhook_url, json={
'attachments': [{'color': color, 'text': message, 'ts': int(datetime.now().timestamp())}]
})Section 3: Advanced Python for DevOps
11. How do you write async Python for DevOps tooling?
Ideal answer:
Async Python is useful for DevOps scripts that make many I/O-bound operations concurrently — checking health endpoints, querying multiple APIs, waiting for multiple resources.
import asyncio
import aiohttp
async def check_endpoint(session: aiohttp.ClientSession, url: str) -> dict:
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=5)) as response:
return {'url': url, 'status': response.status, 'healthy': response.status == 200}
except Exception as e:
return {'url': url, 'status': None, 'healthy': False, 'error': str(e)}
async def check_all_services(urls: list[str]) -> list[dict]:
async with aiohttp.ClientSession() as session:
tasks = [check_endpoint(session, url) for url in urls]
return await asyncio.gather(*tasks)
results = asyncio.run(check_all_services([
'https://service-a.example.com/health',
'https://service-b.example.com/health',
'https://service-c.example.com/health',
]))Checking 100 endpoints sequentially might take 100 seconds. Async does it in ~5 seconds (bounded by the slowest response).
12. How do you use Python to build a simple health-check automation service?
Ideal answer:
import asyncio
import aiohttp
import boto3
from dataclasses import dataclass
@dataclass
class HealthCheck:
name: str
url: str
expected_status: int = 200
async def run_health_checks(checks: list[HealthCheck]) -> None:
sns = boto3.client('sns')
async with aiohttp.ClientSession() as session:
while True:
for check in checks:
try:
async with session.get(check.url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
if resp.status != check.expected_status:
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
Subject=f'Health Check Failed: {check.name}',
Message=f'{check.url} returned {resp.status}'
)
except Exception as e:
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
Subject=f'Health Check Error: {check.name}',
Message=str(e)
)
await asyncio.sleep(60) # Check every minute13. How do you write Python Lambda functions for AWS automation?
Ideal answer:
Python is the most popular Lambda runtime. Common DevOps Lambda patterns:
Auto-remediation Lambda (triggered by CloudWatch Alarm):
import boto3
import json
def lambda_handler(event, context):
# Parse SNS notification from CloudWatch Alarm
message = json.loads(event['Records'][0]['Sns']['Message'])
alarm_name = message['AlarmName']
if 'high-memory' in alarm_name:
# Extract instance ID from alarm dimensions
dimensions = message['Trigger']['Dimensions']
instance_id = next(d['value'] for d in dimensions if d['name'] == 'InstanceId')
ec2 = boto3.client('ec2')
# Reboot the instance
ec2.reboot_instances(InstanceIds=[instance_id])
return {'action': 'rebooted', 'instance': instance_id}Best practices:
- Use environment variables for configuration (not hardcoded ARNs)
- Keep handler function thin — put business logic in importable modules
- Use
aws_lambda_powertoolsfor logging, tracing, metrics - Set appropriate memory and timeout — profile locally with
aws-lambda-rie
14. How do you test Python DevOps scripts?
Ideal answer:
Unit tests with mocking (for AWS/cloud code):
import pytest
from unittest.mock import MagicMock, patch
def terminate_old_instances(ec2_client, age_hours=24):
response = ec2_client.describe_instances()
# ... terminate instances older than age_hours
def test_terminate_old_instances():
mock_ec2 = MagicMock()
mock_ec2.describe_instances.return_value = {
'Reservations': [{
'Instances': [{
'InstanceId': 'i-1234',
'LaunchTime': datetime.now(timezone.utc) - timedelta(hours=48)
}]
}]
}
terminate_old_instances(mock_ec2, age_hours=24)
mock_ec2.terminate_instances.assert_called_once_with(InstanceIds=['i-1234'])moto (AWS mock library):
from moto import mock_ec2
@mock_ec2
def test_with_real_boto3():
ec2 = boto3.client('ec2', region_name='us-east-1')
# moto intercepts all boto3 calls — no real AWS needed
ec2.run_instances(ImageId='ami-12345678', MinCount=1, MaxCount=1)Integration tests: Use a dedicated test AWS account or GCP project. Tag all test resources for easy cleanup. Use IAM boundaries to prevent test code from affecting production.
15. How do you use Python virtual environments and dependency management in DevOps?
Ideal answer:
Virtual environments isolate project dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtPoetry (modern dependency management):
poetry init
poetry add boto3 click pydantic-settings
poetry add --group dev pytest moto[ec2]
poetry installIn Docker / CI (pinned dependencies for reproducibility):
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txtrequirements.txt best practices:
- Pin exact versions in production (
boto3==1.34.0) for reproducibility - Separate dev dependencies (
requirements-dev.txt) - Use
pip-compile(pip-tools) to generate pinned requirements from abstract dependencies
2026 tooling: uv (Rust-based Python package manager) is 10-100x faster than pip and becoming standard for CI/CD pipelines.
16. How do you use Python to automate Slack notifications in a DevOps pipeline?
Ideal answer:
import requests
from datetime import datetime
def send_deployment_notification(
webhook_url: str,
service: str,
version: str,
environment: str,
success: bool,
deployer: str = 'CI/CD'
) -> None:
color = '#36a64f' if success else '#FF0000'
status = 'Deployed Successfully' if success else 'Deployment Failed'
requests.post(webhook_url, json={
'attachments': [{
'color': color,
'title': f'{status}: {service}',
'fields': [
{'title': 'Service', 'value': service, 'short': True},
{'title': 'Version', 'value': version, 'short': True},
{'title': 'Environment', 'value': environment, 'short': True},
{'title': 'Deployed by', 'value': deployer, 'short': True},
],
'footer': 'InterviewDrill Deploy Bot',
'ts': int(datetime.now().timestamp())
}]
})Slack SDK for complex interactions:
from slack_sdk import WebClient
client = WebClient(token=slack_bot_token)
client.chat_postMessage(
channel='#deployments',
blocks=[...] # Block Kit for rich formatting
)17. How do you use Python for infrastructure cost analysis?
Ideal answer:
import boto3
from datetime import datetime, timedelta
def get_weekly_cost_by_service() -> dict:
ce = boto3.client('ce', region_name='us-east-1')
end = datetime.now().strftime('%Y-%m-%d')
start = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d')
response = ce.get_cost_and_usage(
TimePeriod={'Start': start, 'End': end},
Granularity='DAILY',
GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}],
Metrics=['BlendedCost']
)
service_costs = {}
for result in response['ResultsByTime']:
for group in result['Groups']:
service = group['Keys'][0]
cost = float(group['Metrics']['BlendedCost']['Amount'])
service_costs[service] = service_costs.get(service, 0) + cost
return dict(sorted(service_costs.items(), key=lambda x: x[1], reverse=True))
costs = get_weekly_cost_by_service()
for service, cost in list(costs.items())[:10]:
print(f"{service}: ${cost:.2f}")18. Common Python gotchas in DevOps scripts?
Why they ask this: Real-world experience question — they want to see you've hit production issues, not just written toy scripts.
1. Mutable default arguments:
# WRONG — list is created once and shared across all calls
def add_tag(instance_id, tags=[]):
tags.append({'Key': 'managed', 'Value': 'true'}) # bug!
# RIGHT
def add_tag(instance_id, tags=None):
if tags is None:
tags = []2. Not handling pagination:
Most AWS APIs return max 100-1000 results. Scripts that don't paginate silently miss data.
3. Catching too broadly:
try:
do_something()
except Exception:
pass # swallows every error including programming mistakes4. Timezone-naive datetimes:
# AWS timestamps are timezone-aware — comparing with naive datetime raises TypeError
from datetime import timezone
now = datetime.now(timezone.utc) # always use timezone-aware5. String formatting with secrets:
logger.info(f"Connecting to DB: {connection_string}") # leaks credentials to logs
logger.info("Connecting to DB: %s", masked_url) # use parameterized logging19. How do you structure a Python DevOps project?
Ideal answer:
my-devops-tool/
├── src/
│ └── mytools/
│ ├── __init__.py
│ ├── aws/
│ │ ├── ec2.py
│ │ └── s3.py
│ ├── k8s/
│ │ └── deployments.py
│ └── cli.py # Click CLI entry points
├── tests/
│ ├── unit/
│ └── integration/
├── pyproject.toml # Poetry / modern packaging
├── Dockerfile
└── .github/
└── workflows/
└── test.ymlEntry points in pyproject.toml:
[tool.poetry.scripts]
mytools = "mytools.cli:cli"After pip install -e ., users run mytools deploy --env production instead of python -m mytools.cli deploy.
CI setup: Run ruff (linting), mypy (type checking), and pytest with coverage in CI before merging.
20. How do you manage Python in a Docker-based DevOps environment?
Ideal answer:
Multi-stage Dockerfile for Python tools:
FROM python:3.12-slim AS base
# Install uv for fast dependency resolution
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
FROM base AS builder
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project
COPY src/ src/
RUN uv sync --frozen --no-dev
FROM base AS final
WORKDIR /app
COPY --from=builder /app /app
ENV PATH="/app/.venv/bin:$PATH"
ENTRYPOINT ["mytools"]Key patterns:
- Use
python:3.12-slimnotpython:3.12— reduces image size from ~1GB to ~130MB - Pin Python version in the image tag —
python:3.12updates;python:3.12.3doesn't - Use
uvinstead of pip in CI for significantly faster installs - Run as non-root:
USER 1001 - Don't include test dependencies in production images
