← Back to Blog
Python for DevOps Interview Questions 2026: The Complete Guide
Python16 min read·Apr 20, 2026
By InterviewDrill Team

Python for DevOps Interview Questions 2026: The Complete Guide

Python is the scripting language of DevOps. From AWS automation with Boto3 to writing Kubernetes operators to building internal tooling, Python proficiency is expected at every senior level. Here are the 20 questions that come up most in DevOps and SRE interviews.


Section 1: Python Scripting Fundamentals for DevOps

1. How do you run shell commands from Python and when do you choose each approach?

Why they ask this: Shell execution is a core DevOps scripting pattern. They want to see you know the safe options.

Ideal answer:

subprocess.run() (recommended):

import subprocess

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-n', 'production'],
    capture_output=True,
    text=True,
    check=True  # raises CalledProcessError on non-zero exit
)
print(result.stdout)

Why subprocess over os.system():

  • os.system() runs through the shell (injection risk), returns only exit code
  • subprocess.run() captures stdout/stderr, raises exceptions on failure, doesn't need shell

shell=True — use sparingly:

# Needed for shell features (pipes, redirects, glob)
result = subprocess.run('ps aux | grep nginx', shell=True, capture_output=True, text=True)

Never use shell=True with user-controlled input — command injection risk.

When to use Fabric/Paramiko instead: For running commands on remote servers over SSH. subprocess only executes locally.


2. How do you use Boto3 to automate AWS operations?

Why they ask this: Boto3 is the standard AWS automation library for Python. If you work with AWS, you need to know this.

Ideal answer:

Boto3 is the official AWS SDK for Python. It provides clients (low-level API) and resources (higher-level object-oriented API).

import boto3

# Client (low-level, maps directly to AWS API)
ec2 = boto3.client('ec2', region_name='us-east-1')

# Describe running instances
response = ec2.describe_instances(
    Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
for reservation in response['Reservations']:
    for instance in reservation['Instances']:
        print(instance['InstanceId'], instance['PrivateIpAddress'])

# Resource (higher-level, more Pythonic)
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
for obj in bucket.objects.all():
    print(obj.key)

Authentication order: Environment variables (AWS_ACCESS_KEY_ID) → ~/.aws/credentials → EC2/ECS instance metadata (IAM role). Always use IAM roles for code running in AWS — never hardcode credentials.

Pagination: Many AWS APIs are paginated. Always use paginators:

paginator = ec2.get_paginator('describe_instances')
for page in paginator.paginate():
    for reservation in page['Reservations']:
        # process instances

3. How do you write a CLI tool in Python for DevOps automation?

Ideal answer:

Use the click library for production-quality CLI tools (preferred over argparse for DevOps tooling).

import click
import boto3

@click.group()
@click.option('--region', default='us-east-1', help='AWS region')
@click.pass_context
def cli(ctx, region):
    ctx.ensure_object(dict)
    ctx.obj['region'] = region

@cli.command()
@click.argument('environment')
@click.option('--dry-run', is_flag=True)
@click.pass_context
def scale(ctx, environment, dry_run):
    """Scale ASG in ENVIRONMENT."""
    region = ctx.obj['region']
    click.echo(f"Scaling {environment} in {region} (dry_run={dry_run})")
    if not dry_run:
        # actual scaling logic
        pass

if __name__ == '__main__':
    cli()

Why Click:

  • Automatic help generation (--help)
  • Type coercion and validation
  • Nested command groups (like git subcommands)
  • Colorized output, prompts, progress bars

4. How do you handle YAML and JSON in Python for DevOps workflows?

Ideal answer:

JSON (built-in):

import json

# Parse JSON response
data = json.loads(response_text)
instance_id = data['Reservations'][0]['Instances'][0]['InstanceId']

# Write JSON
with open('output.json', 'w') as f:
    json.dump(data, f, indent=2)

YAML (PyYAML or ruamel.yaml):

import yaml

# Read Kubernetes manifest
with open('deployment.yaml') as f:
    manifest = yaml.safe_load(f)  # ALWAYS use safe_load, never load()

manifest['spec']['replicas'] = 3
manifest['spec']['template']['spec']['containers'][0]['image'] = f'myapp:{new_tag}'

# Write back
with open('deployment.yaml', 'w') as f:
    yaml.dump(manifest, f, default_flow_style=False)

ruamel.yaml: Preserves comments and formatting when round-tripping YAML — important when editing Kubernetes manifests or Helm values files.

Security: yaml.load() with arbitrary input can execute code — always use yaml.safe_load().


5. How do you use Python's `requests` library to interact with REST APIs?

Ideal answer:

import requests

# Basic GET with authentication
response = requests.get(
    'https://api.github.com/repos/org/repo/pulls',
    headers={'Authorization': f'token {github_token}'},
    params={'state': 'open', 'per_page': 100}
)
response.raise_for_status()  # raises HTTPError on 4xx/5xx
pulls = response.json()

# POST with JSON body
response = requests.post(
    'https://jenkins.example.com/job/deploy/buildWithParameters',
    auth=('user', jenkins_token),
    json={'VERSION': new_version, 'ENVIRONMENT': 'production'}
)

Session objects: Reuse TCP connections for multiple requests to the same host:

session = requests.Session()
session.headers.update({'Authorization': f'Bearer {token}'})
# All requests from this session share the auth header and connection pool

Retry with backoff: Use urllib3.util.Retry with a HTTPAdapter for resilient API calls in automation scripts.


Section 2: Python for Cloud & Infrastructure Automation

6. How do you use the Kubernetes Python client?

Ideal answer:

The official kubernetes Python client lets you interact with K8s clusters programmatically.

from kubernetes import client, config, watch

# Load kubeconfig (local) or in-cluster config
config.load_kube_config()  # local
# config.load_incluster_config()  # inside a pod

v1 = client.CoreV1Api()
apps_v1 = client.AppsV1Api()

# List pods in a namespace
pods = v1.list_namespaced_pod(namespace='production')
for pod in pods.items:
    print(pod.metadata.name, pod.status.phase)

# Update deployment image
body = {'spec': {'template': {'spec': {'containers': [
    {'name': 'app', 'image': f'myapp:{new_tag}'}
]}}}}
apps_v1.patch_namespaced_deployment(
    name='my-app',
    namespace='production',
    body=body
)

# Watch for events
w = watch.Watch()
for event in w.stream(v1.list_namespaced_pod, namespace='production'):
    print(event['type'], event['object'].metadata.name)

7. How do you write Python scripts that are safe and maintainable for production use?

Ideal answer:

Error handling — be specific:

import boto3
from botocore.exceptions import ClientError, BotoCoreError

try:
    ec2.terminate_instances(InstanceIds=[instance_id])
except ClientError as e:
    error_code = e.response['Error']['Code']
    if error_code == 'InvalidInstanceID.NotFound':
        print(f"Instance {instance_id} already gone")
    else:
        raise  # Re-raise unexpected errors

Logging over print:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s'
)
logger = logging.getLogger(__name__)

logger.info("Starting instance termination: %s", instance_id)
logger.error("Failed: %s", e, exc_info=True)  # includes stack trace

Type hints for maintainability:

from typing import Optional

def get_instance_ip(instance_id: str, region: str = 'us-east-1') -> Optional[str]:
    ...

Environment variables for config:

import os
DATABASE_URL = os.environ.get('DATABASE_URL') or raise ValueError("DATABASE_URL not set")

8. How do you use Python context managers in DevOps scripts?

Ideal answer:

Context managers (with statements) handle resource lifecycle — acquiring and releasing resources safely even if exceptions occur.

Built-in examples:

# File handling — file auto-closes even on exception
with open('config.yaml') as f:
    config = yaml.safe_load(f)

# Temporary directory
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
    # tmpdir auto-deleted after block
    subprocess.run(['git', 'clone', repo_url, tmpdir])

Custom context manager for deployment locking:

from contextlib import contextmanager
import boto3

@contextmanager
def deployment_lock(lock_name: str):
    dynamodb = boto3.client('dynamodb')
    try:
        dynamodb.put_item(
            TableName='DeploymentLocks',
            Item={'lock_name': {'S': lock_name}},
            ConditionExpression='attribute_not_exists(lock_name)'
        )
        yield
    finally:
        dynamodb.delete_item(
            TableName='DeploymentLocks',
            Key={'lock_name': {'S': lock_name}}
        )

with deployment_lock('production-deploy'):
    deploy_application()

9. How do you work with environment variables and configuration in Python DevOps scripts?

Ideal answer:

Direct env vars (simple scripts):

import os
token = os.environ['GITHUB_TOKEN']  # raises KeyError if missing
token = os.environ.get('GITHUB_TOKEN', 'default')  # returns default

python-dotenv (for local development):

from dotenv import load_dotenv
load_dotenv()  # loads .env file in development, ignored in production

Pydantic settings (production-grade config validation):

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    aws_region: str = 'us-east-1'
    slack_webhook_url: str  # required — validation error if missing
    deployment_timeout: int = 300

    class Config:
        env_file = '.env'

settings = Settings()

What not to do: Hardcode credentials, commit .env files, use config files with secrets in version control.


10. How do you use Python for log analysis and alerting?

Ideal answer:

import re
from collections import defaultdict
from datetime import datetime, timedelta

def analyze_error_rate(log_file: str, window_minutes: int = 5) -> dict:
    error_counts = defaultdict(int)
    total_counts = defaultdict(int)
    cutoff = datetime.now() - timedelta(minutes=window_minutes)

    with open(log_file) as f:
        for line in f:
            # Parse nginx access log
            match = re.match(
                r'(?P<ip>\S+) .* \[(?P<time>[^\]]+)\] "\S+ (?P<path>\S+) \S+" (?P<status>\d+)',
                line
            )
            if not match:
                continue
            timestamp = datetime.strptime(match.group('time'), '%d/%b/%Y:%H:%M:%S %z')
            path = match.group('path')
            status = int(match.group('status'))

            total_counts[path] += 1
            if status >= 500:
                error_counts[path] += 1

    return {
        path: error_counts[path] / total_counts[path]
        for path in total_counts
        if total_counts[path] > 10  # ignore low-traffic endpoints
    }

Alerting to Slack:

import requests

def slack_alert(webhook_url: str, message: str, level: str = 'warning'):
    color = {'critical': '#FF0000', 'warning': '#FFA500', 'info': '#36a64f'}[level]
    requests.post(webhook_url, json={
        'attachments': [{'color': color, 'text': message, 'ts': int(datetime.now().timestamp())}]
    })

Section 3: Advanced Python for DevOps

11. How do you write async Python for DevOps tooling?

Ideal answer:

Async Python is useful for DevOps scripts that make many I/O-bound operations concurrently — checking health endpoints, querying multiple APIs, waiting for multiple resources.

import asyncio
import aiohttp

async def check_endpoint(session: aiohttp.ClientSession, url: str) -> dict:
    try:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=5)) as response:
            return {'url': url, 'status': response.status, 'healthy': response.status == 200}
    except Exception as e:
        return {'url': url, 'status': None, 'healthy': False, 'error': str(e)}

async def check_all_services(urls: list[str]) -> list[dict]:
    async with aiohttp.ClientSession() as session:
        tasks = [check_endpoint(session, url) for url in urls]
        return await asyncio.gather(*tasks)

results = asyncio.run(check_all_services([
    'https://service-a.example.com/health',
    'https://service-b.example.com/health',
    'https://service-c.example.com/health',
]))

Checking 100 endpoints sequentially might take 100 seconds. Async does it in ~5 seconds (bounded by the slowest response).


12. How do you use Python to build a simple health-check automation service?

Ideal answer:

import asyncio
import aiohttp
import boto3
from dataclasses import dataclass

@dataclass
class HealthCheck:
    name: str
    url: str
    expected_status: int = 200

async def run_health_checks(checks: list[HealthCheck]) -> None:
    sns = boto3.client('sns')

    async with aiohttp.ClientSession() as session:
        while True:
            for check in checks:
                try:
                    async with session.get(check.url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
                        if resp.status != check.expected_status:
                            sns.publish(
                                TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
                                Subject=f'Health Check Failed: {check.name}',
                                Message=f'{check.url} returned {resp.status}'
                            )
                except Exception as e:
                    sns.publish(
                        TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
                        Subject=f'Health Check Error: {check.name}',
                        Message=str(e)
                    )
            await asyncio.sleep(60)  # Check every minute

13. How do you write Python Lambda functions for AWS automation?

Ideal answer:

Python is the most popular Lambda runtime. Common DevOps Lambda patterns:

Auto-remediation Lambda (triggered by CloudWatch Alarm):

import boto3
import json

def lambda_handler(event, context):
    # Parse SNS notification from CloudWatch Alarm
    message = json.loads(event['Records'][0]['Sns']['Message'])
    alarm_name = message['AlarmName']

    if 'high-memory' in alarm_name:
        # Extract instance ID from alarm dimensions
        dimensions = message['Trigger']['Dimensions']
        instance_id = next(d['value'] for d in dimensions if d['name'] == 'InstanceId')

        ec2 = boto3.client('ec2')
        # Reboot the instance
        ec2.reboot_instances(InstanceIds=[instance_id])
        return {'action': 'rebooted', 'instance': instance_id}

Best practices:

  • Use environment variables for configuration (not hardcoded ARNs)
  • Keep handler function thin — put business logic in importable modules
  • Use aws_lambda_powertools for logging, tracing, metrics
  • Set appropriate memory and timeout — profile locally with aws-lambda-rie

14. How do you test Python DevOps scripts?

Ideal answer:

Unit tests with mocking (for AWS/cloud code):

import pytest
from unittest.mock import MagicMock, patch

def terminate_old_instances(ec2_client, age_hours=24):
    response = ec2_client.describe_instances()
    # ... terminate instances older than age_hours

def test_terminate_old_instances():
    mock_ec2 = MagicMock()
    mock_ec2.describe_instances.return_value = {
        'Reservations': [{
            'Instances': [{
                'InstanceId': 'i-1234',
                'LaunchTime': datetime.now(timezone.utc) - timedelta(hours=48)
            }]
        }]
    }
    terminate_old_instances(mock_ec2, age_hours=24)
    mock_ec2.terminate_instances.assert_called_once_with(InstanceIds=['i-1234'])

moto (AWS mock library):

from moto import mock_ec2

@mock_ec2
def test_with_real_boto3():
    ec2 = boto3.client('ec2', region_name='us-east-1')
    # moto intercepts all boto3 calls — no real AWS needed
    ec2.run_instances(ImageId='ami-12345678', MinCount=1, MaxCount=1)

Integration tests: Use a dedicated test AWS account or GCP project. Tag all test resources for easy cleanup. Use IAM boundaries to prevent test code from affecting production.


15. How do you use Python virtual environments and dependency management in DevOps?

Ideal answer:

Virtual environments isolate project dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Poetry (modern dependency management):

poetry init
poetry add boto3 click pydantic-settings
poetry add --group dev pytest moto[ec2]
poetry install

In Docker / CI (pinned dependencies for reproducibility):

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

requirements.txt best practices:

  • Pin exact versions in production (boto3==1.34.0) for reproducibility
  • Separate dev dependencies (requirements-dev.txt)
  • Use pip-compile (pip-tools) to generate pinned requirements from abstract dependencies

2026 tooling: uv (Rust-based Python package manager) is 10-100x faster than pip and becoming standard for CI/CD pipelines.


16. How do you use Python to automate Slack notifications in a DevOps pipeline?

Ideal answer:

import requests
from datetime import datetime

def send_deployment_notification(
    webhook_url: str,
    service: str,
    version: str,
    environment: str,
    success: bool,
    deployer: str = 'CI/CD'
) -> None:
    color = '#36a64f' if success else '#FF0000'
    status = 'Deployed Successfully' if success else 'Deployment Failed'

    requests.post(webhook_url, json={
        'attachments': [{
            'color': color,
            'title': f'{status}: {service}',
            'fields': [
                {'title': 'Service', 'value': service, 'short': True},
                {'title': 'Version', 'value': version, 'short': True},
                {'title': 'Environment', 'value': environment, 'short': True},
                {'title': 'Deployed by', 'value': deployer, 'short': True},
            ],
            'footer': 'InterviewDrill Deploy Bot',
            'ts': int(datetime.now().timestamp())
        }]
    })

Slack SDK for complex interactions:

from slack_sdk import WebClient

client = WebClient(token=slack_bot_token)
client.chat_postMessage(
    channel='#deployments',
    blocks=[...]  # Block Kit for rich formatting
)

17. How do you use Python for infrastructure cost analysis?

Ideal answer:

import boto3
from datetime import datetime, timedelta

def get_weekly_cost_by_service() -> dict:
    ce = boto3.client('ce', region_name='us-east-1')

    end = datetime.now().strftime('%Y-%m-%d')
    start = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d')

    response = ce.get_cost_and_usage(
        TimePeriod={'Start': start, 'End': end},
        Granularity='DAILY',
        GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}],
        Metrics=['BlendedCost']
    )

    service_costs = {}
    for result in response['ResultsByTime']:
        for group in result['Groups']:
            service = group['Keys'][0]
            cost = float(group['Metrics']['BlendedCost']['Amount'])
            service_costs[service] = service_costs.get(service, 0) + cost

    return dict(sorted(service_costs.items(), key=lambda x: x[1], reverse=True))

costs = get_weekly_cost_by_service()
for service, cost in list(costs.items())[:10]:
    print(f"{service}: ${cost:.2f}")

18. Common Python gotchas in DevOps scripts?

Why they ask this: Real-world experience question — they want to see you've hit production issues, not just written toy scripts.

1. Mutable default arguments:

# WRONG — list is created once and shared across all calls
def add_tag(instance_id, tags=[]):
    tags.append({'Key': 'managed', 'Value': 'true'})  # bug!

# RIGHT
def add_tag(instance_id, tags=None):
    if tags is None:
        tags = []

2. Not handling pagination:

Most AWS APIs return max 100-1000 results. Scripts that don't paginate silently miss data.

3. Catching too broadly:

try:
    do_something()
except Exception:
    pass  # swallows every error including programming mistakes

4. Timezone-naive datetimes:

# AWS timestamps are timezone-aware — comparing with naive datetime raises TypeError
from datetime import timezone
now = datetime.now(timezone.utc)  # always use timezone-aware

5. String formatting with secrets:

logger.info(f"Connecting to DB: {connection_string}")  # leaks credentials to logs
logger.info("Connecting to DB: %s", masked_url)  # use parameterized logging

19. How do you structure a Python DevOps project?

Ideal answer:

my-devops-tool/
├── src/
│   └── mytools/
│       ├── __init__.py
│       ├── aws/
│       │   ├── ec2.py
│       │   └── s3.py
│       ├── k8s/
│       │   └── deployments.py
│       └── cli.py          # Click CLI entry points
├── tests/
│   ├── unit/
│   └── integration/
├── pyproject.toml          # Poetry / modern packaging
├── Dockerfile
└── .github/
    └── workflows/
        └── test.yml

Entry points in pyproject.toml:

[tool.poetry.scripts]
mytools = "mytools.cli:cli"

After pip install -e ., users run mytools deploy --env production instead of python -m mytools.cli deploy.

CI setup: Run ruff (linting), mypy (type checking), and pytest with coverage in CI before merging.


20. How do you manage Python in a Docker-based DevOps environment?

Ideal answer:

Multi-stage Dockerfile for Python tools:

FROM python:3.12-slim AS base

# Install uv for fast dependency resolution
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

FROM base AS builder
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project
COPY src/ src/
RUN uv sync --frozen --no-dev

FROM base AS final
WORKDIR /app
COPY --from=builder /app /app
ENV PATH="/app/.venv/bin:$PATH"
ENTRYPOINT ["mytools"]

Key patterns:

  • Use python:3.12-slim not python:3.12 — reduces image size from ~1GB to ~130MB
  • Pin Python version in the image tag — python:3.12 updates; python:3.12.3 doesn't
  • Use uv instead of pip in CI for significantly faster installs
  • Run as non-root: USER 1001
  • Don't include test dependencies in production images

Reading helps. Practicing wins interviews.

Practice these exact questions with an AI interviewer that pushes back. First session completely free.

Start Practicing Free →