Learn Python Series (#39) - Testing Your Code Part 2

in #stem16 hours ago (edited)

Learn Python Series (#39) - Testing Your Code Part 2

python-logo.png

Repository

What will I learn?

  • You will learn why mocking exists and what problem it solves in testing;
  • the mental model behind test isolation and dependency replacement;
  • what test coverage actually measures and what it doesn't tell you;
  • how test-driven development changes the way you write code;
  • the difference between testing behavior vs testing implementation.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • An installed Python 3(.11+) distribution;
  • The ambition to learn Python programming.

Difficulty

  • Intermediate

Curriculum (of the Learn Python Series):

GitHub Account

https://github.com/realScipio

Learn Python Series (#39) - Testing Your Code Part 2

In episode #38, we covered the fundamentals of testing - assertions, fixtures, test organization. But we tested simple functions that don't depend on anything external. Real code isn't that isolated.

Your code makes HTTP requests. It reads files. It queries databases. It uses the current time. It sends emails. How do you test code that depends on systems you don't control?

This is the problem mocking solves.

Nota bene: This episode is about test isolation - understanding WHY and WHEN to fake external dependencies, not just HOW to use the mock library.

The isolation problem

Imagine testing a function that fetches cryptocurrency prices from an API:

def get_bitcoin_price():
    response = requests.get("https://api.example.com/price/bitcoin")
    return response.json()["price"]

If you run this test, it makes a real HTTP request. What happens?

The test is slow. Network requests take hundreds of milliseconds. Run 100 tests, wait 10+ seconds. Fast tests are crucial for rapid feedback.

The test is unreliable. The API might be down. Your network might be offline. Rate limits might block you. The test fails not because YOUR code broke, but because something external failed. This is called a "flaky test" - sometimes passes, sometimes fails, unpredictably.

The test affects external state. What if this function sends an email, or charges a credit card, or posts to social media? Running tests would trigger real actions. Dangerous.

The test depends on specific data. The API returns whatever the current Bitcoin price is. Your assertion assert price == 70000 fails tomorrow when the price changes. You're testing the API's behavior, not your code's behavior.

The core insight: you want to test YOUR CODE, not the external systems it depends on. The solution: replace those external dependencies with fakes you control.

Mocking: controlled fakes

A mock is a fake object that simulates the behavior of a real object. You configure it to return specific values, raise specific exceptions, track how it was called. Then you swap it in place of the real object during testing.

The mental model: think of mocks like stunt doubles in movies. The real actor is expensive, difficult to schedule, sometimes risky. The stunt double looks similar enough and follows the script you give them. The movie treats them as the real person, but you control them completely.

In testing, the "script" is: "When someone calls this method, return this value." The test verifies your code behaves correctly given that scripted response.

Test doubles: the vocabulary

Before diving into Python's mock library, understand the terminology:

Mock: An object that records how it was used and lets you make assertions about those interactions. "Was this method called? How many times? With what arguments?"

Stub: A fake that returns predefined values. You configure: "When get_price() is called, return 70000." No interaction tracking, just canned responses.

Spy: A wrapper around a real object that records calls while still delegating to the real implementation. Useful for verifying interactions without changing behavior.

Fake: A working implementation with simplified behavior. Example: an in-memory database instead of PostgreSQL. It works, but takes shortcuts for testing purposes.

Python's unittest.mock library provides Mock objects that can act as mocks, stubs, or spies depending on how you use them. The distinction matters conceptually more than practically.

Patching: temporary replacement

The patch mechanism temporarily replaces an object (function, class, method) with a mock for the duration of a test. After the test, the original is restored.

Think of it like prop substitution in theater. For the scene where the character breaks a vase, you swap the real expensive vase with a breakable prop. After the scene, the real vase goes back on stage.

In code:

@patch('requests.get')
def test_get_bitcoin_price(mock_get):
    mock_get.return_value.json.return_value = {"price": 70000}
    
    price = get_bitcoin_price()
    
    assert price == 70000

During this test, requests.get doesn't make a real HTTP request. It's replaced with mock_get, which you've configured to return a fake response. Your function calls requests.get() thinking it's the real thing, gets the fake response, and proceeds normally. The test verifies your code correctly extracts the price from the response structure.

After the test, requests.get is the real function again. No permanent changes.

What to mock, what not to mock

Mocking is powerful but can be misused. The guideline: mock external dependencies you don't control, not your own code.

Mock these:

  • HTTP requests to external APIs (requests.get, httpx.get)
  • Database connections and queries
  • File system operations that aren't part of what you're testing
  • Email sending (SMTP)
  • Current time/date when timing matters
  • Random number generators when you need deterministic tests
  • Third-party library calls you don't control

Don't mock these:

  • Your own functions and classes (test them directly)
  • Simple data structures (lists, dicts - just use real ones)
  • Built-in Python operations that are fast and reliable
  • The code under test itself (defeats the purpose)

Mocking your own code creates fragile tests coupled to implementation details. If you refactor, tests break even though behavior didn't change. Mock at boundaries - where your code talks to external systems.

Testing exceptions and error paths

Real systems fail. APIs return errors. Databases disconnect. Files don't exist. Your code needs error handling, and you need to test that handling.

Mocks let you simulate failures on demand:

@patch('requests.get')
def test_handles_api_timeout(mock_get):
    mock_get.side_effect = requests.Timeout("Connection timed out")
    
    result = get_bitcoin_price_with_fallback()
    
    assert result is None

The side_effect attribute makes the mock raise an exception instead of returning a value. This tests your error handling path without waiting for a real timeout or manually breaking your network.

You can verify your code handles failures gracefully: returns defaults, logs errors, retries appropriately, whatever your error handling strategy is.

Test coverage: what it measures

Test coverage tools count which lines of your code execute during tests. The metric: "What percentage of statements were run?"

Install pytest-cov and run:

pytest --cov=myproject

You'll see output like: "calculator.py: 85% coverage". This means 85% of lines in calculator.py executed during your test suite.

But here's the critical insight: coverage measures execution, not correctness. A line that executed isn't necessarily tested properly.

Consider:

def divide(a, b):
    result = a / b
    return result

This test achieves 100% coverage:

def test_divide():
    divide(10, 2)

Every line runs. But there's no assertion! The test verifies nothing. The function could return wrong results and the test would still pass.

Coverage tells you what code ISN'T tested (0% coverage lines definitely have no tests). It doesn't tell you what code IS tested well. Use coverage to find untested code, not as proof of quality.

Aim for high coverage, but focus on meaningful assertions. 100% coverage with weak tests is worse than 80% coverage with strong tests.

Test-driven development: tests first, code second

TDD inverts the normal workflow. Instead of: write code, then write tests, you do: write test, then write code.

The cycle:

1. Red: Write a test for behavior that doesn't exist yet. Run it. It fails (red).

2. Green: Write the minimal code to make that test pass. Run it. It succeeds (green).

3. Refactor: Improve the code without changing behavior. Tests stay green.

Repeat this cycle for every small piece of functionality.

Why does this matter? TDD forces you to think about interface before implementation. The test describes WHAT you want, forcing you to clarify requirements. Then you implement HOW.

This often leads to simpler, more focused code. You only write what's needed to pass the test. No speculative features, no over-engineering. The tests become executable specifications of expected behavior.

TDD isn't always appropriate. For exploratory work where you don't know what you're building yet, write code first. For well-defined features with clear requirements, TDD shines.

Behavior vs implementation testing

This distinction is subtle but crucial.

Behavior testing: Verify what the code DOES from the outside. "Given input X, it returns Y." "When the database is unavailable, it raises DatabaseError." These tests describe user-visible behavior.

Implementation testing: Verify HOW the code works internally. "It calls the cache before querying the database." "It uses binary search instead of linear search."

Behavior tests are resilient to refactoring. You can completely rewrite the internals - change algorithms, restructure classes, swap dependencies - and behavior tests still pass if the external behavior is unchanged.

Implementation tests are fragile. Change how the code works and tests break even though the behavior is identical.

Prefer behavior testing. Mock external dependencies at boundaries, but test your own code's behavior, not its implementation details. This lets you refactor confidently.

Practical example: testing with time

Code that depends on current time is hard to test. The time changes every second. How do you write reproducible tests?

Mock the time:

from datetime import datetime
from unittest.mock import patch

def is_weekend():
    return datetime.now().weekday() >= 5

@patch('mymodule.datetime')
def test_is_weekend_on_saturday(mock_datetime):
    mock_datetime.now.return_value = datetime(2026, 2, 15)  # Saturday
    assert is_weekend() is True

@patch('mymodule.datetime')  
def test_is_weekend_on_tuesday(mock_datetime):
    mock_datetime.now.return_value = datetime(2026, 2, 11)  # Tuesday
    assert is_weekend() is False

By controlling time, you make tests deterministic. They pass today, tomorrow, and five years from now, because the time is fixed during the test.

Configuration and markers

For real projects, configure pytest behavior. Add to pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "slow: tests that take >1 second",
    "integration: tests requiring external services",
]

Mark tests by category:

@pytest.mark.slow
def test_large_computation():
    pass

@pytest.mark.integration
def test_database_connection():
    pass

Then run selectively:

pytest -m "not slow"

This runs only fast tests - useful during development when you want instant feedback. Run the full suite (including slow and integration tests) in CI before merging.

What you should remember

In this episode, we covered advanced testing concepts:

  • Why mocking exists: isolating code from unreliable, slow, or dangerous external dependencies
  • The mental model of test doubles and patching
  • What to mock (external boundaries) vs what to test directly (your own code)
  • Test coverage measures execution, not correctness - use it to find gaps, not prove quality
  • Test-driven development: write tests first to clarify requirements and drive design
  • Behavior testing vs implementation testing: test WHAT code does, not HOW it does it
  • Mocking time and other environmental factors for deterministic tests

Testing isn't about achieving metrics. It's about building confidence that your code works correctly and continues working as you change it.

Bedankt en tot de volgende keer!

@scipio