Learn Python Series (#39) - Testing Your Code Part 2

Repository
What will I learn?
- You will learn why mocking exists and what problem it solves in testing;
- the mental model behind test isolation and dependency replacement;
- what test coverage actually measures and what it doesn't tell you;
- how test-driven development changes the way you write code;
- the difference between testing behavior vs testing implementation.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.11+) distribution;
- The ambition to learn Python programming.
Difficulty
- Intermediate
Curriculum (of the Learn Python Series):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
- Learn Python Series (#11) - NumPy Part 1
- Learn Python Series (#12) - Handling Files
- Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1
- Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2
- Learn Python Series (#15) - Handling JSON
- Learn Python Series (#16) - Mini Project - Developing a Web Crawler Part 3
- Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data
- Learn Python Series (#19) - PyMongo Part 2
- Learn Python Series (#20) - PyMongo Part 3
- Learn Python Series (#21) - Handling Dates and Time Part 1
- Learn Python Series (#22) - Handling Dates and Time Part 2
- Learn Python Series (#23) - Handling Regular Expressions Part 1
- Learn Python Series (#24) - Handling Regular Expressions Part 2
- Learn Python Series (#25) - Handling Regular Expressions Part 3
- Learn Python Series (#26) - pipenv & Visual Studio Code
- Learn Python Series (#27) - Handling Strings Part 3 (F-Strings)
- Learn Python Series (#28) - Using Pickle and Shelve
- Learn Python Series (#29) - Handling CSV
- Learn Python Series (#30) - Data Science Part 1 - Pandas
- Learn Python Series (#31) - Data Science Part 2 - Pandas
- Learn Python Series (#32) - Data Science Part 3 - Pandas
- Learn Python Series (#33) - Data Science Part 4 - Pandas
- Learn Python Series (#34) - Working with APIs in 2026: What's Changed
- Learn Python Series (#35) - Working with APIs Part 2: Beyond GET Requests
- Learn Python Series (#36) - Type Hints and Modern Python
- Learn Python Series (#37) - Virtual Environments and Dependency Management
- Learn Python Series (#38) - Testing Your Code Part 1
- Learn Python Series (#39) - Testing Your Code Part 2 (this post)
GitHub Account
Learn Python Series (#39) - Testing Your Code Part 2
In episode #38, we covered the fundamentals of testing - assertions, fixtures, test organization. But we tested simple functions that don't depend on anything external. Real code isn't that isolated.
Your code makes HTTP requests. It reads files. It queries databases. It uses the current time. It sends emails. How do you test code that depends on systems you don't control?
This is the problem mocking solves.
Nota bene: This episode is about test isolation - understanding WHY and WHEN to fake external dependencies, not just HOW to use the mock library.
The isolation problem
Imagine testing a function that fetches cryptocurrency prices from an API:
def get_bitcoin_price():
response = requests.get("https://api.example.com/price/bitcoin")
return response.json()["price"]
If you run this test, it makes a real HTTP request. What happens?
The test is slow. Network requests take hundreds of milliseconds. Run 100 tests, wait 10+ seconds. Fast tests are crucial for rapid feedback.
The test is unreliable. The API might be down. Your network might be offline. Rate limits might block you. The test fails not because YOUR code broke, but because something external failed. This is called a "flaky test" - sometimes passes, sometimes fails, unpredictably.
The test affects external state. What if this function sends an email, or charges a credit card, or posts to social media? Running tests would trigger real actions. Dangerous.
The test depends on specific data. The API returns whatever the current Bitcoin price is. Your assertion assert price == 70000 fails tomorrow when the price changes. You're testing the API's behavior, not your code's behavior.
The core insight: you want to test YOUR CODE, not the external systems it depends on. The solution: replace those external dependencies with fakes you control.
Mocking: controlled fakes
A mock is a fake object that simulates the behavior of a real object. You configure it to return specific values, raise specific exceptions, track how it was called. Then you swap it in place of the real object during testing.
The mental model: think of mocks like stunt doubles in movies. The real actor is expensive, difficult to schedule, sometimes risky. The stunt double looks similar enough and follows the script you give them. The movie treats them as the real person, but you control them completely.
In testing, the "script" is: "When someone calls this method, return this value." The test verifies your code behaves correctly given that scripted response.
Test doubles: the vocabulary
Before diving into Python's mock library, understand the terminology:
Mock: An object that records how it was used and lets you make assertions about those interactions. "Was this method called? How many times? With what arguments?"
Stub: A fake that returns predefined values. You configure: "When get_price() is called, return 70000." No interaction tracking, just canned responses.
Spy: A wrapper around a real object that records calls while still delegating to the real implementation. Useful for verifying interactions without changing behavior.
Fake: A working implementation with simplified behavior. Example: an in-memory database instead of PostgreSQL. It works, but takes shortcuts for testing purposes.
Python's unittest.mock library provides Mock objects that can act as mocks, stubs, or spies depending on how you use them. The distinction matters conceptually more than practically.
Patching: temporary replacement
The patch mechanism temporarily replaces an object (function, class, method) with a mock for the duration of a test. After the test, the original is restored.
Think of it like prop substitution in theater. For the scene where the character breaks a vase, you swap the real expensive vase with a breakable prop. After the scene, the real vase goes back on stage.
In code:
@patch('requests.get')
def test_get_bitcoin_price(mock_get):
mock_get.return_value.json.return_value = {"price": 70000}
price = get_bitcoin_price()
assert price == 70000
During this test, requests.get doesn't make a real HTTP request. It's replaced with mock_get, which you've configured to return a fake response. Your function calls requests.get() thinking it's the real thing, gets the fake response, and proceeds normally. The test verifies your code correctly extracts the price from the response structure.
After the test, requests.get is the real function again. No permanent changes.
What to mock, what not to mock
Mocking is powerful but can be misused. The guideline: mock external dependencies you don't control, not your own code.
Mock these:
- HTTP requests to external APIs (requests.get, httpx.get)
- Database connections and queries
- File system operations that aren't part of what you're testing
- Email sending (SMTP)
- Current time/date when timing matters
- Random number generators when you need deterministic tests
- Third-party library calls you don't control
Don't mock these:
- Your own functions and classes (test them directly)
- Simple data structures (lists, dicts - just use real ones)
- Built-in Python operations that are fast and reliable
- The code under test itself (defeats the purpose)
Mocking your own code creates fragile tests coupled to implementation details. If you refactor, tests break even though behavior didn't change. Mock at boundaries - where your code talks to external systems.
Testing exceptions and error paths
Real systems fail. APIs return errors. Databases disconnect. Files don't exist. Your code needs error handling, and you need to test that handling.
Mocks let you simulate failures on demand:
@patch('requests.get')
def test_handles_api_timeout(mock_get):
mock_get.side_effect = requests.Timeout("Connection timed out")
result = get_bitcoin_price_with_fallback()
assert result is None
The side_effect attribute makes the mock raise an exception instead of returning a value. This tests your error handling path without waiting for a real timeout or manually breaking your network.
You can verify your code handles failures gracefully: returns defaults, logs errors, retries appropriately, whatever your error handling strategy is.
Test coverage: what it measures
Test coverage tools count which lines of your code execute during tests. The metric: "What percentage of statements were run?"
Install pytest-cov and run:
pytest --cov=myproject
You'll see output like: "calculator.py: 85% coverage". This means 85% of lines in calculator.py executed during your test suite.
But here's the critical insight: coverage measures execution, not correctness. A line that executed isn't necessarily tested properly.
Consider:
def divide(a, b):
result = a / b
return result
This test achieves 100% coverage:
def test_divide():
divide(10, 2)
Every line runs. But there's no assertion! The test verifies nothing. The function could return wrong results and the test would still pass.
Coverage tells you what code ISN'T tested (0% coverage lines definitely have no tests). It doesn't tell you what code IS tested well. Use coverage to find untested code, not as proof of quality.
Aim for high coverage, but focus on meaningful assertions. 100% coverage with weak tests is worse than 80% coverage with strong tests.
Test-driven development: tests first, code second
TDD inverts the normal workflow. Instead of: write code, then write tests, you do: write test, then write code.
The cycle:
1. Red: Write a test for behavior that doesn't exist yet. Run it. It fails (red).
2. Green: Write the minimal code to make that test pass. Run it. It succeeds (green).
3. Refactor: Improve the code without changing behavior. Tests stay green.
Repeat this cycle for every small piece of functionality.
Why does this matter? TDD forces you to think about interface before implementation. The test describes WHAT you want, forcing you to clarify requirements. Then you implement HOW.
This often leads to simpler, more focused code. You only write what's needed to pass the test. No speculative features, no over-engineering. The tests become executable specifications of expected behavior.
TDD isn't always appropriate. For exploratory work where you don't know what you're building yet, write code first. For well-defined features with clear requirements, TDD shines.
Behavior vs implementation testing
This distinction is subtle but crucial.
Behavior testing: Verify what the code DOES from the outside. "Given input X, it returns Y." "When the database is unavailable, it raises DatabaseError." These tests describe user-visible behavior.
Implementation testing: Verify HOW the code works internally. "It calls the cache before querying the database." "It uses binary search instead of linear search."
Behavior tests are resilient to refactoring. You can completely rewrite the internals - change algorithms, restructure classes, swap dependencies - and behavior tests still pass if the external behavior is unchanged.
Implementation tests are fragile. Change how the code works and tests break even though the behavior is identical.
Prefer behavior testing. Mock external dependencies at boundaries, but test your own code's behavior, not its implementation details. This lets you refactor confidently.
Practical example: testing with time
Code that depends on current time is hard to test. The time changes every second. How do you write reproducible tests?
Mock the time:
from datetime import datetime
from unittest.mock import patch
def is_weekend():
return datetime.now().weekday() >= 5
@patch('mymodule.datetime')
def test_is_weekend_on_saturday(mock_datetime):
mock_datetime.now.return_value = datetime(2026, 2, 15) # Saturday
assert is_weekend() is True
@patch('mymodule.datetime')
def test_is_weekend_on_tuesday(mock_datetime):
mock_datetime.now.return_value = datetime(2026, 2, 11) # Tuesday
assert is_weekend() is False
By controlling time, you make tests deterministic. They pass today, tomorrow, and five years from now, because the time is fixed during the test.
Configuration and markers
For real projects, configure pytest behavior. Add to pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
"slow: tests that take >1 second",
"integration: tests requiring external services",
]
Mark tests by category:
@pytest.mark.slow
def test_large_computation():
pass
@pytest.mark.integration
def test_database_connection():
pass
Then run selectively:
pytest -m "not slow"
This runs only fast tests - useful during development when you want instant feedback. Run the full suite (including slow and integration tests) in CI before merging.
What you should remember
In this episode, we covered advanced testing concepts:
- Why mocking exists: isolating code from unreliable, slow, or dangerous external dependencies
- The mental model of test doubles and patching
- What to mock (external boundaries) vs what to test directly (your own code)
- Test coverage measures execution, not correctness - use it to find gaps, not prove quality
- Test-driven development: write tests first to clarify requirements and drive design
- Behavior testing vs implementation testing: test WHAT code does, not HOW it does it
- Mocking time and other environmental factors for deterministic tests
Testing isn't about achieving metrics. It's about building confidence that your code works correctly and continues working as you change it.