Python Logging: A Stroll Through the Source Code

by Brad Solomon May 22, 2019 best-practices intermediate

The Python logging package is a a lightweight but extensible package for keeping better track of what your own code does. Using it gives you much more flexibility than just littering your code with superfluous print() calls.

However, Python’s logging package can be complicated in certain spots. Handlers, loggers, levels, namespaces, filters: it’s not easy to keep track of all of these pieces and how they interact.

One way to tie up the loose ends in your understanding of logging is to peek under the hood to its CPython source code. The Python code behind logging is concise and modular, and reading through it can help you get that aha moment.

This article is meant to complement the logging HOWTO document as well as Logging in Python, which is a walkthrough on how to use the package.

By the end of this article, you’ll be familiar with the following:

  • logging levels and how they work
  • Thread-safety versus process-safety in logging
  • The design of logging from an OOP perspective
  • Logging in libraries vs applications
  • Best practices and design patterns for using logging

For the most part, we’ll go line-by-line down the core module in Python’s logging package in order to build a picture of how it’s laid out.

How to Follow Along

Because the logging source code is central to this article, you can assume that any code block or link is based on a specific commit in the Python 3.7 CPython repository, namely commit d730719. You can find the logging package itself in the Lib/ directory within the CPython source.

Within the logging package, most of the heavy lifting occurs within logging/__init__.py, which is the file you’ll spend the most time on here:

cpython/
│
├── Lib/
│   ├── logging/
│   │   ├── __init__.py
│   │   ├── config.py
│   │   └── handlers.py
│   ├── ...
├── Modules/
├── Include/
...
... [truncated]

With that, let’s jump in.

Preliminaries

Before we get to the heavyweight classes, the top hundred lines or so of __init__.py introduce a few subtle but important concepts.

Preliminary #1: A Level Is Just an int!

Objects like logging.INFO or logging.DEBUG can seem a bit opaque. What are these variables internally, and how are they defined?

In fact, the uppercase constants from Python’s logging are just integers, forming an enum-like collection of numerical levels:

CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0

Why not just use the strings "INFO" or "DEBUG"? Levels are int constants to allow for the simple, unambiguous comparison of one level with another. They are given names as well to lend them semantic meaning. Saying that a message has a severity of 50 may not be immediately clear, but saying that it has a level of CRITICAL lets you know that you’ve got a flashing red light somewhere in your program.

Now, technically, you can pass just the str form of a level in some places, such as logger.setLevel("DEBUG"). Internally, this will call _checkLevel(), which ultimately does a dict lookup for the corresponding int:

_nameToLevel = {
    'CRITICAL': CRITICAL,
    'FATAL': FATAL,
    'ERROR': ERROR,
    'WARN': WARNING,
    'WARNING': WARNING,
    'INFO': INFO,
    'DEBUG': DEBUG,
    'NOTSET': NOTSET,
}

def _checkLevel(level):
    if isinstance(level, int):
        rv = level
    elif str(level) == level:
        if level not in _nameToLevel:
            raise ValueError("Unknown level: %r" % level)
        rv = _nameToLevel[level]
    else:
        raise TypeError("Level not an integer or a valid string: %r" % level)
    return rv

Which should you prefer? I’m not too opinionated on this, but it’s notable that the logging docs consistently use the form logging.DEBUG rather than "DEBUG" or 10. Also, passing the str form isn’t an option in Python 2, and some logging methods such as logger.isEnabledFor() will accept only an int, not its str cousin.

Preliminary #2: Logging Is Thread-Safe, but Not Process-Safe

A few lines down, you’ll find the following short code block, which is sneakily critical to the whole package:

import threading

_lock = threading.RLock()

def _acquireLock():
     if _lock:
        _lock.acquire()

def _releaseLock():
    if _lock:
        _lock.release()

The _lock object is a reentrant lock that sits in the global namespace of the logging/__init__.py module. It makes pretty much every object and operation in the entire logging package thread-safe, enabling threads to do read and write operations without the threat of a race condition. You can see in the module source code that _acquireLock() and _releaseLock() are ubiquitous to the module and its classes.

There’s something not accounted for here, though: what about process safety? The short answer is that the logging module is not process safe. This isn’t inherently a fault of logging—generally, two processes can’t write to same file without a lot of proactive effort on behalf of the programmer first.

This means that you’ll want to be careful before using classes such as a logging.FileHandler with multiprocessing involved. If two processes want to read from and write to the same underlying file concurrently, then you can run into a nasty bug halfway through a long-running routine.

If you want to get around this limitation, there’s a thorough recipe in the official Logging Cookbook. Because this entails a decent amount of setup, one alternative is to have each process log to a separate file based on its process ID, which you can grab with os.getpid().

Package Architecture: Logging’s MRO

Now that we’ve covered some preliminary setup code, let’s take a high-level look at how logging is laid out. The logging package uses a healthy dose of OOP and inheritance. Here’s a partial look at the method resolution order (MRO) for some of the most important classes in the package:

object
│
├── LogRecord
├── Filterer
│   ├── Logger
│   │   └── RootLogger
│   └── Handler
│       ├── StreamHandler
│       └── NullHandler
├── Filter
└── Manager

The tree diagram above doesn’t cover all of the classes in the module, just those that are most worth highlighting.

This litany of classes is typically one source of confusion because there’s a lot going on, and it’s all jargon-heavy. Filter versus Filterer? Logger versus Handler? It can be challenging to keep track of everything, much less visualize how it fits together. A picture is worth a thousand words, so here’s a diagram of a scenario where one logger with two handlers attached to it writes a log message with level logging.INFO:

Flow of the logging package
Flow of logging objects (Image: Real Python)

In Python code, everything above would look like this:

import logging
import sys

logger = logging.getLogger("pylog")
logger.setLevel(logging.DEBUG)
h1 = logging.FileHandler(filename="/tmp/records.log")
h1.setLevel(logging.INFO)
h2 = logging.StreamHandler(sys.stderr)
h2.setLevel(logging.ERROR)
logger.addHandler(h1)
logger.addHandler(h2)
logger.info("testing %d.. %d.. %d..", 1, 2, 3)

There’s a more detailed map of this flow in the Logging HOWTO. What’s shown above is a simplified scenario.

Your code defines just one Logger instance, logger, along with two Handler instances, h1 and h2.

When you call logger.info("testing %d.. %d.. %d..", 1, 2, 3), the logger object serves as a filter because it also has a level associated with it. Only if the message level is severe enough will the logger do anything with the message. Because the logger has level DEBUG, and the message carries a higher INFO level, it gets the go-ahead to move on.

Internally, logger calls logger.makeRecord() to put the message string "testing %d.. %d.. %d.." and its arguments (1, 2, 3) into a bona fide class instance of a LogRecord, which is just a container for the message and its metadata.

The logger object looks around for its handlers (instances of Handler), which may be tied directly to logger itself or to its parents (a concept that we’ll touch on later). In this example, it finds two handlers:

  1. One with level INFO that dumps log data to a file at /tmp/records.log
  2. One that writes to sys.stderr but only if the incoming message is at level ERROR or higher

At this point, there’s another round of tests that kicks in. Because the LogRecord and its message only carry level INFO, the record gets written to Handler 1 (green arrow), but not to Handler 2’s stderr stream (red arrow). For Handlers, writing the LogRecord to their stream is called emitting it, which is captured in their .emit().

Next, let’s further dissect everything from above.

The LogRecord Class

What is a LogRecord? When you log a message, an instance of the LogRecord class is the object you send to be logged. It’s created for you by a Logger instance and encapsulates all the pertinent info about that event. Internally, it’s little more than a wrapper around a dict that contains attributes for the record. A Logger instance sends a LogRecord instance to zero or more Handler instances.

The LogRecord contains some metadata, such as the following:

  1. A name
  2. The creation time as a Unix timestamp
  3. The message itself
  4. Information on what function made the logging call

Here’s a peek into the metadata that it carries with it, which you can introspect by stepping through a logging.error() call with the pdb module:

::...
免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:

或是邮件反馈可也:
askdama[AT]googlegroups.com



自怼圈/年番新

DU21.7
关于 ~ DebugUself with DAMA ;-)


关注公众号, 持续获得相关各种嗯哼:
zoomquiet


粤ICP备18025058号-1
公安备案号: 44049002000656 ...::