How to create a built-in `frame` object in Python?

Recently, I ran into a problem about using pdb in bytefall (a Python virtual machine implemented in Python). It’s not a bug-like problem, but just a curiousity-triggered one.

pdb worked fine in bytefall, but all internal execution flow will be revealed once pdb.set_trace() is called in a user script. It might be annoying if users don’t want to get those information.

Then, a question comes to my mind:
Is it possible to make a switch to run pdb with/without revealing the internal of bytefall virtual machine?

During the developing of this feature, I found that the pyframe.Frame object cannot be used as a duck-typed frame object while using command ll in pdb. The error we got is: TypeError: module, class, method, function, traceback, frame, or code object was expected, got Frame.

Quack, you should give me a frame object

Here is the simplified traceback of that error:

1
2
3
4
5
6
7
8
9
10
pdb.py::do_longlist
-> pdb.py::getsourcelines
-> inspect.py::findsource
-> inspect.py::getsourcefile
-> inspect.py::getfile

# Inside `inspect.py::getfile`, `inspect.py::isframe` is called.
# And this is how `inspect.py::isframe` implemented:
def isframe(object):
return isinstance(object, types.FrameType)

As we know that we can make it pass the check of isinstance(obj, SomeType) by making class of obj inheriting SomeType. e.g.

1
2
3
4
5
class MyList(list):
...

print(isinstance(MyList(), list))
# Output: True

But we are not allowed to do the same thing for frame.

1
2
3
4
5
6
import types
class MyFrame(types.FrameType):
...

print(isinstance(MyFrame(), types.FrameType))
# Got `TypeError: type 'frame' is not an acceptable base type`

Why? After googling, I found a related post on stackoverflow talking about this exception. In short, Py_TPFLAGS_BASETYPE is not set in the implementation of PyFrameObject, thus it cannot be subclassed. We can see that in cpython/Objects/frameobject.c.

And here is the definition of that flag:

  • Py_TPFLAGS_BASETYPE
    This bit is set when the type can be used as the base type of another type. If this bit is clear, the type cannot be subtyped (similar to a “final” class in Java).

(further reading: PEP 253 – Subtyping Built-in Types, Python history - Metaclasses and extension classes (a.k.a “The Killer Joke”))

It not the time to give up yet

Though it’s a frustrating news, I started searching with keywords like “Python, create builtin object”. Then something interesting showed up: How to create a traceback object.

Said by the answerer @abarnert in that post:

The PyTraceBack type is not part of the public API. But (except for being defined in the Python directory instead of the Object directory) it’s built as a C API type, just not documented. … well, there’s no PyTraceBack_New, but there is a PyTraceBack_Here that constructs a new traceback and swaps it into the current exception info.

It reminded me of one thing I missed before: “If one thing is an object, then there (usually) should be a constructor.”

And, yeah, there is a function called PyFrame_New.

Next, we need to figure out how to call PyFrame_New() from Python side.

Since it’s a C function, we can try to access it through ctypes.pythonapi. Roughly speaking, this is what we want to do:

1
2
3
4
5
6
7
8
import ctypes

frame = ctypes.pythonapi.PyFrame_New(
..., # thread state
..., # a code object
..., # a dict of globals
... # a dict of locals
)

Play with ctypes

There are a few things worth noting:

  1. Before calling a c_func, its argtypes and restype should be given.

  2. According to the signature of PyFrame_New, there is a pointer of PyThreadState object should be given. However, it isn’t an object that we can access in Python directly.

  3. As @abarnert mentioned:

    Also, both are CPython-specific, require not just using the C API layer but using undocumented types and functions that could change at any moment, and offer the potential for new and exciting opportunities to segfault your interpreter.

    Compatibility and robustness of our implementation should be taken care of.

Let’s start doing this step by step (to avoid confusion and being affected by the changes among versions, here we are taking CPython 3.7 as the runtime):

  1. According to point 1, we should rewrite the code above into this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    import ctypes

    ctypes.pythonapi.PyFrame_New.argtypes = (
    ..., # PyThreadState*
    ..., # PyCodeObject*
    ..., # PyObject*
    ... # PyObject*
    )
    ctypes.pythonapi.PyFrame_New.restype = (
    ... # PyFrameObject*
    )

    frame = ctypes.pythonapi.PyFrame_New(
    ..., # thread state
    ..., # a code object
    ..., # a dict of globals
    ... # a dict of locals
    )

    But there is a problem: “Except ctypes.py_object, there are no other types of Python object defined as py_threadstate, py_codeobject and py_frameobject.”

    Typically, we have to define some classes inheriting ctypes.Structure with _fields_ in which all members of those internal types are defined. Then assign those classes to argtypes and restype. Take PyThreadState as an example, we have to deal with THESE THINGS.

    Ok, it sounds like a complicated work to do, but there is actually a shortcut for this. Let’s take a look at the signature of PyFrame_New again:

    1
    2
    3
    4
    PyFrameObject*
    PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
    PyObject *globals, PyObject *locals)
    { /* ... */ }


    From the aspect of C, what we have to do is passing pointers of objects to the function. Therefore, we can use ctypes.POINTER(...) as a type for PyThreadState*, PyCodeObject*. (reminder: we just need to use ctypes.py_object for PyObject*)

    According to the documentation of ctypes.POINTER(...), it takes a type defined in ctypes as argument. But what is the type of pointer we need to use?

    As we know that a pointer is a container storing memory address, what argument of ctypes.POINTER(...) takes depends on the architecture of your computer. That is, we should use ctypes.c_ulong for x64 and ctypes.c_uint for x86.

    By doing this, we are also increasing the compatibility of our implementation. And the progress of our implementation is shown as below:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    import ctypes

    # Check whether we are on a x64 or x86 platform by checking the size of `void*`
    # 8-byte for x64, 4-byte for x86
    P_SIZE = ctypes.sizeof(ctypes.c_void_p)
    IS_X64 = P_SIZE == 8

    P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)

    ctypes.pythonapi.PyFrame_New.argtypes = (
    P_MEM_TYPE, # PyThreadState *tstate
    P_MEM_TYPE, # PyCodeObject *code
    ctypes.py_object, # PyObject *globals
    ctypes.py_object # PyObject *locals
    )
    # We can use `ctypes.py_object` for this. Because we are going to
    # manipulate it in Python instead of C.
    ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*

    frame = ctypes.pythonapi.PyFrame_New(
    ..., # thread state
    ..., # a code object
    ..., # a dict of globals
    ... # a dict of locals
    )
  2. Now we are going to pass arguments to the function call PyFrame_New().
    To make it easier to be understood, here we define a simple function greet() for setting 2nd argument up later, and directly use globals() and locals() as the 3rd and 4th argument respectively. As for the first argument tstate, we will talk about it in next step.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    import ctypes

    P_SIZE = ctypes.sizeof(ctypes.c_void_p)
    IS_X64 = P_SIZE == 8
    P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)

    ctypes.pythonapi.PyFrame_New.argtypes = (
    P_MEM_TYPE, # PyThreadState *tstate
    P_MEM_TYPE, # PyCodeObject *code
    ctypes.py_object, # PyObject *globals
    ctypes.py_object # PyObject *locals
    )
    ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*

    # A simple function for demonstration
    def greet():
    print('hello')

    frame = ctypes.pythonapi.PyFrame_New(
    ..., # thread state
    ctypes.cast(id(greet.__code__), P_MEM_TYPE), # a code object
    globals(), # a dict of globals
    locals() # a dict of locals
    )

    Seeing the 2nd argument of PyFrame_New() above? Remember that we have defined the 2nd argument type as P_MEM_TYPE, which is actually a pointer. So that passing greet.__code__ directly is invalid and we will get an error like the following one:

    1
    ctypes.ArgumentError: argument 2: <class 'TypeError'>: expected LP_c_ulong instance instead of code

    To meet the requirement defined in PyFrame_New.argtypes, we have to cast greet.__code__ into a C pointer. Luckily, in CPython, we can get memory address of a Python object through id(). After that, we just need to use ctypes.cast() to cast it into P_MEM_TYPE defined above.

  3. Nice! We are about to finish the function call.
    Like PyFrameObject, we are not able to create a PyThreadState object directly. Besides, a PyThreadState object usually relates to the interpreter you are using, rather than threads created by threading module. (further reading: Thread State and the Global Interpreter Lock)

    To access a PyThreadState object, it should be done through calling PyThreadState_Get(). Since it’s a part of C-API, we have to set argtypes and restype for it, too.

    According to the signature of it, it takes no argument and returns a pointer of PyThreadState.

    1
    2
    3
    PyThreadState *
    PyThreadState_Get(void)
    { /* ... */}

    As the same concept mentioned in previous step, this is the configuration:

    1
    2
    ctypes.pythonapi.PyThreadState_Get.argtypes = None
    ctypes.pythonapi.PyThreadState_Get.restype = P_MEM_TYPE

    Finally, the whole script for creating a frame object will be:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    import ctypes

    P_SIZE = ctypes.sizeof(ctypes.c_void_p)
    IS_X64 = P_SIZE == 8

    P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)

    ctypes.pythonapi.PyFrame_New.argtypes = (
    P_MEM_TYPE, # PyThreadState *tstate
    P_MEM_TYPE, # PyCodeObject *code
    ctypes.py_object, # PyObject *globals
    ctypes.py_object # PyObject *locals
    )
    ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*

    ctypes.pythonapi.PyThreadState_Get.argtypes = None
    ctypes.pythonapi.PyThreadState_Get.restype = P_MEM_TYPE

    def greet():
    print('hello')

    frame = ctypes.pythonapi.PyFrame_New(
    ctypes.pythonapi.PyThreadState_Get(), # thread state
    ctypes.cast(id(greet.__code__), P_MEM_TYPE), # a code object
    globals(), # a dict of globals
    locals() # a dict of locals
    )

Anything funny to do with this created frame?

Yeah! As the problem mentioned at the beginning, we can start playing with pdb right now.
And we will talk about that in the next article.