How to create a built-in `frame` object in Python?
Recently, I ran into a problem about using pdb
in bytefall
(a Python virtual machine implemented in Python). It’s not a bug-like problem, but just a curiousity-triggered one.
pdb
worked fine in bytefall
, but all internal execution flow will be revealed once pdb.set_trace()
is called in a user script. It might be annoying if users don’t want to get those information.
Then, a question comes to my mind:
Is it possible to make a switch to run pdb
with/without revealing the internal of bytefall
virtual machine?
During the developing of this feature, I found that the pyframe.Frame
object cannot be used as a duck-typed frame
object while using command ll
in pdb
. The error we got is: TypeError: module, class, method, function, traceback, frame, or code object was expected, got Frame
.
Quack, you should give me a frame
object
Here is the simplified traceback of that error:
1 | pdb.py::do_longlist |
As we know that we can make it pass the check of isinstance(obj, SomeType)
by making class of obj
inheriting SomeType
. e.g.
1 | class MyList(list): |
But we are not allowed to do the same thing for frame
.
1 | import types |
Why? After googling, I found a related post on stackoverflow talking about this exception. In short, Py_TPFLAGS_BASETYPE
is not set in the implementation of PyFrameObject
, thus it cannot be subclassed. We can see that in cpython/Objects/frameobject.c.
And here is the definition of that flag:
- Py_TPFLAGS_BASETYPE
This bit is set when the type can be used as the base type of another type. If this bit is clear, the type cannot be subtyped (similar to a “final” class in Java).
(further reading: PEP 253 – Subtyping Built-in Types, Python history - Metaclasses and extension classes (a.k.a “The Killer Joke”))
It not the time to give up yet
Though it’s a frustrating news, I started searching with keywords like “Python, create builtin object”. Then something interesting showed up: How to create a traceback object.
Said by the answerer @abarnert in that post:
The
PyTraceBack
type is not part of the public API. But (except for being defined in the Python directory instead of the Object directory) it’s built as a C API type, just not documented. … well, there’s noPyTraceBack_New
, but there is aPyTraceBack_Here
that constructs a new traceback and swaps it into the current exception info.
It reminded me of one thing I missed before: “If one thing is an object, then there (usually) should be a constructor.”
And, yeah, there is a function called PyFrame_New
.
Next, we need to figure out how to call PyFrame_New()
from Python side.
Since it’s a C function, we can try to access it through ctypes.pythonapi
. Roughly speaking, this is what we want to do:
1 | import ctypes |
Play with ctypes
There are a few things worth noting:
-
Before calling a
c_func
, itsargtypes
andrestype
should be given. -
According to the signature of
PyFrame_New
, there is a pointer ofPyThreadState
object should be given. However, it isn’t an object that we can access in Python directly. -
As @abarnert mentioned:
Also, both are CPython-specific, require not just using the C API layer but using undocumented types and functions that could change at any moment, and offer the potential for new and exciting opportunities to segfault your interpreter.
Compatibility and robustness of our implementation should be taken care of.
Let’s start doing this step by step (to avoid confusion and being affected by the changes among versions, here we are taking CPython 3.7 as the runtime):
-
According to point 1, we should rewrite the code above into this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18import ctypes
ctypes.pythonapi.PyFrame_New.argtypes = (
..., # PyThreadState*
..., # PyCodeObject*
..., # PyObject*
... # PyObject*
)
ctypes.pythonapi.PyFrame_New.restype = (
... # PyFrameObject*
)
frame = ctypes.pythonapi.PyFrame_New(
..., # thread state
..., # a code object
..., # a dict of globals
... # a dict of locals
)But there is a problem: “Except
ctypes.py_object
, there are no other types of Python object defined aspy_threadstate
,py_codeobject
andpy_frameobject
.”Typically, we have to define some classes inheriting
ctypes.Structure
with_fields_
in which all members of those internal types are defined. Then assign those classes toargtypes
andrestype
. TakePyThreadState
as an example, we have to deal with THESE THINGS.Ok, it sounds like a complicated work to do, but there is actually a shortcut for this. Let’s take a look at the signature of
PyFrame_New
again:1
2
3
4PyFrameObject*
PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
PyObject *globals, PyObject *locals)
{ /* ... */ }
From the aspect of C, what we have to do is passing pointers of objects to the function. Therefore, we can usectypes.POINTER(...)
as a type forPyThreadState*
,PyCodeObject*
. (reminder: we just need to usectypes.py_object
forPyObject*
)According to the documentation of
ctypes.POINTER(...)
, it takes a type defined inctypes
as argument. But what is the type of pointer we need to use?As we know that a pointer is a container storing memory address, what argument of
ctypes.POINTER(...)
takes depends on the architecture of your computer. That is, we should usectypes.c_ulong
for x64 andctypes.c_uint
for x86.By doing this, we are also increasing the compatibility of our implementation. And the progress of our implementation is shown as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25import ctypes
# Check whether we are on a x64 or x86 platform by checking the size of `void*`
# 8-byte for x64, 4-byte for x86
P_SIZE = ctypes.sizeof(ctypes.c_void_p)
IS_X64 = P_SIZE == 8
P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)
ctypes.pythonapi.PyFrame_New.argtypes = (
P_MEM_TYPE, # PyThreadState *tstate
P_MEM_TYPE, # PyCodeObject *code
ctypes.py_object, # PyObject *globals
ctypes.py_object # PyObject *locals
)
# We can use `ctypes.py_object` for this. Because we are going to
# manipulate it in Python instead of C.
ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*
frame = ctypes.pythonapi.PyFrame_New(
..., # thread state
..., # a code object
..., # a dict of globals
... # a dict of locals
) -
Now we are going to pass arguments to the function call
PyFrame_New()
.
To make it easier to be understood, here we define a simple functiongreet()
for setting 2nd argument up later, and directly useglobals()
andlocals()
as the 3rd and 4th argument respectively. As for the first argumenttstate
, we will talk about it in next step.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24import ctypes
P_SIZE = ctypes.sizeof(ctypes.c_void_p)
IS_X64 = P_SIZE == 8
P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)
ctypes.pythonapi.PyFrame_New.argtypes = (
P_MEM_TYPE, # PyThreadState *tstate
P_MEM_TYPE, # PyCodeObject *code
ctypes.py_object, # PyObject *globals
ctypes.py_object # PyObject *locals
)
ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*
# A simple function for demonstration
def greet():
print('hello')
frame = ctypes.pythonapi.PyFrame_New(
..., # thread state
ctypes.cast(id(greet.__code__), P_MEM_TYPE), # a code object
globals(), # a dict of globals
locals() # a dict of locals
)Seeing the 2nd argument of
PyFrame_New()
above? Remember that we have defined the 2nd argument type asP_MEM_TYPE
, which is actually a pointer. So that passinggreet.__code__
directly is invalid and we will get an error like the following one:1
ctypes.ArgumentError: argument 2: <class 'TypeError'>: expected LP_c_ulong instance instead of code
To meet the requirement defined in
PyFrame_New.argtypes
, we have to castgreet.__code__
into a C pointer. Luckily, in CPython, we can get memory address of a Python object throughid()
. After that, we just need to usectypes.cast()
to cast it intoP_MEM_TYPE
defined above. -
Nice! We are about to finish the function call.
LikePyFrameObject
, we are not able to create aPyThreadState
object directly. Besides, aPyThreadState
object usually relates to the interpreter you are using, rather than threads created bythreading
module. (further reading: Thread State and the Global Interpreter Lock)To access a
PyThreadState
object, it should be done through callingPyThreadState_Get()
. Since it’s a part of C-API, we have to setargtypes
andrestype
for it, too.According to the signature of it, it takes no argument and returns a pointer of
PyThreadState
.1
2
3PyThreadState *
PyThreadState_Get(void)
{ /* ... */}As the same concept mentioned in previous step, this is the configuration:
1
2ctypes.pythonapi.PyThreadState_Get.argtypes = None
ctypes.pythonapi.PyThreadState_Get.restype = P_MEM_TYPEFinally, the whole script for creating a
frame
object will be:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27import ctypes
P_SIZE = ctypes.sizeof(ctypes.c_void_p)
IS_X64 = P_SIZE == 8
P_MEM_TYPE = ctypes.POINTER(ctypes.c_ulong if IS_X64 else ctypes.c_uint)
ctypes.pythonapi.PyFrame_New.argtypes = (
P_MEM_TYPE, # PyThreadState *tstate
P_MEM_TYPE, # PyCodeObject *code
ctypes.py_object, # PyObject *globals
ctypes.py_object # PyObject *locals
)
ctypes.pythonapi.PyFrame_New.restype = ctypes.py_object # PyFrameObject*
ctypes.pythonapi.PyThreadState_Get.argtypes = None
ctypes.pythonapi.PyThreadState_Get.restype = P_MEM_TYPE
def greet():
print('hello')
frame = ctypes.pythonapi.PyFrame_New(
ctypes.pythonapi.PyThreadState_Get(), # thread state
ctypes.cast(id(greet.__code__), P_MEM_TYPE), # a code object
globals(), # a dict of globals
locals() # a dict of locals
)
Anything funny to do with this created frame
?
Yeah! As the problem mentioned at the beginning, we can start playing with pdb
right now.
And we will talk about that in the next article.