How to debug the project¶
To debug our research project, we often want to run a single file within the Waf
framework repeatedly or we even want to dive into the debugger if an error occurs.
Normally, this is not possible as Waf controls the execution and places the bld
or
src
on the PYTHONPATH
. Thus, if we execute a single file, an ImportError
is
raised since bld.project_paths
cannot be imported. Adding the paths manually seems a
little bit hacky and can be circumvented elegantly. In addition to that, even if we
insert a debug statement in the file and the code reaches this line, Waf hides the
prompt of the debugger from the user. Then, it will silently run forever as the debugger
is never closed.
Make bld
and src
importable¶
To place bld
and src
on PYTHONPATH
we turn the project into a python
package. This can be accomplished by placing a file called setup.py
in the root
directory of the project. This file is the entry point for every other Python package
you have ever installed with pip
. For our project, this file contains only necessary
information as we will never upload our research project on PyPi. Here is what the file
looks like:
from setuptools import setup
setup(
name="project_name",
packages=["bld", "src"]
)
That is all. name
is the name of the package which we can use to install or remove
the package. packages
lists directories which will be added to PYTHONPATH
.
To install the package, we do not use pip install .
as this will install the
package in its current form. Instead, we would like that the installed package changes
with our changes to the project. This can be done by making an editable install of the
package which registers our project as a moving target. For the editable install, go
into the root folder of the project where the setup.py
lies and type
$ pip install -e .
That is all. Now, you can run every single file withing the project.
Debugging¶
As an example, let’s say we have a file called src/data_management/create_dataset.py
with the following content:
from bld.project_paths import project_paths_join as ppj
def main():
df = pd.read_stata(ppj("IN_DATA", "example.dta"))
df.AGE = df.AGE.astype(int)
df.to_pickle(ppj("OUT_DATA", "example.pkl"))
if __name__ == "__main__":
main()
The file loads example.dta
, turns variable AGE
into an integer and saves the
file as a pickle object. Assume that running the program raises an error as the variable
AGE
is not defined in the data and is instead called ALTER
(german word for
age). Then, Waf aborts the execution and returns a more or less readable report of the
error which is in this case quite clear. How can we jump into the debugger to inspect
the state of the program?
The first method is to insert a debug statement before the error occurs. Starting with Python 3.7 this is even more simple.
...
def main():
df = pd.read_stata(ppj("IN_DATA", "example.dta"))
import pdb; pdb.set_trace() # For Python < 3.7
breakpoint() # For Python >= 3.7
df.AGE = df.AGE.astype(int)
...
Then, you can start to debug your program. For more information on how to use the Python
debugger pdb
visit this tutorial.
The second method to start the debugger is directly from the command line. Type
$ python -m pdb -c continue src/data_management/create_dataset.py
to enter the debugger if an exception occurs. If you leave out -c continue
you will
jump into the debugger right at the start.
Using a different debugger¶
The default debugger is not really visually appealing. Instead we can use ipdb which is the IPython debugger with tab-completion, syntax highlighting, etc.. Install it with
$ pip install ipdb
Then, use it with import ipdb; ipdb.set_trace()
or register it as the default
debugger for breakpoint()
by setting the environment variable
$ export PYTHONBREAKPOINT=ipdb.set_trace # Unix
$ $env:PYTHONBREAKPOINT="ipdb.set_trace" # Windows
Or just run the file with ipdb
by running
python -m ipdb -c continue src/data_management/create_dataset.py