callmonitor -- A Simple Tool to Monitor and Log Function Callspip install callmonitoror clone this repo and:
python setup.py installIt's simple to use, just decorate any function with the @intercept decorator.
Eg:
from callmonitor import intercept
@intercept
def test_fn_2(x, y=2, z=3):
passThis will save the inputs (args, kwargs and argspec) along with a call
database (callmonitor.DB) to: call-monitor/test_fn_2/<invocation count>.
callmonitor Doesn't Overwrite OutputIf the call-monitor folder already exists (eg. a previous run), then a new
folder call-monitor-1, or call-monitor-2, and so on, is created. See the
sections on Data Structure for more details on how this data is saved.
To avoid different processes from writing to the same location, callmonitor
appends -tid=<N> to the root (call-monitor) folder. Currently callmonitor
supports mpi4py out of the box: if mpi4py.MPI.COMM_WORLD.Get_rank() > 1,
callmonitor automatically assumes that it's running im multi-threaded mode
and appends -tid=<Get_rank()> to the output. If your programm is
multi-threaded with another framwork (eg. concurrent.Futures) then you need
to tell callmonitor your thread ID using callmonitor.Settings:
from callmonitor import Settings
settings = Settings()
settings.enable_multi_threading(THREAD_ID)before the first invocation of intercept (the database is created on disk
when it is first needed, it is at that point when callmonitor.Settings is
read. Any changes made to callmonitor.Settings afterwards will only take
effect if the database is recreated -- using callmonitor.CONTEXT.new).
HandlersSometimes pickle just won't cut it in terms of saving function inputs -- eg.
when we need to save our own fancy data types. callmonitor provides a way of
building your down argument handlers and registering to the global
callmonitor.REGISTRY. The registry is queried every time function inputs are
processed, so if you build your own ArgHandler and add them usingg
callmonitor.REGISTRY.add, it will process any arguments of the associated
datatype from that point forward. Eg, numpy provides its own save/load
functions. We have already build (and registered) a numpy arggument handler
like so:
import numpy as np
from os.path import join
from callmonitor import Handler, REGISTRY
class NPHandler(Handler):
def load(self, path):
self.data = np.load(join(path, f"arg_{self.target}.npy"))
def save(self, path):
np.save(join(path, f"arg_{self.target}.npy"), self.data)
@classmethod
def accumulator_done(cls):
pass
REGISTRY.add(np.ndarray, NPHandler)(remember that callmonitor.REGISTRY.add needs to be called before the
first invocation of @intercept that needs this particular Handler). A
custom handler needs to inherit the callmonitor.Handler class and define
save, load, and accumulator_done (the last one being a @classmethod).
callmonitor.load(<path>) will load a database at <path> (see section on
Data Structure). Eg:
from callmonitor import load
db = load("call-monitor")We can now get individual function calls data from the database using DB.get:
args, kwargs = db.get("function_name", invocation_count)(which will also automatically load .npy files and any custom handlers --
remember to register these in callmonitor.REGISTRY before executing db.get)
Remember: invocation_count starts at 1. Therefore to access the first call to test_np_1, run:
In [4]: db.get("test_np_1", 1)
Out[4]: ([10, array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])], {})callmonitorWe try to enable top-level summaries of the following user-facing classes:
REGISTRYDBDB.get_args, and Args
via the __str__ and __repr__ functions. Eg, callmonitor.REGISTRY shows
which datatype/handler pairs are configured:In [2]: callmonitor.REGISTRY
Out[2]:
{
<class 'numpy.ndarray'>: <class 'callmonitor.handler.NPHandler'>
}Likewise the DB object displays a summary of functions called and how often.
In [3]: db = callmonitor.load("call-monitor")
In [4]: db
Out[4]:
{
Locked: True
test_np_1: {
calls: 2
args: ['x', 'n']
defaults: None
}
}Args ContainerPicking apart args, kwargs, and argspec.defaults can be very tedious --
especially if you're trying to find out the value of a specific argument. Hence
callmonitor.DB provides an additionl getter -- get_args which returns an
Args object. callmonitor.Args are container classes that store each input
argument by name as an attributed. Eg:
In [3]: args = db.get_args("test_fn_1", 1)
In [4]: args
Out[4]: dict_keys(['x', 'y', 'z'])
In [5]: args.x
Out[5]: 1Note: the callmonitor.Args constructor will fill in any arguments not in
args and kwargs from the FullArgSpec defaults. If you just want to
recreate the original function call the args and kwargs returned by
callmonitor.DB.get should be enough.
While not technically a database, let's call the directories generated by
callmonitor a database for the lack of a better term. Each database consists
of a db.pkl file (containing metadata), as well as folders for each function
(each function call is enumerated). Eg:
call-monitor
├── db.pkl
├── test_fn_1
│ ├── 1
│ │ └── input_descriptor.pkl
│ └── 2
│ └── input_descriptor.pkl
└── test_fn_2
└── 1
└── input_descriptor.pkl
Special attention is given to numpy inputs -- these are called
arg_<label>.npy, where <label> is either the index of the input argument,
or the kw for kwargs. Eg:
call-monitor
├── db.pkl
└── test_np_1
├── 1
│ ├── arg_1.npy
│ └── input_descriptor.pkl
└── 2
├── arg_n.npy
└── input_descriptor.pkl
Full consideration was given to saving all call data in a single data structure -- maybe even a real database ;) -- but to do this efficiently at scale is not easy, and might make this package cumbersome. Future versions will include the ability to fuse multiple small function calls into a single accumulator object to avoid large numbers of small files.
Version 0.3.0 brigns many enhancements to callmonitor. We therefore could no
longer enable native backward compatibility. A tool that can convert an version
0.2.0 database to a version 0.3.0 (or later) is currently being developed.
Versions pre-dating 0.2.0 are no longer supported.