def find_mean(y, method = None, p = 0):
"""
Quick function to find the mean or trimmed mean
Assumes we have a list with only numeric type data
If method is set to Trim, will remove outer most p values off the data
before finding the mean
p should be a number between 0 and 0.5
"""
if method == "Trim":
sort_y = sorted(y)
to_remove = floor(p*len(sort_y))
y = sort_y[to_remove:(len(sort_y)-to_remove)]
return sum(y)/len(y)Function Annotations & Decorators
Justin Post
There are a few more extremely useful techniques we can apply when creating or running our functions. We’ll cover two of those here:
- Function Annotations: Help users by improving messages or describing the types of inputs and outputs we should use/expect
- Function Decorators: Add extra behavior to a function without modifying the function’s source code
Note: These types of webpages are built from Jupyter notebooks (.ipynb files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!
Annotations
We can describe two major types of annotations:
- parameter (input) annotations
- return value annotations
Let’s start by discussing how we might use parameter annotations to improve usability of our functions.
Parameter Annotation
Consider the basic function below that finds a trimmed mean from a list of values we created a while back (of course we’d prefer to use numpy arrays now that we know them, but let’s just consider this function for now).
Providing annotations for our parameters (y, method, and p) can help the user to understand what our function expects those inputs to be. For instance, we can state that y should be a list of numeric values, method should be a string, and p should be a numeric value. This augments the use of the docstring.
Parameter annotations take the form of optional expressions that follow the parameter name:
def foo(a: expression, b: expression = 5):
So we want to put the expression prior to any default values. Let’s put some annotations in our function definition.
list[float]implies that the first argument should be a list containing floats (integers work too)None | strimply the second argument should be the special valueNoneor a stringfloatspecifies thatpshould be a float
Note: The only one of these I am able to get working with Colab is the third one. These others work with mypy though. If you are interested in that check here or stop by office hours and we can chat!
from math import floor
def find_mean(y: list[float], method: None | str = None, p: float = 0):
"""
Quick function to find the mean or trimmed mean
Assumes we have a list with only numeric type data
If method is set to Trim, will remove outer most p values off the data
before finding the mean
p should be a number between 0 and 0.5
"""
if method == "Trim":
sort_y = sorted(y)
to_remove = floor(p*len(sort_y))
y = sort_y[to_remove:(len(sort_y)-to_remove)]
return sum(y)/len(y)This doesn’t actually change how the code executes or anything like that.
find_mean([1, 3, 10, 21, 500], method = None)107.0
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = 0.2)11.333333333333334
What it does is give us an alternative way to do type checking. We have an additional __annotations__ attribute on our function. This is a mutable dictionary!
find_mean.__annotations__{'y': list[float], 'method': None | str, 'p': float}
In Colab, we can enable type checking when we run functions by going to - Tools -> Settings -> Editor - Scrolling down to Code diagnostics, select Syntax and Type Checking
Now when we run code that has annotations, some of them are checked first! Again, I can only get the third one to work in Colab…
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = 0.2) #works fine11.333333333333334
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = '20%') #p should be a float!TypeError: must be real number, not str
We can see the error notes that the input shouldn’t be a str but a real number.
find_mean(y = "cat") #should throw a different TypeError than we see here... alasTypeError: unsupported operand type(s) for +: 'int' and 'str'
This does throw an error but not the one it should. This works with other methods for type checking (say with mypy) but not with Colab.
Return Value Annotation
We can also note the type of the value to be returned. The syntax for this is
def foo() -> expression:
For our find_mean() function, we can specify that a float should be returned.
from math import floor
def find_mean(y: list[float], method: None | str = None, p: float = 0) -> float:
"""
Quick function to find the mean or trimmed mean
Assumes we have a list with only numeric type data
If method is set to Trim, will remove outer most p values off the data
before finding the mean
p should be a number between 0 and 0.5
"""
if method == "Trim":
sort_y = sorted(y)
to_remove = floor(p*len(sort_y))
y = sort_y[to_remove:(len(sort_y)-to_remove)]
return sum(y)/len(y)Again, this doesn’t change how the code executes but is useful to have when understanding what a function should do. When we start looking at the syntax of pyspark, we’ll see these types of hints and they can really help to understand how to use things!
Decorators
Decorators take in a function, add some functionality to it, and then outputs a modified function. In this process, the code for the original function isn’t changed. However, we are then able to call the function with the increased functionality. Decorators are pretty general and the same decorator can be used with multiple functions to give all of those functions some functionality.
Let’s see a basic example! We could add a functionality to our functions that tells us the time it takes to run the function.
To do this we import the wraps object (which just helps keep the attributes of our wrapper function in check) from functools along with the time module so we have access to the relevant functions for timing our function execution.
We start by defining our new ‘decorator’ function (timeit()) and then create an inner function to do our bidding.
import time
from functools import wraps
def timeit(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
end = time.perf_counter()
elapsed = end - start
print(f"{func.__name__} ran in {elapsed:.6f} seconds")
return result
return wrapperNow we have a function that takes in another function, sets a start time of execution, runs the function, sets the end time of execution, and prints out the elapsed time the function ran. It then returns the results of the function call.
So essentially, we are just gaining additional functionality if we utilize this!
Let’s apply it to our finding a mean function. We do so using the special @ notation you see below.
@timeit
def find_mean(y: list[float], method: None | str = None, p: float = 0) -> float:
"""
Quick function to find the mean or trimmed mean
Assumes we have a list with only numeric type data
If method is set to Trim, will remove outer most p values off the data
before finding the mean
p should be a number between 0 and 0.5
"""
if method == "Trim":
sort_y = sorted(y)
to_remove = floor(p*len(sort_y))
y = sort_y[to_remove:(len(sort_y)-to_remove)]
return sum(y)/len(y)Now when we call the function it will also print out the time it took to run!
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = 0.2)find_mean ran in 0.000010 seconds
11.333333333333334
find_mean([i*3 for i in range(10000)], method = 'Trim', p = 0.2)find_mean ran in 0.000138 seconds
14998.5
The @ notation is equivalent to writing
find_mean = timeit(find_mean)
The sort of beauty of this is that we can easily apply this to other functions as well - adding the functionality there too! The use of *args* and **kwargs in our inner function definition allows this to apply to functions with any number of arguments!
Suppose we wanted to apply this to our print key/value pairs function we made a while back. We just add @timeit prior to defining the function.
@timeit
def print_key_value_pairs(**kwargs):
"""
key word args can be anything
"""
print(type(kwargs), kwargs)
for x in kwargs:
print(x + " : " + str(kwargs.get(x))) #cast the value to a string for printingprint_key_value_pairs(name = "Justin",
job = "Professor",
phone = 9195150637)<class 'dict'> {'name': 'Justin', 'job': 'Professor', 'phone': 9195150637}
name : Justin
job : Professor
phone : 9195150637
print_key_value_pairs ran in 0.000070 seconds
Nice! Now if we want to apply this to an already existing function, it takes a little extra effort.
Suppose we want to apply this to the random.default_rng() function from numpy. Let’s set a seed and create the augmented version of the function.
import numpy as np
#set a seed for the random number generator.
rng = np.random.default_rng(10)
@timeit
def timed_random(*args, **kwargs):
return rng.random(*args, **kwargs)
timed_random(10)timed_random ran in 0.000020 seconds
array([0.95600171, 0.20768181, 0.82844489, 0.14928212, 0.51280462,
0.1359196 , 0.68903648, 0.84174772, 0.425509 , 0.956926 ])
Recap
Function annotations allow us to describe inputs and outputs more clearly. There are ways to enforce the types specified and, eventually, this should become an option that can be done in python without other packages or things needed.
Function decorators allow us to modify the behavior of a function while not changing the source code.
We’ll end up seeing these when we get into Spark here and there so they’re useful to know about. For instance, if we check out some source code for the pyspark.pipelines.api, we’d see some of the following (excerpts below):
from typing import Callable, Dict, List, Optional, Union, overload
@overload
def table(
*,
query_function: None = None,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
) -> Callable[[QueryFunction], None]:
...
We can see the use of annotations to help us understand what the inputs and outputs for this functions should be. A decorator is used and adds whatever functionality overload gives!
If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!
If you are on Google Colab, head back to our course website for our next lesson!