Useful helper functions from the fast.ai library for processing and visualising structured data.
draw_tree(t, df, size=10, ratio=0.6, precision=0)
Draws a representation of a random forest in IPython.
get_sample(df, n)
Gets a random sample of n rows from df, without replacement.
add_datepart(df, fldnames, drop=True, time=False, errors='raise')
add_datepart converts a column of df from a datetime64 to many columns containing
the information from the date. This applies changes inplace.
train_cats(df)
Change any columns of strings in a panda's dataframe to a column of
categorical values. This applies the changes inplace.
apply_cats(df, trn)
Changes any columns of strings in df into categorical variables using trn as
a template for the category codes.
fix_missing(df, col, name, na_dict)
Fill missing data in a column of df with the median, and add a {name}_na column
which specifies if the data was missing.
numericalize(df, col, name, max_n_cat)
Changes the column col from a categorical type to it's integer codes.
scale_vars(df, mapper)
Standardize numerical features by removing the mean and scaling to unit variance.
proc_df(df, y_fld=None, skip_flds=None, ignore_flds=None, do_scale=False, na_dict=None, preproc_fn=None, max_n_cat=None, subset=None, mapper=None)
proc_df takes a data frame df and splits off the response variable, and
changes the df into an entirely numeric dataframe. For each column of df
which is not in skip_flds nor in ignore_flds, na values are replaced by the
median value of the column.
rf_feat_importance(m, df)
Create a pandas.DataFrame of feature importances.