Useful helper functions from the library for processing and visualising structured data.
, df
, size
, ratio
, precision
Draws a representation of a random forest in IPython.
, n
Gets a random sample of n rows from df, without replacement.
, fldnames
, drop
, time
, errors
add_datepart converts a column of df from a datetime64 to many columns containing
the information from the date. This applies changes inplace.
Change any columns of strings in a panda's dataframe to a column of
categorical values. This applies the changes inplace.
, trn
Changes any columns of strings in df into categorical variables using trn as
a template for the category codes.
, col
, name
, na_dict
Fill missing data in a column of df with the median, and add a {name}_na column
which specifies if the data was missing.
, col
, name
, max_n_cat
Changes the column col from a categorical type to it's integer codes.
, mapper
Standardize numerical features by removing the mean and scaling to unit variance.
, y_fld
, skip_flds
, ignore_flds
, do_scale
, na_dict
, preproc_fn
, max_n_cat
, subset
, mapper
proc_df takes a data frame df and splits off the response variable, and
changes the df into an entirely numeric dataframe. For each column of df
which is not in skip_flds nor in ignore_flds, na values are replaced by the
median value of the column.
, df
Create a pandas.DataFrame of feature importances.