df_to_convergence_df#

post_processors.df_to_convergence_df(err_name='errors', time_name='timings', algorithm_name='algorithm', other_names=None, max_time=inf, groups=True, groups_names=None, filters=None)#

Convert a compact Pandas Dataframe with a column errors containing lists into a long format where these lists are unfolded. This is useful as a post-processor of @run_and_track to be able to easily plot convergence plots with plotly. For instance using @run_and_track to compute and store results,

>>> @run_and_track(n=[1,2],algorithm_names=["goodalg"]):
>>> myfun(n=1):
>>>     costs = [11]
>>>     time = [0]
>>>     for i in range(10):
>>>         # here a dummy cost function over 10 iterations
>>>         costs.append(10-i)
>>>         time.append(i)
>>>     return {"errors": costs, "timings": time}
>>> # the above code stores the results in a dataframe df
>>> df = np.load(...)
>>> # Now we want to plot errors vs time: we use df_to_convergence_df to convert df into a long format for this
>>> vars = ["n"]
>>> df_conv = df_to_convergence_df(df, other_names=vars, groups_names=vars)
>>> # Convergence plots can be easily done with plotly at this stage
>>> import plotly.express as px
>>> px.line(df,y="errors",x="timings", facet_col="n")

This function will strip all the columns of df which are not error, timings, seed or algorithm_name, and will add an iteration counter. If other columns should be kept, specify their name using the other_names option.

Parameters:
  • df_in (pandas dataframe) – A dataframe with lists in columns to be unfolded, typically generated by @run_and_track

  • err_name (str, optional) – specify a different name of the column in df containing the lists to unfold. Can be useful if you custumized the error names, or if several metrics have been stored in df. By default errors

  • time_name (str, optional) – specify a different name of the column in df containing the time lists to unfold. Can be useful if you custumized the timings names. By default timings

  • algorithm_name (str, optional) – name of the column containing the algorithm names, by default algorithm

  • other_names (list of str, optional) – a list containing the names of columns to keep in the unfolded dataframe, by default None

  • max_time (float, optional) – specify a maximum time value after which the dataframe will be truncated, by default np.Inf

  • groups (bool, optional) – When plotting line plots with plotly, if other_names have been provided, the lines may loop from end to beginning of the plot. This can be solved by providing a group entry in plotly plots. Groups contains seed by default if seed where provided in df. By default True

  • groups_names (list of str, optional) – the name of the columns of df to group, by default None

  • filters (dict, optional) – a dictionary with pairs of {column of df: value in that column}, to compute the convergence plot only for the rows in df where the values match the filters values. By default None

Returns:

df – A new dataframe in a long format with each row containing a single time, iteration, error value, seed and algorithm_name. Easy to use for plotting convergence plots with plotly.

Return type:

pandas dataframe