Package bar_chart_race
Expand source code
from ._make_chart import bar_chart_race, load_dataset, prepare_wide_data, prepare_long_data
__version__ = "0.0.7"
all = ['bar_chart_race', 'load_dataset', 'prepare_wide_data', 'prepare_long_data']
Functions
def bar_chart_race(df, filename=None, orientation='h', sort='desc', n_bars=None, fixed_order=False, fixed_max=False, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=0.95, period_label=True, period_fmt=None, period_summary_func=None, perpendicular_bar_func=None, figsize=(6, 3.5), cmap='dark24', title=None, title_size=None, bar_label_size=7, tick_label_size=7, shared_fontdict=None, scale='linear', writer=None, fig=None, dpi=144, bar_kwargs=None, filter_column_colors=False)
-
Create an animated bar chart race using matplotlib. Data must be in 'wide' format where each row represents a single time period and each column represents a distinct category. Optionally, the index can label the time period.
Bar height and location change linearly from one time period to the next.
If no
filename
is given, an HTML string is returned, otherwise the animation is saved to disk.You must have ffmpeg installed on your machine to save files to disk. Get ffmpeg here: https://www.ffmpeg.org/download.html
To save .gif files you'll need to install ImageMagick.
This is resource intensive - Start with just a few rows of data to test.
Parameters
df
:pandas DataFrame
- Must be a 'wide' DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period. The index can be of any type.
filename
:None
orstr
, defaultNone
- If
None
return animation as an HTML5 string. If a string, save animation to that filename location. Use .mp4, .gif, .html, .mpeg, .mov and any other extensions supported by ffmpeg or ImageMagick. orientation
:'h'
or'v'
, default'h'
- Bar orientation - horizontal or vertical
sort
:'desc'
or'asc'
, default'desc'
- Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars
:int
, defaultNone
- Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the edge of the axes.
fixed_order
:bool
orlist
, defaultFalse
- When
False
, bar order changes every time period to correspond withsort
. WhenTrue
, bars remained fixed according to their final value corresponding withsort
. Otherwise, provide a list of the exact order of the categories for the entire duration. fixed_max
:bool
, defaultFalse
-
Whether to fix the maximum value of the axis containing the values. When
False
, the axis for the values will have its maximum (xlim/ylim) just after the largest bar of the current time period. The axis maximum will change along with the data.When True, the maximum axis value will remain constant for the duration of the animation. For example, in a horizontal bar chart, if the largest bar has a value of 100 for the first time period and 10,000 for the last time period. The xlim maximum will be 10,000 for each frame.
steps_per_period
:int
, default10
- The number of steps to go from one time period to the next. The bars will grow linearly between each period.
period_length
:int
, default500
- Number of milliseconds to animate each period (row). Default is 500ms (half of a second)
interpolate_period
:bool
, defaultFalse
- Whether to interpolate the period. Only valid for datetime or
numeric indexes. When set to
True
, for example, the two consecutive periods 2020-03-29 and 2020-03-30 withsteps_per_period
set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 label_bars
:bool
, defaultTrue
- Whether to label the bars with their value on their right
bar_size
:float
, default.95
- Height/width of bars for horizontal/vertical bar charts. Use a number between 0 and 1 Represents the fraction of space that each bar takes up. When equal to 1, no gap remains between the bars.
period_label
:bool
ordict
, defaultTrue
-
If
True
or dict, use the index as a large text label on the axes whose value changesUse a dictionary to supply the exact position of the period along with any valid parameters of the matplotlib
text
method. At a minimum, you must supply both 'x' and 'y' in axes coordinatesExample: { 'x': .99, 'y': .8, 'ha': 'right', 'va': 'center' }
If
False
- don't place label on axes period_fmt
:str
, defaultNone
-
Either a string with date directives or a new-style (Python 3.6+) formatted string
For a string with a date directive, find the complete list here https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
Example of string with date directives '%B %d, %Y' Will change 2020/03/29 to March 29, 2020
For new-style formatted string. Use curly braces and the variable
x
, which will be passed the current period's index value. Example: 'Period {x:10.2f}'Date directives will only be used for datetime indexes.
period_summary_func
:function
, defaultNone
-
Custom text added to the axes each period. Create a user-defined function that accepts two pandas Series of the current time period's values and ranks. It must return a dictionary containing at a minimum the keys "x", "y", and "s" which will be passed to the matplotlib
text
method.Example: def func(values, ranks): total = values.sum() s = f'Worldwide deaths: {total}' return {'x': .85, 'y': .2, 's': s, 'ha': 'right', 'size': 11}
perpendicular_bar_func
:function
orstr
, defaultNone
-
Creates a single bar perpendicular to the main bars that spans the length of the axis.
Use either a string that the DataFrame
agg
method understands or a user-defined function.DataFrame strings - 'mean', 'median', 'max', 'min', etc..
The function is passed two pandas Series of the current time period's data and ranks. It must return a single value.
def func(values, ranks): return values.quantile(.75)
figsize
:two-item tuple
ofnumbers
, default(6, 3.5)
- matplotlib figure size in inches. Will be overridden if figure
supplied to
fig
. cmap
:str, matplotlib colormap instance,
orlist
ofcolors
, default'dark24'
- Colors to be used for the bars. All matplotlib and plotly colormaps are available by string name. Colors will repeat if there are more bars than colors.
title
:str
, defaultNone
- Title of plot
title_size
:number
orstr
, defaultplt.rcParams['axes.titlesize']
- Size in points of title or relative size str. See Font Help below.
bar_label_size
:number
orstr
, default7
- Size in points or relative size str of numeric labels just outside of the bars. See Font Help below.
tick_label_size
:number
orstr
, default7
- Size in points of tick labels. See Font Help below. See Font Help below
shared_fontdict
:dict
, defaultNone
-
Dictionary of font properties shared across the tick labels, bar labels, period labels, and title. The only property not shared is
size
. It will be ignored if you try to set it.Possible keys are: 'family', 'weight', 'color', 'style', 'stretch', 'weight', 'variant' Here is an example dictionary: { 'family' : 'Helvetica', 'weight' : 'bold', 'color' : 'rebeccapurple' }
scale
:'linear'
or'log'
, default'linear'
- Type of scaling to use for the axis containing the values
writer
:str
ormatplotlib Writer instance
-
This argument is passed to the matplotlib FuncAnimation.save method.
By default, the writer will be 'ffmpeg' unless creating a gif, then it will be 'imagemagick', or an html file, then it will be 'html'.
Find all of the availabe Writers:
from matplotlib import animation animation.writers.list()
You must have ffmpeg or ImageMagick installed in order
fig
:matplotlib Figure
, defaultNone
- For greater control over the aesthetics, supply your own figure.
dpi
:int
, default144
- Dots per Inch of the matplotlib figure
bar_kwargs
:dict
, defaultNone</code> (alpha=.8)
- Other keyword arguments (within a dictionary) forwarded to the
matplotlib
barh
/bar
function. If no value for 'alpha' is given, then it is set to .8 by default. Some examples:ec
- edgecolor - color of edge of bar. Default is 'white'lw
- width of edge in points. Default is 1.5alpha
- opacity of bars, 0 to 1 filter_column_colors
:bool
, defaultFalse
-
When setting n_bars, it's possible that some columns never appear in the animation. Regardless, all columns get assigned a color by default.
For instance, suppose you have 100 columns in your DataFrame, set n_bars to 10, and 15 different columns make at least one appearance in the animation. Even if your colormap has at least 15 colors, it's possible that many bars will be the same color, since each of the 100 columns is assigned of the colormaps colors.
Setting this to
True
will map your colormap to just those columns that make an appearance in the animation, helping avoid duplication of colors.Setting this to
True
will also have the (possibly unintended) consequence of changing the colors of each color every time a new integer for n_bars is used.EXPERIMENTAL This parameter is experimental and may be changed/removed in a later version.
Returns
When
filename
is left asNone
, an HTML5 video is returned as a string. Otherwise, a file of the animation is saved andNone
is returned.Notes
It is possible for some bars to be out of order momentarily during a transition since both height and location change linearly and not directly with respect to their current value. This keeps all the transitions identical.
Examples
Use the
load_data
function to get an example dataset to create an animation.df = bcr.load_dataset('covid19') bcr.bar_chart_race( df=df, filename='covid19_horiz_desc.mp4', orientation='h', sort='desc', n_bars=8, fixed_order=False, fixed_max=True, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=.95, period_label={'x': .99, 'y': .8, 'ha': 'right', 'va': 'center'}, period_fmt='%B %d, %Y', period_summary_func=lambda v, r: {'x': .85, 'y': .2, 's': f'Total deaths: {v.sum()}', 'ha': 'right', 'size': 11}, perpendicular_bar_func='median', figsize=(5, 3), dpi=144, cmap='dark24', title='COVID-19 Deaths by Country', title_size='', bar_label_size=7, tick_label_size=7, shared_fontdict={'family' : 'Helvetica', 'weight' : 'bold', 'color' : '.1'}, scale='linear', writer=None, fig=None, bar_kwargs={'alpha': .7}, filter_column_colors=False)
Font Help
Font size can also be a string - 'xx-small', 'x-small', 'small',
'medium', 'large', 'x-large', 'xx-large', 'smaller', 'larger' These sizes are relative to plt.rcParams['font.size'].Expand source code
def bar_chart_race(df, filename=None, orientation='h', sort='desc', n_bars=None, fixed_order=False, fixed_max=False, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=.95, period_label=True, period_fmt=None, period_summary_func=None, perpendicular_bar_func=None, figsize=(6, 3.5), cmap='dark24', title=None, title_size=None, bar_label_size=7, tick_label_size=7, shared_fontdict=None, scale='linear', writer=None, fig=None, dpi=144, bar_kwargs=None, filter_column_colors=False): ''' Create an animated bar chart race using matplotlib. Data must be in 'wide' format where each row represents a single time period and each column represents a distinct category. Optionally, the index can label the time period.
Bar height and location change linearly from one time period to the next. If no `filename` is given, an HTML string is returned, otherwise the animation is saved to disk. You must have ffmpeg installed on your machine to save files to disk. Get ffmpeg here: https://www.ffmpeg.org/download.html To save .gif files you'll need to install ImageMagick. This is resource intensive - Start with just a few rows of data to test. Parameters ---------- df : pandas DataFrame Must be a 'wide' DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period. The index can be of any type. filename : `None` or str, default None If `None` return animation as an HTML5 string. If a string, save animation to that filename location. Use .mp4, .gif, .html, .mpeg, .mov and any other extensions supported by ffmpeg or ImageMagick. orientation : 'h' or 'v', default 'h' Bar orientation - horizontal or vertical sort : 'desc' or 'asc', default 'desc' Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom. n_bars : int, default None Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the edge of the axes. fixed_order : bool or list, default False When `False`, bar order changes every time period to correspond with `sort`. When `True`, bars remained fixed according to their final value corresponding with `sort`. Otherwise, provide a list of the exact order of the categories for the entire duration. fixed_max : bool, default False Whether to fix the maximum value of the axis containing the values. When `False`, the axis for the values will have its maximum (xlim/ylim) just after the largest bar of the current time period. The axis maximum will change along with the data. When True, the maximum axis value will remain constant for the duration of the animation. For example, in a horizontal bar chart, if the largest bar has a value of 100 for the first time period and 10,000 for the last time period. The xlim maximum will be 10,000 for each frame. steps_per_period : int, default 10 The number of steps to go from one time period to the next. The bars will grow linearly between each period. period_length : int, default 500 Number of milliseconds to animate each period (row). Default is 500ms (half of a second) interpolate_period : bool, default `False` Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to `True`, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with `steps_per_period` set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 label_bars : bool, default `True` Whether to label the bars with their value on their right bar_size : float, default .95 Height/width of bars for horizontal/vertical bar charts. Use a number between 0 and 1 Represents the fraction of space that each bar takes up. When equal to 1, no gap remains between the bars. period_label : bool or dict, default `True` If `True` or dict, use the index as a large text label on the axes whose value changes Use a dictionary to supply the exact position of the period along with any valid parameters of the matplotlib `text` method. At a minimum, you must supply both 'x' and 'y' in axes coordinates Example: { 'x': .99, 'y': .8, 'ha': 'right', 'va': 'center' } If `False` - don't place label on axes period_fmt : str, default `None` Either a string with date directives or a new-style (Python 3.6+) formatted string For a string with a date directive, find the complete list here https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes Example of string with date directives '%B %d, %Y' Will change 2020/03/29 to March 29, 2020 For new-style formatted string. Use curly braces and the variable `x`, which will be passed the current period's index value. Example: 'Period {x:10.2f}' Date directives will only be used for datetime indexes. period_summary_func : function, default None Custom text added to the axes each period. Create a user-defined function that accepts two pandas Series of the current time period's values and ranks. It must return a dictionary containing at a minimum the keys "x", "y", and "s" which will be passed to the matplotlib `text` method. Example: def func(values, ranks): total = values.sum() s = f'Worldwide deaths: {total}' return {'x': .85, 'y': .2, 's': s, 'ha': 'right', 'size': 11} perpendicular_bar_func : function or str, default None Creates a single bar perpendicular to the main bars that spans the length of the axis. Use either a string that the DataFrame `agg` method understands or a user-defined function. DataFrame strings - 'mean', 'median', 'max', 'min', etc.. The function is passed two pandas Series of the current time period's data and ranks. It must return a single value. def func(values, ranks): return values.quantile(.75) figsize : two-item tuple of numbers, default (6, 3.5) matplotlib figure size in inches. Will be overridden if figure supplied to `fig`. cmap : str, matplotlib colormap instance, or list of colors, default 'dark24' Colors to be used for the bars. All matplotlib and plotly colormaps are available by string name. Colors will repeat if there are more bars than colors. title : str, default None Title of plot title_size : number or str, default plt.rcParams['axes.titlesize'] Size in points of title or relative size str. See Font Help below. bar_label_size : number or str, default 7 Size in points or relative size str of numeric labels just outside of the bars. See Font Help below. tick_label_size : number or str, default 7 Size in points of tick labels. See Font Help below. See Font Help below shared_fontdict : dict, default None Dictionary of font properties shared across the tick labels, bar labels, period labels, and title. The only property not shared is `size`. It will be ignored if you try to set it. Possible keys are: 'family', 'weight', 'color', 'style', 'stretch', 'weight', 'variant' Here is an example dictionary: { 'family' : 'Helvetica', 'weight' : 'bold', 'color' : 'rebeccapurple' } scale : 'linear' or 'log', default 'linear' Type of scaling to use for the axis containing the values writer : str or matplotlib Writer instance This argument is passed to the matplotlib FuncAnimation.save method. By default, the writer will be 'ffmpeg' unless creating a gif, then it will be 'imagemagick', or an html file, then it will be 'html'. Find all of the availabe Writers: >>> from matplotlib import animation >>> animation.writers.list() You must have ffmpeg or ImageMagick installed in order fig : matplotlib Figure, default None For greater control over the aesthetics, supply your own figure. dpi : int, default 144 Dots per Inch of the matplotlib figure bar_kwargs : dict, default `None` (alpha=.8) Other keyword arguments (within a dictionary) forwarded to the matplotlib `barh`/`bar` function. If no value for 'alpha' is given, then it is set to .8 by default. Some examples: `ec` - edgecolor - color of edge of bar. Default is 'white' `lw` - width of edge in points. Default is 1.5 `alpha` - opacity of bars, 0 to 1 filter_column_colors : bool, default `False` When setting n_bars, it's possible that some columns never appear in the animation. Regardless, all columns get assigned a color by default. For instance, suppose you have 100 columns in your DataFrame, set n_bars to 10, and 15 different columns make at least one appearance in the animation. Even if your colormap has at least 15 colors, it's possible that many bars will be the same color, since each of the 100 columns is assigned of the colormaps colors. Setting this to `True` will map your colormap to just those columns that make an appearance in the animation, helping avoid duplication of colors. Setting this to `True` will also have the (possibly unintended) consequence of changing the colors of each color every time a new integer for n_bars is used. EXPERIMENTAL This parameter is experimental and may be changed/removed in a later version. Returns ------- When `filename` is left as `None`, an HTML5 video is returned as a string. Otherwise, a file of the animation is saved and `None` is returned. Notes ----- It is possible for some bars to be out of order momentarily during a transition since both height and location change linearly and not directly with respect to their current value. This keeps all the transitions identical. Examples -------- Use the `load_data` function to get an example dataset to create an animation. df = bcr.load_dataset('covid19') bcr.bar_chart_race( df=df, filename='covid19_horiz_desc.mp4', orientation='h', sort='desc', n_bars=8, fixed_order=False, fixed_max=True, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=.95, period_label={'x': .99, 'y': .8, 'ha': 'right', 'va': 'center'}, period_fmt='%B %d, %Y', period_summary_func=lambda v, r: {'x': .85, 'y': .2, 's': f'Total deaths: {v.sum()}', 'ha': 'right', 'size': 11}, perpendicular_bar_func='median', figsize=(5, 3), dpi=144, cmap='dark24', title='COVID-19 Deaths by Country', title_size='', bar_label_size=7, tick_label_size=7, shared_fontdict={'family' : 'Helvetica', 'weight' : 'bold', 'color' : '.1'}, scale='linear', writer=None, fig=None, bar_kwargs={'alpha': .7}, filter_column_colors=False) Font Help --------- Font size can also be a string - 'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large', 'smaller', 'larger' These sizes are relative to plt.rcParams['font.size']. ''' bcr = _BarChartRace(df, filename, orientation, sort, n_bars, fixed_order, fixed_max, steps_per_period, period_length, interpolate_period, label_bars, bar_size, period_label, period_fmt, period_summary_func, perpendicular_bar_func, figsize, cmap, title, title_size, bar_label_size, tick_label_size, shared_fontdict, scale, writer, fig, dpi, bar_kwargs, filter_column_colors) return bcr.make_animation()</code></pre>
def load_dataset(name='covid19')
-
Return a pandas DataFrame suitable for immediate use in
bar_chart_race()
. Must be connected to the internetParameters
name
:str
, default'covid19'
- Name of dataset to load. Either 'covid19' or 'urban_pop'
Returns
pandas DataFrame
Expand source code
def load_dataset(name='covid19'): ''' Return a pandas DataFrame suitable for immediate use in
bar_chart_race
. Must be connected to the internetParameters ---------- name : str, default 'covid19' Name of dataset to load. Either 'covid19' or 'urban_pop' Returns ------- pandas DataFrame ''' url = f'https://raw.githubusercontent.com/dexplo/bar_chart_race/master/data/{name}.csv' index_dict = {'covid19_tutorial': 'date', 'covid19': 'date', 'urban_pop': 'year'} index_col = index_dict[name] return pd.read_csv(url, index_col=index_col, parse_dates=[index_col])</code></pre>
def prepare_long_data(df, index, columns, values, aggfunc='sum', orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True)
-
Prepares 'long' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks
You (currently) cannot pass long data to
bar_chart_race()
directly. Use this function to create your wide data first before passing it tobar_chart_race()
.Parameters
df
:pandas DataFrame
-
Must be a 'long' pandas DataFrame where one column contains the period, another the categories, and the third the values of each category for each period.
This DataFrame will be passed to the
pivot_table
method to turn it into a wide DataFrame. It will then be passed to theprepare_wide_data()
function. index
:str
- Name of column used for the time period. It will be placed in the index
columns
:str
- Name of column containing the categories for each time period. This column will get pivoted so that each unique value is a column.
values
:str
- Name of column holding the values for each time period of each category. This column will become the values of the resulting DataFrame
aggfunc
:str
oraggregation function
, default'sum'
- String name of aggregation function ('sum', 'min', 'mean', 'max, etc…) or actual function (np.sum, np.min, etc…). Categories that have multiple values for the same time period must be aggregated for the animation to work.
orientation
:'h'
or'v'
, default'h'
- Bar orientation - horizontal or vertical
sort
:'desc'
or'asc'
, default'desc'
- Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars
:int
, defaultNone
- Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top.
interpolate_period
:bool
, defaultFalse
- Whether to interpolate the period. Only valid for datetime or
numeric indexes. When set to
True
, for example, the two consecutive periods 2020-03-29 and 2020-03-30 withsteps_per_period
set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 steps_per_period
:int
, default10
- The number of steps to go from one time period to the next. The bars will grow linearly between each period.
compute_ranks
:bool
, defaultTrue
- When
True
return both the interpolated values and ranks DataFrames Otherwise just return the values
Returns
A tuple
ofDataFrames. The first is the interpolated values and the second
is the interpolated ranks.
Examples
df_values, df_ranks = bcr.prepare_long_data(df) bcr.bar_chart_race(df_values, steps_per_period=1, period_length=50)
Expand source code
def prepare_long_data(df, index, columns, values, aggfunc='sum', orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True): ''' Prepares 'long' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks
You (currently) cannot pass long data to `bar_chart_race` directly. Use this function to create your wide data first before passing it to `bar_chart_race`. Parameters ---------- df : pandas DataFrame Must be a 'long' pandas DataFrame where one column contains the period, another the categories, and the third the values of each category for each period. This DataFrame will be passed to the `pivot_table` method to turn it into a wide DataFrame. It will then be passed to the `prepare_wide_data` function. index : str Name of column used for the time period. It will be placed in the index columns : str Name of column containing the categories for each time period. This column will get pivoted so that each unique value is a column. values : str Name of column holding the values for each time period of each category. This column will become the values of the resulting DataFrame aggfunc : str or aggregation function, default 'sum' String name of aggregation function ('sum', 'min', 'mean', 'max, etc...) or actual function (np.sum, np.min, etc...). Categories that have multiple values for the same time period must be aggregated for the animation to work. orientation : 'h' or 'v', default 'h' Bar orientation - horizontal or vertical sort : 'desc' or 'asc', default 'desc' Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom. n_bars : int, default None Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top. interpolate_period : bool, default `False` Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to `True`, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with `steps_per_period` set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 steps_per_period : int, default 10 The number of steps to go from one time period to the next. The bars will grow linearly between each period. compute_ranks : bool, default True When `True` return both the interpolated values and ranks DataFrames Otherwise just return the values Returns ------- A tuple of DataFrames. The first is the interpolated values and the second is the interpolated ranks. Examples -------- df_values, df_ranks = bcr.prepare_long_data(df) bcr.bar_chart_race(df_values, steps_per_period=1, period_length=50) ''' df_wide = df.pivot_table(index=index, columns=columns, values=values, aggfunc=aggfunc) return prepare_wide_data(df_wide, orientation, sort, n_bars, interpolate_period, steps_per_period, compute_ranks)</code></pre>
def prepare_wide_data(df, orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True)
-
Prepares 'wide' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks
There is no need to use this function directly to create the animation. You can pass your DataFrame directly to
bar_chart_race()
.This function is useful if you want to view the prepared data without creating an animation.
Parameters
df
:pandas DataFrame
- Must be a 'wide' pandas DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period.
orientation
:'h'
or'v'
, default'h'
- Bar orientation - horizontal or vertical
sort
:'desc'
or'asc'
, default'desc'
- Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars
:int
, defaultNone
- Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top.
interpolate_period
:bool
, defaultFalse
- Whether to interpolate the period. Only valid for datetime or
numeric indexes. When set to
True
, for example, the two consecutive periods 2020-03-29 and 2020-03-30 withsteps_per_period
set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 steps_per_period
:int
, default10
- The number of steps to go from one time period to the next. The bars will grow linearly between each period.
compute_ranks
:bool
, defaultTrue
- When
True
return both the interpolated values and ranks DataFrames Otherwise just return the values
Returns
A tuple
ofDataFrames. The first is the interpolated values and the second
is the interpolated ranks.
Examples
df_values, df_ranks = bcr.prepare_wide_data(df)
Expand source code
def prepare_wide_data(df, orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True): ''' Prepares 'wide' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks
There is no need to use this function directly to create the animation. You can pass your DataFrame directly to `bar_chart_race`. This function is useful if you want to view the prepared data without creating an animation. Parameters ---------- df : pandas DataFrame Must be a 'wide' pandas DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period. orientation : 'h' or 'v', default 'h' Bar orientation - horizontal or vertical sort : 'desc' or 'asc', default 'desc' Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom. n_bars : int, default None Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top. interpolate_period : bool, default `False` Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to `True`, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with `steps_per_period` set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00 steps_per_period : int, default 10 The number of steps to go from one time period to the next. The bars will grow linearly between each period. compute_ranks : bool, default True When `True` return both the interpolated values and ranks DataFrames Otherwise just return the values Returns ------- A tuple of DataFrames. The first is the interpolated values and the second is the interpolated ranks. Examples -------- df_values, df_ranks = bcr.prepare_wide_data(df) ''' if n_bars is None: n_bars = df.shape[1] df_values = df.reset_index() df_values.index = df_values.index * steps_per_period new_index = range(df_values.index[-1] + 1) df_values = df_values.reindex(new_index) if interpolate_period: if df_values.iloc[:, 0].dtype.kind == 'M': first, last = df_values.iloc[[0, -1], 0] dr = pd.date_range(first, last, periods=len(df_values)) df_values.iloc[:, 0] = dr else: df_values.iloc[:, 0] = df_values.iloc[:, 0].interpolate() else: df_values.iloc[:, 0] = df_values.iloc[:, 0].fillna(method='ffill') df_values = df_values.set_index(df_values.columns[0]) if compute_ranks: df_ranks = df_values.rank(axis=1, method='first', ascending=False).clip(upper=n_bars + 1) if (sort == 'desc' and orientation == 'h') or (sort == 'asc' and orientation == 'v'): df_ranks = n_bars + 1 - df_ranks df_ranks = df_ranks.interpolate() df_values = df_values.interpolate() if compute_ranks: return df_values, df_ranks return df_values</code></pre>