import altair as alt
from vega_datasets import data
= data.movies.url source
Adding a median line to a strip chart in Altair
altair
I find it a bit tricky to add a mark representing an aggregate value to strip plots in Altair. Below is an example using mark_tick to show the median value for IMDB_Rating
within each movie genre using the Vega datasets movies data.
The chart will have two layers (one for each marking, points and ticks). Each layer will use the same data source so we can set up a shared “base.”
= alt.Chart(source, height=alt.Step(25), width=500).transform_calculate(
base 1="sqrt(-2*log(random()))*cos(2*PI*random())"
jitter )
- 1
-
I define the
jitter
field here because it seems to interfere with the sorting if I define it as part of the points layer below.
The X
field is the same for sorting and the points/ticks x-encoding so I’ll make a variable to store and re-use it. Also, I’ll define how the chart is sorted and some axis settings to be applied to both the X
and Y
-axis.
= "IMDB_Rating"
x_field = "Q"
type_ 1= alt.EncodingSortField(field=x_field, op="median", order="descending")
sort = {"titleFontSize": 14, "titlePadding": 15, "labelFontSize": 12} axis_kwargs
- 1
- Sort by an aggregate field value.
Below are the chart layers. In the ticks
layer we are using the aggregate
option to calculate the median value within each movie genre.
= base.mark_circle(
points =20, stroke="steelblue", strokeOpacity=0.5, fill="steelblue", fillOpacity=0.15
size
).encode(=alt.Y("Major_Genre:N").sort(sort).axis(title=None, **axis_kwargs),
y=alt.X(f"{x_field}:{type_}"),
x=alt.YOffset("jitter:Q"),
yOffset
)
= base.mark_tick(stroke="firebrick", strokeOpacity=0.85, thickness=1.5).encode(
ticks =alt.Y("Major_Genre:N").sort(sort),
y=alt.X(f"{x_field}:{type_}", aggregate="median").axis(
x="IMDB Rating", **axis_kwargs
title
),
)
+ ticks points