Plotly Express provides various data packages(i.e built-in datasets) for demonstration, educational and test purposes.
We can use these packages to create various plots with any of the Plotly submodules – graph_objects, figure_factory, and express.
Importing the data sets
import plotly.express as px
df = px.data.<PACKAGE NAME>
Various data sets
plotly.express.data.carshare()
Each row represents the availability of car-sharing services near the centroid of a zone in Montreal over a month-long period.
- Returns: [‘centroid_lat’, ‘centroid_lon’, ‘car_hours’, ‘peak_hour’].
- Return type: A
pandas.DataFrame
with 249 rows.
plotly.express.data.election()
Each row represents voting results for an electoral district in the 2013 Montreal mayoral election.
- Returns: [‘district’, ‘Coderre’, ‘Bergeron’, ‘Joly’, ‘total’, ‘winner’, ‘result’, ‘district_id’].
- Return type: A
pandas.DataFrame
with 58 rows.
plotly.express.data.experiment()
Each row in this wide dataset represents the results of 100 simulated participants on three hypothetical experiments, along with their gender and control/treatment group.
If
indexed
is True, the data frame index is named “participant”.
- Returns: [‘experiment_1’, ‘experiment_2’, ‘experiment_3’, ‘gender’, ‘group’].
- Return type: A
pandas.DataFrame
with 100 rows.
plotly.express.data.gapminder()
Each row represents a country in a given year.
Gapminder takes the following arguments:-
- datetimes: Default value is False, If
datetimes
is True, the ‘year’ column will be a DateTime column. - centroids: Default value is False, If
centroids
is True, two new columns are added: [‘centroid_lat’, ‘centroid_lon’]. - year: Default value is None, If
year
is an integer, the dataset will be filtered for that year. - pretty_names: Default value in False.
Returns: [‘country’, ‘continent’, ‘year’, ‘lifeExp’, ‘pop’, ‘gdpPercap’, ‘iso_alpha’, ‘iso_num’].
Return type: A pandas.DataFrame
with 1704 rows.
plotly.express.data.election_geojson()
Each feature represents an electoral district in the 2013 Montreal mayoral election.
Returns: A GeoJSON-formatted dict
with 58 polygon or multi-polygon features whose id is an electoral district numerical ID and whose district property is the ID and district name.
plotly.express.data.iris()
Each row represents a flower.
- Returns: [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’, ‘species_id’].
- Return type: A
pandas.DataFrame
with 150 rows.
plotly.express.data.medals_long(indexed=False)
This dataset represents the medal table for Olympic Short Track Speed Skating for the top three nations as of 2020.
If
indexed
is True, the ‘nation’ column is used as the index.
- Returns: [‘nation’, ‘medal’, ‘count’].
- Return type: A
pandas.DataFrame
with 9 rows.
plotly.express.data.medals_wide(indexed=False)
This dataset represents the medal table for Olympic Short Track Speed Skating for the top three nations as of 2020.
If
indexed
is True, the ‘nation’ column is used as the index and the column index is named ‘medal’.
- Returns: [‘nation’, ‘gold’, ‘silver’, ‘bronze’].
- Return type: A
pandas.DataFrame
with 3 rows and the following columns
plotly.express.data.stocks(indexed=False, datetimes=False)
Each row in this wide dataset represents closing prices from 6 tech stocks in 2018/2019.
If
indexed
is True, the ‘date’ column is used as the index and the column index.
Ifdatetimes
is True, the ‘date’ column will be a datetime column is named ‘company’
- Returns: [‘date’, ‘GOOG’, ‘AAPL’, ‘AMZN’, ‘FB’, ‘NFLX’, ‘MSFT’].
- Return type: A
pandas.DataFrame
with 100 rows.
plotly.express.data.tips()
Each row represents a restaurant bill.
- Returns: [‘total_bill’, ‘tip’, ‘sex’, ‘smoker’, ‘day’, ‘time’, ‘size’].
- Return type: A
pandas.DataFrame
with 244 rows.
plotly.express.data.wind()
Each row represents a level of wind intensity in a cardinal direction, and its frequency.
- Returns: [‘direction’, ‘strength’, ‘frequency’].
- Return type: A
pandas.DataFrame
with 128 rows.
Get insight of any data set
We can also look into the values of the data set using head()
method.
import plotly.express as px
print(px.data.carshare().head())
Output
centroid_lat centroid_lon car_hours peak_hour
0 45.471549 -73.588684 1772.750000 2
1 45.543865 -73.562456 986.333333 23
2 45.487640 -73.642767 354.750000 20
3 45.522870 -73.595677 560.166667 23
4 45.453971 -73.738946 2836.666667 19