×

Plotly Express Data Pacakges

Plotly Express provides various data packages(i.e built-in datasets) for demonstration, educational and test purposes.

We can use these packages to create various plots with any of the Plotly submodules – graph_objects, figure_factory, and express.

Importing the data sets

import plotly.express as px

df = px.data.<PACKAGE NAME>
Various data sets

plotly.express.data.carshare()

Each row represents the availability of car-sharing services near the centroid of a zone in Montreal over a month-long period.

  • Returns: [‘centroid_lat’, ‘centroid_lon’, ‘car_hours’, ‘peak_hour’].
  • Return type: A pandas.DataFrame with 249 rows.

plotly.express.data.election()

Each row represents voting results for an electoral district in the 2013 Montreal mayoral election.

  • Returns: [‘district’, ‘Coderre’, ‘Bergeron’, ‘Joly’, ‘total’, ‘winner’, ‘result’, ‘district_id’].
  • Return type:pandas.DataFrame with 58 rows.

plotly.express.data.experiment()

Each row in this wide dataset represents the results of 100 simulated participants on three hypothetical experiments, along with their gender and control/treatment group.

If indexed is True, the data frame index is named “participant”.

  • Returns: [‘experiment_1’, ‘experiment_2’, ‘experiment_3’, ‘gender’, ‘group’].
  • Return type:pandas.DataFrame with 100 rows.

plotly.express.data.gapminder()

Each row represents a country in a given year.

Gapminder takes the following arguments:-

  • datetimes: Default value is False, If datetimes is True, the ‘year’ column will be a DateTime column.
  • centroids: Default value is False, If centroids is True, two new columns are added: [‘centroid_lat’, ‘centroid_lon’].
  • year: Default value is None, If year is an integer, the dataset will be filtered for that year.
  • pretty_names: Default value in False.

Returns: [‘country’, ‘continent’, ‘year’, ‘lifeExp’, ‘pop’, ‘gdpPercap’, ‘iso_alpha’, ‘iso_num’].

Return type:pandas.DataFrame with 1704 rows.

plotly.express.data.election_geojson()

Each feature represents an electoral district in the 2013 Montreal mayoral election.

Returns: A GeoJSON-formatted dict with 58 polygon or multi-polygon features whose id is an electoral district numerical ID and whose district property is the ID and district name.

plotly.express.data.iris()

Each row represents a flower.

  • Returns: [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’, ‘species_id’].
  • Return type:pandas.DataFrame with 150 rows.

plotly.express.data.medals_long(indexed=False)

This dataset represents the medal table for Olympic Short Track Speed Skating for the top three nations as of 2020.

If indexed is True, the ‘nation’ column is used as the index.

  • Returns: [‘nation’, ‘medal’, ‘count’].
  • Return type:pandas.DataFrame with 9 rows.

plotly.express.data.medals_wide(indexed=False)

This dataset represents the medal table for Olympic Short Track Speed Skating for the top three nations as of 2020.

If indexed is True, the ‘nation’ column is used as the index and the column index is named ‘medal’.

  • Returns: [‘nation’, ‘gold’, ‘silver’, ‘bronze’].
  • Return type:pandas.DataFrame with 3 rows and the following columns

plotly.express.data.stocks(indexed=False, datetimes=False)

Each row in this wide dataset represents closing prices from 6 tech stocks in 2018/2019.

If indexed is True, the ‘date’ column is used as the index and the column index.
If datetimes is True, the ‘date’ column will be a datetime column is named ‘company’

  • Returns: [‘date’, ‘GOOG’, ‘AAPL’, ‘AMZN’, ‘FB’, ‘NFLX’, ‘MSFT’].
  • Return type:pandas.DataFrame with 100 rows.

plotly.express.data.tips()

Each row represents a restaurant bill.

  • Returns: [‘total_bill’, ‘tip’, ‘sex’, ‘smoker’, ‘day’, ‘time’, ‘size’].
  • Return type:pandas.DataFrame with 244 rows.

plotly.express.data.wind()

Each row represents a level of wind intensity in a cardinal direction, and its frequency.

  • Returns: [‘direction’, ‘strength’, ‘frequency’].
  • Return type:pandas.DataFrame with 128 rows.

Get insight of any data set

We can also look into the values of the data set using head() method.

import plotly.express as px

print(px.data.carshare().head())
Output
     centroid_lat  centroid_lon    car_hours  peak_hour
0     45.471549    -73.588684  1772.750000          2
1     45.543865    -73.562456   986.333333         23
2     45.487640    -73.642767   354.750000         20
3     45.522870    -73.595677   560.166667         23
4     45.453971    -73.738946  2836.666667         19