This lesson requires a premium membership to access.
Premium membership includes unlimited access to all courses, quizzes, downloadable resources, and future content updates.
Categorical variables are those that represent a qualitative property of the data element, such as sector or industry in a financial context. Understanding how these non-numerical data are distributed is important, as they often hold key insights into the structure and segmentation of your dataset.
To do some analysis on categorical data, we will load a new data set (sp_500_constituents.csv). It’s a list of S&P 500 companies and contains these companies’ sector and industry.