📝 Exercise M1.01#
Imagine we are interested in predicting penguins species based on two of their body measurements: culmen length and culmen depth. First we want to do some data exploration to get a feel for the data.
What are the features? What is the target?
The data is located in ../datasets/penguins_classification.csv
, load it with
pandas
into a DataFrame
.
# Write your code here.
Show a few samples of the data.
How many features are numerical? How many features are categorical?
# Write your code here.
What are the different penguins species available in the dataset and how many
samples of each species are there? Hint: select the right column and use the
value_counts
method.
# Write your code here.
Plot histograms for the numerical features
# Write your code here.
Show features distribution for each class. Hint: use
seaborn.pairplot
# Write your code here.
Looking at these distributions, how hard do you think it would be to classify
the penguins only using "culmen depth"
and "culmen length"
?