Visualizing Healthcare Data Using Matplotlib and Seaborn
Written on
In this article, we will explore fundamental visualization techniques using the Matplotlib and Seaborn libraries, both of which are highly regarded in the data science and analytics fields.
- Matplotlib: This library is excellent for creating basic plots with a high degree of customization. It integrates seamlessly with Pandas and NumPy, making it easy to visualize multiple figures.
- Seaborn: Known for its aesthetic appeal, Seaborn is a powerful visualization tool that works well with Pandas DataFrames. It offers attractive themes for plots but may occasionally cause Out Of Memory (OOM) issues.
Below are some examples of visualizations created with Matplotlib and Seaborn:
Matplotlib Library
To begin visualizing data, we first need to import it using the Pandas library.
import pandas as pd
Now let's read the healthcare data:
# Reading the CSV file using read_csv in Pandas df = pd.read_csv('healthcare.csv')
To view the data:
df.head()
Boxplot
Boxplots allow us to visualize quartiles and perform descriptive analysis.
import matplotlib.pyplot as plt
# Checking for outliers using a box plot for column in df:
if df[column].dtype in ['int64', 'float64']:
plt.figure()
df.boxplot(column=[column])
Histogram of All Features
Histograms help us analyze the distribution of our data.
df.hist()
Single Histogram for One Feature
To plot a histogram for a specific feature:
# Plotting a single histogram plt.hist(df['BMI'])
Scatter Plot of Two Features
Scatter plots illustrate the relationship between two variables.
# Comparing two features on a scatter plot x = df['Age'] y = df['Glucose']
plt.scatter(x, y) plt.xlabel('Age') plt.ylabel('Glucose') plt.title('Age vs Glucose') plt.show()
Bar Plot
Bar plots are effective for visualizing categorical variable counts.
plt.bar(x, y)
Scatter Plot of All Features as Subplots
ax[0, 0].scatter(x=df['Age'], y=df['BMI']) ax[0, 0].set_xlabel("Age") ax[0, 0].set_ylabel("BMI")
ax[0, 1].scatter(x=df['Age'], y=df['SkinThickness']) ax[0, 1].set_xlabel("Age") ax[0, 1].set_ylabel("SkinThickness")
# Continue for other feature comparisons...
Seaborn Library
Begin by importing the library:
import seaborn as sns
Joint Plot
Create a joint plot combining histograms and scatter plots.
sns.jointplot(x=x, y=y, data=df, size=5)
Boxplot with Seaborn
sns.boxplot(x="Outcome", y="Age", data=df)
Violin Plot
Violin plots depict probability density and resemble box plots.
sns.violinplot(x="Outcome", y="Age", data=df, size=6)
Pairplot
A pairplot displays relationships among all variables in a single figure.
aa = sns.pairplot(df)
Conclusion
Visualization serves as an effective means to observe relationships between features and derive insights. While there are numerous plotting techniques available, this article focused on a selection of them.
I hope you found this article helpful. Feel free to connect with me on LinkedIn and Twitter.
Recommended Articles
- NLP — Zero to Hero with Python
- Python Data Structures: Data-types and Objects
- Exception Handling Concepts in Python
- Principal Component Analysis in Dimensionality Reduction with Python
- Fully Explained K-means Clustering with Python
- Fully Explained Linear Regression with Python
- Fully Explained Logistic Regression with Python
- Basics of Time Series with Python
- Data Wrangling With Python — Part 1
- Confusion Matrix in Machine Learning