Data Analysis and Visualization
Section 7: Data Analysis and Visualization
Lesson 1: Introduction to Pandas
1.1 Working with DataFrames
Pandas is a powerful library for data manipulation and analysis in Python. It introduces the concept of DataFrames, which are two-dimensional, tabular data structures.
Example:
# Working with DataFrames
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
1.2 Data Manipulation and Analysis
Pandas provides extensive functionality for data manipulation and analysis, including filtering, sorting, and aggregation.
Example:
# Data manipulation and analysis
# Filtering data
young_people = df[df['Age'] < 30]
# Sorting data
sorted_df = df.sort_values(by='Age')
# Aggregation
average_age = df['Age'].mean()
print("Young People:")
print(young_people)
print("\nSorted DataFrame:")
print(sorted_df)
print("\nAverage Age:", average_age)
Lesson 2: Data Visualization with Matplotlib
2.1 Creating Plots and Charts
Matplotlib is a popular library for creating static, animated, and interactive visualizations in Python.
Example:
# Data Visualization with Matplotlib
import matplotlib.pyplot as plt
# Creating a bar chart
plt.bar(df['Name'], df['Age'])
plt.xlabel('Name')
plt.ylabel('Age')
plt.title('Age Distribution')
plt.show()
Matplotlib can be used for various types of plots, including line charts, scatter plots, histograms, and more.
Example (Line Chart):
# Creating a line chart
plt.plot(df['Name'], df['Age'], marker='o')
plt.xlabel('Name')
plt.ylabel('Age')
plt.title('Age Trend')
plt.show()
Example (Scatter Plot):
# Creating a scatter plot
plt.scatter(df['Age'], df['City'])
plt.xlabel('Age')
plt.ylabel('City')
plt.title('Age vs City')
plt.show()
In Section 7, we introduced data analysis and visualization in Python, focusing on the Pandas library for data manipulation and the Matplotlib library for data visualization.
Pandas provides a user-friendly interface for working with tabular data through its DataFrame structure. We explored creating DataFrames, filtering, sorting, and aggregating data.
Matplotlib, on the other hand, offers a versatile set of tools for creating various types of plots and charts, allowing you to visualize patterns and trends in your data.
As you advance in data analysis and visualization, you can explore more advanced techniques, integrate other libraries like Seaborn or Plotly, and work with larger datasets. These skills are valuable for anyone involved in exploring and communicating insights from data.