Data Analysis and Visualization

Section 7: Data Analysis and Visualization

Lesson 1: Introduction to Pandas

1.1 Working with DataFrames

Pandas is a powerful library for data manipulation and analysis in Python. It introduces the concept of DataFrames, which are two-dimensional, tabular data structures.

Example:

 

# Working with DataFrames

import pandas as pd


# Creating a DataFrame

data = {'Name': ['Alice', 'Bob', 'Charlie'],

        'Age': [25, 30, 22],

        'City': ['New York', 'San Francisco', 'Los Angeles']}


df = pd.DataFrame(data)


# Displaying the DataFrame

print(df)


1.2 Data Manipulation and Analysis

Pandas provides extensive functionality for data manipulation and analysis, including filtering, sorting, and aggregation.

Example:

# Data manipulation and analysis

# Filtering data

young_people = df[df['Age'] < 30]


# Sorting data

sorted_df = df.sort_values(by='Age')


# Aggregation

average_age = df['Age'].mean()


print("Young People:")

print(young_people)

print("\nSorted DataFrame:")

print(sorted_df)

print("\nAverage Age:", average_age)


Lesson 2: Data Visualization with Matplotlib

2.1 Creating Plots and Charts

Matplotlib is a popular library for creating static, animated, and interactive visualizations in Python.

Example:

# Data Visualization with Matplotlib

import matplotlib.pyplot as plt


# Creating a bar chart

plt.bar(df['Name'], df['Age'])

plt.xlabel('Name')

plt.ylabel('Age')

plt.title('Age Distribution')

plt.show()

Matplotlib can be used for various types of plots, including line charts, scatter plots, histograms, and more.

Example (Line Chart):

# Creating a line chart

plt.plot(df['Name'], df['Age'], marker='o')

plt.xlabel('Name')

plt.ylabel('Age')

plt.title('Age Trend')

plt.show()

Example (Scatter Plot):


 

# Creating a scatter plot

plt.scatter(df['Age'], df['City'])

plt.xlabel('Age')

plt.ylabel('City')

plt.title('Age vs City')

plt.show()


In Section 7, we introduced data analysis and visualization in Python, focusing on the Pandas library for data manipulation and the Matplotlib library for data visualization.

Pandas provides a user-friendly interface for working with tabular data through its DataFrame structure. We explored creating DataFrames, filtering, sorting, and aggregating data.

Matplotlib, on the other hand, offers a versatile set of tools for creating various types of plots and charts, allowing you to visualize patterns and trends in your data.

As you advance in data analysis and visualization, you can explore more advanced techniques, integrate other libraries like Seaborn or Plotly, and work with larger datasets. These skills are valuable for anyone involved in exploring and communicating insights from data.