Pandas (python data analysis library) is a library designed for data manipulation and analysis. It provides two primary data structures:
- Series: A one-dimensional labeled array.
- DataFrame: A two-dimensional labeled array, which is essentially a table of data.
Pandas is great for handling structured data (like tables, CSV files, databases) and offers a wide range of tools for data cleaning, exploration, and analysis.
Key Features:
- Series and DataFrames: Used for representing and manipulating data.
- Data manipulation: Filtering, grouping, joining, and reshaping data.
- Handling missing data: Easy handling of missing or NA values.
- Reading/writing data: Supports reading from and writing to multiple formats (CSV, Excel, SQL, etc.).
Example:
Creating DataFrame:
import pandas as pd
# Creating a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Basic DataFrame Operations:
# Selecting a column
print(df['Name'])
# Filtering data
filtered_df = df[df['Age'] > 30]
print(filtered_df)
# Adding a new column
df['Salary'] = [50000, 60000, 70000, 80000]
print(df)
# Grouping data
grouped = df.groupby('City').mean()
print(grouped)
Advance Example
# Merging two DataFrames
df1 = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['A', 'B', 'C']
})
df2 = pd.DataFrame({
'ID': [1, 2, 4],
'City': ['X', 'Y', 'Z']
})
merged_df = pd.merge(df1, df2, on='ID', how='outer')
print(merged_df)
Perfect for data analysis, handling structured data, and performing operations like filtering, grouping, and merging.