## Introduction

In this blog post, we will delve into descriptive statistics, focusing on measures of central tendency and variability. We will explore two scenarios: calculating summary statistics for a numeric variable grouped by a categorical variable and examining statistical details for specific categories in a dataset. By the end, you will have a solid understanding of how to analyze data using these fundamental statistical measures.

## 1. Summary Statistics Grouped by Categorical Variables:

To calculate summary statistics (mean, median, minimum, maximum, and standard deviation) for a numeric variable grouped by a categorical variable, let's consider an example dataset called 'Loan-payments-data.csv'. We will calculate summary statistics for the 'age' variable grouped by the 'education' variable. Here's the code to perform the analysis:

```

import pandas as pd

# Read the dataset from the CSV file

df = pd.read_csv('Loan-payments-data.csv')

# Calculate the summary statistics for 'age' grouped by 'education'

age_summary = df.groupby('education')['age'].describe()

# Extract the numeric values from the summary statistics

age_values = age_summary['mean'].tolist() + age_summary['50%'].tolist() + age_summary['min'].tolist() + \ age_summary['max'].tolist() + age_summary['std'].tolist()

# Print the summary statistics

print('Summary statistics for age grouped by education:')

print(age_summary)

print('')

# Print the list of numeric values

print('Numeric values for age grouped by education:')

print(age_values)

'''

Output

## 2. Statistical Details for Specific Categories

Next, let's examine statistical details for specific categories in a dataset. For this example, we will use the 'Iris (1).csv' dataset. We will calculate percentile, mean, and standard deviation for the species 'Iris-setosa', 'Iris-versicolor', and 'Iris-virginica'. Here's the code to perform the analysis:

```python

import pandas as pd

# Read the dataset from the CSV file

df = pd.read_csv('Iris (1).csv')

# Filter the dataset for the specified species

setosa_data = df[df['Species'] == 'Iris-setosa']

versicolor_data = df[df['Species'] == 'Iris-versicolor']

virginica_data = df[df['Species'] == 'Iris-virginica']

# Calculate the statistical details for each species

setosa_stats = setosa_data.describe()

versicolor_stats = versicolor_data.describe()

virginica_stats = virginica_data.describe()

# Display the statistical details

print("Statistical details for Iris-setosa:")

print(setosa_stats)

print()

print("Statistical details for Iris-versicolor:")

print(versicolor_stats)

print()

print("Statistical details for Iris-virginica:")

print(virginica_stats)

'''

Output

## Summary

Descriptive statistics, such as measures of central tendency (mean, median) and variability (standard deviation), are powerful tools for summarizing and analyzing datasets. By calculating summary statistics for a numeric variable grouped by a categorical variable and examining statistical details for specific categories, you can gain valuable insights into your data. These statistical measures serve as a foundation for further analysis and decision-making in various fields, including finance, economics, and social sciences.

Click here for dataset - Loan-payments-data.csv

follow devcp.in for more content