EXPLORING DESCRIPTIVE STATISTICS MEASURES of CENTRAL TENDANCY and VARIABILITY

 Introduction

In this blog post, we will delve into descriptive statistics, focusing on measures of central tendency and variability. We will explore two scenarios: calculating summary statistics for a numeric variable grouped by a categorical variable and examining statistical details for specific categories in a dataset. By the end, you will have a solid understanding of how to analyze data using these fundamental statistical measures.


statistic

1. Summary Statistics Grouped by Categorical Variables:

To calculate summary statistics (mean, median, minimum, maximum, and standard deviation) for a numeric variable grouped by a categorical variable, let's consider an example dataset called 'Loan-payments-data.csv'. We will calculate summary statistics for the 'age' variable grouped by the 'education' variable. Here's the code to perform the analysis:

```

import pandas as pd

# Read the dataset from the CSV file

df = pd.read_csv('Loan-payments-data.csv')

# Calculate the summary statistics for 'age' grouped by 'education'

age_summary = df.groupby('education')['age'].describe()

# Extract the numeric values from the summary statistics

age_values = age_summary['mean'].tolist() + age_summary['50%'].tolist() + age_summary['min'].tolist() + \ age_summary['max'].tolist() + age_summary['std'].tolist()

# Print the summary statistics

print('Summary statistics for age grouped by education:')

print(age_summary) 

print('')

# Print the list of numeric values

print('Numeric values for age grouped by education:')

print(age_values)

'''

Output



2. Statistical Details for Specific Categories

Next, let's examine statistical details for specific categories in a dataset. For this example, we will use the 'Iris (1).csv' dataset. We will calculate percentile, mean, and standard deviation for the species 'Iris-setosa', 'Iris-versicolor', and 'Iris-virginica'. Here's the code to perform the analysis:

```python

import pandas as pd

# Read the dataset from the CSV file

df = pd.read_csv('Iris (1).csv')

# Filter the dataset for the specified species

setosa_data = df[df['Species'] == 'Iris-setosa']

versicolor_data = df[df['Species'] == 'Iris-versicolor']

virginica_data = df[df['Species'] == 'Iris-virginica']

# Calculate the statistical details for each species

setosa_stats = setosa_data.describe()

versicolor_stats = versicolor_data.describe()

virginica_stats = virginica_data.describe()

# Display the statistical details

print("Statistical details for Iris-setosa:")

print(setosa_stats)

print()

print("Statistical details for Iris-versicolor:")

print(versicolor_stats)

print()

print("Statistical details for Iris-virginica:")

print(virginica_stats)

'''

Output



Summary

Descriptive statistics, such as measures of central tendency (mean, median) and variability (standard deviation), are powerful tools for summarizing and analyzing datasets. By calculating summary statistics for a numeric variable grouped by a categorical variable and examining statistical details for specific categories, you can gain valuable insights into your data. These statistical measures serve as a foundation for further analysis and decision-making in various fields, including finance, economics, and social sciences.

Click here for dataset - Loan-payments-data.csv

follow devcp.in for more content

Previous Post Next Post