Pandas

Series

DataFrames

DataFrame Cheat Sheet Pandas Data Types Guide List of Dicts to DataFrame DataFrame to List of Dicts Import CSV into DataFrame Read Excel Files with Pandas Parsing JSON to DataFrame Parse HTML Table to DataFrame SQLite to DataFrame in Pandas Save DataFrame to CSV Save DataFrame to Excel Select Columns by Data Type Store DataFrame in SQLite Saving DataFrame as JSON Render DataFrame to HTML Table Serve DataFrame as REST API Read XML into DataFrame DataFrame to XML with Pandas DataFrame to PDF with Pandas Read Clipboard Data with Pandas Pandas & HDFStore Guide Pandas json_normalize() Explained Selecting SQLite rows with Pandas Create & Add Data to DataFrame Create DataFrame & Add Columns Pandas DataFrame from NumPy Array Create DataFrame from Dict DataFrame from N Series in Pandas Listing DataFrame Row Labels View DataFrame Column Labels Viewing Data Types in Pandas Pandas: Multi-Type Columns? Summarizing DataFrames in Pandas Pandas DataFrame Data Types DataFrame to NumPy Conversion Inspect DataFrame Axes Counting Rows & Columns in Pandas Count Elements & Dimensions in DF Check Empty DataFrame in Pandas Managing Duplicate Labels in DF Pandas: Casting DataFrame Types Guide to pandas convert_dtypes() pandas infer_objects() Explained Deep/Shallow DataFrame Copies Pandas: First/Last N Rows Access/Modify Cell with .at[]/.iat[] Mastering pandas.DataFrame.loc[] pandas.DataFrame.insert() Guide Understanding DataFrame.items() pandas.DataFrame.iterrows() Guide Get Pandas Column Position Exploring pd.DataFrame.itertuples() Drop a Column in Pandas Pandas: Creating DataFrame DataFrame to List of Tuples Deep Dive into DataFrame.xs() pandas.DataFrame.get() Examples Exploring pandas.DataFrame.isin() pandas.DataFrame.where() Guide pandas.DataFrame.mask() Tutorial Pandas DataFrame.query() Method Element-wise Sum in Pandas Subtract DataFrames Element-wise Pandas Elem-wise Multiplication Divide DataFrames Element-wise Pandas Modulo of DataFrames Element-wise Exponentiation in Pandas Logarithmic Operations in Pandas Master pandas.DataFrame.dot() Pandas lt() & le() Methods Explained Pandas gt() & ge() Methods Guide Comparing DataFrames Element-wise Pandas DataFrame.combine() Method pandas `combine_first()` Method Master DataFrame.apply() in Pandas DataFrame.map() Method Explained Pandas DataFrame.pipe() Guide Pandas DataFrame.agg() Examples Pandas DataFrame.aggregate() DataFrame.transform() in Pandas Master DataFrame.groupby() Pandas Rolling Window Calculations Expanding Window Calculations Pandas EW Calculations Pandas DataFrame.abs() Explained Pandas DataFrame.all() Guide Pandas DataFrame.any() Method Pandas DataFrame.clip() Tutorial Pairwise Correlation in Pandas Counting Non-Null Values in DF Pandas DataFrame.cummax() Explained Pandas DataFrame.cummin() Pandas & Google Sheets Tutorial Setting Random Seed in Pandas Pandas DataFrame.cumprod() Guide Accessing & Modifying Excel in OneDrive Handling Large Data with Pandas & Dask Pandas DataFrame.cumsum() Guide Cleaning Text Data with Pandas Master DataFrame.diff() in Pandas Pandas: Reading from S3 with Examples Web Scraping with Pandas Pandas DataFrame.eval() Method Pandas Profiling for Data Analysis Pandas DataFrame.kurt() Pandas & Spark Integration DataFrame.kurtosis() in Pandas DataFrame.max() in Pandas Pandas for Geospatial Data Analysis Pandas DataFrame.min() Guide Understanding DataFrame.mean() Pandas DataFrame.median() Guide Pandas DataFrame.mode() guide DataFrame.pct_change() in Pandas Pandas prod() & product() Methods DataFrame.quantile() in Pandas Data Ranks in Pandas Pandas DataFrame.round() Explained Pandas DataFrame.sem() Explained Pandas DataFrame.skew() Method Pandas DataFrame.sum() Examples Pandas DataFrame.std() Explained DataFrame.var() in Pandas Counting Distinct Values in Pandas Add Prefix/Suffix in Pandas Pandas DataFrame.align() Guide Pandas DataFrame.at_time() Guide Using DataFrame.between_time() Dropping Labels in Pandas DF Pandas: Remove Duplicate Rows Pandas DataFrame.duplicated() Pandas equals() Explained Mastering DataFrame.filter() Pandas idxmax() & idxmin() Guide Guide to DataFrame.reindex() Pandas: DataFrame.reindex_like() Renaming DataFrame Columns DataFrame.reset_index() Guide Pandas DataFrame.sample() Guide DataFrame.set_axis() in Pandas Pandas set_index() Method DataFrame.take() in Pandas Pandas DataFrame.truncate() Explained Mastering DataFrame.bfill() in Pandas DataFrame.dropna() in Pandas Pandas DataFrame.ffill() Guide Pandas fillna() Method Examples Pandas interpolate() Explained Identify Missing Values in DF Pandas: Detect Non-Missing Values Pandas DataFrame.replace() Guide DataFrame.droplevel() in Pandas Pandas DataFrame.pivot() Tutorial Pandas .pivot_table() Explained DataFrame.reorder_levels() guide Sort Pandas DataFrame with sort_values() Pandas sort_index() Guide Pandas: nlargest() and nsmallest() Pandas swaplevel() Explained Pandas DataFrame stack() & unstack() Mastering DataFrame.transpose() Pandas DataFrame.melt() Tutorial Pandas DataFrame.assign() Guide DataFrame.explode() in Pandas Pandas DataFrame.squeeze() DataFrame to xarray in Pandas Master Pandas DataFrame.compare() Pandas DataFrame.join() Explained Merge 2 Pandas DataFrames Pandas DataFrame.update() DataFrame.asfreq() in Pandas Pandas DataFrame.asof() Guide DataFrame.shift() in Pandas DataFrame.resample() Guide Pandas to_period() Explained DataFrame.to_timestamp() Guide Understanding DataFrame.tz_convert() NumPy Type Checking with mypy Pandas tz_localize() Guide Pandas to_string() Explained Appending Rows in DataFrame Prepend Row to DataFrame Filter DataFrame by Conditions Pandas MultiIndex Tutorial Iterate DataFrame Rows in Pandas Async/Await in Pandas Selecting Rows in Pandas Selecting Columns in Pandas Swapping Columns in Pandas Change Pandas Columns Order Changing Column Data Type in Pandas Search Rows by String Keyword Sorting DataFrame Rows DataFrame to MongoDB Tutorial Replacing NA/NaN with Zero in DF Create Empty DataFrame in Pandas Filter DataFrame with LIKE/NOT LIKE Shuffling DataFrame Rows Update Cell in Pandas DataFrame Concatenate CSVs into a DataFrame Save DataFrame to Multiple CSVs Add Column Based on Existing Ones Checking Column Existence in DF Check Row Existence in DataFrame Dropping Unused Levels in MultiIndex Select Columns Except Some in Pandas Split DataFrame in Test/Train/Val Sets Indexes with Conditions in Pandas Counting Value Frequency with Pandas Convert ISO Strings to Datetime Appending DataFrame rows to CSV Select N Random Rows in Pandas Select Rows Between Dates Select rows by time frame in Pandas Convert Strings to Numbers in DF Combine Columns in Pandas Print DataFrame Without Index Print All Columns in Pandas Clear DataFrame Rows in Pandas Pandas: Map True/False to 1/0 Filter Pandas DataFrame with regex Dropping Non-Numerical Columns Removing Duplicates in Pandas Renaming DataFrame Columns Dropping Columns in Pandas Dropping Columns in Pandas Drop Columns by Avg Threshold Pandas: String Conversion Pandas: Clean Column Names Read Authenticated CSV with Pandas Select Rows Not in Another DF Appending Footer Row in Pandas Understanding dtype('O') in Pandas Replace NaN with Column Mean Pandas DataFrame & Type Hints Pandas Series & Type Hints Generate Heatmap with Pandas Pandas + Faker for Random Data Insert a Row in DataFrame Comparing 2 DataFrame Columns Select Columns by Name Patterns Organize a Pandas Project DataFrame vs Matrix Explained Trim Strings in DataFrame Pandas DataFrame Column Naming Convert DataFrame to Series Concatenate Strings in DataFrame Partition Large DataFrame Append Dict to DataFrame Replace Negative Values in DF Split DataFrame Column Swap Rows in Pandas DataFrame Pandas: Timestamp to Datetime Filtering DataFrame with OR Replicate DataFrame Row N Times Pandas: Auto-Increment Column Find Closest Value in DataFrame Nested Dict to Multi-Index DF Convert DataFrame strings to binary Creating Categorical Columns Pandas: Ordered Categories SparseArray in Pandas Explained Combine Categorical Columns IntervalIndex in Pandas PeriodIndex in Pandas Explained Pandas BusinessDay.is_on_offset() CustomBusinessDay in Pandas Rolling Sample Covariance Rolling Weighted Mean with Pandas Rolling Weighted Window with Pandas Pandas: Rolling Weighted Variance Rolling Weighted Std Deviation Expanding Count in Pandas DF Expanding Min/Max in Pandas Pandas cut() Function Explained Exploring Pandas qcut() Function Pandas get_dummies() Function Pandas from_dummies() Guide Pandas lreshape() Tutorial Pandas wide_to_long() Examples Pandas to_timedelta() Function Pandas: Business Day DatetimeIndex Infer_freq() in Pandas Pandas Dataframe/Series Hashing Pandas: Counting Grouped Rows Sum/Average in Pandas DataFrame Min/Max in DataFrame Groups Count Unique Values in Groups Pandas GroupBy Day of Week Pandas DataFrame Grouping Pandas: Product of Groups Summary Stats of DF Groups Get nth Row of Each Group in Pandas Get Head/Tail Rows in Pandas Rank Values in Pandas Groups Pandas: Cumulative Sum/Avg by Group Cumulative Min/Max in Pandas Pandas Cumulative Product by Group DataFrame from a String Pandas INNER JOIN DataFrames LEFT JOIN with Pandas Pandas: RIGHT JOIN DataFrames Pandas FULL JOIN Tutorial Pandas: CROSS JOIN DataFrames Pandas: SELF JOIN Explained Reading CSV with Custom Delimiter Pandas: Reading Varied CSV Rows Combine Excel Files with Pandas Skip N Rows in Pandas CSV Dropping MultiIndex in Pivot Tables Adding Percent Column in Pandas Pandas Pivot Table Tutorial Pandas Lag/Lead Column Tutorial Find Frequent Value in DF Groups Combine date & time in Pandas Grouping Pandas DataFrame Rows Pandas: Create New Column with Conditions Check Numeric Data in DataFrame

Error Fixing

Pandas DataFrame: Counting unique values in each group

Updated: February 21, 2024 By: Guest Contributor Post a comment

Table Of Contents

1 Overview

2 Introduction to Grouping in Pandas

3 Basic Example: Counting Unique Values

4 Intermediate Example: Custom Groups and Multiple Columns

5 Advanced Example: Using agg with Custom Functions

6 Grouping and Counting Unique Values with Complex Conditions

7 Visualizing the Counts of Unique Values

8 Conclusion

Overview

Working with Pandas DataFrames is a fundamental skill for any data scientist or analyst. A common operation when analyzing data is grouping data and calculating statistics on these groups. In this tutorial, we will focus on how to count unique values in each group using a Pandas DataFrame. This operation is useful when you need to understand the diversity or variance within groups in your dataset.

We’ll begin with basic examples and gradually move on to more advanced scenarios. Examples will cover a range of techniques from using groupby and nunique methods to applying more complex functions for deeper insights into your grouped data.

Introduction to Grouping in Pandas

Before we delve into counting unique values, let’s quickly review how to group data in a Pandas DataFrame. Grouping involves one or more keys by which the data is split into groups. Each group can then be aggregated or transformed independently.

import pandas as pd

# Sample DataFrame
data = {'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'C'],
        'Values': [1, 2, 2, 3, 3, 1, 2, 3]}
df = pd.DataFrame(data)

The simplest way to group data is by using the groupby method. For example, if we want to group by the ‘Category’ column:

grouped_df = df.groupby('Category')

Basic Example: Counting Unique Values

Counting unique values in each group can be achieved using the nunique method. This method returns the count of unique values for each column in each group:

unique_counts = grouped_df['Values'].nunique()
print(unique_counts)

This results in:

Category
A    2
B    2
C    2
Name: Values, dtype: int64

As you can see, each category has two unique values in the ‘Values’ column.

Intermediate Example: Custom Groups and Multiple Columns

Let’s explore how to count unique values in custom groupings and across multiple columns. Suppose we have an additional ‘Subcategory’ column and we want to group by both ‘Category’ and ‘Subcategory’:

data['Subcategory'] = ['X', 'X', 'Y', 'Y', 'X', 'Y', 'X', 'Y']
df = pd.DataFrame(data)
grouped_df = df.groupby(['Category', 'Subcategory'])

To count unique values across multiple columns, use:

unique_counts = grouped_df[['Values', 'Subcategory']].nunique()
print(unique_counts)

This yields:

                     Values  Subcategory
Category Subcategory                  
A        X              1          1
         Y              1          1
B        X              1          1
C        Y              1          1
         X              2          1

Notice how the count of unique ‘Values’ and ‘Subcategory’ changes depending on the group.

Advanced Example: Using `agg` with Custom Functions

For even more control over how unique values are counted, you can use the agg method with custom functions. This is especially useful when you want to apply different aggregations for different columns or process the aggregated results further.

def count_unique(series):
    return len(series.unique())

unique_counts = grouped_df.agg({
    'Values': [count_unique, 'nunique'],
    'Subcategory': 'nunique'
})
print(unique_counts)

The output will display the result of both the custom count_unique function and the built-in nunique method for comparison.

Grouping and Counting Unique Values with Complex Conditions

For scenarios with complex grouping criteria or where the data needs to be filtered before counting, Pandas offers robust tools to apply conditions and filters within your groupby operations.

custom_group = df.groupby('Category').filter(lambda x: x['Values'].nunique() > 1)
print(custom_group.groupby('Category')['Values'].nunique())

This will filter groups with more than one unique value before counting unique values, offering a focused analysis on groups of interest.

Visualizing the Counts of Unique Values

In addition to numerical analysis, visual representations can provide intuitive insights into the distribution of unique counts across groups. Utilizing libraries like Matplotlib or Seaborn, we can easily plot the counts of unique values for a more comprehensive understanding.

import matplotlib.pyplot as plt
import seaborn as sns

# Plotting unique counts
counts = df.groupby('Category')['Values'].nunique().reset_index()
sns.barplot(x='Category', y='Values', data=counts)
plt.show()

Conclusion

Counting unique values within groups in a Pandas DataFrame is a powerful tool for data analysis, providing insights into the diversity and variance of data subsets. As we’ve explored from basic to advanced examples, there are multiple ways to approach this operation, each offering unique perspectives on the data. With the powerful grouping and aggregation capabilities of Pandas, detailed and complex data analysis tasks can be performed efficiently and effectively.

Next Article: Pandas: How to import a CSV file into a DataFrame

Previous Article: Pandas DataFrame: Finding min/max value in each group

Series: DateFrames in Pandas

Pandas