How can I reshape a DataFrame with implicit multi-level headers to long format in pandas?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I have a wide DataFrame with implicit multi-level headers, where the first row represents years and the second row specifies categories. For example, my input DataFrame looks like this:

PYTHON
| 2022 | 2022 | 2021 | 2021
class |   A  |   B  |   A  |   B
-----------|------|------|------|------
X       |   1  |   2  |   3  |   4
Y       |   5  |   6  |   7  |   8

I want to transform it into the following long format, with separate columns for year, category, class, and value:

PYTHON
year   | category | class | value
----------------------------------
2022   |    A     |   X   |   1
2022   |    A     |   Y   |   5
2022   |    B     |   X   |   2
2022   |    B     |   Y   |   6
2021   |    A     |   X   |   3
2021   |    A     |   Y   |   7
2021   |    B     |   X   |   4
2021   |    B     |   Y   |   8

I have attempted various combinations of pd.melt without success. How can I perform this transformation using pandas?

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 5 months ago

Below is one way to accomplish this transformation using a stacked MultiIndex DataFrame. Assume you already have a DataFrame with a MultiIndex on the columns (where the first level is year and the second is category). One solution is:

PYTHON
import pandas as pd

# Create the DataFrame with a MultiIndex for the columns.
data = {
    (2022, "A"): [1, 5],
    (2022, "B"): [2, 6],
    (2021, "A"): [3, 7],
    (2021, "B"): [4, 8],
}
df = pd.DataFrame(data, index=["X", "Y"])
df.columns = pd.MultiIndex.from_tuples(df.columns)

# Stack the DataFrame on the multi-level columns.
df_stacked = df.stack([0, 1]).reset_index()
df_stacked.columns = ["class", "year", "category", "value"]

# (Optional) Sort if needed.
df_stacked = df_stacked.sort_values(["year", "category", "class"])

print(df_stacked)

Explanation:

We first create the DataFrame with a MultiIndex for the columns, where level 0 represents the years (2022, 2021) and level 1 represents the categories (A, B).
Using df.stack([0, 1]) converts the two levels of column headers into row values. The resulting DataFrame then has a hierarchical index that we reset to get a flat DataFrame.
We rename the columns to match the desired output: "year", "category", "class", and "value".

This will produce the following transformed DataFrame:

  class  year category  value
0     X  2022        A      1
1     Y  2022        A      5
2     X  2022        B      2
3     Y  2022        B      6
4     X  2021        A      3
5     Y  2021        A      7
6     X  2021        B      4
7     Y  2021        B      8

This is equivalent to the desired form.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by ZenithWanderer819 • 5 months ago

You could melt with ignore_index=False and rename_axis/rename:

PYTHON
out = (df.rename_axis(columns=['year', 'category'])
         .melt(ignore_index=False)
         .reset_index()
      )

Or:

PYTHON
out = (df.melt(ignore_index=False)
         .rename(columns={'variable_0': 'year',
                          'variable_1': 'category'})
         .reset_index()
      )

Output:

TEXT
  class  year category  value
0     X  2022        A      1
1     Y  2022        A      5
2     X  2022        B      2
3     Y  2022        B      6
4     X  2021        A      3
5     Y  2021        A      7
6     X  2021        B      4
7     Y  2021        B      8

Reproducible input:

PYTHON
df = pd.DataFrame.from_dict({'index': ['X', 'Y'],
                             'columns': [('2022', 'A'), ('2022', 'B'), ('2021', 'A'), ('2021', 'B')],
                             'data': [[1, 2, 3, 4], [5, 6, 7, 8]],
                             'index_names': ['class'],
                             'column_names': [None, None]},
                            orient='tight')

No comments yet.

Discussion

No comments yet.

How can I reshape a DataFrame with implicit multi-level headers to long format in pandas?

2 Answers

Discussion

Similar Posts

How can I efficiently count distinct chronological (a, b, b) triplets in a large array using Python?

How can I dynamically generate Airflow DAG tasks and dependencies using dag_run.conf?

How can I correctly define a dict argument for a LangChain tool function?