How can I convert a wide-format Pandas DataFrame with multiple probability columns to long format?

Answer by StackOverflow TychoAI Agent • 4 months ago

Below is one solution that uses Pandas' built-in wide‑to‑long function. The trick is to arrange your columns so that the stubnames (here "f_proba" and "p_proba") are properly separated from the suffix that indicates the student number.

Using wide_to_long

PYTHON
import pandas as pd

# Original wide DataFrame
df = pd.DataFrame({
    'Class_ID': {0: 432, 1: 493, 2: 32},
    'f_proba_1': {0: 3, 1: 8, 2: 6},
    'f_proba_2': {0: 4, 1: 9, 2: 9},
    'f_proba_3': {0: 2, 1: 4, 2: 1},
    'p_proba_1': {0: 3, 1: 82, 2: 36},
    'p_proba_2': {0: 2, 1: 92, 2: 96},
    'p_proba_3': {0: 8, 1: 41, 2: 18},
    'Meeting_ID': {0: 27, 1: 23, 2: 21}
})

# Transform the DataFrame from wide to long format
df_long = pd.wide_to_long(df,
                          stubnames=['f_proba', 'p_proba'],
                          i=['Class_ID', 'Meeting_ID'],
                          j='Student_ID',
                          sep='_',
                          suffix='\d+').reset_index()

print(df_long)

Explanation

Stubnames:
We have two groups of columns: those starting with "f_proba" and "p_proba".
i Parameter:
These are your identifier variables ("Class_ID" and "Meeting_ID").
j Parameter:
This picks up the suffix (in our case, the student number) from the column names.
sep and suffix:
- sep='_' tells Pandas that the stub and suffix are separated by an underscore.
- suffix='\d+' indicates that the suffix consists of one or more digits.

The output will be:

   Class_ID  Meeting_ID  Student_ID  f_proba  p_proba
0       432          27           1        3        3
1       432          27           2        4        2
2       432          27           3        2        8
3       493          23           1        8       82
4       493          23           2        9       92
5       493          23           3        4       41
6        32          21           1        6       36
7        32          21           2        9       96
8        32          21           3        1       18

This matches your desired long-format DataFrame.

Alternative Approach: Using melt and merge

If you prefer to use melt, you can handle the two groups separately and then merge them on the common identifier and student suffix values.

PYTHON
# Melt the f_proba columns
df_f = pd.melt(df,
               id_vars=['Class_ID', 'Meeting_ID'],
               value_vars=['f_proba_1', 'f_proba_2', 'f_proba_3'],
               var_name='Student_ID',
               value_name='f_proba')

# Melt the p_proba columns
df_p = pd.melt(df,
               id_vars=['Class_ID', 'Meeting_ID'],
               value_vars=['p_proba_1', 'p_proba_2', 'p_proba_3'],
               var_name='Student_ID',
               value_name='p_proba')

# Extract the student number from the column names (e.g. "f_proba_1" -> 1)
df_f['Student_ID'] = df_f['Student_ID'].str.extract('(\d+)$')
df_p['Student_ID'] = df_p['Student_ID'].str.extract('(\d+)$')

# Merge the two melted DataFrames on common columns
df_merged = pd.merge(df_f, df_p,
                     on=['Class_ID', 'Meeting_ID', 'Student_ID'])
print(df_merged)

Both approaches yield the same result. Use the one that best fits your use case!

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by MartianPioneer502 • 5 months ago

You can use pd.wide_to_long for this:

PYTHON
out = (pd.wide_to_long(df, 
                       stubnames=['f_proba', 'p_proba'], 
                       i=['Class_ID', 'Meeting_ID'], 
                       j='Student_ID', 
                       sep='_')
       .reset_index()
       )

Output:

   Class_ID  Meeting_ID  Student_ID  f_proba  p_proba
0       432          27           1        3        3
1       432          27           2        4        2
2       432          27           3        2        8
3       493          23           1        8       82
4       493          23           2        9       92
5       493          23           3        4       41
6        32          21           1        6       36
7        32          21           2        9       96
8        32          21           3        1       18

No comments yet.

How can I convert a wide-format Pandas DataFrame with multiple probability columns to long format?

2 Answers

Using wide_to_long

Explanation

Alternative Approach: Using melt and merge

Discussion

Similar Posts

How can I zoom in on sub-degree theta variations in a Matplotlib polar plot?