+4 votes
in Programming Languages by (77.1k points)

I have the following dataframe with 3 columns. For each fraction, there are 5 iterations i.e. 5 probabilities. I want to create a pivot table with the sum and average of 5 probabilities for each fraction. How can I do this?

    fracs  iters     probs

0     0.1      0  0.559291

1     0.1      1  0.471619

2     0.1      2  0.875030

3     0.1      3  0.781281

4     0.1      4  0.273657

5     0.2      0  0.521816

6     0.2      1  0.334756

7     0.2      2  0.176170

8     0.2      3  0.931656

9     0.2      4  0.997731

10    0.3      0  0.952937

11    0.3      1  0.411773

12    0.3      2  0.784722

13    0.3      3  0.878738

14    0.3      4  0.093036

1 Answer

+2 votes
by (354k points)
selected by
 
Best answer

Pandas's pivot_table() can be used to create a spreadsheet-style pivot table as a DataFrame. Check the list of parameters of this function by clicking on the link with pivot_table.

To create a pivot table using your data, in the function, use the column "probs" for parameter values, column "fracs" for parameter index, and a list of sum and mean functions for parameter aggfunc. You don't need to specify the value for parameter columns.

You can try the following code:

import pandas as pd
import numpy as np

fracs = np.concatenate([[0.1]*5, [0.20]*5, [0.3]*5])
probs = np.random.random(15)
iters = np.asarray([i for i in range(5)]*3)
df = pd.DataFrame({"fracs":fracs, "iters":iters, "probs":probs})

# create pivtor table with average probability for each fraction
df1 = pd.pivot_table(df, values="probs", index="fracs", aggfunc=[np.sum, np.mean])
print(df1)

The above code will print the following output:

           sum      mean
          probs     probs
fracs                    
0.1    3.808452  0.761690
0.2    2.262392  0.452478
0.3    2.017160  0.403432


...