+1 vote
in Operating Systems by (72.7k points)

I have a tsv file with the following columns "rec_id  true_label  ml_label  predicted_prob  calibrated_prob positive_count". I want to sort the file in descending order by the column predicted_prob. Columns are separated by tab. How to use the "sort" command for it?

1 Answer

+2 votes
by (77.2k points)
selected by
 
Best answer

It is very straightforward to sort a TSV file based on a given column. From the list of columns you gave, predicted_prob is the 4th column, so you can use the following command to sort the file by the fourth column.

sort -t$'\t' -k4,4gr data.tsv -o sorted_data.tsv

Here, data.tsv is the input file and sorted_data.tsv is the output file.

-t$'\t': Specifies the tab character (\t) as the delimiter.

-k4,4: Specifies that sorting should be based on the 4th column (predicted_prob).

g: Sorts numerically (treats the values as floating-point numbers).

r: Sorts in reverse order (descending).


...