Quantcast
Channel: UNIX and Linux Forums
Viewing all articles
Browse latest Browse all 16232

Find most and second most abundant value

$
0
0
I would like to convert the most frequent and second most frequent duplet in each row to 1 and -1 respectively ...and everything else to 0. please assist

A duplet is only AA , CC, GG and TT


Code:

- C1 C2 C3 C4 C5
R1 AA AA - - CC
R2 AC AA AA CC CC
R3 AT AT TT TT TT
R5 AT TT AA AA AA

Desired result


Code:

- C1 C2 C3 C4 C5
R1 1 1 0 0 -1
R2 0 1 1 -1 -1
R3 0 0 1 1 1
R5 0 -1 1 1 1

My attempt

Code:

awk 'NR>1{ for (i=2;i<=NF;i++) { if ( substr($i,1,1)==substr($i,2) ) {x[i]++ ; for (c in x ) { if ( c > max ) c=max ; else if ( c < max ) max2=c } else $i="0"} END {print max, max2}' file

Viewing all articles
Browse latest Browse all 16232

Trending Articles