I would like to convert the most frequent and second most frequent duplet in each row to 1 and -1 respectively ...and everything else to 0. please assist
A duplet is only AA , CC, GG and TT
Desired result
My attempt
A duplet is only AA , CC, GG and TT
Code:
- C1 C2 C3 C4 C5
R1 AA AA - - CC
R2 AC AA AA CC CC
R3 AT AT TT TT TT
R5 AT TT AA AA AA
Code:
- C1 C2 C3 C4 C5
R1 1 1 0 0 -1
R2 0 1 1 -1 -1
R3 0 0 1 1 1
R5 0 -1 1 1 1
Code:
awk 'NR>1{ for (i=2;i<=NF;i++) { if ( substr($i,1,1)==substr($i,2) ) {x[i]++ ; for (c in x ) { if ( c > max ) c=max ; else if ( c < max ) max2=c } else $i="0"} END {print max, max2}' file