Quantcast
Channel: UNIX and Linux Forums
Viewing all articles
Browse latest Browse all 16232

Average within reps reformat according to second file

$
0
0
Please help me in the following,

I have a matrix file

Code:

S2 S1 S3 S4 S5
G1 11 12 13 14 15
G2 21 22 23 24 25
G3 31 32 33 34 35
G4 41 42 43 44 45

a datafile

Code:

Sample Loc Rep T1 T2 T3 RC1 RC2 RC3
S1 L1 1 1.5 NULL 45 R F T
S1 L1 2 2.5 2 NULL 35 F G
S1 L2 1 4 3 NULL F T R
S2 L1 1 56 45 24 F G Y
S2 L2 1 10 5 NULL G F Y
S2 L2 2 20 NULL 34 F G T
S3 L1 1 3.4 NULL 32 F T Y
S3 L2 1 4.6 3 21 D D R

and a query file

Code:

T1
T2
T3
T4

I would like to

1. Find average of Rep only for the query columns , group by Sample and Loc , the column order is not fixed, Sample is col1 in the example but maybe col5 in the data. So they should be taken dynamically as keywords like 'Sample' from the header column in datafile. Missing data is indicated by NULL.
If some entries like T4 in the query file are not present in the data, that column name can be ignored.


2. Output both the matrix and the datafiles as separate files so that they have the same common samples , and arranged in the same sequence.

Output

Code:

  S2 S1 S3
G1 11 12 13
G2 21 22 23
G3 31 32 33
G4 41 42 43

Output2

Code:

   
  S2 S1 S3
T1_L1 56 2 3.4         
T1_L2 15 4 4.6
T2_L1 45 2 NULL
T2_L2  5 3 3
T3_L1 24 45 32
T3_L2 34 NULL 21

Please not that columns are in same order in both files of output. Order doesn't matter as long as they are in sync in the same sequence like {s2,s1,s3} in both, {s1,s2,s3} is also acceptable.

I tried this

Code:

awk ' FILENAME=="QUERY.TXT" { cols[$0];next }
      FILENAME=="data.txt"  && NR==1 {
        for(i=1; i<=NF; i++)
        {
        if ($i=="Sample")
        s[1]=i
        if ($i=="Loc")
        s[2]=i
        if ($i=="Rep")
        s[3]=i
        }
        next;
  }
  {
   
    for(i=1; i<=NF; i++)
      if(($i in cols))
      for(j=1; j<=p; j++) {
              st=a[j]
              for(i in s){
                  st=st" "s[j];
              }
        print st }
    if  !($i in cols)
        delete s["i"];
    }
'  QUERY.TXT data.txt data.txt mat.txt


Viewing all articles
Browse latest Browse all 16232

Trending Articles