Quantcast
Channel: UNIX and Linux Forums
Viewing all articles
Browse latest Browse all 16232

Find key pattern and print selected lines for each record

$
0
0
Hi,

I need help on a complicated file that I am working on. I wanted to extract important info from a very huge file. It is space delimited file. I have hundred thousands of records in this file. An example content of the inputfile as below:-

Code:

##
ID    Ser402            Old;        23 mins .
ACC  P669GM;
DAT  MAY-2014, the old episode.
TOS  Japanes Anime. one piece
TMA  Pirates; animation; cartoon.
POT  DownloadID=5445;
HEW  StreamID=792; watchop (eu).
HEW  AnotherOnlineID=823; narutowire (same).
COM  -@- Simple Comment: Ace died and Luffy is miserable.
COM      None of his nakama was with him {SOV:000250}.
COM  -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM      streaming {SOV:000305}.
COM  -@- Another Comment: Belongs to the same server.
COM      {SOV:000305}.
COM  -----------------------------------------------------------------------
COM  Can be watched online, see http://www.watchop.eu
DOR  Data; packet; -; Unknown; Anime.
DOR  TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR  TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.
PPE  Torrent unapplicable;
KAW  Complete episode; Early release; Host channel;
KAW  Repeat; subchannel; subchannel host.
FEA  link          1    20        unavailable
FEA                                /F3184.
FEA  TOP_CHAN      1      1      unavailable (will be determined).
FEA  SUBCHAN      2      18      at 9 (confirmed!).
FEA  TOP_CHAN      19    117    unavailable (No info).
FEA  SUBCHAN      118    138    at 10 (confirmed!).
FEA  TOP_CHAN      139    145    unavailable (will be determined).
FEA  SUBCHAN      146    166    at 12 (confirmed!).
FEA  TOP_CHAN      167    269    unavailable (the source is unknown).
FEA  REP          1      146    A.
FEA  CAD          75    75      by host.
FEA                                {undetermined}.
SYN  synopsis for this episode is unavailable.
##
ID    MOV10              NewMov;        90 mins.
ACC  PPDFB1;
TOS  Japanes Anime. Naruto shippuden
TMA  Ninja; shinobi, konoha; hokage; Pain.
CC    Distributed under the Creative License
CC  -----------------------------------------------------------------------
DOR  Data; packet; -; Unknown; Anime movie.
DOR  movie; new movie; 90 mins only
DOR  MOVID; 299; -.
DOR  MOV3D; -; 1.
PPE  10; torrent
KAW  new movie; Complete movie.
FEA  Null        1    683        Unknown
FEA                                /F82.
FEA  mov      62    124      (SOV:005).
FEA  mov      155    259      (SOV:005).
FEA  mov      346    376      (SOV:025).
SYN  In this episode, Dresrossa has been surrounded by a cage known as birdcage by doflamingo.
      Luffy is moving towards the palace to defeat Doflamingo.
##

All the records in this file are separated by “##”. What I need is an output that only shows the needed info based on matched patterns “ subchannel or subchannel host” in KAW line. In the example input, only the first records has this patterns. Then, the output should be like below:-

Code:

##
ID      Ser402
ACC          P669GM
TOS    Japanes Anime. one piece
TMA    Pirates; animation; cartoon.
COM    -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM      streaming {SOV:000305}.

DOR    TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR    TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.

KAW    Complete episode; Early release; Host channel;
KAW    Repeat; subchannel; subchannel host.
FEA      link          1    20        unavailable
FEA                                /F3184.
FEA      TOP_CHAN    1      1      unavailable (will be determined).
FEA      SUBCHAN      2      18      at 9 (confirmed!).
FEA      TOP_CHAN    19    117    unavailable (No info).
FEA      SUBCHAN      118    138    at 10 (confirmed!).
FEA      TOP_CHAN      139    145    unavailable (will be determined).
FEA      SUBCHAN        146    166    at 12 (confirmed!).
FEA      TOP_CHAN      167    269    unavailable (the source is unknown).
FEA      REP                    1      146    A.
FEA      CAD                  75    75      by host.
FEA                                                  {undetermined}.
TT        3
##

As shown above, for line starts with COM, I just want the one with -@-Full Comment and another COM line following it, if any (bold in blue color). I also need to print line DOR followed by TDP only (bold in red color). While, In the last line, there should be a new line created named as “TT” and the value following it is the total number of the occurrences of pattern “FEA SUBCHAN”.

I don't have any idea how to print only selected lines there. I used below codes to find the key pattern. But it will only print all the lines for the matched records. I just need selected lines as shown in the sample output above.

Code:

awk '/##/{if(l)print s;l=0;s=$0;next}/subchannel/{l=1}{s=s RS $0}END{if(l)print s}' inputfile
would appreciate your kind help. Thanks.

Viewing all articles
Browse latest Browse all 16232

Trending Articles