Hi,
I need help on a complicated file that I am working on. I wanted to extract important info from a very huge file. It is space delimited file. I have hundred thousands of records in this file. An example content of the inputfile as below:-
All the records in this file are separated by ##. What I need is an output that only shows the needed info based on matched patterns subchannel or subchannel host in KAW line. In the example input, only the first records has this patterns. Then, the output should be like below:-
As shown above, for line starts with COM, I just want the one with -@-Full Comment and another COM line following it, if any (bold in blue color). I also need to print line DOR followed by TDP only (bold in red color). While, In the last line, there should be a new line created named as TT and the value following it is the total number of the occurrences of pattern FEA SUBCHAN.
I don't have any idea how to print only selected lines there. I used below codes to find the key pattern. But it will only print all the lines for the matched records. I just need selected lines as shown in the sample output above.
would appreciate your kind help. Thanks.
I need help on a complicated file that I am working on. I wanted to extract important info from a very huge file. It is space delimited file. I have hundred thousands of records in this file. An example content of the inputfile as below:-
Code:
##
ID Ser402 Old; 23 mins .
ACC P669GM;
DAT MAY-2014, the old episode.
TOS Japanes Anime. one piece
TMA Pirates; animation; cartoon.
POT DownloadID=5445;
HEW StreamID=792; watchop (eu).
HEW AnotherOnlineID=823; narutowire (same).
COM -@- Simple Comment: Ace died and Luffy is miserable.
COM None of his nakama was with him {SOV:000250}.
COM -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM streaming {SOV:000305}.
COM -@- Another Comment: Belongs to the same server.
COM {SOV:000305}.
COM -----------------------------------------------------------------------
COM Can be watched online, see http://www.watchop.eu
DOR Data; packet; -; Unknown; Anime.
DOR TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.
PPE Torrent unapplicable;
KAW Complete episode; Early release; Host channel;
KAW Repeat; subchannel; subchannel host.
FEA link 1 20 unavailable
FEA /F3184.
FEA TOP_CHAN 1 1 unavailable (will be determined).
FEA SUBCHAN 2 18 at 9 (confirmed!).
FEA TOP_CHAN 19 117 unavailable (No info).
FEA SUBCHAN 118 138 at 10 (confirmed!).
FEA TOP_CHAN 139 145 unavailable (will be determined).
FEA SUBCHAN 146 166 at 12 (confirmed!).
FEA TOP_CHAN 167 269 unavailable (the source is unknown).
FEA REP 1 146 A.
FEA CAD 75 75 by host.
FEA {undetermined}.
SYN synopsis for this episode is unavailable.
##
ID MOV10 NewMov; 90 mins.
ACC PPDFB1;
TOS Japanes Anime. Naruto shippuden
TMA Ninja; shinobi, konoha; hokage; Pain.
CC Distributed under the Creative License
CC -----------------------------------------------------------------------
DOR Data; packet; -; Unknown; Anime movie.
DOR movie; new movie; 90 mins only
DOR MOVID; 299; -.
DOR MOV3D; -; 1.
PPE 10; torrent
KAW new movie; Complete movie.
FEA Null 1 683 Unknown
FEA /F82.
FEA mov 62 124 (SOV:005).
FEA mov 155 259 (SOV:005).
FEA mov 346 376 (SOV:025).
SYN In this episode, Dresrossa has been surrounded by a cage known as birdcage by doflamingo.
Luffy is moving towards the palace to defeat Doflamingo.
##
Code:
##
ID Ser402
ACC P669GM
TOS Japanes Anime. one piece
TMA Pirates; animation; cartoon.
COM -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM streaming {SOV:000305}.
DOR TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.
KAW Complete episode; Early release; Host channel;
KAW Repeat; subchannel; subchannel host.
FEA link 1 20 unavailable
FEA /F3184.
FEA TOP_CHAN 1 1 unavailable (will be determined).
FEA SUBCHAN 2 18 at 9 (confirmed!).
FEA TOP_CHAN 19 117 unavailable (No info).
FEA SUBCHAN 118 138 at 10 (confirmed!).
FEA TOP_CHAN 139 145 unavailable (will be determined).
FEA SUBCHAN 146 166 at 12 (confirmed!).
FEA TOP_CHAN 167 269 unavailable (the source is unknown).
FEA REP 1 146 A.
FEA CAD 75 75 by host.
FEA {undetermined}.
TT 3
##
I don't have any idea how to print only selected lines there. I used below codes to find the key pattern. But it will only print all the lines for the matched records. I just need selected lines as shown in the sample output above.
Code:
awk '/##/{if(l)print s;l=0;s=$0;next}/subchannel/{l=1}{s=s RS $0}END{if(l)print s}' inputfile