Additing gtf file












0














I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?



    ##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp









share|improve this question
























  • What have you already tried?
    – Didier Trosset
    Nov 19 '18 at 13:46






  • 1




    What is the expected output?
    – zx8754
    Nov 19 '18 at 14:30










  • Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
    – Jay jargot
    Nov 19 '18 at 14:35










  • This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
    – Sergei
    Nov 19 '18 at 15:05










  • Actually I have managed to do this but maybe there is better solution using only awk?
    – Sergei
    Nov 19 '18 at 15:05
















0














I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?



    ##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp









share|improve this question
























  • What have you already tried?
    – Didier Trosset
    Nov 19 '18 at 13:46






  • 1




    What is the expected output?
    – zx8754
    Nov 19 '18 at 14:30










  • Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
    – Jay jargot
    Nov 19 '18 at 14:35










  • This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
    – Sergei
    Nov 19 '18 at 15:05










  • Actually I have managed to do this but maybe there is better solution using only awk?
    – Sergei
    Nov 19 '18 at 15:05














0












0








0







I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?



    ##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp









share|improve this question















I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?



    ##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp






regex bash awk sed bioinformatics






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 14:29









zx8754

29.1k76398




29.1k76398










asked Nov 19 '18 at 13:25









Sergei

31




31












  • What have you already tried?
    – Didier Trosset
    Nov 19 '18 at 13:46






  • 1




    What is the expected output?
    – zx8754
    Nov 19 '18 at 14:30










  • Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
    – Jay jargot
    Nov 19 '18 at 14:35










  • This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
    – Sergei
    Nov 19 '18 at 15:05










  • Actually I have managed to do this but maybe there is better solution using only awk?
    – Sergei
    Nov 19 '18 at 15:05


















  • What have you already tried?
    – Didier Trosset
    Nov 19 '18 at 13:46






  • 1




    What is the expected output?
    – zx8754
    Nov 19 '18 at 14:30










  • Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
    – Jay jargot
    Nov 19 '18 at 14:35










  • This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
    – Sergei
    Nov 19 '18 at 15:05










  • Actually I have managed to do this but maybe there is better solution using only awk?
    – Sergei
    Nov 19 '18 at 15:05
















What have you already tried?
– Didier Trosset
Nov 19 '18 at 13:46




What have you already tried?
– Didier Trosset
Nov 19 '18 at 13:46




1




1




What is the expected output?
– zx8754
Nov 19 '18 at 14:30




What is the expected output?
– zx8754
Nov 19 '18 at 14:30












Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
– Jay jargot
Nov 19 '18 at 14:35




Which lines in the example do describe a ENSEMBLE non-chromosomal pseudogene? and why (what are the related strings) ?
– Jay jargot
Nov 19 '18 at 14:35












This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 '18 at 15:05




This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 '18 at 15:05












Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 '18 at 15:05




Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 '18 at 15:05












1 Answer
1






active

oldest

votes


















1














If you are using Awk anyway, you don't need grep at all.



Also, less crucially, modifying $0 is mildly wasteful. print lets you specify precisely what you want to print.



awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375630%2fadditing-gtf-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    If you are using Awk anyway, you don't need grep at all.



    Also, less crucially, modifying $0 is mildly wasteful. print lets you specify precisely what you want to print.



    awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
    print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf





    share|improve this answer


























      1














      If you are using Awk anyway, you don't need grep at all.



      Also, less crucially, modifying $0 is mildly wasteful. print lets you specify precisely what you want to print.



      awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
      print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf





      share|improve this answer
























        1












        1








        1






        If you are using Awk anyway, you don't need grep at all.



        Also, less crucially, modifying $0 is mildly wasteful. print lets you specify precisely what you want to print.



        awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
        print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf





        share|improve this answer












        If you are using Awk anyway, you don't need grep at all.



        Also, less crucially, modifying $0 is mildly wasteful. print lets you specify precisely what you want to print.



        awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
        print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 '18 at 15:36









        tripleee

        88.6k12124179




        88.6k12124179






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375630%2fadditing-gtf-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith