Asked 1 month ago by VoidRover634
Extract file names (third field) from each tab-delimited column header using Bash or AWK
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by VoidRover634
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I have a file with a header row where each column contains a file path. My goal is to extract and print just the file name (the third delimited value) for each column. There are over 100 columns.
E.g. Input header row:
AAF2Y7VM5-8/cnv/F04_reads.tsv AAF2Y7VM5-7/cnv/D04_reads.tsv AAF2Y7VM5-6/cnv/E04_reads.tsv
Desired output header row:
F04_reads.tsv D04_reads.tsv E04_reads.tsv
I tried the following command:
BASHawk -F '[/|\t]' '{if (NR==1) {for(i=1;i<=NF;i++) printf $i"\t"}}' ZScores.txt
This produces all delimited values for every column, but I only need the third value (the file name) from each entry. I would appreciate any solutions in awk, bash, or sed that focus on extracting that specific field.
Comments 0
•Answers 8
•Views 432
Answer by TychoAI Agent • 1 month ago
Below is one way to do this using awk. This command processes only the header row (NR==1), splits each column on “/” to extract the file name (the last element), and prints them out as tab‐separated values:
BASHawk 'NR==1 { for(i = 1; i <= NF; i++) { n = split($i, parts, "/") printf "%s%s", parts[n], (i < NF ? "\t" : "\n") } }' ZScores.txt
parts
using "/" as the delimiter. The function returns the number of pieces in n
.parts[n]
, which is the file name) followed by a tab (or newline for the last column).This should give you an output header row like:
F04_reads.tsv D04_reads.tsv E04_reads.tsv
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by CometCommander391 • 1 month ago
I would exploit GNU AWK for this task following way. Let file.txt
content be TAB-sheared file with following content:
AAF2Y7VM5-8/cnv/F04_reads.tsv AAF2Y7VM5-7/cnv/D04_reads.tsv AAF2Y7VM5-6/cnv/E04_reads.tsv
something something something
something something something
Then
AWKawk 'BEGIN{FS="/";RS="[\t\n]";ORS="\t"}{print $3}RT=="\n"{exit}' file.txt
gives output
F04_reads.tsv D04_reads.tsv E04_reads.tsv
Explanation: I inform GNU AWK that record are separated by TAB or newline character and fields are separated by /
and print
value should be suffixed with \t
, rather than newline. I instruct GNU AWK to print
3rd field and if row terminator (RT
) is newline I instruct GNU AWK to stop (exit
). Output will have trailing TAB and not newline, which is consistent with your original code.
(tested in GNU Awk 5.3.1)
No comments yet.
Answer by CometTracker638 • 1 month ago
a non-awk solution
BASH$ sed 1q file | tr -s ' ' \n | cut -d/ -f3 | paste -sd' '
extract first row, transpose to column, cut the 3rd field, serialize back to a row
No comments yet.
Answer by QuasarCommander868 • 1 month ago
To just extract first line:
Bash (replace tabs):
BASH( IFS=$'\t' read -ra cols <file; echo "${cols[@]##*/}" )
Bash (retain tabs):
BASH( shopt -s extglob IFS= read -r cols echo "${cols//+([!$'\t'])\/}" ) <file
Sed (replace tabs):
SEDsed -E 's|[^ ]+/||g; y|\t| |; q' file
Sed (retain tabs):
SEDsed -E 's|[^ ]+/||g; q' file
If the intention is to also retain the whole file as tsv:
Bash: append cat
after echo
in the "retain tabs" version:
BASH( shopt -s extglob IFS= read -r cols echo "${cols//+([!$'\t'])\/}" cat ) <file
Sed: prefix s
command with 1
and elide the q
from "retain tabs" version:
SEDsed -E '1s|[^ ]+/||g' file
No comments yet.
Answer by AsteroidDiscoverer380 • 1 month ago
KISS:
BASH$ echo $(head -n1 file | tr ' ' '\n' | cut -d/ -f3) F04_reads.tsv D04_reads.tsv E04_reads.tsv
or
BASH$ echo $(head -n1 file | tr ' ' '\n' | awk -F/ 'NF{printf "%s " ,$3}') F04_reads.tsv D04_reads.tsv E04_reads.tsv
No comments yet.
Answer by CosmicPioneer164 • 1 month ago
Tweaking OP's current code to print every 3rd field:
BASH$ awk -F '[/|\t]' '{if (NR==1) {for(i=3;i<=NF;i+=3) printf $i"\t"}}' ZScores.txt F04_reads.tsv D04_reads.tsv E04_reads.tsv
NOTE: there's a trailing \t
on that output; also, the line does not end with a \n
Removing the trailing \t
, adding a trailing \n
, and skipping processing of rest of file:
BASH$ awk -F '[/|\t]' 'NR==1 { for (i=3;i<=NF;i+=3) { printf "%s%s", sep, $i; sep="\t" }; print ""; exit }' ZScores.txt F04_reads.tsv D04_reads.tsv E04_reads.tsv
Where:
sep
is blank for first pass through loop, then set to \t
for remaining passes through the loopprint ""
- terminate the printf
line of output with a \n
(default output record separator)exit
- to keep from reading (and in this case ignoring) rest of fileNOTE: OP's code places a tab (\t
) between output values but the expected output shows a single space between values; if OP wishes to separate the output with single spaces then replace sep="\t"
with sep=" "
No comments yet.
Answer by EclipseAstronaut558 • 1 month ago
Using any awk if your fields are tab-separated as they appear to be:
BASH$ awk 'NR==1{gsub("[^ ]+/","")} 1' file F04_reads.tsv D04_reads.tsv E04_reads.tsv
Otherwise, using any POSIX awk:
BASH$ awk 'NR==1{gsub("[^[:space:]]+/,"")} 1' file F04_reads.tsv D04_reads.tsv E04_reads.tsv
Change [^[:space:]]
to [^ \t]
if you don't have a POSIX awk but - get a new awk.
The above assumes your fields cannot contain the space characters that separate your fields. If they can then you need to edit your question to tell us how to identify spaces within fields from spaces between fields.
No comments yet.
Answer by MartianSurveyor810 • 1 month ago
1st solution: With your shown samples please try following.
AWK{ while(match($0,/(\/[^\/]*\/)([^.]*\.tsv)/,arr)){ val=(val?val OFS:"") arr[2] $0=substr($0,RSTART+RLENGTH) } $0=val } 1 ' Input_file
2nd solution: if ok with perl onliner solution
PERL-nle 'print join(" ", /([^\/]+_reads\.tsv)/g)' Input_file
No comments yet.
No comments yet.