给定一个文件 file.txt,转置它的内容。

你可以假设每行列数相同,并且每个字段由 ' ' 分隔。

示例:

假设file.txt有如下内容:

name age
alice 21
ryan 30

你的脚本应该显示第10行:

name alice ryan
age 21 30

有请Linux三剑客(awk,sed,grep)之一:

awk命令:

awk是一个报告生成器,它拥有强大的文本格式化的能力。它是由Alfred Aho 、Peter Weinberger 和 Brian Kernighan这三个人创造的,awk由这个三个人的姓氏的首个字母组成。

NAME
       gawk - pattern scanning and processing language

SYNOPSIS
       gawk  [ POSIX or GNU style options ] -f program-file [ -- ]
       file ...
       gawk [ POSIX or GNU style options ]  [  --  ]  program-text
       file ...

DESCRIPTION

   Built-in Variables
       Gawk's built-in variables are:

       ARGC        The  number of command line arguments (does not
                   include  options  to  gawk,  or   the   program
                   source).

       ARGIND      The  index  in  ARGV  of the current file being
                   processed.
       ARGV        Array of command line arguments.  The array  is
                   indexed from 0 to ARGC - 1.  Dynamically chang‐
                   ing the contents of ARGV can control the  files
                   used for data.

       BINMODE     On non-POSIX systems, specifies use of “binary”
                   mode for all file I/O.  Numeric values of 1, 2,
                   or  3,  specify that input files, output files,
                   or all files, respectively, should  use  binary
                   I/O.  String values of "r", or "w" specify that
                   input files,  or  output  files,  respectively,
                   should  use  binary I/O.  String values of "rw"
                   or "wr"  specify  that  all  files  should  use
                   binary  I/O.  Any other string value is treated
                   as "rw", but generates a warning message.

       CONVFMT     The conversion format for numbers,  "%.6g",  by
                   default.

       ENVIRON     An  array  containing the values of the current
                   environment.  The array is indexed by the envi‐
                   ronment variables, each element being the value
                   of that variable (e.g.,  ENVIRON["HOME"]  might
                   be  "/home/arnold").   Changing this array does
                   not affect the  environment  seen  by  programs
                   which  gawk  spawns via redirection or the sys‐
                   tem() function.

       ERRNO       If a system error occurs either doing  a  redi‐
                   rection for getline, during a read for getline,
                   or during a close(), then ERRNO will contain  a
                   string describing the error.  The value is sub‐
                   ject to translation in non-English locales.

       FIELDWIDTHS A whitespace separated list  of  field  widths.
                   When  set, gawk parses the input into fields of
                   fixed width, instead of using the value of  the
                   FS   variable  as  the  field  separator.   See
                   Fields, above.

       FILENAME    The name of the  current  input  file.   If  no
                   files  are  specified  on the command line, the
                   value of FILENAME is “-”.  However, FILENAME is
                   undefined  inside the BEGIN rule (unless set by
                   getline).

       FNR         The input record number in  the  current  input
                   file.
       FPAT        A regular expression describing the contents of
                   the fields in a record.  When set, gawk  parses
                   the  input  into fields, where the fields match
                   the regular expression, instead  of  using  the
                   value  of  the FS variable as the field separa‐
                   tor.  See Fields, above.

       FS          The input field separator, a space by  default.
                   See Fields, above.

       FUNCTAB     An array whose indices and corresponding values
                   are the names of all the user-defined or exten‐
                   sion  functions  in the program.  NOTE: You may
                   not use the delete statement with  the  FUNCTAB
                   array.

       IGNORECASE  Controls  the  case-sensitivity  of all regular
                   expression and string operations.   If  IGNORE‐
                   CASE  has a non-zero value, then string compar‐
                   isons and  pattern  matching  in  rules,  field
                   splitting  with  FS and FPAT, record separating
                   with RS, regular expression matching with ~ and
                   !~, and the gensub(), gsub(), index(), match(),
                   patsplit(), split(), and sub()  built-in  func‐
                   tions   all  ignore  case  when  doing  regular
                   expression operations.  NOTE: Array  subscript‐
                   ing  is not affected.  However, the asort() and
                   asorti() functions are affected.
                   Thus, if IGNORECASE is not equal to zero,  /aB/
                   matches  all  of  the strings "ab", "aB", "Ab",
                   and "AB".  As with all AWK variables, the  ini‐
                   tial  value of IGNORECASE is zero, so all regu‐
                   lar expression and string operations  are  nor‐
                   mally case-sensitive.

       LINT        Provides  dynamic  control of the --lint option
                   from within an AWK program.   When  true,  gawk
                   prints  lint warnings. When false, it does not.
                   When assigned the string  value  "fatal",  lint
                   warnings  become  fatal  errors,  exactly  like
                   --lint=fatal.  Any other true value just prints
                   warnings.

       NF          The  number  of  fields  in  the  current input
                   record.

       NR          The total number of input records seen so far.

       OFMT        The  output  format  for  numbers,  "%.6g",  by
                   default.

       OFS         The output field separator, a space by default.

       ORS         The  output record separator, by default a new‐
                   line.

       PREC        The working precision  of  arbitrary  precision
                   floating-point numbers, 53 by default.

awk的基本语法是awk [options] 'Pattern{Action}' file

从一个最简单的命令作为start,省略[options]Pattern,将Action设置成最简单的print

$ echo abc > test.txt
$ awk '{print}' test.txt
abc

这就是最最简单的awk用法:使用awk命令对文本test.txt(的每一行)进行处理,处理的动作是print


本题解答

awk '{
    for (i=1;i<=NF;i++){
        if (NR==1){  # 天坑!之前这里写成了NF,怎么输出每行开头都多1个空格
            res[i]=$i
        }
        else{
            res[i]=res[i]" "$i
        }
    }
}END{
    for(j=1;j<=NF;j++){
        print res[j]
    }
}' file.txt

解析

awk是一行一行地处理文本文件,运行流程是:

  1. 先运行BEGIN后的{Action},相当于表头
  2. 再运行{Action}中的文件处理主体命令
  3. 最后运行END后的{Action}中的命令

有几个经常用到的awk常量:NF是当前行的field字段数;NR是正在处理的当前行数。

注意到是转置,假如原始文本有mn列(字段),那么转置后的文本应该有nm列,即原始文本的每个字段都对应新文本的一行。我们可以用数组res来储存新文本,将新文本的每一行存为数组res的一个元素。

END之前我们遍历file.txt的每一行,并做一个判断:在第一行时,每碰到一个字段就将其按顺序放在res数组中;从第二行开始起,每碰到一个字段就将其追加到对应元素的末尾(中间添加一个空格)。

文本处理完了,最后需要输出。在END后遍历数组,输出每一行。注意printf不会自动换行,而print会自动换行。