十三文件瘦身-去除 linux shell 脚本注释

浏览次数: 163

【动机】

常见的需求是为了减小脚本体积而去除脚本注释, 近日发现对于一些复杂的脚本, 使用 shc 工具编译脚本为可执行文件, 在去除注释前编译失败, 去除注释后能顺利编译. 如何分离瘦身脚本, 以便可用于任何 linux shell 脚本?

【意图】

将去除注释的逻辑, 写入 awk 脚本, 由 linux shell 按需调用即可.

【实现】

新建 slim.awk 文件, 写入脚本如下:


#
    # 一. 作用: 给 linux shell 脚本瘦身(去除注释), 包括    
        #   1. 删除注释块.
                # 注释块: 确定关键字, 忽略当前行到仅包含关键字一行之间的所有行, 及边界两行
                # 首行格式:    
                    #   :(冒号可选) <<  flag this is tail...
                # 特点: 
                    #   1. 冒号左右, << 与 flag 之间, 空格数为 0-n. 
                    #   2. 首行 flag, 如果要在其首尾或者中间包含空格, 可以用单(双)引号包围, 
                    #       对数为 0-3. 例如:
                    #       1) '    Name  '
                    #       2) '''     Name          '''
                    #       此时要求结束 flag, 空格数要与首行对应, 关于这一点后面还有论述.
                    #   2. 当缺少 : 时, flag 后面不能有任何字符, 包括空格, 否则脚本自身报错
                    #   所以, 从首字符开始查找较快
                # 注意:
                    #   注释块还有两种形式, 但个人感觉它们因本质上是源码而不易区分, 
                    #   故不建议剔除: 
                    # 1. 空命令带参数形式:
                    #     : "   末尾注释自动生成, 无需 #
                    # 起始行: 冒号与(单/双)引号之间最少一个空格
                    # first 
                    #     second
                    #     "
                    # 2. 条件满足执行, 或条件满足但忽略执行
                    # [    '
                    #     hello
                    #     world
                    # ' ] && echo "hey"
                    # [ 'wol' ]

        #   2. cat 做写入文件时, 内部的任何行(包括空行, # 开头的行等)均需要保留
                # 目前知道的 cat 语句分两种, 普通打印和写文件, 后者内部包含的注释不能删除, 因为它是输入内容的一部分
                # 展开到一行后的格式可能如下:
                # 1) cat abc.txt        (作为普通代码行输出即可)
                    # 考虑到极端特殊情况下, 以免处理换行书写:
                    # cat\
                    #   abc.txt
                # 2) cat   >> mnk.tso   <<- AMD    
                    #     one
                    #     tow
                    #     # this coment is importanty
                    #     woo
                    #     foo 
                    # AMD      
                    # 标志 AMD 与 注释块的 flag 具有一致的特点, 即需要时可以包括空格. 关于这一点后面还有论述.
                    # 同样, 首行换行书写也已经处理
                    # cat\
                    #    >>\
                    #    mnk.tso\
                    #    <<-\
                    #    AMD
                    # 也是需要考虑的.
                    #
                #  3) cat 的其他格式, 暂时没用到, 用到时再补充......
        #   3. 调用 awk 命令时, 通过配置 awk 字符串变量 -v keep_flags, 允许指定 1 个或多个(中间用空格分隔)字符串标志
                #   以此标志的行, 表示强制保留在瘦身文件中, 该变量默认为 keep_line. 这在某些仅需要在瘦身文件中才运行代码
                #   或者强制保留某些注释的场景, 可能会有用.
                #   详情参看 BEGIN{ } 处代码注释.
        #   4. 尽可能删除代码行末尾注释
                # 代码行末尾注释:
                # 1. 由于涉及代码的 单引号/双引号, 引号转义, 引号嵌套, 比较复杂 
                # 2. 匹配正则考虑不周, 可能会删除正常使用的 # 及其以后, 例如位于多层嵌套, 附带转义的单/双引号时.
                # 3. 但有一种情况, 正则表达式为: /^\s*[^ #]+#/, 为简化判断分支, 已将其合并到行处理
                #   1) # 位于其他字符中间或最末尾
                #   1) # 左边至少有一个空格
                #   2) # 右边没有单引号和双引号
                #   例如: 
                #   echo hello "'one # \"two '' ;' "' # ' three" \\' r'\four \' " five"  " # \r \n \' here,  " , ' and # is very important 
                #   运行一遍就知道, 实际的注释应从左起第 3 个 # 开始, 按上述第 3 条的匹配原则, 则匹配第 4 个 # , 
                #   有漏删, 但不会误删. 基本满足.

        #   5. 删除单行注释和空白行
                # 1) 由于"删除"目的是通过"忽略打印, 不调用 print"实现的, 而单行注释/空白行的删除, 依赖于
                #   处理与 /^\s*(\s*|#.*)$/ 不匹配的行.
                # 2) 上述正则取反, 不但匹配普通代码行(可能带末尾注释), 还可能匹配注释块和 cat 格式代码, 即与
                #   注释块/cat 格式代码的匹配正则有重合的部分. 故必须将之放在最后

        #   6. 保留首行 shengba 行:  shengba 行
                # 1) shengba 行正则与注释块和 cat 格式代码均不相容, 原则上其判断可以放在任何位置, 但为了减少纯 print 语句
                #   次数, 故将其与 4 一起放在最后.
    # 二. 关于一些特殊的注释块和 cat 命令写文件标志
        #   1. 假设下面的 # 代表行首, 则下面的注释块虽然很怪, 但不影响脚本执行, 注意末行的标志末尾还有几个空格:
        #    :  <<   """      Com  me   nt    """  the  rest   words 
        #     this is a comment content
        #       Com  me   nt    
        #   2. 类似 1, cat 也有相应的写法, 只不过首行标志末尾不能有多余字符, 即使是空格. 而标志是否被引号包围, 
        #       作用又有所不同( # 代表行首):
        #       cat  >> $file <<- """       EOF"""
        #           this is content of $file
        #                     one 
        #             two         three
        #           #       five six.
        #               123
        #       EOF
        #       以上, 所有的奇葩注释块以及 cat 命令块, 程序均已做了兼容处理. 
    # 三. 其他:
        #   1. echo 多行字符串, 如果其中附带 # 开头的行, 我没有做处理. 按逻辑将会被当作注释行删除. 例如:
            #       echo "
            #           # this is a comment
            #           echo \"hello world\"
            #       "
        #   作为补救, 可借助于 keep_flags 变量以保留, 举例如下:
        #    awk -v keep_flags="echo_inner_comment" ...
            #       echo "
            #           # echo_inner_comment# this is a comment
            #           echo \"hello world\"
            #       "
#

# @include 语句目前只能从 shell 入口脚本, 而不是 awk 调用方算起, 不科学, 几乎没有存在的意义
# @include "../../library/awk/function/ttt.awk"


function debug(desc, value){
    if (__RELEASE__ == 0)
        printf "%s:<%s>\n", desc, value > "/dev/stderr" 
}

# 为指定的注释块行或者 cat 命令行 获取用于与结尾行对应的标志字符串(可能会包含空格)
# 不是合法的块或者其他语句, 则返回空字符串
# 参数: 代码块的首行, 或者是概念上的首行(例如 cat 首行折行写, 但此处使用的整理到一行的代码)
function getflag(line){
    result=""  
    start_idx = match(line, /<<-?/)  # << 或者 <<- 在字串中的起始位置
    # debug("<< 位置", start_idx)
    if (start_idx != 0){
        temp = substr(line, start_idx)            
        start_idx = match(temp, /[^-< ]/)  #  << 或者 <<- 之后, 第一个非空格字符的位置 
        # debug("<<之后第一个非空格字符位置",start_idx)
        if (start_idx != 0){
            result=substr(temp,start_idx)
            char = substr(result,1,1)       
            if (char == "'" || char == "\""){
                start_idx = match(result, /[^'"]/)    
                result = substr(result, start_idx)
                end_idx = match(result, /['"]/)        
                result = substr(result, 1, end_idx-1)
            }else{
                end_idx = match(result, /[ ]/)
                if (end_idx != 0){
                    result = substr(result, 1, end_idx-1)
                }
            }
        }
    }
    return result
}

# 处理注释块或者 cat 命令块, 如果块不完整, 则报错退出
# 参数: 
#   flag_desc 对块的简短表述
#   show 是否要显示(打印), 例如
#       cat 要整块打印, 即使块内包含注释; 
#       注释块要隐藏, 即使块内部包含代码
#   line 首行或概念首行代码
function handle_block(flag_desc, show, line){
    flag = getflag(line)
    debug(flag_desc,flag)
    # 隐藏范围内的注释
    start_line=FNR
    while($0 != flag){
        success=getline
        # 1：成功读取一行。到达文件末尾（EOF）。-1：发生错误。
        if (success != 1){
                error_msg=creat_error(""flag_desc" 标志 <"flag"> 找不到匹配的结束句,",start_line)
            exit
        }
        if (show != 0)
            print 
    }
}


function creat_error(msg,line){
    return sprintf("%s\nLine number: %d", msg, line)
}

# 从给定的字符串, 创建用于强制保存匹配行注释到瘦身文件的正则表达式
function reg_by_keep_flags(flags){
    if (flags == "")
        flags = "keep_line" 
    debug("flag", flags)
    split(flags, arr_flag, " ")

    reg = "^\\s*#\\s*("
    for(idx in arr_flag){
        reg = sprintf("%s|%s", reg, arr_flag[idx])
    }
    reg = sprintf("%s)", reg)

    # 第一个 | 不能要
    sub(/\|/, "", reg)
    return reg
}




# 参数:
#   keep_flags 行首带此字符串包含的标志之一, 表示强制保持该行(去掉 # 与字符串标志后). 
#       默认为 keep_line. 多个标志用空格隔开, 所以一个单独标志无法包含空格, 
#       否则被当成多个标志. 
#       脚本要自己保证该标志不影响脚本执行,  # 号的前后可以有 0 或多个空格. 例如, 调用
#           awk -v keep_flags preserved_comment -f <...> , 则对于以下注释
#           #preserved_comment # comment1
#               #  preserved_comment   # comment2
#             #    preserved_comment # comment3
#       甚至下面的注释也合法, 虽然可读性不好:
#           #preserved_comment#comment4
#       以上四种情况, 得到的代码行分别是(在行内的具体起始位置, 由标志去除后决定):
#       # comment1 和 # comment2 和 # comment3 和 #comment4   
#   append_remark 是否在末尾添加操作摘要(成功或失败). 注意, 如果操作失败, 控制台上总是有操作失败摘要
#       任何非空字符串, 表示添加; 其他, 表示不添加. 默认空字符串. 
BEGIN{
    #test()
    __RELEASE__ = 1
    reg_keep = reg_by_keep_flags(keep_flags)

    debug("reg_keeping",reg_keep)

    #exit
    error_msg = ""
}{  
    # 注意: awk 不支持前向声明/后向声明, 故 match 定位只能从前往后一点一点定位
    if ($0 ~ /^\s*:?\s*<<\s*[^< ]+\s*[^ ]+\s*.*$/){
        # 注释块: 
        handle_block("注释块", 0, $0)
    }
    else if ($0 ~ /^\s*cat\s*/){
        # cat 语句 用来存放可能有 \ 换行的完整的 cat 命令
        # 如果 cat 是写文件, 则内部内容, 包括注释都要保护好
        # 格式:  cat   >> mnk.tso   <<- AMD  (-号不一定有), >> mnk.tso 可能会在末尾
        print $0
        while ($0 ~ /\\\s*$/){
            sub(/\\\s*$/, " ", $0)
            prev = $0
            getline
            print $0
            $0 = sprintf("%s%s", prev, $0)
        }
        if ($0 ~ /<<-?/){      
            handle_block("cat 命令", 1, $0)
        }         
    }
    else if ( $0 ~ reg_keep){
        sub(reg_keep, "")
        print
    }
    else if (NR == 1 && $0 ~ /^\s*#\s*!\s*\/bin\/bash\s*$/ || $0 !~ /^\s*(\s*|#.*)$/){
        # 单行处理过程:
        #   1. 首行 shengba 行, 非空行: 打印
        #   2. 空白行, 即匹配正则的 | 之前半部分 /^\s*\s*$/ , 简化得到 /^\s*$/: 忽略
        #   3. 代码行, 不匹配正则的剩余行(行尾可能带注释): 保守删除行尾注释(尾部做替换空字符处理而实现删除. 见说明)

        # 替换只对上述 3(代码行)起作用. 
        sub(/\s+#[^'"]*$/, "") 
        print 
    }
}END{
    cur_time = strftime("%Y-%m-%d %H:%M:%S", systime()) 
    info = error_msg
    if (error_msg == "")
        info = "All  comments of script are cleared."
    summary = sprintf("# [%s] %s. \n# source: %s.", cur_time, info, FILENAME)

    # 只要有错误发生, 则控制台必须打印
    if (error_msg != "")
        print summary > "/dev/stderr" 

    if (append_remark != "")
        print summary          
}

脚本有详尽注释, 在此不赘述, 基本原理是考虑存在的多种注释的同时, 又不能误删一些合理的以 # 开头的行, 例如包含于 cat 命令内的代码, 以及为精简逻辑, 调整了 if / else if / else 的顺序.

新建 compile.sh 文件, 写入函数 create_slim:

#!/bin/bash

:<<COMMENT
    去除注释, 详情见 doc/compile.txt. 
    用法:
    create_slim --source-path(-s)     <=self_entry> \
                --target-path(-t)     <=''>
                --keep-flags(-k)       <='keep_line'> \
                --append-remark(-a)   ?
    保持默认值的简单使用(静态调用方式, 仅打印到控制台):
        Scripter.create_slim  
COMMENT
create_slim(){
    local source_path target_path keep_flags
    local to_where
    eval $PASX

    params  "'source-path s' '$(readlink -f $0)'" \
            "'target-path t'" \
            "'keep-flags k' keep_line" 
    local apd
    param_appear append_remark a && apd=y
    ensure_par "$target_path"
    [ "$target_path" ] && to_where="$target_path" || to_where="/dev/stdout"     
    awk -v keep_flags="$keep_flags" -v append_remark="$apd" \
        -f "$(dirname "${BASH_SOURCE[0]}")/awk/slim.awk" \
        "$source_path"  > "$to_where"
}

【测试】

先创建包含各种注释(包括虽然不影响脚本执行, 但可能一辈子也不会采用的奇葩注释)的 sh 代码文件 bingo.sh:

#!/bin/bash

#        keep_line   #   calling me using keep_line

   #   keep_line  #calling you using keep_line

   :  <<   """   Co  mm    ent1   """      tail tail ...
    this is a comment content
   Co  mm    ent1   
miracle1(){
    echo "mirocle comment1   :  <<  'XXX'  , with tail"
}

   :  <<-   '''   Co  mm    ent2   '''      tail tail ...
    this is a comment content
   Co  mm    ent2   
miracle2(){
    echo "mirocle comment2   :  <<-   'YYY'  , with tail "
}
#
  #     self_compile_in_slim_file#  call compile self
  <<   '''   Co  mm    ent3   '''     
    this is a comment content
   Co  mm    ent3   
miracle3(){
    echo "mirocle comment3   << ' PPP'  , without tail except spaces"
}

  # another_keep_line  # hwo are you by another_keep

  <<-   '''   Co  mm    ent4   '''      
    this is a comment content
   Co  mm    ent4   
miracle4(){
    echo "mirocle comment4  <<- 'QQQ'   , without tail except spaces"
}

#tip "-------------------------"

   :  <<  Comment5    tail tail ...
    this is a comment content
Comment5
normal1(){
    echo "normal comment1  :  <<  MMM , with tail."
}

   :  <<-  Comment6    tail tail ...
    this is a comment content
Comment6
normal2(){
    echo "normal comment2   : <<  NNN , without tail "
}

     <<  Comment7     
    this is a comment content
Comment7
normal3(){
    echo "normal comment3  :  <<  TTT, without tail except spaces."
}

     <<-  Comment8      
    this is a comment content
Comment8
normal4(){
    echo "normal comment4   : <<  RRR , without tail except spaces"
}

bingo(){
    way1(){
        local file=$ROOT_DIR_PATH/test/test.txt
        ensure_par "$file"
        cat    >>   $file        \
             <<-  \
                     """            EOF    """
        # this is a comment, whick must not be deleted
                this is a content of $file
                            one 
                    two         three
                #       five six.
                123
            EOF    
        echo " this is the first line ,
        the second line
# keep_comment_in_echo        # You must rely on \"--keep_flags\" in order to retain this comment
        the third line
        "
    }


    way1

    # micracle testing
    # miracle1
    # miracle2
    # miracle3
    # miracle4

    # # normal testing 
    # normal1
    # normal2             # some other tail comment
    #  normal3
    #  normal4

    unset -f way1 
}

bingo

现在新建测试文件 test.sh, 键入以下代码:

way1(){
        create_slim  --source-path "$ROOT_DIR_PATH/test/lnx-code/bingo.sh" \
                    --target-path "$ROOT_DIR_PATH/test/slim_result.sh" \
                    --keep-flags "self_compile_in_slim_file keep_line 
                    another_keep_line keep_comment_in_echo" -a    
    }
    clear
    way1

得到的最终瘦身文件 slim_result.sh 内容如下:

#!/bin/bash
   #   calling me using keep_line
  #calling you using keep_line
miracle1(){
    echo "mirocle comment1   :  <<  'XXX'  , with tail"
}
miracle2(){
    echo "mirocle comment2   :  <<-   'YYY'  , with tail "
}
#  call compile self
miracle3(){
    echo "mirocle comment3   << ' PPP'  , without tail except spaces"
}
  # hwo are you by another_keep
miracle4(){
    echo "mirocle comment4  <<- 'QQQ'   , without tail except spaces"
}
normal1(){
    echo "normal comment1  :  <<  MMM , with tail."
}
normal2(){
    echo "normal comment2   : <<  NNN , without tail "
}
normal3(){
    echo "normal comment3  :  <<  TTT, without tail except spaces."
}
normal4(){
    echo "normal comment4   : <<  RRR , without tail except spaces"
}
bingo(){
    way1(){
        local file=$ROOT_DIR_PATH/test/test.txt
        ensure_par "$file"
        cat    >>   $file        \
             <<-  \
                     """            EOF    """
        # this is a comment, whick must not be deleted
                this is a content of $file
                            one 
                    two         three
                #       five six.
                123
            EOF    
        echo " this is the first line ,
        the second line
        # You must rely on \"--keep_flags\" in order to retain this comment
        the third line
        "
    }
    way1
    unset -f way1 
}
bingo
# [2025-05-27 17:17:11] All  comments of script are cleared.. 
# source: /mnt/d/Estate/asset/OS/linux/application/blogging/test/lnx-code/bingo.sh.

注意:

#!/bin/bash 得以保留;
通过 keep_flags 定义的注释保持行得以保留, 例如为了防止误删 echo 多行字符串时, 中间可能存在的以 # 开头的文本;
cat 命令内部包含的 # 开始的语句得以保留, 注意这并不依赖于 keep_flags 的定义;
最末添加两行说明(可选), 标志瘦身成功(通过 –append-remark 或 -a 选项指定).

当然还通过忽略选项参数 –target-path 和 -t , 将结果打印到控制台, 而不是生成文件. 篇幅关系, 测试从略.

谢谢收看!

十三文件瘦身-去除 linux shell 脚本注释

【动机】

【意图】

【实现】

【测试】

Comments

发表回复取消回复

【动机】

【意图】

【实现】

【测试】

Comments

发表回复 取消回复

发表回复取消回复