如何快速获得一个文件的类型和所使用的编码信息

前文iconv批量转换字符集编码的利器, 说到通过UltraEdit来得知CSV的编码是Unicode(对于小文件,Notepad也可以代劳),那么有什么更简便的办法获得文件的编码,甚至文件类型(Mime-type)呢?

Linux下有个非常实用的file command, 现在我把它移植到Windows中来了。

下载地址:file-win32-5.28.zip

官网及源码下载:Fine Free File Command

使用方法非常简单,这里举例如下,

file test_utf16le.txt
test_utf16le.txt: Little-endian UTF-16 Unicode text, with no line terminators


file --mime-encoding test_utf16le.txt
test_utf16le.txt: utf-16le

 

file file.exe
file.exe: PE32 executable (console) Intel 80386, for MS Windows

详细使用说明,

Usage: file [OPTION...] [FILE...]
Determine type of FILEs.

      --help                 display this help and exit
  -v, --version              output version information and exit
  -m, --magic-file LIST      use LIST as a colon-separated list of magic
                               number files
  -z, --uncompress           try to look inside compressed files
  -Z, --uncompress-noreport  only print the contents of compressed files
  -b, --brief                do not prepend filenames to output lines
  -c, --checking-printout    print the parsed form of the magic file, use in
                               conjunction with -m to debug a new magic file
                               before installing it
  -e, --exclude TEST         exclude TEST from the list of test to be
                               performed for file. Valid tests are:
                               apptype, ascii, cdf, compress, elf, encoding,
                               soft, tar, text, tokens
  -f, --files-from FILE      read the filenames to be examined from FILE
  -F, --separator STRING     use string as separator instead of `:'
  -i, --mime                 output MIME type strings (--mime-type and
                               --mime-encoding)
      --apple                output the Apple CREATOR/TYPE
      --extension            output a slash-separated list of extensions
      --mime-type            output the MIME type
      --mime-encoding        output the MIME encoding
  -k, --keep-going           don't stop at the first match
  -l, --list                 list magic strength
  -n, --no-buffer            do not buffer output
  -N, --no-pad               do not pad output
  -0, --print0               terminate filenames with ASCII NUL
  -p, --preserve-date        preserve access times on files
  -P, --parameter            set file engine parameter limits
                               indir        15 recursion limit for indirection
                               name         30 use limit for name/use magic
                               elf_notes   256 max ELF notes processed
                               elf_phnum   128 max ELF prog sections processed
                               elf_shnum 32768 max ELF sections processed
  -r, --raw                  don't translate unprintable chars to \ooo
  -s, --special-files        treat special (block/char devices) files as
                             ordinary ones
  -C, --compile              compile file specified by -m
  -d, --debug                print debugging messages

Report bugs to http://bugs.gw.com/

P.S.

SimplMagic 是一个java实现版本,使用相同的Magic files。