如何快速获得一个文件的类型和所使用的编码信息
前文iconv批量转换字符集编码的利器, 说到通过UltraEdit来得知CSV的编码是Unicode(对于小文件,Notepad也可以代劳),那么有什么更简便的办法获得文件的编码,甚至文件类型(Mime-type)呢?
Linux下有个非常实用的file command, 现在我把它移植到Windows中来了。
下载地址:file-win32-5.28.zip
官网及源码下载:Fine Free File Command
使用方法非常简单,这里举例如下,
file test_utf16le.txt test_utf16le.txt: Little-endian UTF-16 Unicode text, with no line terminators file --mime-encoding test_utf16le.txt test_utf16le.txt: utf-16le
file file.exe file.exe: PE32 executable (console) Intel 80386, for MS Windows
详细使用说明,
Usage: file [OPTION...] [FILE...]
Determine type of FILEs.
--help display this help and exit
-v, --version output version information and exit
-m, --magic-file LIST use LIST as a colon-separated list of magic
number files
-z, --uncompress try to look inside compressed files
-Z, --uncompress-noreport only print the contents of compressed files
-b, --brief do not prepend filenames to output lines
-c, --checking-printout print the parsed form of the magic file, use in
conjunction with -m to debug a new magic file
before installing it
-e, --exclude TEST exclude TEST from the list of test to be
performed for file. Valid tests are:
apptype, ascii, cdf, compress, elf, encoding,
soft, tar, text, tokens
-f, --files-from FILE read the filenames to be examined from FILE
-F, --separator STRING use string as separator instead of `:'
-i, --mime output MIME type strings (--mime-type and
--mime-encoding)
--apple output the Apple CREATOR/TYPE
--extension output a slash-separated list of extensions
--mime-type output the MIME type
--mime-encoding output the MIME encoding
-k, --keep-going don't stop at the first match
-l, --list list magic strength
-n, --no-buffer do not buffer output
-N, --no-pad do not pad output
-0, --print0 terminate filenames with ASCII NUL
-p, --preserve-date preserve access times on files
-P, --parameter set file engine parameter limits
indir 15 recursion limit for indirection
name 30 use limit for name/use magic
elf_notes 256 max ELF notes processed
elf_phnum 128 max ELF prog sections processed
elf_shnum 32768 max ELF sections processed
-r, --raw don't translate unprintable chars to \ooo
-s, --special-files treat special (block/char devices) files as
ordinary ones
-C, --compile compile file specified by -m
-d, --debug print debugging messages
Report bugs to http://bugs.gw.com/
P.S.
SimplMagic 是一个java实现版本,使用相同的Magic files。