如何快速获得一个文件的类型和所使用的编码信息
前文iconv批量转换字符集编码的利器, 说到通过UltraEdit来得知CSV的编码是Unicode(对于小文件,Notepad也可以代劳),那么有什么更简便的办法获得文件的编码,甚至文件类型(Mime-type)呢?
Linux下有个非常实用的file command, 现在我把它移植到Windows中来了。
下载地址:file-win32-5.28.zip
官网及源码下载:Fine Free File Command
使用方法非常简单,这里举例如下,
file test_utf16le.txt test_utf16le.txt: Little-endian UTF-16 Unicode text, with no line terminators file --mime-encoding test_utf16le.txt test_utf16le.txt: utf-16le
file file.exe file.exe: PE32 executable (console) Intel 80386, for MS Windows
详细使用说明,
Usage: file [OPTION...] [FILE...] Determine type of FILEs. --help display this help and exit -v, --version output version information and exit -m, --magic-file LIST use LIST as a colon-separated list of magic number files -z, --uncompress try to look inside compressed files -Z, --uncompress-noreport only print the contents of compressed files -b, --brief do not prepend filenames to output lines -c, --checking-printout print the parsed form of the magic file, use in conjunction with -m to debug a new magic file before installing it -e, --exclude TEST exclude TEST from the list of test to be performed for file. Valid tests are: apptype, ascii, cdf, compress, elf, encoding, soft, tar, text, tokens -f, --files-from FILE read the filenames to be examined from FILE -F, --separator STRING use string as separator instead of `:' -i, --mime output MIME type strings (--mime-type and --mime-encoding) --apple output the Apple CREATOR/TYPE --extension output a slash-separated list of extensions --mime-type output the MIME type --mime-encoding output the MIME encoding -k, --keep-going don't stop at the first match -l, --list list magic strength -n, --no-buffer do not buffer output -N, --no-pad do not pad output -0, --print0 terminate filenames with ASCII NUL -p, --preserve-date preserve access times on files -P, --parameter set file engine parameter limits indir 15 recursion limit for indirection name 30 use limit for name/use magic elf_notes 256 max ELF notes processed elf_phnum 128 max ELF prog sections processed elf_shnum 32768 max ELF sections processed -r, --raw don't translate unprintable chars to \ooo -s, --special-files treat special (block/char devices) files as ordinary ones -C, --compile compile file specified by -m -d, --debug print debugging messages Report bugs to http://bugs.gw.com/
P.S.
SimplMagic 是一个java实现版本,使用相同的Magic files。