SAS导入导出时编码问题汇总

获得SAS默认编码(其实是通过启动时加载配置文件决定的,nls),

"D:\Program Files\SASHome9.4\SASFoundation\9.4\sas.exe" -CONFIG "d:\Program Files\SASHome9.4\SASFoundation\9.4\nls\zh\sasv9.cfg"

启动后无法修改,如果尝试通过下面命令设置,会得到警告,

option encoding='utf-8';

WARNING 30-12: SAS option ENCODING is valid only at startup of the SAS System. The SAS option is ignored.

%put &sysencoding;

*or;

data _null_;
  val=GETOPTION('encoding');
  put val=;
run;

因此在导入、导出的时候,我们可以指定导入文件或者导出文件的编码。

比如,要导入的csv文件为utf-8,变量为中文,代码如下,

options validvarname=any;

 FILENAME nls "X:\job\test1.csv" ENCODING="utf-8";


PROC IMPORT OUT= WORK.TEST2 
            DATAFILE= nls
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
RUN;

对应的UTF-8编码文件输出,

FILENAME export "X:\job\test2.csv" ENCODING="utf-8";

PROC EXPORT DATA= TEST2
            OUTFILE= export
            DBMS=csv REPLACE;
RUN;
This example creates a SAS data set from an external file. The external file’s encoding is in UTF-8, and the current SAS session encoding is Wlatin1. By default, SAS assumes that the external file is in the same encoding as the session encoding, which causes the character data to be written to the new SAS data set incorrectly.
To tell SAS what encoding to use when reading the external file, specify the ENCODING= option. When you tell SAS that the external file is in UTF-8, SAS then transcodes the external file from UTF-8 to the current session encoding when writing to the new SAS data set. Therefore, the data is written to the new data set correctly in Wlatin1.
如果不指定编码,SAS会默认导出和导出的文件编码同自身默认的编码一致。
另外我们可以对SAS数据库指定编码。
比如转换SAS dataset的编码,
*转换整个目录;
libname inlib cvp 'c:\temp';
libname outlib 'c:\' outencoding='UTF-8';
proc copy noclone in=inlib out=outlib;
run;

*转换指定数据库

libname inlib cvp 'c:\temp';
libname outlib 'c:\' outencoding='UTF-8';
proc copy noclone in=inlib out=outlib;
 select dataset_name;
run;