csv2table让PostgreSQL下导入csv不再那么麻烦
PostgreSQL下使用Copy命令导入csv,速度很快,但你得先分析CSV,创建表,有了csv2table,自动创建表并导入数据,一气呵成。
usage: csv2table [-h] --file FILE [--copy] [--backslash] [--delimiter DELIMITER] [--quote QUOTE] [--table TABLE] [--schema SCHEMA] [--create-schema] [--integer] [--numeric] [--timestamp] [--date] [--tz] [--mogrify] [--lower] [--drop] [--truncate] [--skip-parsing SKIP_PARSING] [--big-integer] [--no-create] [--fix-duplicates] [--transaction] [--temporary] [--file-column FILE_COLUMN] [--redshift REDSHIFT] [--redshift-bucket REDSHIFT_BUCKET] [--redshift-upload] [--gzip] [--missing-headers] [--cine] [--keep-going N] [--primary-key PRIMARY_KEY] [--serial-column SERIAL_COLUMN] [--array] optional arguments: -h, --help show this help message and exit --file FILE, -f FILE csv file --copy, -y issue a copy statement, for import --backslash, -s when issuing copy, use \copy --delimiter DELIMITER, -d DELIMITER csv delimiter --quote QUOTE, -q QUOTE csv quote --table TABLE, -t TABLE table name --schema SCHEMA schema name --create-schema create the named schema --integer, -i integer columns --numeric, -n numeric parsing --timestamp, -p timestamp parsing --date date parsing --tz use timestamptz instead of timestamp --mogrify, -m clean names --lower, -l lowercase names --drop, -x drop table first --truncate truncate table first --skip-parsing SKIP_PARSING skip type parsing for cols --big-integer, -b use bigint instead of int --no-create, -z don't issue create table --fix-duplicates handle duplicate column names by appending an increasing suffix --transaction, -1 run in a transaction --temporary create a temporary table --file-column FILE_COLUMN add a column to the created table which will contain the name of the file the data was loaded from --redshift REDSHIFT generate a copy command for redshift. takes an optional argument, a file to look for s3 credentials/bucket in. To try and read from the environment, specify --redshift-bucket REDSHIFT_BUCKET when generating a copy command for redshift, using this option specifies the bucket to store the file in, if requested. This will override any bucket found by the --redshift option. --redshift-upload upload the specified file to redshift --gzip the file is in gzip format --missing-headers file is missing headers, make up column names --cine create if not exists when creating a table --keep-going N read the N rows instead of the first two rows, trading performance for better type detection --primary-key PRIMARY_KEY column(s) to make part of the primary key --serial-column SERIAL_COLUMN add serial column of the given name --array attempt to detect and create array columns
P.S.
其实对于大的CSV文件,有时我会用
head -50 file.csv>sample.csv
这样的命令来取一部分进行分析
并用R读取并来导入(自动创建表),然后再用copy来快速导入。