csv2table让PostgreSQL下导入csv不再那么麻烦
PostgreSQL下使用Copy命令导入csv,速度很快,但你得先分析CSV,创建表,有了csv2table,自动创建表并导入数据,一气呵成。
usage: csv2table [-h] --file FILE [--copy] [--backslash]
[--delimiter DELIMITER] [--quote QUOTE] [--table TABLE]
[--schema SCHEMA] [--create-schema] [--integer] [--numeric]
[--timestamp] [--date] [--tz] [--mogrify] [--lower] [--drop]
[--truncate] [--skip-parsing SKIP_PARSING] [--big-integer]
[--no-create] [--fix-duplicates] [--transaction]
[--temporary] [--file-column FILE_COLUMN]
[--redshift REDSHIFT] [--redshift-bucket REDSHIFT_BUCKET]
[--redshift-upload] [--gzip] [--missing-headers] [--cine]
[--keep-going N] [--primary-key PRIMARY_KEY]
[--serial-column SERIAL_COLUMN] [--array]
optional arguments:
-h, --help show this help message and exit
--file FILE, -f FILE csv file
--copy, -y issue a copy statement, for import
--backslash, -s when issuing copy, use \copy
--delimiter DELIMITER, -d DELIMITER
csv delimiter
--quote QUOTE, -q QUOTE
csv quote
--table TABLE, -t TABLE
table name
--schema SCHEMA schema name
--create-schema create the named schema
--integer, -i integer columns
--numeric, -n numeric parsing
--timestamp, -p timestamp parsing
--date date parsing
--tz use timestamptz instead of timestamp
--mogrify, -m clean names
--lower, -l lowercase names
--drop, -x drop table first
--truncate truncate table first
--skip-parsing SKIP_PARSING
skip type parsing for cols
--big-integer, -b use bigint instead of int
--no-create, -z don't issue create table
--fix-duplicates handle duplicate column names by appending an
increasing suffix
--transaction, -1 run in a transaction
--temporary create a temporary table
--file-column FILE_COLUMN
add a column to the created table which will contain
the name of the file the data was loaded from
--redshift REDSHIFT generate a copy command for redshift. takes an
optional argument, a file to look for s3
credentials/bucket in. To try and read from the
environment, specify
--redshift-bucket REDSHIFT_BUCKET
when generating a copy command for redshift, using
this option specifies the bucket to store the file in,
if requested. This will override any bucket found by
the --redshift option.
--redshift-upload upload the specified file to redshift
--gzip the file is in gzip format
--missing-headers file is missing headers, make up column names
--cine create if not exists when creating a table
--keep-going N read the N rows instead of the first two rows, trading
performance for better type detection
--primary-key PRIMARY_KEY
column(s) to make part of the primary key
--serial-column SERIAL_COLUMN
add serial column of the given name
--array attempt to detect and create array columns
P.S.
其实对于大的CSV文件,有时我会用
head -50 file.csv>sample.csv
这样的命令来取一部分进行分析
并用R读取并来导入(自动创建表),然后再用copy来快速导入。