csv2table让PostgreSQL下导入csv不再那么麻烦

September 6, 2016

alitrack PostgreSQL csv2table, PostgreSQL, redshift

csv2table让PostgreSQL下导入csv不再那么麻烦

PostgreSQL下使用Copy命令导入csv，速度很快，但你得先分析CSV，创建表，有了csv2table，自动创建表并导入数据，一气呵成。

usage: csv2table [-h] --file FILE [--copy] [--backslash]
                 [--delimiter DELIMITER] [--quote QUOTE] [--table TABLE]
                 [--schema SCHEMA] [--create-schema] [--integer] [--numeric]
                 [--timestamp] [--date] [--tz] [--mogrify] [--lower] [--drop]
                 [--truncate] [--skip-parsing SKIP_PARSING] [--big-integer]
                 [--no-create] [--fix-duplicates] [--transaction]
                 [--temporary] [--file-column FILE_COLUMN]
                 [--redshift REDSHIFT] [--redshift-bucket REDSHIFT_BUCKET]
                 [--redshift-upload] [--gzip] [--missing-headers] [--cine]
                 [--keep-going N] [--primary-key PRIMARY_KEY]
                 [--serial-column SERIAL_COLUMN] [--array]

optional arguments:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  csv file
  --copy, -y            issue a copy statement, for import
  --backslash, -s       when issuing copy, use \copy
  --delimiter DELIMITER, -d DELIMITER
                        csv delimiter
  --quote QUOTE, -q QUOTE
                        csv quote
  --table TABLE, -t TABLE
                        table name
  --schema SCHEMA       schema name
  --create-schema       create the named schema
  --integer, -i         integer columns
  --numeric, -n         numeric parsing
  --timestamp, -p       timestamp parsing
  --date                date parsing
  --tz                  use timestamptz instead of timestamp
  --mogrify, -m         clean names
  --lower, -l           lowercase names
  --drop, -x            drop table first
  --truncate            truncate table first
  --skip-parsing SKIP_PARSING
                        skip type parsing for cols
  --big-integer, -b     use bigint instead of int
  --no-create, -z       don't issue create table
  --fix-duplicates      handle duplicate column names by appending an
                        increasing suffix
  --transaction, -1     run in a transaction
  --temporary           create a temporary table
  --file-column FILE_COLUMN
                        add a column to the created table which will contain
                        the name of the file the data was loaded from
  --redshift REDSHIFT   generate a copy command for redshift. takes an
                        optional argument, a file to look for s3
                        credentials/bucket in. To try and read from the
                        environment, specify 
  --redshift-bucket REDSHIFT_BUCKET
                        when generating a copy command for redshift, using
                        this option specifies the bucket to store the file in,
                        if requested. This will override any bucket found by
                        the --redshift option.
  --redshift-upload     upload the specified file to redshift
  --gzip                the file is in gzip format
  --missing-headers     file is missing headers, make up column names
  --cine                create if not exists when creating a table
  --keep-going N        read the N rows instead of the first two rows, trading
                        performance for better type detection
  --primary-key PRIMARY_KEY
                        column(s) to make part of the primary key
  --serial-column SERIAL_COLUMN
                        add serial column of the given name
  --array               attempt to detect and create array columns

P.S.
其实对于大的CSV文件，有时我会用

head -50 file.csv>sample.csv

这样的命令来取一部分进行分析
并用R读取并来导入（自动创建表），然后再用copy来快速导入。

csv2table让PostgreSQL下导入csv不再那么麻烦

csv2table让PostgreSQL下导入csv不再那么麻烦

Written by:

alitrack

Leave a Reply
Cancel reply

csv2table让PostgreSQL下导入csv不再那么麻烦

Written by:

alitrack

Leave a Reply Cancel reply

Leave a Reply
Cancel reply