Change delimiter in a csv file and remove line breaks in fields
Linux
Python
Coding
I wrote a script to convert delimiters in CSV files, eg, commas to pipes. I prefer pipe-delimited files because the the pipe-delimiter (|) will not clash data in the different fields 99.999% of the time. I also added the option to convert newline () and carriage return ()̊ characters in the data fields to spaces. This comes in handy when I use PROC IMPORT
in SAS as line breaks cause it to choke.
Here's my csvconvert.py
script:
```{python}
#! /usr/bin/env python
#### Command line arguments
import argparse
= argparse.ArgumentParser(description="Convert delimited file from one delimiter to another; defaults to converting CSV to pipe-delimited.")
parser "--dlm-input", action="store", dest="dlm_in", default=",", required=False, help="delimiter of the input file; defaults to comma (,)", nargs='?', metavar="','")
parser.add_argument("--dlm-output", action="store", dest="dlm_out", default="|", required=False, help="delimiter of the output file; defaults to pipe (|)", nargs='?', metavar="'|'")
parser.add_argument("--remove-line-char", action="store_true", dest="remove_line_char", default=False, help="remove \\n and \\r characters in fields and replace with spaces")
parser.add_argument("--quote-char", action="store", dest="quote_char", default='"', required=False, help="quote character; defaults to double quote (\")", nargs='?', metavar="\"")
parser.add_argument("-i", "--input", action="store", dest="input", required=False, help="input file; if not specified, take from standard input.", nargs='?', metavar="file.csv")
parser.add_argument("-o", "--output", action="store", dest="output", required=False, help="output file; if not specified, write to standard output", nargs='?', metavar="file.pipe")
parser.add_argument("-v", "--verbose", action="store_true", dest="verbose", default=False, help="increase verbosity")
parser.add_argument(= parser.parse_args()
args # print args
# http://snipplr.com/view/45759/convert-csv-file-to-pipe-delineated-file/
import argparse
import csv
import sys
from signal import signal, SIGPIPE, SIG_DFL # http://stackoverflow.com/questions/14207708/ioerror-errno-32-broken-pipe-python
## no error when exiting a pipe like less
signal(SIGPIPE,SIG_DFL)
if args.input:
= csv.reader(open(args.input, 'rb'), delimiter=args.dlm_in, quotechar=args.quote_char)
csv_reader else:
= csv.reader(sys.stdin, delimiter=args.dlm_in, quotechar=args.quote_char)
csv_reader
if args.output:
= open(args.output, 'wb')
h_outfile else:
= sys.stdout
h_outfile
for row in csv_reader:
= args.dlm_out.join(row)
row if args.remove_line_char:
= row.replace('\n', ' ').replace('\r', ' ')
row "%s\n" % (row))
h_outfile.write(
h_outfile.flush()# print row
```
Help description:
usage: csvconvert.py [-h] [--dlm-input [',']] [--dlm-output ['|']]
[--remove-line-char] [--quote-char ["]] [-i [file.csv]]
[-o [file.pipe]] [-v]
Convert delimited file from one delimiter to another; defaults to converting
CSV to pipe-delimited.
optional arguments:
-h, --help show this help message and exit
--dlm-input [','] delimiter of the input file; defaults to comma (,)
--dlm-output ['|'] delimiter of the output file; defaults to pipe (|)
--remove-line-char remove \n and \r characters in fields and replace with
spaces
--quote-char ["] quote character; defaults to double quote (")
-i [file.csv], --input [file.csv]
input file; if not specified, take from standard
input.
-o [file.pipe], --output [file.pipe]
output file; if not specified, write to standard
output
-v, --verbose increase verbosity
Usage:
cat myfile.csv | csvconvert.py --remove-line-char > myfile.pipe