13.2.5. LOAD DATA INFILE
Syntax
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name
'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[FIELDS
[TERMINATED BY 'string
']
[[OPTIONALLY] ENCLOSED BY 'char
']
[ESCAPED BY 'char
']
]
[LINES
[STARTING BY 'string
']
[TERMINATED BY 'string
']
]
[IGNORE number
LINES]
[(col_name_or_user_var
,...)]
[SET col_name
= expr
,...)]
The LOAD DATA INFILE
statement reads rows
from a text file into a table at a very high speed. The filename
must be given as a literal string.
The syntax for the FIELDS
and
LINES
clauses also applies to the
SELECT ... INTO OUTFILE
statement, as
described later in this section. (See also
Section 13.2.7, “SELECT
Syntax”.)
For more information about the efficiency of
INSERT
versus LOAD DATA
INFILE
and speeding up LOAD DATA
INFILE
, see Section 7.2.16, “Speed of INSERT
Statements”.
The character set indicated by the
character_set_database
system variable is
used to interpret the information in the file. SET
NAMES
and the setting of
character_set_client
do not affect
interpretation of input.
Note that it is currently not possible to load data files that
use the ucs2
character set.
As of MySQL 5.1.6, the
character_set_filesystem
system variable
controls the interpretation of the filename.
You can also load data files by using the
mysqlimport utility; it operates by sending a
LOAD DATA INFILE
statement to the server. The
--local
option causes
mysqlimport to read data files from the
client host. You can specify the --compress
option to get better performance over slow networks if the
client and server support the compressed protocol. See
Section 8.12, “mysqlimport — A Data Import Program”.
If you use LOW_PRIORITY
, execution of the
LOAD DATA
statement is delayed until no other
clients are reading from the table.
If you specify CONCURRENT
with a
MyISAM
table that satisfies the condition for
concurrent inserts (that is, it contains no free blocks in the
middle), other threads can retrieve data from the table while
LOAD DATA
is executing. Using this option
affects the performance of LOAD DATA
a bit,
even if no other thread is using the table at the same time.
The LOCAL
keyword, if specified, is
interpreted with respect to the client end of the connection:
If LOCAL
is specified, the file is read
by the client program on the client host and sent to the
server. The file can be given as a full pathname to specify
its exact location. If given as a relative pathname, the
name is interpreted relative to the directory in which the
client program was started.
-
If LOCAL
is not specified, the file must
be located on the server host and is read directly by the
server. The server uses the following rules to locate the
file:
If the filename is an absolute pathname, the server uses
it as given.
If the filename is a relative pathname with one or more
leading components, the server searches for the file
relative to the server's data directory.
If a filename with no leading components is given, the
server looks for the file in the database directory of
the default database.
Note that, in the non-LOCAL
case, these rules
mean that a file named as ./myfile.txt
is
read from the server's data directory, whereas the file named as
myfile.txt
is read from the database
directory of the default database. For example, if
db1
is the default database, the following
LOAD DATA
statement reads the file
data.txt
from the database directory for
db1
, even though the statement explicitly
loads the file into a table in the db2
database:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
Windows pathnames are specified using forward slashes rather
than backslashes. If you do use backslashes, you must double
them.
For security reasons, when reading text files located on the
server, the files must either reside in the database directory
or be readable by all. Also, to use LOAD DATA
INFILE
on server files, you must have the
FILE
privilege. See
Section 5.7.3, “Privileges Provided by MySQL”.
Using LOCAL
is a bit slower than letting the
server access the files directly, because the contents of the
file must be sent over the connection by the client to the
server. On the other hand, you do not need the
FILE
privilege to load local files.
LOCAL
works only if your server and your
client both have been enabled to allow it. For example, if
mysqld was started with
--local-infile=0
, LOCAL
does
not work. See Section 5.6.4, “Security Issues with LOAD DATA LOCAL
”.
On Unix, if you need LOAD DATA
to read from a
pipe, you can use the following technique (here we load the
listing of the /
directory into a table):
mkfifo /mysql/db/x/x
chmod 666 /mysql/db/x/x
find / -ls > /mysql/db/x/x
mysql -e "LOAD DATA INFILE 'x' INTO TABLE x" x
The REPLACE
and IGNORE
keywords control handling of input rows that duplicate existing
rows on unique key values.
If you specify REPLACE
, input rows replace
existing rows. In other words, rows that have the same value for
a primary key or unique index as an existing row. See
Section 13.2.6, “REPLACE
Syntax”.
If you specify IGNORE
, input rows that
duplicate an existing row on a unique key value are skipped. If
you do not specify either option, the behavior depends on
whether the LOCAL
keyword is specified.
Without LOCAL
, an error occurs when a
duplicate key value is found, and the rest of the text file is
ignored. With LOCAL
, the default behavior is
the same as if IGNORE
is specified; this is
because the server has no way to stop transmission of the file
in the middle of the operation.
If you want to ignore foreign key constraints during the load
operation, you can issue a SET
FOREIGN_KEY_CHECKS=0
statement before executing
LOAD DATA
.
If you use LOAD DATA INFILE
on an empty
MyISAM
table, all non-unique indexes are
created in a separate batch (as for REPAIR
TABLE
). Normally, this makes LOAD DATA
INFILE
much faster when you have many indexes. In some
extreme cases, you can create the indexes even faster by turning
them off with ALTER TABLE ... DISABLE KEYS
before loading the file into the table and using ALTER
TABLE ... ENABLE KEYS
to re-create the indexes after
loading the file. See Section 7.2.16, “Speed of INSERT
Statements”.
LOAD DATA INFILE
is the complement of
SELECT ... INTO OUTFILE
. (See
Section 13.2.7, “SELECT
Syntax”.) To write data from a table to a file,
use SELECT ... INTO OUTFILE
. To read the file
back into a table, use LOAD DATA INFILE
. The
syntax of the FIELDS
and
LINES
clauses is the same for both
statements. Both clauses are optional, but
FIELDS
must precede LINES
if both are specified.
If you specify a FIELDS
clause, each of its
subclauses (TERMINATED BY
,
[OPTIONALLY] ENCLOSED BY
, and
ESCAPED BY
) is also optional, except that you
must specify at least one of them.
If you specify no FIELDS
clause, the defaults
are the same as if you had written this:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
If you specify no LINES
clause, the defaults
are the same as if you had written this:
LINES TERMINATED BY '\n' STARTING BY ''
In other words, the defaults cause LOAD DATA
INFILE
to act as follows when reading input:
Look for line boundaries at newlines.
Do not skip over any line prefix.
Break lines into fields at tabs.
Do not expect fields to be enclosed within any quoting
characters.
Interpret occurrences of tab, newline, or
‘\
’ preceded by
‘\
’ as literal characters
that are part of field values.
Conversely, the defaults cause SELECT ... INTO
OUTFILE
to act as follows when writing output:
Write tabs between fields.
Do not enclose fields within any quoting characters.
Use ‘\
’ to escape instances
of tab, newline, or ‘\
’ that
occur within field values.
Write newlines at the ends of lines.
Backslash is the MySQL escape character within strings, so to
write FIELDS ESCAPED BY '\\'
, you must
specify two backslashes for the value to be interpreted as a
single backslash.
Note: If you have generated the
text file on a Windows system, you might have to use
LINES TERMINATED BY '\r\n'
to read the file
properly, because Windows programs typically use two characters
as a line terminator. Some programs, such as
WordPad, might use \r
as a
line terminator when writing files. To read such files, use
LINES TERMINATED BY '\r'
.
If all the lines you want to read in have a common prefix that
you want to ignore, you can use LINES STARTING BY
'prefix_string
'
to skip
over the prefix, and anything before it. If
a line does not include the prefix, the entire line is skipped.
Suppose that you issue the following statement:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx';
If the data file looks like this:
xxx"abc",1
something xxx"def",2
"ghi",3
The resulting rows will be ("abc",1)
and
("def",2)
. The third row in the file will be
skipped because it does not contain the prefix.
The IGNORE number
LINES
option can be used to ignore lines at the start
of the file. For example, you can use IGNORE 1
LINES
to skip over an initial header line containing
column names:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;
When you use SELECT ... INTO OUTFILE
in
tandem with LOAD DATA INFILE
to write data
from a database into a file and then read the file back into the
database later, the field- and line-handling options for both
statements must match. Otherwise, LOAD DATA
INFILE
will not interpret the contents of the file
properly. Suppose that you use SELECT ... INTO
OUTFILE
to write a file with fields delimited by
commas:
SELECT * INTO OUTFILE 'data.txt'
FIELDS TERMINATED BY ','
FROM table2;
To read the comma-delimited file back in, the correct statement
would be:
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY ',';
If instead you tried to read in the file with the statement
shown following, it wouldn't work because it instructs
LOAD DATA INFILE
to look for tabs between
fields:
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
The likely result is that each input line would be interpreted
as a single field.
LOAD DATA INFILE
can be used to read files
obtained from external sources. For example, many programs can
export data in comma-separate values (CSV) format, such that
lines have fields separated by commas and enclosed within double
quotes. If lines in such a file are terminated by newlines, the
statement shown here illustrates the field- and line-handling
options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Any of the field- or line-handling options can specify an empty
string (''
). If not empty, the
FIELDS [OPTIONALLY] ENCLOSED BY
and
FIELDS ESCAPED BY
values must be a single
character. The FIELDS TERMINATED BY
,
LINES STARTING BY
, and LINES
TERMINATED BY
values can be more than one character.
For example, to write lines that are terminated by carriage
return/linefeed pairs, or to read a file containing such lines,
specify a LINES TERMINATED BY '\r\n'
clause.
To read a file containing jokes that are separated by lines
consisting of %%
, you can do this
CREATE TABLE jokes
(a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
joke TEXT NOT NULL);
LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n%%\n' (joke);
FIELDS [OPTIONALLY] ENCLOSED BY
controls
quoting of fields. For output (SELECT ... INTO
OUTFILE
), if you omit the word
OPTIONALLY
, all fields are enclosed by the
ENCLOSED BY
character. An example of such
output (using a comma as the field delimiter) is shown here:
"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"
If you specify OPTIONALLY
, the
ENCLOSED BY
character is used only to enclose
values from columns that have a string data type (such as
CHAR
, BINARY
,
TEXT
, or ENUM
):
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20
Note that occurrences of the ENCLOSED BY
character within a field value are escaped by prefixing them
with the ESCAPED BY
character. Also note that
if you specify an empty ESCAPED BY
value, it
is possible to inadvertently generate output that cannot be read
properly by LOAD DATA INFILE
. For example,
the preceding output just shown would appear as follows if the
escape character is empty. Observe that the second field in the
fourth line contains a comma following the quote, which
(erroneously) appears to terminate the field:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20
For input, the ENCLOSED BY
character, if
present, is stripped from the ends of field values. (This is
true regardless of whether OPTIONALLY
is
specified; OPTIONALLY
has no effect on input
interpretation.) Occurrences of the ENCLOSED
BY
character preceded by the ESCAPED
BY
character are interpreted as part of the current
field value.
If the field begins with the ENCLOSED BY
character, instances of that character are recognized as
terminating a field value only if followed by the field or line
TERMINATED BY
sequence. To avoid ambiguity,
occurrences of the ENCLOSED BY
character
within a field value can be doubled and are interpreted as a
single instance of the character. For example, if
ENCLOSED BY '"'
is specified, quotes are
handled as shown here:
"The ""BIG"" boss" -> The "BIG" boss
The "BIG" boss -> The "BIG" boss
The ""BIG"" boss -> The ""BIG"" boss
FIELDS ESCAPED BY
controls how to write or
read special characters. If the FIELDS ESCAPED
BY
character is not empty, it is used to prefix the
following characters on output:
The FIELDS ESCAPED BY
character
The FIELDS [OPTIONALLY] ENCLOSED BY
character
The first character of the FIELDS TERMINATED
BY
and LINES TERMINATED BY
values
ASCII 0
(what is actually written
following the escape character is ASCII
‘0
’, not a zero-valued byte)
If the FIELDS ESCAPED BY
character is empty,
no characters are escaped and NULL
is output
as NULL
, not \N
. It is
probably not a good idea to specify an empty escape character,
particularly if field values in your data contain any of the
characters in the list just given.
For input, if the FIELDS ESCAPED BY
character
is not empty, occurrences of that character are stripped and the
following character is taken literally as part of a field value.
The exceptions are an escaped ‘0
’
or ‘N
’ (for example,
\0
or \N
if the escape
character is ‘\
’). These
sequences are interpreted as ASCII NUL (a zero-valued byte) and
NULL
. The rules for NULL
handling are described later in this section.
For more information about
‘\
’-escape syntax, see
Section 9.1, “Literal Values”.
In certain cases, field- and line-handling options interact:
If LINES TERMINATED BY
is an empty string
and FIELDS TERMINATED BY
is non-empty,
lines are also terminated with FIELDS TERMINATED
BY
.
-
If the FIELDS TERMINATED BY
and
FIELDS ENCLOSED BY
values are both empty
(''
), a fixed-row (non-delimited) format
is used. With fixed-row format, no delimiters are used
between fields (but you can still have a line terminator).
Instead, column values are written and read using the
display widths of the columns. For example, if a column is
declared as INT(7)
, values for the column
are written using seven-character fields. On input, values
for the column are obtained by reading seven characters.
LINES TERMINATED BY
is still used to
separate lines. If a line does not contain all fields, the
rest of the columns are set to their default values. If you
do not have a line terminator, you should set this to
''
. In this case, the text file must
contain all fields for each row.
Fixed-row format also affects handling of
NULL
values, as described later. Note
that fixed-size format does not work if you are using a
multi-byte character set.
Handling of NULL
values varies according to
the FIELDS
and LINES
options in use:
For the default FIELDS
and
LINES
values, NULL
is
written as a field value of \N
for
output, and a field value of \N
is read
as NULL
for input (assuming that the
ESCAPED BY
character is
‘\
’).
If FIELDS ENCLOSED BY
is not empty, a
field containing the literal word NULL
as
its value is read as a NULL
value. This
differs from the word NULL
enclosed
within FIELDS ENCLOSED BY
characters,
which is read as the string 'NULL'
.
If FIELDS ESCAPED BY
is empty,
NULL
is written as the word
NULL
.
With fixed-row format (which is used when FIELDS
TERMINATED BY
and FIELDS ENCLOSED
BY
are both empty), NULL
is
written as an empty string. Note that this causes both
NULL
values and empty strings in the
table to be indistinguishable when written to the file
because both are written as empty strings. If you need to be
able to tell the two apart when reading the file back in,
you should not use fixed-row format.
Some cases are not supported by LOAD DATA
INFILE
:
Fixed-size rows (FIELDS TERMINATED BY
and
FIELDS ENCLOSED BY
both empty) and
BLOB
or TEXT
columns.
-
If you specify one separator that is the same as or a prefix
of another, LOAD DATA INFILE
cannot
interpret the input properly. For example, the following
FIELDS
clause would cause problems:
FIELDS TERMINATED BY '"' ENCLOSED BY '"'
If FIELDS ESCAPED BY
is empty, a field
value that contains an occurrence of FIELDS
ENCLOSED BY
or LINES TERMINATED
BY
followed by the FIELDS TERMINATED
BY
value causes LOAD DATA
INFILE
to stop reading a field or line too early.
This happens because LOAD DATA INFILE
cannot properly determine where the field or line value
ends.
The following example loads all columns of the
persondata
table:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;
By default, when no column list is provided at the end of the
LOAD DATA INFILE
statement, input lines are
expected to contain a field for each table column. If you want
to load only some of a table's columns, specify a column list:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
You must also specify a column list if the order of the fields
in the input file differs from the order of the columns in the
table. Otherwise, MySQL cannot tell how to match input fields
with table columns.
The column list can contain either column names or user
variables. With user variables, the SET
clause enables you to perform transformations on their values
before assigning the result to columns.
User variables in the SET
clause can be used
in several ways. The following example uses the first input
column directly for the value of t1.column1
,
and assigns the second input column to a user variable that is
subjected to a division operation before being used for the
value of t1.column2
:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @var1)
SET column2 = @var1/100;
The SET
clause can be used to supply values
not derived from the input file. The following statement sets
column3
to the current date and time:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, column2)
SET column3 = CURRENT_TIMESTAMP;
You can also discard an input value by assigning it to a user
variable and not assigning the variable to a table column:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @dummy, column2, @dummy, column3);
Use of the column/variable list and SET
clause is subject to the following restrictions:
Assignments in the SET
clause should have
only column names on the left hand side of assignment
operators.
You can use subqueries in the right hand side of
SET
assignments. A subquery that returns
a value to be assigned to a column may be a scalar subquery
only. Also, you cannot use a subquery to select from the
table that is being loaded.
Lines ignored by an IGNORE
clause are not
processed for the column/variable list or
SET
clause.
User variables cannot be used when loading data with
fixed-row format because user variables do not have a
display width.
When processing an input line, LOAD DATA
splits it into fields and uses the values according to the
column/variable list and the SET
clause, if
they are present. Then the resulting row is inserted into the
table. If there are BEFORE INSERT
or
AFTER INSERT
triggers for the table, they are
activated before or after inserting the row, respectively.
If an input line has too many fields, the extra fields are
ignored and the number of warnings is incremented.
If an input line has too few fields, the table columns for which
input fields are missing are set to their default values.
Default value assignment is described in
Section 11.1.4, “Data Type Default Values”.
An empty field value is interpreted differently than if the
field value is missing:
For string types, the column is set to the empty string.
For numeric types, the column is set to
0
.
For date and time types, the column is set to the
appropriate “zero” value for the type. See
Section 11.3, “Date and Time Types”.
These are the same values that result if you assign an empty
string explicitly to a string, numeric, or date or time type
explicitly in an INSERT
or
UPDATE
statement.
TIMESTAMP
columns are set to the current date
and time only if there is a NULL
value for
the column (that is, \N
), or if the
TIMESTAMP
column's default value is the
current timestamp and it is omitted from the field list when a
field list is specified.
LOAD DATA INFILE
regards all input as
strings, so you cannot use numeric values for
ENUM
or SET
columns the
way you can with INSERT
statements. All
ENUM
and SET
values must
be specified as strings.
When the LOAD DATA INFILE
statement finishes,
it returns an information string in the following format:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
If you are using the C API, you can get information about the
statement by calling the mysql_info()
function. See Section 25.2.3.34, “mysql_info()
”.
Warnings occur under the same circumstances as when values are
inserted via the INSERT
statement (see
Section 13.2.4, “INSERT
Syntax”), except that LOAD DATA
INFILE
also generates warnings when there are too few
or too many fields in the input row. The warnings are not stored
anywhere; the number of warnings can be used only as an
indication of whether everything went well.
You can use SHOW WARNINGS
to get a list of
the first max_error_count
warnings as
information about what went wrong. See
Section 13.5.4.28, “SHOW WARNINGS
Syntax”.