The following example uses parquet
for the cloudFiles.format
. Use csv
, avro
, or json
for other file sources. All other settings for read and write stay the same for the default behaviors for each format.
(spark.readStream.format("cloudFiles") .option("cloudFiles.format", "parquet") # The schema location directory keeps track of your data schema over time .option("cloudFiles.schemaLocation", "<path-to-checkpoint>") .load("<path-to-source-data>") .writeStream .option("checkpointLocation", "<path-to-checkpoint>") .start("<path_to_target") )