The following example uses parquet for the cloudFiles.format. Use csv, avro, or json for other file sources. All other settings for read and write stay the same for the default behaviors for each format.
(spark.readStream.format("cloudFiles") .option("cloudFiles.format", "parquet") # The schema location directory keeps track of your data schema over time .option("cloudFiles.schemaLocation", "<path-to-checkpoint>") .load("<path-to-source-data>") .writeStream .option("checkpointLocation", "<path-to-checkpoint>") .start("<path_to_target") )