Final answer:
To create a Delta table in Databricks with PySpark, you need to import PySpark SQL functions, create a SparkSession, define the schema, create a DataFrame with data and save it as a Delta table using the .format('delta').save() method.
Step-by-step explanation:
To create a Delta table in Databricks using PySpark, follow the steps outlined below:
- Start by importing the necessary PySpark SQL functions and types:
- from pyspark.sql import SparkSession
- from pyspark.sql.types import StructType, StructField, StringType, IntegerType
- Next, create a SparkSession:
- spark = SparkSession.builder.appName("DeltaTableExample").getOrCreate()
- Define the schema for your table:
- schema = StructType([
- StructField("name", StringType(), True),
- StructField("age", IntegerType(), True),
- StructField("city", StringType(), True)
- ])
- Create a DataFrame with some data:
- data = [("John Doe", 30, "New York"), ("Jane Smith", 25, "Los Angeles")]
- # Apply the schema to the RDD and create a DataFrame
- df = spark.createDataFrame(data, schema)
- To save the DataFrame as a Delta table, use the .format() method with 'delta' and call the .save() method:
- df.write.format("delta").save("/delta/events")
After performing these steps, you will have successfully created a Delta table in Databricks using PySpark.