36.3k views
1 vote
How to create delta table in databricks using pyspark

1 Answer

2 votes

Final answer:

To create a Delta table in Databricks with PySpark, you need to import PySpark SQL functions, create a SparkSession, define the schema, create a DataFrame with data and save it as a Delta table using the .format('delta').save() method.

Step-by-step explanation:

To create a Delta table in Databricks using PySpark, follow the steps outlined below:

  • Start by importing the necessary PySpark SQL functions and types:
  • from pyspark.sql import SparkSession

  • from pyspark.sql.types import StructType, StructField, StringType, IntegerType

  • Next, create a SparkSession:
  • spark = SparkSession.builder.appName("DeltaTableExample").getOrCreate()

  • Define the schema for your table:
  • schema = StructType([

  • StructField("name", StringType(), True),

  • StructField("age", IntegerType(), True),

  • StructField("city", StringType(), True)

  • ])

  • Create a DataFrame with some data:
  • data = [("John Doe", 30, "New York"), ("Jane Smith", 25, "Los Angeles")]


  • # Apply the schema to the RDD and create a DataFrame

  • df = spark.createDataFrame(data, schema)

  • To save the DataFrame as a Delta table, use the .format() method with 'delta' and call the .save() method:
  • df.write.format("delta").save("/delta/events")

After performing these steps, you will have successfully created a Delta table in Databricks using PySpark.

User Danblack
by
7.2k points