PYTHON
class IcebergS3Example(SparklithEntryPoint):
def run(self):
bucket = "<your-bucket-name>"
warehouse_path = f"s3a://{bucket}/iceberg-warehouse"
with OrchesteraSparkSession(
app_name="IcebergS3Example",
executor_instances=4,
executor_cores=2,
executor_memory="8g",
) as spark:
spark.sparkContext.setLogLevel("ERROR")
# Read sample data from publicly available S3
df = spark.read.parquet(
"s3a://ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet"
).limit(1000)
# Create Iceberg table and write data
table_name = "local.example.ookla_performance"
spark.sql("CREATE NAMESPACE IF NOT EXISTS local.example")
df.writeTo(table_name).createOrReplace()
# Read back from Iceberg table
iceberg_df = spark.table(table_name))
# Show table history (time travel metadata)
spark.sql(f"SELECT * FROM {table_name}.history").show()
# Show table snapshots
spark.sql(f"SELECT * FROM {table_name}.snapshots").show()
Data pipelines break, APIs fail, networks flake, and services crash. That's not your problem anymore. Managing reliability shouldn't mean constant firefighting.
Orchestera platform scales with your data and your compute needs. It also scales down when you don't need it, so you only pay for what you use, billed directly in your AWS account, without any compute markup.
You simply write your data processing logic in the programming languages you already use with our native SDKs and deploy it to using the Orchestera platform.
Orchestera handles the rest, including scaling your compute and data storage. It's even optimized to avoid unnecessary data transfers across availability zones and rebalances your pipelines across regions to maintain your SLAs.
Write your data processing logic in the programming languages you already use with our native SDKs. Your days of writing boilerplate code are over.
Infrastructure breaks, nodes crash, jobs timeout. Sparklith automatically handles all failures and keeps your data processing running.
Auto-scale from 1 to 1000+ nodes based on workload. Built-in cost optimization ensures you never overspend.
Years of production tuning built-in. Minimize network penalties, optimize I/O, prevent disk spillage.
No more hunting through logs. See the exact state of every job, every cluster, every step.
From model training to data prep, Orchestera keeps your AI workflows resilient and repeatable.
Process data at scale with automatic failure recovery and cost optimization.
Transform petabytes of data reliably. One failed node won't break your entire pipeline.
Train models on massive datasets using Spark without worrying about infrastructure failures. Iterate model development using Spark in Jupyter notebooks.
Run complex analytics jobs that can recover from any failure and continue where they left off.
Process large datasets overnight with confidence. Wake up to completed jobs, not failures.
Built by engineers who've built and scaled systems at:
It sounds like magic, we promise it's not.
© 2026 Orchestera Software Services. All Rights Reserved.