Spark
A spark task type’s example and dive into information of PyDolphinScheduler.
Example
"""A example workflow for task spark."""
from pydolphinscheduler.core.process_definition import ProcessDefinition
from pydolphinscheduler.tasks.spark import DeployMode, ProgramType, Spark
with ProcessDefinition(name="task_spark_example", tenant="tenant_exists") as pd:
task = Spark(
name="task_spark",
main_class="org.apache.spark.examples.SparkPi",
main_package="spark-examples_2.12-3.2.0.jar",
program_type=ProgramType.JAVA,
deploy_mode=DeployMode.LOCAL,
)
pd.run()
Dive Into
Task Spark.
- class pydolphinscheduler.tasks.spark.DeployMode[source]
Bases:
str
SPARK deploy mode, for now it just contain LOCAL, CLIENT and CLUSTER.
- CLIENT = 'client'
- CLUSTER = 'cluster'
- LOCAL = 'local'
- class pydolphinscheduler.tasks.spark.Spark(name: str, main_class: str, main_package: str, program_type: Optional[ProgramType] = 'SCALA', deploy_mode: Optional[DeployMode] = 'cluster', spark_version: Optional[SparkVersion] = 'SPARK2', app_name: Optional[str] = None, driver_cores: Optional[int] = 1, driver_memory: Optional[str] = '512M', num_executors: Optional[int] = 2, executor_memory: Optional[str] = '2G', executor_cores: Optional[int] = 2, main_args: Optional[str] = None, others: Optional[str] = None, *args, **kwargs)[source]
Bases:
Engine
Task spark object, declare behavior for spark task to dolphinscheduler.
- _downstream_task_codes: Set[int]
- _task_custom_attr: set = {'app_name', 'deploy_mode', 'driver_cores', 'driver_memory', 'executor_cores', 'executor_memory', 'main_args', 'num_executors', 'others', 'spark_version'}
- _task_relation: Set[TaskRelation]
- _upstream_task_codes: Set[int]