Quantcast
Channel: Active questions tagged config - Stack Overflow
Viewing all articles
Browse latest Browse all 5054

Best practise for Pyspark Orchestration and code packaging [closed]

$
0
0

I'm trying to find the best practises for orchestrating and packaging pyspark ETL jobs, I tried reading a lot of articles on that. Regarding orchestrating, what most articles are recommending is, to use a main.py program while submitting the spark-submit command. But at this point every articles definition of what the "main.py" has to do differs. Below are my doubts?

1.Some articles are sugesting, to read the config file in the main.py itself, some are suggesting to pass it as a argument in the spark-submit command, like in this link. what is the best approach? and why one approach is better than the other?

2.I want to run a number of ETL jobs sequentially on the success of each job and in case of abort, An email has to be sent and when resubmitted, the job should start from the point of failure, Should I Include this functionality as part of the main.py program or it should be somewhere else?

Your response is appreciated.


Viewing all articles
Browse latest Browse all 5054

Trending Articles