-1

Im using Spark3 and Im observing that the cached partitions are getting dropped from memory. This is what Im doing:

  1. caching a df
  2. applying a filter on the cached df, assigning the result into a new df and caching it

df1=spark.table("table1").cache() - fraction cached reaches 100% and then drops down df2=df1.filter(col("id).isin(1,2,3,4)).cache()

Have sufficient memory for execution and storage. No spills observed No evictions observed But still after the cached fractions reach 98-100%, start dropping And slowly the percentage picks up as the second df is being cached. It seems like it is recomputing the cache of the first df again.

Why are the cached partitions getting dropped? Why is it recomputing again? Any ideas?

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.