I am using h2o-3 java repo to load this frame but have been running into memory issues with constant GC pressure.
The actual frame size is 3.31 GB as per h2o logs, but the peak JVM usage comes to be about 60G when trying to read the actual frame.
Loaded H2O frame: Frame key: TestData.csv
cols: 64
rows: 16375903
chunks: 64
size: 3552977559
[4572.372s][info ][gc,heap ] GC(751) PSYoungGen: 37597471K(37605888K)->133505K(37607424K) Eden: 37463552K(37463552K)->0K(37465600K) From: 133919K(142336K)->133505K(141824K)
[4572.372s][info ][gc,heap ] GC(751) ParOldGen: 37023812K(75497472K)->37023812K(75497472K)
[4572.372s][info ][gc,metaspace ] GC(751) Metaspace: 149267K(150464K)->149267K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4572.372s][info ][gc ] GC(751) Pause Young (Allocation Failure) 72872M->36286M(110454M) 34.508ms
[4572.372s][info ][gc,cpu ] GC(751) User=0.41s Sys=0.01s Real=0.04s
[4573.512s][info ][gc,start ] GC(752) Pause Young (Allocation Failure)
[4573.547s][info ][gc,heap ] GC(752) PSYoungGen: 37599105K(37607424K)->134243K(37606912K) Eden: 37465600K(37465600K)->0K(37465600K) From: 133505K(141824K)->134243K(141312K)
[4573.547s][info ][gc,heap ] GC(752) ParOldGen: 37023812K(75497472K)->37025404K(75497472K)
[4573.547s][info ][gc,metaspace ] GC(752) Metaspace: 149267K(150464K)->149267K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4573.547s][info ][gc ] GC(752) Pause Young (Allocation Failure) 72873M->36288M(110453M) 34.940ms
[4573.547s][info ][gc,cpu ] GC(752) User=0.42s Sys=0.01s Real=0.04s
[4574.694s][info ][gc,start ] GC(753) Pause Young (Allocation Failure)
[4574.730s][info ][gc,heap ] GC(753) PSYoungGen: 37599843K(37606912K)->140646K(37603840K) Eden: 37465600K(37465600K)->0K(37463040K) From: 134243K(141312K)->140646K(140800K)
[4574.730s][info ][gc,heap ] GC(753) ParOldGen: 37025404K(75497472K)->37030581K(75497472K)
[4574.730s][info ][gc,metaspace ] GC(753) Metaspace: 149268K(150464K)->149268K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4574.730s][info ][gc ] GC(753) Pause Young (Allocation Failure) 72876M->36300M(110450M) 36.442ms
[4574.730s][info ][gc,cpu ] GC(753) User=0.45s Sys=0.00s Real=0.03s
[4575.808s][info ][gc,start ] GC(754) Pause Young (Allocation Failure)
[4575.844s][info ][gc,heap ] GC(754) PSYoungGen: 37603686K(37603840K)->135682K(37605888K) Eden: 37463040K(37463040K)->0K(37463040K) From: 140646K(140800K)->135682K(142848K)
[4575.844s][info ][gc,heap ] GC(754) ParOldGen: 37030581K(75497472K)->37038668K(75497472K)
[4575.844s][info ][gc,metaspace ] GC(754) Metaspace: 149268K(150464K)->149268K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4575.844s][info ][gc ] GC(754) Pause Young (Allocation Failure) 72885M->36303M(110452M) 36.014ms
[4575.844s][info ][gc,cpu ] GC(754) User=0.44s Sys=0.01s Real=0.03s
[4576.901s][info ][gc,start ] GC(755) Pause Young (Allocation Failure)
[4576.936s][info ][gc,heap ] GC(755) PSYoungGen: 37598450K(37605888K)->134747K(37606912K) Eden: 37462768K(37463040K)->0K(37464576K) From: 135682K(142848K)->134747K(142336K)
[4576.936s][info ][gc,heap ] GC(755) ParOldGen: 37038668K(75497472K)->37040005K(75497472K)
[4576.936s][info ][gc,metaspace ] GC(755) Metaspace: 149268K(150464K)->149268K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4576.936s][info ][gc ] GC(755) Pause Young (Allocation Failure) 72887M->36303M(110453M) 34.523ms
[4576.936s][info ][gc,cpu ] GC(755) User=0.42s Sys=0.00s Real=0.04s
[4577.987s][info ][gc,start ] GC(756) Pause Young (Allocation Failure)
[4578.021s][info ][gc,heap ] GC(756) PSYoungGen: 37599323K(37606912K)->135186K(37606400K) Eden: 37464576K(37464576K)->0K(37464576K) From: 134747K(142336K)->135186K(141824K)
[4578.021s][info ][gc,heap ] GC(756) ParOldGen: 37040005K(75497472K)->37040005K(75497472K)
[4578.021s][info ][gc,metaspace ] GC(756) Metaspace: 149268K(150464K)->149268K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
[4578.021s][info ][gc ] GC(756) Pause Young (Allocation Failure) 72889M->36303M(110453M) 34.372ms
[4578.021s][info ][gc,cpu ] GC(756) User=0.42s Sys=0.01s Real=0.03s
[4579.074s][info ][gc,start ] GC(757) Pause Young (Allocation Failure)
[4579.109s][info ][gc,heap ] GC(757) PSYoungGen: 37599762K(37606400K)->136628K(37607424K) Eden: 37464576K(37464576K)->0K(37466112K) From: 135186K(141824K)->136628K(141312K)
[4579.109s][info ][gc,heap ] GC(757) ParOldGen: 37040005K(75497472K)->37040005K(75497472K)
[4579.109s][info ][gc,metaspace ] GC(757) Metaspace: 149268K(150464K)->149268K(150464K) NonClass: 128211K(128896K)->128211K(128896K) Class: 21056K(21568K)->21056K(21568K)
Given the above logs, what configurations I can use to relieve the memory pressure from the jvm? I am executing and trying to parse this large data frame in h2o on EMR Serverless with 108G memory and 16cores on executor, but still running into memory issues while parsing this large single file.