J'ai vraiment besoin d'un peu d'aide ici:
Nous sommes à l'aide de Spark3.1.2 utilisation autonome de cluster. Depuis que nous avons commencé à l'aide de la s3a répertoire committer, notre étincelle d'emplois de la stabilité et de la performance a augmenté de manière significative!
Ces derniers temps, cependant, nous sommes complètement dérouté la résolution de ce s3a répertoire committer question de jours, et je me demande si vous avez une idée de ce qui se passe?
Notre étincelle projets échouent à cause de Java OOM (ou plutôt de la limite de processus) erreur:
An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:803)
at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343)
at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
at java.base/java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:714)
at org.apache.spark.rpc.netty.DedicatedMessageLoop.$anonfun$new$1(MessageLoop.scala:174)
at org.apache.spark.rpc.netty.DedicatedMessageLoop.$anonfun$new$1$adapted(MessageLoop.scala:173)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at org.apache.spark.rpc.netty.DedicatedMessageLoop.<init>(MessageLoop.scala:173)
at org.apache.spark.rpc.netty.Dispatcher.liftedTree1$1(Dispatcher.scala:75)
at org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:72)
at org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:136)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:231)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:394)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Spark Thread Dump montre de plus de 5000 committer threads sur l'étincelle de pilote! Voici un exemple:
Thread ID Thread Name Thread State Thread Locks
1047 s3-committer-pool-0 WAITING
1449 s3-committer-pool-0 WAITING
1468 s3-committer-pool-0 WAITING
1485 s3-committer-pool-0 WAITING
1505 s3-committer-pool-0 WAITING
1524 s3-committer-pool-0 WAITING
1529 s3-committer-pool-0 WAITING
1544 s3-committer-pool-0 WAITING
1549 s3-committer-pool-0 WAITING
1809 s3-committer-pool-0 WAITING
1972 s3-committer-pool-0 WAITING
1998 s3-committer-pool-0 WAITING
2022 s3-committer-pool-0 WAITING
2043 s3-committer-pool-0 WAITING
2416 s3-committer-pool-0 WAITING
2453 s3-committer-pool-0 WAITING
2470 s3-committer-pool-0 WAITING
2517 s3-committer-pool-0 WAITING
2534 s3-committer-pool-0 WAITING
2551 s3-committer-pool-0 WAITING
2580 s3-committer-pool-0 WAITING
2597 s3-committer-pool-0 WAITING
2614 s3-committer-pool-0 WAITING
2631 s3-committer-pool-0 WAITING
2726 s3-committer-pool-0 WAITING
2743 s3-committer-pool-0 WAITING
2763 s3-committer-pool-0 WAITING
2780 s3-committer-pool-0 WAITING
2819 s3-committer-pool-0 WAITING
2841 s3-committer-pool-0 WAITING
2858 s3-committer-pool-0 WAITING
2875 s3-committer-pool-0 WAITING
2925 s3-committer-pool-0 WAITING
2942 s3-committer-pool-0 WAITING
2963 s3-committer-pool-0 WAITING
2980 s3-committer-pool-0 WAITING
3020 s3-committer-pool-0 WAITING
3037 s3-committer-pool-0 WAITING
3055 s3-committer-pool-0 WAITING
3072 s3-committer-pool-0 WAITING
3127 s3-committer-pool-0 WAITING
3144 s3-committer-pool-0 WAITING
3163 s3-committer-pool-0 WAITING
3180 s3-committer-pool-0 WAITING
3222 s3-committer-pool-0 WAITING
3242 s3-committer-pool-0 WAITING
3259 s3-committer-pool-0 WAITING
3278 s3-committer-pool-0 WAITING
3418 s3-committer-pool-0 WAITING
3435 s3-committer-pool-0 WAITING
3452 s3-committer-pool-0 WAITING
3469 s3-committer-pool-0 WAITING
3486 s3-committer-pool-0 WAITING
3491 s3-committer-pool-0 WAITING
3501 s3-committer-pool-0 WAITING
3508 s3-committer-pool-0 WAITING
4029 s3-committer-pool-0 WAITING
4093 s3-committer-pool-0 WAITING
4658 s3-committer-pool-0 WAITING
4666 s3-committer-pool-0 WAITING
4907 s3-committer-pool-0 WAITING
5102 s3-committer-pool-0 WAITING
5119 s3-committer-pool-0 WAITING
5158 s3-committer-pool-0 WAITING
5175 s3-committer-pool-0 WAITING
5192 s3-committer-pool-0 WAITING
5209 s3-committer-pool-0 WAITING
5226 s3-committer-pool-0 WAITING
5395 s3-committer-pool-0 WAITING
5634 s3-committer-pool-0 WAITING
5651 s3-committer-pool-0 WAITING
5668 s3-committer-pool-0 WAITING
5685 s3-committer-pool-0 WAITING
5702 s3-committer-pool-0 WAITING
5722 s3-committer-pool-0 WAITING
5739 s3-committer-pool-0 WAITING
6144 s3-committer-pool-0 WAITING
6167 s3-committer-pool-0 WAITING
6289 s3-committer-pool-0 WAITING
6588 s3-committer-pool-0 WAITING
6628 s3-committer-pool-0 WAITING
6645 s3-committer-pool-0 WAITING
6662 s3-committer-pool-0 WAITING
6675 s3-committer-pool-0 WAITING
6692 s3-committer-pool-0 WAITING
6709 s3-committer-pool-0 WAITING
7049 s3-committer-pool-0 WAITING
Cela est de considérer que nos paramètres de ne pas permettre à plus de 100 threads... Ou que nous ne comprenons pas quelque chose...
Voici nos configurations et paramètres:
fs.s3a.threads.max 100
fs.s3a.connection.maximum 1000
fs.s3a.committer.threads 16
fs.s3a.max.total.tasks 5
fs.s3a.committer.name directory
fs.s3a.fast.upload.buffer disk
io.file.buffer.size 1048576
mapreduce.outputcommitter.factory.scheme.s3a - org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
Nous avons essayé différentes versions de la bougie d'Hadoop sur le cloud la bibliothèque, mais le problème est toujours le même.
Nous serions vraiment reconnaissants si vous pouvez nous indiquer la bonne direction
2