Yang, Tong 3f5759eed2 MRS comp-lts 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2023-01-19 17:08:45 +00:00

26 lines
2.2 KiB
HTML

<a name="mrs_01_1700"></a><a name="mrs_01_1700"></a>
<h1 class="topictitle1">Why Does Array Border-crossing Occur During FileInputFormat Split?</h1>
<div id="body8662426"><div class="section" id="mrs_01_1700__en-us_topic_0000001173949232_sd6cfe94a8277481e9cebd708c59d735f"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1700__en-us_topic_0000001173949232_ac1d0f85c88c549a19f990308df4feef7">When HDFS calls the FileInputFormat getSplit method, the ArrayIndexOutOfBoundsException: 0 appears in the following log:</p>
<pre class="screen" id="mrs_01_1700__en-us_topic_0000001173949232_s7645825abdf242d89795e426b4163991">java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:210)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)</pre>
</div>
<div class="section" id="mrs_01_1700__en-us_topic_0000001173949232_sbb1b6b67f3d8491f8579feeb968fbd5c"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_1700__en-us_topic_0000001173949232_a159ffeeaa3a84a83b3d03bac23ce2660">The elements of each block correspondent frame are as below: /default/rack0/:,/default/rack0/datanodeip:port.</p>
<p id="mrs_01_1700__en-us_topic_0000001173949232_a3f193adfb98e4d8ba2e63a645109fb87">The problem is due to a block damage or loss, making the block correspondent machine ip and port become null. Use <strong id="mrs_01_1700__en-us_topic_0000001173949232_b13503171315181">hdfs fsck</strong> to check the file blocks health state when this problem occurs, and remove damaged block or restore the missing block to re-computing the task.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1690.html">FAQ</a></div>
</div>
</div>