developer tip

java.io.IOException : Hadoop 바이너리에서 null \ bin \ winutils.exe 실행 파일을 찾을 수 없습니다.

optionbox 2020. 10. 18. 09:20
반응형

java.io.IOException : Hadoop 바이너리에서 null \ bin \ winutils.exe 실행 파일을 찾을 수 없습니다. Windows 7에서 Eclipse 스파크


설치된 (Maven Spark 프로젝트) spark에서 간단한 작업 을 실행할 수 없습니다 Scala IDE.Windows 7

Spark 핵심 종속성이 추가되었습니다.

val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()

오류:

16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
    at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
    at scala.Option.map(Option.scala:145)<br>
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
    at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
    at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
    at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

다음 은 솔루션 문제에 대한 좋은 설명입니다.

  1. http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe 에서 winutils.exe를 다운로드합니다 .
  2. OS 수준에서 또는 프로그래밍 방식으로 HADOOP_HOME 환경 변수를 설정합니다.

    System.setProperty ( "hadoop.home.dir", "winutils가있는 폴더의 전체 경로");

  3. 즐겨


  1. winutils.exe 다운로드
  2. 폴더 만들기, 말 C:\winutils\bin
  3. winutils.exe내부 복사C:\winutils\bin
  4. 환경 변수 HADOOP_HOME를 다음으로 설정C:\winutils

이것을 따르십시오 :

  1. Create a bin folder in any directory(to be used in step 3).

  2. Download winutils.exe and place it in the bin directory.

  3. Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR"); in your code.


if we see below issue

ERROR Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

then do following steps

  1. download winutils.exe from http://public-repo-1.hortonworks.com/hdp- win-alpha/winutils.exe.
  2. and keep this under bin folder of any folder you created for.e.g. C:\Hadoop\bin
  3. and in program add following line before creating SparkContext or SparkConf System.setProperty("hadoop.home.dir", "C:\Hadoop");

On Windows 10 - you should add two different arguments.

(1) Add the new variable and value as - HADOOP_HOME and path (i.e. c:\Hadoop) under System Variables.

(2) Add/append new entry to the "Path" variable as "C:\Hadoop\bin".

The above worked for me.


Setting the Hadoop_Home environment variable in system properties didn't work for me. But this did:

  • Set the Hadoop_Home in the Eclipse Run Configurations environment tab.
  • Follow the 'Windows Environment Setup' from here

I got the same problem while running unit tests. I found this workaround solution:

The following workaround allows to get rid of this message:

    File workaround = new File(".");
    System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
    new File("./bin").mkdirs();
    new File("./bin/winutils.exe").createNewFile();

from: https://issues.cloudera.org/browse/DISTRO-544


You can alternatively download winutils.exe from GITHub:

https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin

replace hadoop-2.7.1 with the version you want and place the file in D:\hadoop\bin

If you do not have access rights to the environment variable settings on your machine, simply add the below line to your code:

System.setProperty("hadoop.home.dir", "D:\\hadoop");

1) Download winutils.exe from https://github.com/steveloughran/winutils 
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code 
  System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"

On top of mentioning your environment variable for HADOOP_HOME in windows as C:\winutils, you also need to make sure you are the administrator of the machine. If not and adding environment variables prompts you for admin credentials (even under USER variables) then these variables will be applicable once you start your command prompt as administrator.


I have also faced the similar problem with the following details Java 1.8.0_121, Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is -

System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");

PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using

spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path to a demo test file /path to output directory command

Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned,

D:\BigData\spark-2.3.0-bin-hadoop2.7\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\BigData\spark-quickstart\target\spark-quickstart-0.0.1-SNAPSHOT.jar D:\BigData\spark-quickstart\wordcount.txt


That's a tricky one... Your storage letter must be capical. For example "C:\..."

참고URL : https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha

반응형