16/09/25 12:45:35 ERROR spark.SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: File does not exist: hdfs://hdfs1:9000/spark-event at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93) at org.apache.spark.SparkContext.<init>(SparkContext.scala:516) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
只需在hdfs创建指定的目录就好:hadoop fs -mkdir /spark-event
重新运行spark-shell,等待一段时间的加载后进入shell命令行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/09/25 13:05:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/09/25 13:06:21 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect. Spark context Web UI available at http://192.168.223.151:4040 Spark context available as 'sc' (master = spark://hdfs1:7077, app id = app-20160925130549-0002). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.1 /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101) Type in expressions to have them evaluated. Type :help for more information.
在shell输入以下代码行:
1
var wordcount=sc.textFile("hdfs://hdfs1:9000/spark/input/helloworld.txt").flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_).collect().foreach(println)