本篇文章主要介绍了"Two ways to load mysql tables into hdfs via spark",主要涉及到方面的内容,对于MySql感兴趣的同学可以参考一下:
There are two ways to load mysql tables into hdfs via spark, then process these ...
There are two ways to load mysql tables into hdfs via spark, then process these datas.- Load mysql tables: use JDBCRDD directely
package org.apache.spark.examples.sql
import org.apache.spark.sql.SQLContext
import java.sql.{ Connection, DriverManager, ResultSet }
import org.apache.spark.rdd.JdbcRDD
import org.apache.spark.{ SparkConf, SparkContext }
import java.util.HashMap
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.hive.HiveContext
object LoadFromMysql {
def escape(ori: String) = {
if(ori!=null){
ori.replace("&", "&").replace("\t", " ").replace("\n", "
")
}else{
ori
}
}
def main(args: Array[String]) {
if (args.length != 6) {
System.err.println("Usage LoadFromMysql
- Load mysql tables: SQLContext.load and save table with parquet format
SQLContext way is also based on JDBCRDD, just spark provide more parquet support in SqlContext.
package org.apache.spark.examples.sql
import org.apache.spark.sql.SQLContext
import java.sql.{ Connection, DriverManager, ResultSet }
import org.apache.spark.rdd.JdbcRDD
import org.apache.spark.{ SparkConf, SparkContext }
import java.util.HashMap
import org.apache.spark.api.java.JavaSparkContext
/**
* @author ChenFangFang
*/
object LoadFromMysql_SqlContext {
def main(args: Array[String]) {
if (args.length != 6) {
System.err.println("Usage LoadFromMysql_SqlContext
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val df = sqlContext.parquetFile(...).toDF()
df.registerTempTable("parquetTable")
sqlContext.sql("SELECT * FROM parquetTable where id=1").collect().foreach(println)
版权声明:本文为博主原创文章,未经博主允许不得转载。
以上就介绍了Two ways to load mysql tables into hdfs via spark,包括了方面的内容,希望对MySql有兴趣的朋友有所帮助。
本文网址链接:http://www.codes51.com/article/detail_165371.html
相关图片
相关文章