
/usr/local/service/kylin/bin/sample.sh
learn_kylin工程,然后选择名为kylin_sales_cube的样例 Cube,选择Actions>Build,选择一个在2014-01-01之后的日期(覆盖所有的10000样例记录)。



select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt
kylin.properties中设置kylin.env.hadoop-conf-dir属性。kylin.env.hadoop-conf-dir=/usr/local/service/hadoop/etc/hadoop
$KYLIN_HOME/spark中嵌入一个 Spark binary (v2.1.2),所有使用kylin.engine.spark-conf.作为前缀的 Spark 配置属性都能在$KYLIN_HOME/conf/kylin.properties中进行管理。这些属性当运行提交 Spark job 时会被提取并应用。例如,如果您配置kylin.engine.spark-conf.spark.executor.memory=4G,Kylin 将会在执行spark-submit操作时使用–conf spark.executor.memory=4G作为参数。kylin.engine.spark-conf.spark.master=yarnkylin.engine.spark-conf.spark.submit.deployMode=clusterkylin.engine.spark-conf.spark.dynamicAllocation.enabled=truekylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300kylin.engine.spark-conf.spark.yarn.queue=defaultkylin.engine.spark-conf.spark.driver.memory=2Gkylin.engine.spark-conf.spark.executor.memory=4Gkylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024kylin.engine.spark-conf.spark.executor.cores=1kylin.engine.spark-conf.spark.network.timeout=600kylin.engine.spark-conf.spark.shuffle.service.enabled=true#kylin.engine.spark-conf.spark.executor.instances=1kylin.engine.spark-conf.spark.eventLog.enabled=truekylin.engine.spark-conf.spark.hadoop.dfs.replication=2kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=truekylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodeckylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodeckylin.engine.spark-conf.spark.eventLog.dir=hdfs\\:///kylin/spark-historykylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\\:///kylin/spark-history## uncomment for HDP#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
hdp.version指定为 Yarn 容器的 Java 选项,因此需取消kylin.properties中最后三行的注释。jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ .hadoop fs -mkdir -p /kylin/spark/hadoop fs -put spark-libs.jar /kylin/spark/
kylin.properties中进行如下配置:kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar
kylin.engine.spark-conf.*参数都可以在 Cube 或 Project 级别进行重写,这为用户提供了灵活性。
3. 创建和修改样例 cube
运行sample.sh创建样例 cube,然后启动 Kylin 服务器:/usr/local/service/kylin/bin/sample.sh/usr/local/service/kylin/bin/kylin.sh start
kylin_sales的 cube,将Cube Engine由MapReduce修改为Spark(Beta):

kylin.engine.spark.rdd-partition-cut-mb其值为500。



文档反馈