0%

Setup PyCharm For SystemML with PySpark

Install PySpark

1.1 Download Spark package

1.2 Set up SPARK_HOME at ~/.bash_profile

1
2
export SPARK_HOME="/Users/mac/Documents/Software/spark-2.1.1-bin-hadoop2.7"
export PATH="$SPARK_HOME/bin:$PATH"

1.3 Activate SPARK_HOME

1
source ~/.bash_profile

Install PyCharm

Create a Python project

Add PySpark library into the interpreter

Preference -> Project -> Project Interpreter -> Project Interpreter setting (Figure 1) -> Show paths for the selected interpreter (Figure 2) -> Add PySpark library (Figure 3 )

Figure 1:
Figure 2:
Figure 3:

Create a python file to prepare SparkContext & SparkSession

1
2
3
4
5
6
7
8
9
10
11
import os

os.environ["SPARK_HOME"] = "/Users/mac/Documents/Software/spark-2.1.1-bin-hadoop2.7"

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.context import SQLContext

sc = SparkContext(appName="SystemML_Learning", master="local[4]")
spark = SparkSession.builder.getOrCreate()
sqlCtx = SQLContext(sc)

Import it into the SystemML program at your python file

1
from src.pyspark_sc import *

Reference

  1. https://medium.com/data-science-cafe/pycharm-and-apache-spark-on-mac-os-x-990af6dc6f38