setup local development environment for pyspark in Windows

in #spark6 years ago
  1. Install Java(8+)
    https://www.java.com/en/download/help/download_options.xml#windows
    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    Add to Environment Variables
    JAVA_HOME C:\Programs Files\jdk1.8.0_192
    add to Path C:\Program Files\Java\jdk1.8.0_191\bin
    C:\Program Files\Java\jre1.8.0_191\bin

    1. Setup Hadoop winutils:
      Download from this GitHub repo https://github.com/steveloughran/winutils
      Copy bin folder under Hadoop 2.7.1 to your location. For example C:\ProgramData\wintuils
      Add to Environment Variables
      HADOOP_HOME C:\ProgramData\winutils
      add to Path %HADOOP_HOME%\bin

    2. Install Anaconda(5.3)
      https://www.anaconda.com/download/
      Downgrade python to 3.6.5 since python 3.7 may not compatible with some packages

    3. Install Spark(2.3.2 recommended) with Hadoop 2.7
      Download .tgz from https://www.apache.org/dyn/closer.lua/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

    You might need to install 7zip tp unzip .tgz, move unzipped folder to a separate location, for example, C:\

    Add to Environment Variables
    SPARK_HOME C:\spark-2.3.2-bin-hadoop2.7
    Add to Path %SPARK_HOME%\bin
    To verify it, open Command Prompt, cd to bin folder of SPARK_HOME, and type pyspark

    1. Install Pycharm
      https://www.jetbrains.com/pycharm/download/#section=windows
      set up interpreter in pycharm, using Conda Environment and adding packages by click "+"

    2. For command line run, using pip install -r requirements.txt
      Using pip freeze > requirements.txt when updating dependencies
      Or open Pycharm the Settings/Preferences dialog (Ctrl+Alt+S) and select Tools | Python Integrated Tools.
      In the Package requirements file field, type the name of the requirements file or click the browse button and locate the desired file.

    Click OK to save the changes.

Sort:  

Congratulations @weileng! You received a personal award!

1 Year on Steemit

Click here to view your Board of Honor

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @weileng! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!