[en] Apache Spark Code Tool

[en] The Apache Spark Code tool is a code editor that creates an Apache Spark context and executes Apache Spark commands directly from Designer. This tool uses the R programming language.

[en] For additional information, see Apache Spark Direct, Apache Spark on Databricks, and Apache Spark on Microsoft Azure HDInsight.

[en] Connect to Apache Spark

[en] Option 1

[en] Connect to your Apache Spark cluster directly.

[en] Drag a Connect In-DB Tool or Data Stream In Tool onto the canvas.
[en] Select the Connection Name dropdown arrow and select Manage connection.

[en] Option 2

[en] Alternatively, connect directly with the Apache Spark Code tool.

[en] Drag the Apache Spark Code tool onto the canvas.
[en] Under Data Connection, select the Connection Name dropdown arrow and select Manage connection.

[en] Both methods bring up the Manage In-DB Connections window. In Manage In-DB Connections, select a Data Source.

[en] Code Editor

[en] With an Apache Spark Direct connection established, the Code Editor activates. Use Insert Code to generate template functions in the code editor.

[en] Scala

[en] Import Library creates an import statement.

[en] import package

[en] Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

[en] val[en] dataFrame = readAlteryxData(1)

[en] Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

[en] writeAlteryxData([en] dataFrame, 1)

[en] Log Message creates a logAlteryxMessage function to write a string to the log as a message.

[en] logAlteryxMessage("Example message")

[en] Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

[en] logAlteryxWarning("Example warning")

[en] Log Error creates a logAlteryxError functions to write a string to the log as an error.

[en] logAlteryxError("Example error")

[en] Python

[en] Import Library creates an import statement.

[en] from module import library

[en] Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

[en] dataFrame[en] = readAlteryxData(1)

[en] Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

[en] writeAlteryxData([en] dataFrame, 1)

[en] Log Message creates a logAlteryxMessage function to write a string to the log as a message.

[en] logAlteryxMessage("Example message")

[en] Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

[en] logAlteryxWarning("Example warning")

[en] Log Error creates a logAlteryxError functions to write a string to the log as an error.

[en] logAlteryxError("Example error")

R

[en] Import Library creates an import statement.

[en] library([en] jsonlite)

[en] Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

[en] dataFrame[en] <- readAlteryxData(1)

[en] Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

[en] writeAlteryxData([en] dataFrame, 1)

[en] Log Message creates a logAlteryxMessage function to write a string to the log as a message.

[en] logAlteryxMessage("Example message")

[en] Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

[en] logAlteryxWarning("Example warning")

[en] Log Error creates a logAlteryxError functions to write a string to the log as an error.

[en] logAlteryxError("Example error")

[en] Import Code

[en] Use ImportCode to pull in code created externally.

[en] From File opens a File Explorer to browse to your file.
[en] From Jupyter Notebook opens a File Explorer to browse to your file.
[en] From URL provides a field to type or paste a file location.

[en] Click the gear icon to change cosmetic aspects of the code editor.

[en] Use the Text Size buttons to increase or decrease the size of the text in the editor.
[en] Use Color Theme to toggle between a dark and light color scheme.
[en] Select Wrap Long Lines causes long lines to remain visible within the code editor window instead of requiring a horizontal scroll.
[en] Select Show Line Numbers to see line numbers for the editor.

[en] Output Metainfo

[en] Select the output channel metainfo you want to manage. Manually change the Apache Spark Data Type of existing data.

[en] Select the plus icon to add a data row.

[en] Enter the Field Name.
[en] Select the Apache Spark Data Type.
[en] Enter the Size in bits.