Valuestocols Transform

Note

Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.

For each unique value in a column, a separate column is created. For each row that contains the value in the source column, an indicator value is inserted in the new column.

This indicator value can be a literal value or the output of a function.
If no indicator value is generated, a null value is written.

This transform is used to generate indicator columns, which can be used in statistical analysis.

It evaluates entire cell values for uniqueness. It does not scan for individual elements in Object or Array data.
If a row in the source column contains a missing value, an indicator value is added in a new Empty column.
It is not appropriate for tabulating counts of strings or patterns in a column. SeeCountpattern Transform.

Optionally, you can specify a default value, which is applied to all non-indicator value cells in the new column.

Note

When this transform is applied in the data grid, it only identifies the unique values in the current sample. If there are other unique values in the entire dataset, new columns are not created for them when the transform is executed across the entire dataset.

Basic Usage

Source:

Data
Happy
Dog
Happy Happy Dog

Transformation:

valuestocols col:Data value:'X'

Results:

Data	Happy	Dog	Happy_Happy_Dog
Happy	X
Dog		X
Happy Happy Dog			X

Syntax and Parameters

valuestocols col:column_ref value:(expression) default:(expression) [limit:int_num]

Token	Required?	Data Type	Description
values tocols	Y	transform	Name of the transform
col	Y	string	Name of source column
value	Y	string	String literal, column, or function call that defines the value to use as the indicator value in any newly generated column
default	N	string	String literal, column, or function call that defines the value to use to indicate a false match in any newly generated column
limit	N	integer (positive)	Maximum number of columns to generate. Default is `50`.

For more information on syntax standards, see Language Documentation Syntax Notes.

col

Identifies the column to which to apply the transform. You can specify only one column.

Usage Notes:

Required?	Data Type
Yes	String (column name)

value

For the valuestocols transform, this parameter specifies the value to insert in each row of a generated column where the column name of the generated column appears in the same row of the source column. This value can be a string literal, a column reference, or a function.

Usage Notes:

Required?	Data Type
Yes	String literal, column reference, or function call

default

Optionally, this parameter can be used to specify the value to insert in each row of a generated column where the column name of the generated column does not appear in the same row of the source column. This value can be a string literal, a column reference, or a function.

If this parameter is not specified, a missing value is inserted.

Usage Notes:

Required?	Data Type
No	String literal, column reference, or function call

limit

The limit parameter defines the maximum number of columns to create from the unique values detected in the source column. If not specified, the limit is 50.

Note

Be careful setting this parameter too high. In some cases, the application can run out of memory generating the results, and your results can fail.

Usage Notes:

Required?	Data Type
No. Default value is `50`.	Integer (positive)

Examples

Tip

For additional examples, see Common Tasks.

Example - Basic valuestocols

Source:

This dataset contains milestones for three employees who joined the company at the same time. The milestones were recorded and organized by date as individual items, so it's not easy to verify that all five milestones have been checked off for each employee:

Orientation
Contact Info
Acquire Computer
HR Policies Training
Product Training

Date	Name	Checklist
4/4/16	Bowie Kuhn	Orientation
4/4/16	Happy Chandler	Contact Info
4/4/16	Bowie Kuhn	Contact Info
4/4/16	Bowie Kuhn	Acquire Computer
4/4/16	Bud Selig	Product Training
4/4/16	Bud Selig	Orientation
4/5/16	Happy Chandler	HR Policies Training
4/5/16	Happy Chandler	Orientation
4/5/16	Happy Chandler	Acquire Computer
4/5/16	Bowie Kuhn	HR Policies Training
4/5/16	Bud Selig	HR Policies Training
4/5/16	Bud Selig	Contact Info
4/6/16	Happy Chandler	Product Training

Transformation:

The following transform creates columns for each of the values in the Checklist column and adds a yes value where there is a match for the row:

Transformation Name	`Convert values to columns`
Parameter: Column	onboardingChecklist
Parameter: Fill when present	'yes'

Results:

In the generated columns, you can quickly assess whether all three employees have completed an individual checklist item:

Bud Selig has not acquired his computer.
Bowie Kuhn has not had product training.

Date	Name	Checklist	Orientation	Contact_Info	Acquire_Computer	Product_Training	HR_Policies_Training
4/4/16	Bowie Kuhn	Orientation	yes
4/4/16	Happy Chandler	Contact Info		yes
4/4/16	Bowie Kuhn	Contact Info		yes
4/4/16	Bowie Kuhn	Acquire Computer			yes
4/4/16	Bud Selig	Product Training				yes
4/4/16	Bud Selig	Orientation	yes
4/5/16	Happy Chandler	HR Policies Training					yes
4/5/16	Happy Chandler	Orientation	yes
4/5/16	Happy Chandler	Acquire Computer			yes
4/5/16	Bowie Kuhn	HR Policies Training					yes
4/5/16	Bud Selig	HR Policies Training					yes
4/5/16	Bud Selig	Contact Info		yes
4/6/16	Happy Chandler	Product Training				yes

Example - Magazine subscriptions

This example shows how you can break out a column of nested values into separate rows and columns of data.

Source:

The following data covers magazine subscriptions for individual customers. Their subscriptions are stored in an array of values. You are interested in who is subscribing to each magazine.

CustId	Subscriptions
Anne Aimes	["Little House and Garden","Sporty Pants","Life on the Range"]
Barry Barnes	["Sporty Pants","Investing Smart"]
Cindy Compton	["Cakes n Pies","Powerlifting Plus","Running Days"]
Darryl Diaz	["Investing Smart","Cakes n Pies"]

Transformation:

When this data is loaded into the Transformer, you might need to apply a header to it. If it is in CSV format, you might need to apply some replace transformations to clean up the Subscriptions column so it looks like the above.

When the Subscriptions column contains cleanly formatted arrays, the column is re-typed as Array type. You can then apply the following transformation:

Transformation Name	`Expand Array into rows`
Parameter: Column	Subscriptions

Each CustId/Subscription combination is now written to a separate row. You can use this new data structure to break out instances of magazine subscriptions. Using the following transformation, you can add the corresponding CustId value to the column:

Transformation Name	`Convert values to columns`
Parameter: Column	Subscriptions
Parameter: Fill when present	CustId

Delete the two source columns:

Transformation Name	`Delete columns`
Parameter: Columns	CustId,Subscriptions
Parameter: Action	Delete selected columns

Results:

Little_House_and_Garden	Sporty_Pants	Life_on_the_Range	Investing_Smart	Cakes_n_Pies	Powerlifting_Plus	Running_Days
Anne Aimes
	Anne Aimes
		Anne Aimes
	Barry Barnes
			Barry Barnes
				Cindy Compton
					Cindy Compton
						Cindy Compton
			Darryl Diaz
				Darry Diaz

In this section:

Valuestocols Transform

Basic Usage

Syntax and Parameters

col

value

default

limit

Examples

Example - Basic valuestocols

Example - Magazine subscriptions

Search results