Extractlist Transform
Anmerkung
Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.
Extracts a set of values based on a specified pattern from a source column of any data type. The generated column contains an array of occurrences of the specified pattern.
While the new column contains array data, the data type of the new column is sometimes inferred as String.
Basic Usage
Your source column (myWidgetInventory
) is formatted in the following manner:
{ "red":"100","white":"1300","blue":"315","purple":"55"}
The following step extracts the raw inventory contents of each color:
extractlist col: myWidgetInventory on:`{digit}+`
Output: The generated column contains data that looks like the following array:
["100","1300","315","55"]
Syntax and Parameters
extractlist: col:column_ref on:string_literal_pattern delimiter:string_literal_pattern [quote:'quoted_string'] [as:'new_column_name']
Token | Required? | Data Type | Description |
---|---|---|---|
extractlist | Y | transform | Name of the transform |
col | Y | string | Source column name |
on | Y | string | String literal or pattern that identifies the values to extract from the source column |
delimiter | Y | string | String literal or pattern that identifies the separator between the values to extract |
quote | N | string | Specifies a quoted object that is omitted from matching delimiters |
as | N | string | Name of the newly generated column |
For more information on syntax standards, see Language Documentation Syntax Notes.
col
Identifies the column to which to apply the transform. You can specify only one column.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (column name) |
on
Identifies the pattern to match, which can be a string literal, Wrangle , or regular expression.
Tipp
You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/
. For example, /\u0013/
represents Unicode character 0013
(carriage return). For more information, see Supported Special Regular Expression Characters.
For the extractlist
tranform, all instances that match this pattern in the source column are extracted into the array list in the new column. Each occurrence in the generated array corresponds to an individual instance in the source; the new column can contain duplicate values.
To create array elements based only on the delimiter
parameter, set the following regular expression:
on:`/+/`
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (literal, regular expression, orAlteryx pattern ) |
delimiter
Specifies the character or pattern that defines the end of a key-value pair. This value can be specified as a String literal, regular expression, or Wrangle.
In the following:
{ key1=value1,key2=value2 }
The delimiter is the comma ( ','
). The final key-value pair does not need a delimiter.
For this transform, this parameter defines the pattern that separates the values that you want to extract into the array.
Tipp
You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/
. For example, /\u0013/
represents Unicode character 0013
(carriage return). For more information, see Supported Special Regular Expression Characters.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (literal, regular expression, orAlteryx pattern ) |
quote
extractlist col: MySourceValues on:`{alpha}+` delimiter:';' quote:'\"'
Output: Extracts from the MySourceValues column each instance of a string value that occurs before the delimiter. Values between double-quotes are considered string literals and are not processed according to the delimiters defined in the transform.
Can be used to specify a string as a single quoted object. This parameter value can be one or more characters.
Usage Notes:
Required? | Data Type |
---|---|
No | String |
The
quote
value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.
as
Name of the new column that is being generated. If the as
parameter is not specified, a default name is used.
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
Examples
Tipp
For additional examples, see Common Tasks.
Extract hashtag values
In this example, you extract one or more values from a source column and assemble them in an Array column.
Suppose you need to extract the hashtags from customer tweets to another column. In such cases, you can use the {hashtag}
Alteryx pattern to extract all hashtag values from a customer's tweets into a new column.
Source:
The following dataset contains customer tweets across different locations.
User Name | Location | Customer tweets |
---|---|---|
James | U.K | Excited to announce that we’ve transitioned Wrangler from a hybrid desktop application to a completely cloud-based service! #dataprep #businessintelligence #CommitToCleanData # London |
Mark | Berlin | Learnt more about the importance of identifying issues in your data—early and often #CommitToCleanData #predictivetransformations #realbusinessintelligence |
Catherine | Paris | Clean data is the foundation of your analysis. Learn more about what we consider the five tenets of sound #dataprep, starting with #1a prioritizing and setting targets. #startwiththeuser #realbusinessintelligence #Paris |
Dave | New York | Learn how #NewYorklife onboarded as part of their #bigdata #dataprep initiative to unlock hidden insights and make them accessible across departments. |
Christy | San Francisco | How can you quickly determine the number of times a user ID appears in your data?#dataprep #pivot #aggregation#machinelearning initiatives #SFO |
Transformation:
The following transformation extracts the hashtag messages from customer tweets.
Transformation Name | |
---|---|
Parameter: Column | customer_tweets |
Parameter: Pattern matching elements in the list | `{hashtag}` |
Parameter: New column name | Hashtag tweets |
Then, the source column can be deleted.
Results:
User Name | Location | Hashtag tweets |
---|---|---|
James | U.K | ["#dataprep", "#businessintelligence", "#CommitToCleanData", " # London"] |
Mark | Berlin | ["#CommitToCleanData", "#predictivetransformations", "#realbusinessintelligence", "0"] |
Catherine | Paris | ["#dataprep", "#startwiththeuser","#realbusinessintelligence", "# Paris"] |
Dave | New York | ["#NewYorklife", "dataprep", "bigdata", "0"] |
Christy | SanFrancisco | [ "dataprep", "#pivot", "#aggregation", "#machinelearning"] |
Extract query parameters from URLs
Source:
In this example, a list of URLs identifies the items in the shopping carts of visitors to your web site. You want to extract the shopping cart information embedded in the query parameters of the URL.
Username | cartURL |
---|---|
joe.robinson | http://example123.com/cart.asp?prodid=1001&qty=2 |
steph.schmidt | http://example123.com/cart.asp?prodid=1005&qty=4 |
jack.holmes | http://example123.com/cart.asp?prodid=2102&qty=1 |
tina.jones | http://example123.com/cart.asp?prodid=10412&qty=2 |
Transformation:
The following transformation extracts the list of query values from the URL. Note that the equals sign is included in the matching pattern so that you don't accidentally pick up numeric values from the non-parameter part of the URL:
Transformation Name | |
---|---|
Parameter: Column | cartURL |
Parameter: Pattern matching elements in list | `=[digit]+` |
The two query parameter values have been extracted into an array of values, including the equals sign, which must be removed:
Transformation Name | |
---|---|
Parameter: Column | cartURL |
Parameter: Find | '=' |
Parameter: Replace with | '' |
Parameter: Match all occurrences | true |
You can now unnest these values into separate columns:
Transformation Name | |
---|---|
Parameter: Column | extractlist_cartURL |
Parameter: Paths to elements | '[0]','[1]' |
After you rename the two columns to prodId
and Qty
, you can delete the column generated by the first transformation.
Results:
Username | cartURL | prodId | Qty |
---|---|---|---|
joe.robinson | http://example123.com/cart.asp?prodid=1001&qty=2 | 1001 | 2 |
steph.schmidt | http://example123.com/cart.asp?prodid=1005&qty=4 | 1005 | 4 |
jack.holmes | http://example123.com/cart.asp?prodid=2102&qty=1 | 2102 | 1 |
tina.jones | http://example123.com/cart.asp?prodid=10412&qty=2 | 10412 | 2 |
Extract counts from a ragged array using extractlist
This example illustrates how to extract values from a column.
Source:
The following dataset contains counts of support emails processed by each member of the support team for individual customers over a six-month period. In this case, you are interested in the total number of emails processed for each customer.
Unfortunately, the data is ragged, as there are no entries for a support team member if he or she has not answered an email for a customer.
custId | startDate | endDate | supportEmailCount |
---|---|---|---|
C001 | 7/15/2015 | 12/31/2015 | ["Max":"2","Ted":"0","Sally":"12","Jack":"6","Sue":"4"] |
C002 | 7/15/2015 | 12/31/2015 | ["Sally":"4","Sue":"3"] |
C003 | 7/15/2015 | 12/31/2015 | ["Ted":"12","Sally":"2"] |
C004 | 7/15/2015 | 12/31/2015 | ["Jack":"7","Sue":"4","Ted":"5"] |
If the data is imported from a CSV file, you might need to make some simple Replace Text or Pattern transformations to clean up the data to look like the above example.
Transformation:
Use the following transformation to extract just the numeric values from the supportEmailCount
array:
Transformation Name | |
---|---|
Parameter: Column | supportEmailCount |
Parameter: Pattern matching elements in list | `{digit}+` |
You should now have a column extractlist_supportEmailCount
containing a ragged array. You can use the following transformations to convert this data to a comma-separated list of values:
Transformation Name | |
---|---|
Parameter: Column | extractlist_supportEmailCount |
Parameter: Find | `[` |
Parameter: Replace with | '' |
Parameter: Match all occurrences | true |
Transformation Name | |
---|---|
Parameter: Column | extractlist_supportEmailCount |
Parameter: Find | `]` |
Parameter: Replace with | '' |
Parameter: Match all occurrences | true |
Transformation Name | |
---|---|
Parameter: Column | extractlist_supportEmailCount |
Parameter: Find | `"` |
Parameter: Replace with | '' |
Parameter: Match all occurrences | true |
Convert the column to String data type.
You can now split out the column into separate columns containing individual values in the modified source. The limit
parameter specifies the number of splits to create, resulting in 5 new columns, which is the maximum number of entries in the source arrays.
Transformation Name | |
---|---|
Parameter: Column | extractlist_supportEmailCount |
Parameter: Option | On pattern |
Parameter: Match pattern | ',' |
Parameter: Number of columns to create | 4 |
You might have to set the type for each generated column to Integer. If you try to use a New Formula transformation to calculate the sum of all of the generated columns, it only returns values for the first row because the missing rows are null values.
In the columns containing null values, select the missing value bar in the data histogram. Select the Replace suggestion card, and modify the transformation to write a 0
in place of the null value, as follows:
Transformation Name | |
---|---|
Parameter: Columns | extractlist_supportEmailCount3 |
Parameter: Formula | '0' |
Parameter: Group rows by | ismissing([extractlist_supportEmailCount3]) |
Repeat this step for any other column containing null values.
You can now use the following to sum the values in the generated columns:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | (extractlist_supportEmailCount1 + extractlist_supportEmailCount2 + extractlist_supportEmailCount3 + extractlist_supportEmailCount4 + extractlist_supportEmailCount5) |
Results:
After renaming the generated column to totalSupportEmails
and dropping the columns used to create it, your dataset should look like the following:
custId | startDate | endDate | supportEmailCount | totalSupportEmails |
---|---|---|---|---|
C001 | 7/15/2015 | 12/31/2015 | ["Max":"2","Ted":"0","Sally":"12","Jack":"6","Sue":"4"] | 24 |
C002 | 7/15/2015 | 12/31/2015 | ["Sally":"4","Sue":"3"] | 7 |
C003 | 7/15/2015 | 12/31/2015 | ["Ted":"12","Sally":"2"] | 14 |
C004 | 7/15/2015 | 12/31/2015 | ["Jack":"7","Sue":"4","Ted":"5"] | 16 |