Data Cleanse Pro Tool
AMP Only
Data Cleanse Pro is available via the AMP engine only. Unlike the original Data Cleansing tool, which is a macro, Data Cleanse Pro is built natively for the AMP engine and provides significantly better performance.
(New): This label indicates enhancements over the original Data Cleansing tool.
Use Data Cleanse Pro to fix common data quality issues. You can replace null values, remove punctuation and HTML tags, modify capitalization, and more!
重要
The existing Data Cleansing tool is still available and will continue to work in existing workflows.
Tool Components

The Data Cleanse Pro tool has 2 anchors.
Input anchor: Use the input anchor to connect the data you want to clean up.
Output anchor: The output anchor outputs the cleaned data.
Configure the Tool
Use the Data Cleanse Pro Configuration window to determine how data quality issues are managed.
Remove Null Data
Select either or both of the checkboxes to remove entire rows and columns of null data.
重要
These options are applied after all other cleansing steps to ensure consistent output when combined with the Replace with Null options. If you want to remove Null Data from the initial input, do not use the Remove Null Data and Replace with Null options at the same time.
Null Rows:
Remove all rows with a null value in every column.
This option doesn't remove rows with empty string values only.
A message shows how many rows were removed.
Null Columns:
Remove all columns with a null value in every row.
This option doesn't remove columns with empty string values only.
A message shows how many columns were removed.
Select Columns to Cleanse
Use this section to select the columns that you want to cleanse. These columns are populated from the tool's input connection. Column identifies the column name and Type identifies the data type of the column (V_String, Byte, Double, Int32, etc.)
You can use the select all checkbox to select all of your columns or the checkboxes associated with the individual columns to select specific columns to clean.
Use the Search input field to search for specific columns by name.
Select the filter icon next to Type to filter this table for columns with a specific data type.
By default, all columns are selected and (new) data types are included. (New) The Unknown column is also selected by default and allows new columns in the data.
Remove Characters
Use this section to select the specific character categories that you want to remove from the column that you selected in the previous section. Several of the below examples use this as the input:
Input | Content |
---|---|
1 | |
2 | |
3 | |
Leading and Trailing Whitespace: This option is checked by default. Remove whitespace, including tabs and line breaks, at the beginning and end of a string.
Output
Content
1
Hello , Alteryx Community!
2
W e ekly Challe nges
3
Help Documentation !
Tabs, Line Breaks, and Duplicate Whitespace: Replace all consecutive spaces, tabs, and other whitespace characters at any position in a string, with a single space.
Output
Content
1
Hello , Alteryx Community!
2
W e ekly Challe nges
3
Help Documentation !
All WhiteSpace: Remove all whitespace at any position in a string.
Output
Content
1
Hello,AlteryxCommunity!
2
WeeklyChallenges
3
HelpDocumentation!
(New) HTML Tag: Remove HTML, XML, and other tags enclosed in angle brackets. Only the plain text content within the bracket remains. For example, an input of
<h1>Alteryx</h1>
results inAlteryx
.Example Input:
<div class="ExternalClass683BD308D77E4320B52BF37E72FF5C00"><html> <p>HD - Request to update several receiver fields - agreed to unlock file to allow for receiver edits - changes will be noted in notes field<br></p> </html></div>
Example Output:
HD - Request to update several receiver fields - agreed to unlock file to allow for receiver edits - changes will be noted in notes field
重要
Text that includes angle brackets (< and >) is interpreted as containing tags. This means even non-HTML content written using
< >
, for example, comparisons or math expressions, might be affected.For example...
Input:
3 < 5 and 3 > 1
Output:
3 1
: The text< 5 and 3 >
is treated as a tag and removed.
To prevent this, consider replacing angle brackets with alternative symbols (for example,
<
and>
or use parentheses).(New) Invisible Characters: Remove all characters that are invisible formatting markers, such as invisible separators, word-break control, and the Unicode encodings U+200B (zero-width space) and U+00AD (soft hyphen). This option is useful when fields appear empty but are not truly blank.
This is particularly useful when the data quality indicator in your Results window displays empty values that are otherwise not noticeable.
Letters: Remove all letters, including non-Latin alphabet letters like,
A b Z À é ö
. Use the (new) Except field to enter letters that should not be removed, if any (case-sensitive, no separators). Review Unicode ICU alphabetic categories:Uppercase Letter: A, B, C, Α, Б
Lowercase Letter: a, b, c, α, б
Titlecase Letter: Dž, Lj, Nj (special titlecase letters)
Modifier Letter: ʰ, ˠ, ˡ (phonetic or modifier letters)
Other Letter: क, 日, ש (letters not uppercase or lowercase)
Numbers: Remove all numbers. Use the (new) Except field to enter numbers that should not be removed, if any. Review Unicode ICU numeric values:
Decimal Digit Number: 0-9, ١, ۲ (standard digits)
Letter Number: Ⅳ, Ⅶ, ↀ (Roman numerals, etc.)
Other Number: ½, ², ¾ (fractions, superscripts)
Punctuation and Special Characters: Remove all punctuation and special characters based on Unicode punctuation and symbol definitions used by the ICU library. Use the (new) Except field to enter punctuation symbols that should not be removed, if any.
In Unicode, all punctuation characters fall under the punctuation category:
Connector Punctuation: _ , ‿ , ⁀ (underscore, etc.)
Dash Punctuation: - , — , ‒ (hyphen, dash)
Open Punctuation: ( , [ , { (opening brackets)
Close Punctuation: ) , ] , } (closing brackets)
Initial Quote Punctuation: " , ‘ (left quotes)
Final Quote Punctuation: " , ’ (right quotes)
Other Punctuation: ! , ? , ; (miscellaneous)
Characters are categorized under the symbol general category, which includes:
Math Symbols: Characters like +, −, =, and ∞.
Currency Symbols: Characters like $, €, ¥, and £.
Modifier Symbols: Characters like ^, ˜, and accents used for phonetic transcription.
Other Symbols: Various symbols that include ©, ®, and emoji characters.
Replace String Columns
Determine how to handle data quality issues in string columns. The checkbox is selected by default but this operation is optional.
Replace Null with Blanks: Replace null values with a blank string value. A blank registers as " " rather than [Null]. This option is selected by default.
(New) Replace Blanks with Nulls: Replace blank string values with null values. A blank registers as " " rather than [Null]. This is the inverse of the previous option and helps standardize empty columns as true nulls.
Replace Numeric Columns
Determine how to handle data quality issues in numeric columns. The checkbox is selected by default but this operation is optional.
Replace Null with 0: Replace null values with a 0 (zero). This option is selected by default.
(New) Replace 0 with Nulls: Replace 0's (zeros) with null values. This is the inverse of the previous option.
Modify Case
Check the Modify Case checkbox to activate the modify case dropdown. Select one of these options:
Upper Case: Capitalize all letters in a string.
Lower Case: Convert all letters in a string to lowercase.
Title Case: Capitalize the 1st letter of all words in a string.