[en]
Key-Value Pair Extraction
[en] Laboratory Tool
[en] This is a Laboratory tool and isn't for use in production. It might have documented known issues, might not be feature complete, and is subject to change.
[en] A key-value pair links 2 data elements. The key is a unique identifier that defines the dataset (for example, person, place, thing) and the value is the identified data. Examples of key-value pairs:
[en] Person: John
[en] Place: Bank
[en] Thing: Check
[en] The Key-Value Pair Extraction tool identifies key-value pair structures in your documents. The tool leverages the Google Tesseract library and fuzzy matching to find key-value pairs. The Key-Value Pair Extraction tool isn’t intended for tabular data. For tabular data, use the Image Template tool.
[en] If you are passing noisy documents to the Key-Value Pair Extraction tool, try to pre-process images with the OCR Optimization feature in the Image Processing tool to improve results. The OCR Optimization feature cleans up documents that have non-white backgrounds, watermarks, and other noise.
重要
[en] This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.
[en] Language Support
[en] The Key-Value Pair Extraction tool supports English, Chinese (Simplified), French, German, Italian, Japanese, Portuguese, and Spanish as inputs. We recommend that your key and value are in the same language.
[en] Tool Components
[en] The Key-Value Pair Extraction tool has 3 anchors:
[en] D anchor: Use the D anchor to pass the image data you want to analyze.
[en] K anchor: Use the K anchor to pass the keys you want to identify.
[en] Output anchor: Use the output anchor to pass the key-value pairs downstream.
[en] Configure the Tool
[en] Add a Key-Value Pair Extraction tool to the canvas.
[en] Use the anchors to connect the Key-Value Pair Extraction tool to the image data and keys you want to use in the workflow.
[en] Select the column containing the Image data.
[en] Select the Language of the text within the image data.
[en] Select the column containing the Keys. Tip: You can use the Text Input tool to enter your keys within the workflow.
[en] Run the workflow.
[en] Output
[en] The Key-Value Pair Extraction tool outputs the incoming columns in addition to columns named after each identified key. The column for each key contains the associated values in a single cell. If there is more than 1 value per key, the tool separates the values with a space (for example, value1 value2 value3). If a key appears at more than 1 location, the tool creates a column for each instance (for example, key1, key2, key3).
[en] FAQ
[en] For best results, we recommend keys match the document as close as possible. However, the Key-Value Pair Extraction tool can find keys with different cases or key-value pairs with different delimiters (for example, [KEY: value] and [key, value]).
[en] In general, you can use the tool with images that have black text on white backgrounds. However, if you are dealing with documents that have a non-white background, the OCR Optimization feature in the Image Processing tool can correct this.
[en] We recommend using the OCR Optimization feature in the Image Processing tool first as it automatically converts to grayscale in the background and negates the need for manual grayscale adjustments.
[en] You can’t connect the Key-Value Pair Extraction tool with the Image Template tool. Note, the Key-Value Pair Extraction tool identifies all instances of your specified keys and returns their corresponding values, regardless of their positions in a document. This negates the need for the creation of bounding boxes and annotation.
[en] Delete any empty rows in your list of keys, then run the workflow again.
[en] The Key-Value Pair Extraction tool is not optimized for handwriting.
[en] Ideally, structure the key-value pairs like this:
[en] Structure
[en] <Key>: <Value>
[en] Example 1
[en] Company: Alteryx
[en] Example 2
[en] Name: Libby
[en] The tool can also recognize keys with multi-line values as long as there are no lines, such as cells from a table, separating the values:
[en] Structure
[en] <Key>: <Value Line 1>
[en] <Value Line 2>
[en] <Value Line 3>
[en] Example 1
[en] Shipping Address: ABC Company
[en] 123 Main Street
[en] Some City, New York 12345
[en] Example 2
[en] Billing Address: XYZ Vendor
[en] 456 Pleasant Street