Skip to main content

[en] Tool Icon Key-Value Pair Extraction

[en] Laboratory Tool

[en] This is a Laboratory tool and isn't for use in production. It might have documented known issues, might not be feature complete, and is subject to change.

[en] A key-value pair links 2 data elements. The key is a unique identifier that defines the dataset (for example, person, place, thing) and the value is the identified data. Examples of key-value pairs:

  • [en] Person: John

  • [en] Place: Bank

  • [en] Thing: Check

[en] The Key-Value Pair Extraction tool identifies key-value pair structures in your documents. The tool leverages the Google Tesseract library and fuzzy matching to find key-value pairs. The Key-Value Pair Extraction tool isn’t intended for tabular data. For tabular data, use the Image Template tool.

[en] If you are passing noisy documents to the Key-Value Pair Extraction tool, try to pre-process images with the OCR Optimization feature in the Image Processing tool to improve results. The OCR Optimization feature cleans up documents that have non-white backgrounds, watermarks, and other noise.

重要

[en] This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.

[en] Language Support

[en] The Key-Value Pair Extraction tool supports English, Chinese (Simplified), French, German, Italian, Japanese, Portuguese, and Spanish as inputs. We recommend that your key and value are in the same language.

[en] Tool Components

[en] The Key-Value Pair Extraction tool has 3 anchors:

  • [en] D anchor: Use the D anchor to pass the image data you want to analyze.

  • [en] K anchor: Use the K anchor to pass the keys you want to identify.

  • [en] Output anchor: Use the output anchor to pass the key-value pairs downstream.

[en] Configure the Tool

  1. [en] Add a Key-Value Pair Extraction tool to the canvas.

  2. [en] Use the anchors to connect the Key-Value Pair Extraction tool to the image data and keys you want to use in the workflow.

  3. [en] Select the column containing the Image data.

  4. [en] Select the Language of the text within the image data.

  5. [en] Select the column containing the Keys. Tip: You can use the Text Input tool to enter your keys within the workflow.

  6. [en] Run the workflow.

[en] Output

[en] The Key-Value Pair Extraction tool outputs the incoming columns in addition to columns named after each identified key. The column for each key contains the associated values in a single cell. If there is more than 1 value per key, the tool separates the values with a space (for example, value1 value2 value3). If a key appears at more than 1 location, the tool creates a column for each instance (for example, key1, key2, key3).

[en] FAQ

[en] For best results, we recommend keys match the document as close as possible. However, the Key-Value Pair Extraction tool can find keys with different cases or key-value pairs with different delimiters (for example, [KEY: value] and [key, value]).

[en] In general, you can use the tool with images that have black text on white backgrounds. However, if you are dealing with documents that have a non-white background, the OCR Optimization feature in the Image Processing tool can correct this.

[en] We recommend using the OCR Optimization feature in the Image Processing tool first as it automatically converts to grayscale in the background and negates the need for manual grayscale adjustments.

[en] You can’t connect the Key-Value Pair Extraction tool with the Image Template tool. Note, the Key-Value Pair Extraction tool identifies all instances of your specified keys and returns their corresponding values, regardless of their positions in a document. This negates the need for the creation of bounding boxes and annotation.

[en] Delete any empty rows in your list of keys, then run the workflow again.

[en] Ideally, structure the key-value pairs like this:

[en] Structure

[en] <Key>: <Value>

[en] Example 1

[en] Company: Alteryx

[en] Example 2

[en] Name: Libby

[en] The tool can also recognize keys with multi-line values as long as there are no lines, such as cells from a table, separating the values:

[en] Structure

[en] <Key>: <Value Line 1>

[en] <Value Line 2>

[en] <Value Line 3>

[en] Example 1

[en] Shipping Address: ABC Company

[en] 123 Main Street

[en] Some City, New York 12345

[en] Example 2

[en] Billing Address: XYZ Vendor

[en] 456 Pleasant Street