Skip to main content

Window Transformations

A window transformation performs calculations on a row based on row values that are related to it. Windowing functions can perform calculations based on time, relative row positions, and rolling windows.

For example, you might wish to calculate the average percentage of CPU usage over 24-hour intervals based on log entries. From the rows of data, you can create a window function that calculates the average value in the CPU usage column over the 24-hour period, as defined based on date values for each log entry.

Key distinction:

  • In a window function, the output of each row's calculation is specific to the row.

  • In an aggregate function, the output for a row is the same value for all rows that are used in the calculation.

  • For more information on aggregation, see Create Aggregations.

Basic Structure

You can use windowing functions with the following transformation types:

  • window - creates a new column called window

  • New formula - creates a new column that you name

  • Edit with formula - modifies the values in a column based on a formula that you specify.

Group by parameter

You can use the Group by parameter to define the column of values by which rows of data are grouped for calculation purposes. For example, if your Group by column contains months, your calculations are computed for each month represented in the column values.

注意

Transforms that use the group parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the group parameter, particularly on larger datasets, or your job may run out of memory and fail. To enforce row ordering, you can use the sort transform. For more information, see Sort Transform.

Order by parameter

When using window functions, you can use the Order by parameter to specify the column or columns by which to sort the output.

Source:

The following table contains the sales data of a company for all the four regions in the last three months.

Month

Sales

Region

2021-01-01

800

East

2021-01-01

1500

West

2021-01-01

1000

North

2021-01-01

2000

South

2021-02-01

1250

East

2021-02-01

800

West

2021-02-01

1100

North

2021-02-01

700

South

2021-03-01

900

East

2021-03-01

1000

West

2021-03-01

1400

North

2021-03-01

800

South

Transformation:

In the following transformation, you can calculate the rolling average of sales. You apply the ROLLINGAVERAGE and specify that the results are to be ordered by the Sales column.

Transformation Name

Window

Parameter: Formulas

ROLLINGAVERAGE (Sales, 0,1)

Parameter: Order by

Sales

Results:

The following dataset shows the ROLLINGAVERAGE ordered by Sales column.

Month

Sales

Region

RollingAverage

2021-02-01

700

South

750

2021-01-01

800

East

800

2021-02-01

800

West

800

2021-03-01

800

South

850

2021-03-01

900

East

950

2021-01-01

1000

North

1000

2021-03-01

1000

West

1050

2021-02-01

1100

North

1175

2021-02-01

1250

East

1325

2021-03-01

1400

North

1450

2021-01-01

1500

West

1750

2021-01-01

2000

South

2000

Compute over Time Windows

You may need to create windows of time within your data that are not cleanly segmented by basic units of time measurement. For example, you may need to create a custom time period, called a session, based on timestamps recorded in event-based data.

A session is usually defined as a group of events that occur within a given time frame. For example, you may need to perform calculations based on five-minute intervals within your logging data. If a user opens your shopping website, logs in, searches items, and then logs out within a five-minute interval, that can be grouped under a single session. However, if the user's interaction lasted six minutes, the logged events may span multiple windowed sessions in the data.

You can use the SESSION function to create time boxes based on a time period that you specify. When the function is applied to your column of timestamp values, the application assigns an ID to events that belong to the same session.

From the following example, you can create a Session ID. After you create the session ID, you can find the volume of data consumed by the individual user.

Source:

User Name

TimeStamp

Activity

Volume (in Kb)

Bob

02/11/21 08:01:13

Read

1024

William

02/11/21 08:01:00

Read

1024

John

02/11/21 08:01:17

Read

1024

Christy

02/11/21 08:01:17

Read

1024

William

02/11/21 08:03:33

Read

520

Christy

02/11/21 08:02:01

Password change

1024

Bob

02/11/21 08:07:23

Adding items to cart

2048

William

02/11/21 08:05:45

Read

520

William

02/11/21 08:11:56

Account settings

2048

John

02/11/21 08:15:11

Password change

2048

Bob

02/11/21 08:34:00

Proceeding to payment

2048

Bob

02/11/21 08:43:03

logout

2048

Christy

02/11/21 09:03:43

Read

1024

Christy

02/11/21 09:10:00

logout

1024

Transformation:

Transformation Name

Window

Parameter: Formulas

SESSION (TimeStamp, 5, minute)

Parameter: Group by

User Name

Parameter: Order by

TimeStamp

Since the new column is named window, you should rename it:

Transformation Name

Rename columns

Parameter: Option

Manual rename

Parameter: Column

window

Parameter: New column name

SESSIONID

With this session ID, you can calculate the maximum volume of data consumed by each session ID and by each user.

Transformation Name

New formula

Parameter: Formula type

Multiple row formula

Parameter: Formula

MAX(Volume (in Kb))

Parameter: Sort rows by

SessionID

Parameter: Group rows by

User Name, SessionID

Parameter: New column name

Volume_Consumed (in Kb)

Results:

User Name

TimeStamp

Activity

Volume (in Kb)

SessionID

max_Volume (in Kb)

William

02/11/21 08:01:00

Read

1024

1

1024

William

02/11/21 08:03:33

Read

520

1

1024

William

02/11/21 08:05:45

Read

520

1

1024

William

02/11/21 08:11:56

Account settings

2048

2

2048

Bob

02/11/21 08:01:13

Read

1024

1

1024

Bob

02/11/21 08:07:23

Adding items to cart

2048

2

2048

Bob

02/11/21 08:34:00

Proceeding to payment

2048

3

2048

Bob

02/11/21 08:43:03

logout

2048

4

2048

Christy

02/11/21 08:01:17

Read

1024

1

1024

Christy

02/11/21 08:02:01

Password change

1024

1

1024

Christy

02/11/21 09:03:43

Read

1024

2

1024

Christy

02/11/21 09:10:00

logout

1024

3

1024

John

02/11/21 08:01:17

Read

1024

1

1024

John

02/11/21 08:15:11

Password change

2048

2

2048

Calculate over preceding and following rows

The PREV and NEXT functions enable you to fetch data from a previous row or a subsequent row, which is helpful for identifying relative changes or trends in your data.

Source:

The following dataset contains orders for different product types over a given time period. You can apply the PREV and NEXT functions to calculate the previous orders and the next orders to analyze the trend of orders and derive the average of orders for a product group.

Product_Type

Order_date

Order

Laptop

2021-01-05

300

Laptop

2021-01-26

1780

Laptop

2021-01-09

500

Laptop

2021-01-31

1200

SmartPhone

2021-01-24

1400

SmartPhone

2021-01-26

2200

SmartPhone

2021-01-07

700

Tablet

2021-01-21

600

Tablet

2021-01-23

900

Transformation:

You can also calculate the percentage of change in orders over time. The following transformation calculates the change between the current order and the previous one and then divides that value over the previous value to calculate the percent change between the rows:

Transformation Name

Window

Parameter: Formulas

(Order - PREV(Order, 1)) / PREV(Order, 1) * 100

Parameter: Group by

Product_Type

Parameter: Order by

Order

After you rename the column to ChangeinOrder, you can apply the NUMFORMAT function to clean up and format the ChangeinOrder values. The following transformation reformats the ChangeinOrder column to display two decimal places.

Transformation Name

Edit with formula

Parameter: Column

ChangeinOrder

Parameter: Formula

NUMFORMAT(ChangeinOrder, '##.##')

Similarly, you can apply the NEXT function and calculate the Change in orders for upcoming months.

Results:

Product_Type

Order_date

Order

NEXTOrder

ChangeinOrder

Laptop

2021-01-05

300

500

Laptop

2021-01-09

500

1200

66.67

Laptop

2021-01-31

1200

1780

140

Laptop

2021-01-26

1780

48.33

SmartPhone

2021-01-07

700

400

SmartPhone

2021-01-24

1400

2200

100

SmartPhone

2021-01-26

2200

57.14

Tablet

2021-01-21

600

900

Tablet

2021-01-23

900

50

Fill Empty Values

You can use the FILL function to fill empty or null values in your data with the last non-empty value in the group.

Source:

For example, the following dataset contains the daily orders received. Note the missing values due to weekends. You can assume that the no orders were received for Saturday and Sunday ,

Date

DayOfWeek

OrdersDay

OrdersTotal

2021-03-10

Wednesday

100

100

2021-03-11

Thursday

112

212

2021-03-12

Friday

320

532

2021-03-13

Saturday

2021-03-14

Sunday

2021-03-15

Monday

300

832

Transformation:

You have to clean up the data to fill the values for OrdersDay column. You can use the following function to fill the empty and null values. This function tests the the OrdersDay column to check if the column is empty or null. If so, the value '0' is written in the column, else the value of the column ($col) is written.

Transformation Name

Edit with formula

Parameter: Column

OrdersDay

Parameter: Formula

IF(OrdersDay == '' || ISNULL(OrdersDay), '0', $col)

You can see the values of Friday is taken for Saturday and Sunday and filled it accordingly as per theFILLfunction.

Transformation Name

Edit with formula

Parameter: Column

OrdersTotal

Parameter: Formula

IF (OrdersDay == '0', FILL (OrdersTotal, -1,0),$col)

Parameter: Order by

Date

Results:

Date

DayOfWeek

OrdersDay

OrdersTotal

2021-03-10

Wednesday

100

100

2021-03-11

Thursday

112

212

2021-03-12

Friday

320

532

2021-03-13

Saturday

0

532

2021-03-14

Sunday

0

532

2021-03-15

Monday

300

832

Calculate Rank

The RANK function enables you to create rankings in your data based on calculations by returning a ranking value for each row with the specified group of values. When used, some rows might receive the same value as other rows. For example, if there are three tie values in a group, the same rank is assigned to the rows and the next three ranks are skipped.

The DENSERANK function enables you to generate a ranked order of values within a group. If there are tie values in a group, it does not skip rank in case of tie values. For example, if two rows are listed as rank 2, then the fourth row receives rank 3.

Source:

The following dataset contains total Sales information by quarter. You can use the RANK and DENSERANK to identify the quarters with the highest sales.

Year

Quarter

Sales

2018

1

1000

2018

2

2000

2018

3

3000

2018

4

2000

2019

1

1000

2019

2

500

2019

3

9000

2019

4

3000

2020

1

500

2020

2

500

2020

3

200

2020

4

400

Transformation:

RANK:

Transformation Name

Window

Parameter: Formula type

Multiple row formula

Parameter: Formula

RANK()

Parameter: Sort rows by

Sales

Parameter: New column name

SalesRank

DENSERANK:

Transformation Name

Window

Parameter: Formula type

Multiple row formula

Parameter: Formula

DENSERANK()

Parameter: Sort rows by

Sales

Parameter: New column name

SalesDenseRank

Results:

For the RANK function, when multiple rows share the same rank, the next rank is not consecutive, whereas for the DENSERANK function, the next rank is consecutive.

Year

Quarter

Sales

SalesDenseRank

SalesRank

2020

3

200

1

1

2020

4

400

2

2

2020

2

500

3

3

2020

1

500

3

3

2019

2

500

3

3

2019

1

1000

4

6

2018

1

1000

4

6

2018

4

2000

5

8

2018

2

2000

5

8

2019

4

3000

6

10

2018

3

3000

6

10

2019

3

9000

7

12

Calculate Rolling Functions

Rolling calculations enable you to compute a function over a changing set of rows. Rolling calculations are useful for computing the current state of a measure within your data.

For example, in the above sample data, you can find the rolling sum and rolling average of the sales for the year. You can use the above example data to find the rolling sum and rolling average.

Source:

From the following dataset, you can calculate the rolling calculations such as ROLLINGSUM, ROLLINGAVERAGE, ROLLINGMAX, and ROLLINGMIN.

Year

Quarter

Sales

2018

1

1000

2018

2

2000

2018

3

3000

2018

4

2000

2019

1

1000

2019

2

500

2019

3

9000

2019

4

3000

2020

1

500

2020

2

500

2020

3

200

2020

4

400

Transformation:

Transformation Name

Window

Parameter: Formulas

ROLLINGSUM (Sales, 0,1)

Parameter: Formulas

ROLLINGAVERAGE (Sales, 0,1)

Parameter: Formulas

ROLLINGMAX (Sales, 0, 1)

Parameter: Formulas

ROLLINGMIN (Sales, 0,1)

Parameter: Order by

Sales

You can rename the required columns accordingly.

Results:

Year

Quarter

Sales

RollingSumSales

RollingAverageSales

RollingMinSales

RollingMaxSales

2020

3

200

600

300

200

400

2020

4

400

900

450

400

500

2020

2

500

1000

500

500

500

2020

1

500

1000

500

500

500

2019

2

500

1500

750

500

1000

2019

1

1000

2000

1000

1000

1000

2018

1

1000

3000

1500

1000

2000

2018

4

2000

4000

2000

2000

2000

2018

2

2000

5000

2500

2000

3000

2019

4

3000

6000

3000

3000

3000

2018

3

3000

12000

6000

3000

9000

2019

3

9000

9000

9000

9000

9000

Rolling date functions

The Rolling date functions enable you to calculate forward or backward of the current row within the specified column. For example, when dealing with business calendars, you might want to know if the date falls on a holiday or weekend; based on that, you can roll the date forward or backward according to the business calendar.

Source:

The following example dataset shows the order date, order quantity that belongs to a product group. You are interested in finding the rolling minimum and maximum dates for the product group, as well as the rolling mode value. You can useROLLINGMINDATE, ROLLINGMAXDATE, and ROLLINGMODEDATE functions.

Order_date

Order_quantity

Product_Group

2021-04-14

750

PG001

2021-07-13

1500

PG001

2021-08-31

355

PG002

2021-02-16

2000

PG002

2021-05-13

867

PG002

2021-06-18

1010

PG002

2021-11-15

909

PG003

2021-10-16

200

PG003

2021-09-09

200

PG004

2021-01-01

900

PG004

2021-12-07

707

PG004

Transformation:

Transformation Name

Window

Parameter: Formulas

ROLLINGSUM (Sales, 0,1)

Parameter: Formulas

ROLLINGMAXDATE (Order_date, 0,1)

Parameter: Formulas

ROLLINGMINDATE (Order_date, 0, 1)

Parameter: Formulas

ROLLINGMODEDATE (Order_date, 0,1)

Parameter: Order by

Order_date

Results:

Order_date

Order_quantity

Product_Group

RollingMaxdate

RollingMindate

RollingModedate

2021-01-01

900

PG004

2021-02-16

2021-01-01

2021-01-01

2021-02-16

2000

PG002

2021-04-14

2021-02-16

2021-02-16

2021-04-14

750

PG001

2021-05-13

2021-04-14

2021-04-14

2021-05-13

867

PG002

2021-06-18

2021-05-13

2021-05-13

2021-06-18

1010

PG002

2021-07-13

2021-06-18

2021-06-18

2021-07-13

1500

PG001

2021-08-31

2021-07-13

2021-07-13

2021-08-31

355

PG002

2021-09-09

2021-08-31

2021-08-31

2021-09-09

200

PG004

2021-10-16

2021-09-09

2021-09-09

2021-10-16

200

PG003

2021-11-15

2021-10-16

2021-10-16

2021-11-15

909

PG003

2021-12-07

2021-11-15

2021-11-15

2021-12-07

707

PG004

2021-12-07

2021-12-07

2021-12-07