Review – Analyzing Groups within Data

Calculations on Groups in a Dataset

>> gs = groupsummary(tble,groupvars,method,datavars)

`gs`	A table of calculations for each group, including a column for group counts

`tble`	A table of data
`groupvars`	The variables you want to group by. Can be a string vector (or scalar) of variable names, integer vector of indices, or logical vector indicating the grouping variables
`method`	The aggregation method
`datavars`	The variables you want to apply the aggregation method to

You can also specify group bins for the grouping variable, which can be especially helpful with datetime variables.

>> gs = groupsummary(tble,groupvars,groupbins,method,datavars)

`gs`	A table of calculations for each group, including a column for group counts

`tble`	A table of data
`groupvars`	The variables you want to group by. Can be a string vector (or scalar) of variable names, integer vector of indices, or logical vector indicating the grouping variables
`groupbins`	The binning scheme. For `datetime` variables, this can be a unit of time, like `"minute"` or `decade`.
`method`	The aggregation method
`datavars`	The variables you want to apply the aggregation method to

The output of groupsummary sometimes requires post-processing like removing or renaming variables with removevars and renamevars respectively.

Organize Aggregated Data by Grouping Variable

After calculating group statistics with multiple variables, you may want to unstack the aggregated data into a new table with the grouping variables defining the rows and columns.

>> tbl2 = unstack(tble,datavars,colvar)

tbl2 A new table with data from datavars grouped into columns defined by colvar

`tble`	A table of data
`datavars`	The variables you want to be in the entries of the table
`colvar`	The variable that defines the new columns

Unstack Table Variables Task

You can use the Unstack Table Variables task to unstack aggregated data from a table interactively. You should specify the following:

Input table: Specify the input, or original, data
Output table name: This is the name of the output table.
Names of new table variables: This is the variable that identifies the new column names. It is often a categorical.
Values in new table variables: This is the variable to be unstacked. It will become the entries in the new table.
Aggregator for new table variable values: Specify the aggregation function to be applied to the Values in the new table variables.
How to include it in the output table. The options are:
- Group by: This variable will be used as a grouping variable and stored in the first column.
- Keep first: The first entry of this variable for the specified group will be returned in the output table.
- Discard: This variable will not appear in the output table.
Combining Aggregated Data and Initial Conditions
Using the Unstack Table Variables task, you can specify variables to group by, aggregate, and keep their first entry. In the output table, each type of variable is stored together. From left to right:
1. grouping variables
2. “keep first” variables
3. the new variables containing aggregated data