Review – Analyzing Groups within Data

Calculations on Groups in a Dataset

>> gs = groupsummary(tble,groupvars,method,datavars)
gsA table of calculations for each group, including a column for group counts
tbleA table of data
groupvarsThe variables you want to group by. Can be a string vector (or scalar) of variable names, integer vector of indices, or logical vector indicating the grouping variables
methodThe aggregation method
datavarsThe variables you want to apply the aggregation method to

You can also specify group bins for the grouping variable, which can be especially helpful with datetime variables.

>> gs = groupsummary(tble,groupvars,groupbins,method,datavars)
gsA table of calculations for each group, including a column for group counts
tbleA table of data
groupvarsThe variables you want to group by. Can be a string vector (or scalar) of variable names, integer vector of indices, or logical vector indicating the grouping variables
groupbinsThe binning scheme. For datetime variables, this can be a unit of time, like "minute" or decade.
methodThe aggregation method
datavarsThe variables you want to apply the aggregation method to

The output of groupsummary sometimes requires post-processing like removing or renaming variables with removevars and renamevars respectively.

Organize Aggregated Data by Grouping Variable

After calculating group statistics with multiple variables, you may want to unstack the aggregated data into a new table with the grouping variables defining the rows and columns.

>> tbl2 = unstack(tble,datavars,colvar)
tbl2A new table with data from datavars grouped into columns defined by colvar
tbleA table of data
datavarsThe variables you want to be in the entries of the table
colvarThe variable that defines the new columns

Unstack Table Variables Task

You can use the Unstack Table Variables task to unstack aggregated data from a table interactively. You should specify the following:

  1. Input table: Specify the input, or original, data
  2. Output table name: This is the name of the output table.
  3. Names of new table variables: This is the variable that identifies the new column names.  It is often a categorical.
  4. Values in new table variables: This is the variable to be unstacked.  It will become the entries in the new table.
  5. Aggregator for new table variable values: Specify the aggregation function to be applied to the Values in the new table variables.
  6. How to include it in the output table.  The options are:
    • Group by: This variable will be used as a grouping variable and stored in the first column.
    • Keep first: The first entry of this variable for the specified group will be returned in the output table.
    • Discard: This variable will not appear in the output table.



    Combining Aggregated Data and Initial Conditions
    Using the Unstack Table Variables task, you can specify variables to group by, aggregate, and keep their first entry. In the output table, each type of variable is stored together. From left to right:
    1. grouping variables
    2. “keep first” variables
    3. the new variables containing aggregated data