# summarize

### Problem

Place tuples in groups and calculate aggregate values for each group.

Example: *For every city where we have stores, how big is the total revenue per city?*

### Description

The first argument specifies attributes to group by, the second argument is a hash with attribute names as keys and aggregation operations as values.

Like the `group`

operation, `summarize`

uses a number of *grouping attributes* to slice up the input relation. However, `summarize`

does not create a nested relation. Instead, `summarize`

extends the output relation with one or several attributes containing *aggregated values* computed over tuples that match the grouping tuples on all their attributes. (An important difference: while the attributes passed to `group`

are the *grouped attributes*, `summarize`

expects the *grouping attributes*.)

A simple implementation of the count aggregation might look like this:

Let’s see how this works step by step. We start with the `supplies`

relations (described in the Overview page):

We now want to know, *How many products does each supplier stock, and what is the total quantity of items in stock for each supplier?*.

First we project on the `sid`

attribute:

sid |
---|

S1 |

S2 |

S3 |

S4 |

Then we extend by the matching tuples from the original `supplies`

:

And then we can extend the relation with count (number of different products) and the total quantity (the sum of `qty`

in each group):

sid | count | sum |
---|---|---|

S1 | 6 | 1300 |

S2 | 2 | 700 |

S3 | 1 | 200 |

S4 | 3 | 900 |

This result can be obtained through Bmg like this:

`supplies.summarize([:sid], :pid => :count, :qty => :sum)`

(But note that the aggregate attributes will be named `:pid`

and `:qty`

instead of `count`

and `sum`

.)

### Requirements

### Supported aggregation operations

Each of the the aggregation operations below operates on values taken from a single attribute of the tuples in a nested relation, as described above.

Only some of the operations can be compiled to SQL, as indicated in the table below. To understand how mixing compilable and non-combinable operations works, see the With a SQL database backend page.

Name | SQL? | Numerical? | Description |
---|---|---|---|

avg | ✔ | ✔ | Numerical average |

collect | — | Creates an array containing all values (including duplicates) | |

concat | — | String concatenation | |

count | ✔ | Numer of values (including duplicates) | |

distinct | — | Like `collect, with duplicates removed | |

distinct_count | ✔ | Like `count` , but duplicates not counted | |

first | — | ??? | |

last | — | ??? | |

max | ✔ | picks out the largest value | |

min | ✔ | picks out the largest value | |

multiple | — | ??? | |

percentile | — | ✔ | ? |

stddev | — | ✔ | ? |

sum | ✔ | ✔ | The sum of all values |

value_by | — | ? | |

variance | — | ✔ | ? |

### Examples

*Consult the Overview page for the data model used in this example.*