GridSafeDocumentation/AccountingPolicy/AggregationPolicy/

Aggregation Policy

In some cases (especially where a service work-load consists of small jobs of short duration) very large numbers of accounting records may be generated. This can impact on report generation performance. The aggregation policy can be used to generate aggregated records combining similar records run within a short time period. This speeds up report generation at the expense of some loss of detail. Accounting records are mapped to the time period where they complete. This means that plots against time will be distorted where they contain jobs that cross period boundaries.

The aggregated table should contain StartedTimestamp and CompletedTimstamp fields and should duplicate relevant fields from the master table. By default numerical fields are aggregated and non numerical fields are key fields that generate different aggregation records for each time period. Date type properties are always ignored by the aggregation process.

Any table fields from the original record that do not have equivalent fields will not be available from the aggregated tables. Derived property definitions are automatically inherited provided they do not use date type properties.

Multiple aggregation tables are allowed in order to support different aggregation period lengths. However in most cases daily aggregation should be sufficient.

The default behavior is to only aggregate records with similar run periods so jobs will only aggregate if they all start in the same aggregation period as each other and all end in the same aggregation period as each other. This introduces some increase in the number of aggregate records but improves the fidelity of time use profiles generated from the aggregate records. Optionally the aggregation can be set to ignore the start time and consider for aggregation all records that complete in the same aggregation period.

This policy is configured using the following configuration properties:

  • AggregationPolicy.table-name This is a comma separated list of aggregated tables that should be generated.
  • master.aggregated-table-name This should be the name of the table being aggregated
  • key.aggregation-table-name.prop-name boolean value to force a numeric property to be treated as a key property.
  • aggregated-table-name.aggregate-using_end boolean value to use end-time only aggregation.

The handler class for the aggregation table also has to be set to a subclass of uk.ac.ed.epcc.safe.accounting.aggregation.AggregateUsageRecordFactory and any table cross references also need to be defined. Currently two sub-classes are available:

  • uk.ac.ed.epcc.safe.accounting.aggregation.DailyAggregateUsageRecordFactory which aggregates on day boundaries.
  • uk.ac.ed.epcc.safe.accounting.aggregation.HouryAggregateUsageRecordFactory which aggregates on hour boundaries.

 

Grid-SAFE is funded by JISC and managed by EPCC at the University of Edinburgh