Statistics

Objective

Usage and web statistics are a very important part of every application to see how the system is used, if new features are used, where the users come from and many other reasons.

Applications usually collect statistic data at runtime per request. That means that with every request of a user to the application, many statistical informations can be written at many points of the application and the background systems that are used by the application. Each component of a software system should be able to write their own statistical informations, containing the data that the special component is interested in. We should define statistic scopes, so each component can write its statistic data within its own scope and it is distinguishable witch data belogs to which scope.

Afterwards, application administrators, users, etc. want to access the statistical data to get different kinds of information out of it. Usually, only parts of the big pool of statistic data have to get aggregated to get the required information. This potentially leads to performance problems because the pool of statistic data may be a very large database table or a very big logfile etc., where it takes a lot of time to gather the information that is needed for one particular kind of statistical information.

This leads to the thought that if we would know what kinds of aggregated informations shall be delivered to the administrator or the end user, we could preprocess the large datapool and aggregate the data into many smaller pools, ie database tables. These database tables then can get queried much faster to get the desired different kinds of statistics. Again, each of these tables with aggregated data would belong to a specific statistic scope. To manage the different database tables with aggregated data, we should have so called aggregation definitions in the system. With these aggregation definitions we could define:

  • the database-table where we want to write the aggregated data in.
  • what kind of data is needed from the big pool for this aggregation.
  • how to aggregate the data.

These database tables containing the aggregated data then could get queried very performant to produce a statistic report. Queries to these tables again don’t have to return the whole data, they just might want to return parts of it or again aggregate the data for the report. We even could think about querying different tables containing aggregated data at one time to get a special kind of statistic.

To manage the different queries to the tables with aggregated data, we should have report definitions that hold this information and that are kept in a persistence layer of the framework. Within these report definitions, the name of the statistic (e.g., Publication Management Page Statistics) and the query is kept. These report definitions again would belog to one specific statistic scope. This leads to the following duties for the Statistic Manager Component: • writing statistic records during runtime of the system by providing an interface to the applications. • aggregating the raw statistic data for different kinds of statistics in a nightly process depending on aggregation-definitions that are held by the framework. These aggregation definitions are creatable, retrievable and deletable by interface calls to the framework.

Current Status

Alpha.

Documentation

Additional Files