Font Size:
Model Selection for Linear Regression Under Data Aggregation
Last modified: 2024-04-17
Abstract
Aggregating over individuals belonging to different groups is sometimes unavoidable, such as when data from different views are merged. When performing linear regression, aggregation is known to induce a so-called aggregation bias in the ordinary least-squares (OLS) coefficient estimates compared to those obtained without aggregation. The effect of this aggregation bias on common model selection procedures is however poorly understood. Using simulations based on the matrixvariate normal distribution, we discuss the properties of common selection procedures using a variety of metrics when aggregation is applied.
Keywords
aggregation bias, matrixvariate normal, ordinary least-squares, model selection