Statistical methodology for regression model with measurement error

Saqr, Anwar A. Mohamad (2013) Statistical methodology for regression model with measurement error. [Thesis (PhD/Research)]

Text (Introductory Pages)

Download (199kB) | Preview
Text (Whole Thesis)

Download (840kB) | Preview


This thesis primarily deals with the estimation of the slope parameter of the simple linear regression model in the presence of measurement errors (ME) or error-in-variables in both the explanatory and response variables. It is a very old and difficult problem which has been considered by a host of authors since the third quarter of the nineteenth century. The ME poses a serious problem in fitting the regression line, as it directly impacts on estimators and their standard error (see eg Fuller, 2006, p. 3). The standard linear regression methods, including the least squares or maximum likelihood, work when the explanatory variable is measured without error. But in practice, there are many situations where the variables can only be measured with ME. For example, data on the medical variables such as blood pressure and blood chemistries, agricultural variables such as soil nitrogen and rainfall etc can hardly be measured accurately. The apparent observed data represents the manifest variable which measures the actual unobservable latent variable with ME.

The ME model is divided into two general classifications, (i) functional model if the explanatory (ξ) is a unknown constant, and (ii) structural model if ξ is independent and identically distributed random variable (cf Kendall, 1950,
1952). The most important characteristic of the normal structural model is that the parameters are not identifiable without prior information about the error variances as the ratio of error variances (λ) (see Cheng and Van Nees, 1999, p. 6). However, the non-normal structural model is identifiable without any prior information. The normal and non-normal structural models
with ME in both response and explanatory variables are considered in this research.

There are a number of commonly used methods to estimate the slope parameter of the ME model. None of these methods solves the estimation problem in varying situations. A summary of the well known methods is provided in Table 1.

The first two chapters of this thesis cover an introduction to the ME problem, background, and motivation of the study. From Chapter 3 we provide a new methodology to fit the regression line using the reflection of the explanatory
variable about the fitted regression line with the manifest variables. The asymptotic consistency and the mean absolute error (MAE) criteria are used

Table 1: A summary of commonly used methods to handle the ME model problem

to compare the new estimators and the relevant existing estimators under different conditions.

One of the most commonly used methods to deal with the ME model is the instrumental variable (IV) method. But it is difficult to find valid IV that is highly correlated to the explanatory but uncorrelated with the error term.

Therefore, in Chapter 4 we propose a new method to find a good IV based on the reflection of explanatory variable. The new method is easy to implement, and performs much better than the existing methods. The superiority of
this method is demonstrated both analytically and via numerical as well as graphical illustrations under certain assumptions.

In Chapter 5, a commonly used method to deal with the normal structural model, namely the orthogonal regression (OR) (which is the same the maximum likelihood solution when λ = 1) method under the assumption of known λ is discussed. But the OR method does not work well (inconsistent) if λ is misspecified and/or the sample size is small. We provide an alternative method based on the reflection method (RM) of estimation for measurement error model. The RM uses a new transformed explanatory variable which is derived from the reflection formula. This method is equivalent or asymptotically equivalent to the orthogonal regression method, and nearly asymptotically unbiased and efficient under the assumption that λ is equal
to one and the sample size is large. If λ is misspecified the RM method is better than the OR method under the MAE criterion even if the sample size is small.

Chapter 6 considers the Wald method (two grouping method) which is still widely used, in spite of increasing criticism on the efficiency of the estimator. To address this problem, we introduce a new grouping method based on
the reflection grouping (RG) approach. The proposed method provides new grouping process to modify Wald method in order to increase its efficiency. The RG method introduces a new way of dividing the data using the rank of the reflection of the explanatory variable. The method recommends different grouping criteria depending on the value of λ to be one or more/less than one. The RG method significantly increases the efficiency of Wald method,
and it is more precise than the other competing methods and works well for different sample sizes and for different values of λ. Moreover, the RG method also removes the shortcomings of the maximum likelihood method when λ is
misspecified and sample size is small.

The geometric mean (GM) regression is covered in Chapter 7. The GM method is widely used in many disciplines including medical, pharmacology, astrometry, oceanography, and fisheries researches etc. This method is known by many names such as reduced major axis, standardized major axis, line of organic correlation etc. We introduce a new estimator of the slope parameter when both variables are subject to ME. The weighted geometric mean (WGM) estimator is constructed based on the reflection and the mathematical relationship between the vertical and orthogonal distances
of the observed points and the regression line of the manifest model. The WGM estimator possesses better statistical properties than the geometric mean estimator, and OLS-bisector estimator. The WGM estimator is stable
and work well for different values of λ and for different sample sizes.

The properties of the proposed reflection estimators are investigated in Chapters 3-7. Also, these estimators are compared with the relevant existing estimators by simulation studies. The computer package Matlab is used for all computations and preparation of graphs. Based on the asymptotic consistency and MAE criteria the proposed reflection estimators perform better than the existing estimators, in some cases, even the standard assumption on λ and sample size are violated.

Chapter 8 provides some concluding summaries remarks.

Statistics for USQ ePrint 27812
Statistics for this ePrint Item
Item Type: Thesis (PhD/Research)
Item Status: Live Archive
Additional Information: Doctor of Philosophy
Faculty/School / Institute/Centre: Historic - Faculty of Health, Engineering and Sciences - School of Agricultural, Computational and Environmental Sciences (1 Jul 2013 - 5 Sep 2019)
Faculty/School / Institute/Centre: Historic - Faculty of Health, Engineering and Sciences - School of Agricultural, Computational and Environmental Sciences (1 Jul 2013 - 5 Sep 2019)
Supervisors: Khan, Professor Shahjahan; Langlands, Dr Trevor
Date Deposited: 10 Aug 2016 01:32
Last Modified: 15 Aug 2016 05:52
Uncontrolled Keywords: Matlab; measurement errors; ME; estimation of the slope parameter; simple linear regression; regression methods; statistical methodolgy
Fields of Research (2008): 01 Mathematical Sciences > 0104 Statistics > 010401 Applied Statistics
01 Mathematical Sciences > 0104 Statistics > 010403 Forensic Statistics
Fields of Research (2020): 49 MATHEMATICAL SCIENCES > 4905 Statistics > 490501 Applied statistics
49 MATHEMATICAL SCIENCES > 4905 Statistics > 490504 Forensic evaluation, inference and statistics

Actions (login required)

View Item Archive Repository Staff Only