Statistical Calibration Overview
Calibration allows estimates from one survey to be expressed in the units, or "currency," of another. It is often a critical part of transitioning to a new survey design.
Calibration is a statistical process that accounts for the sources of variation that can lead to differences among estimates produced by different surveys. While there is no "one size fits all" calibration method, the purpose of calibration is always the same: to allow estimates from one survey design to be expressed in the units, or "currency," of another.
Statisticians around the world use calibration to:
- Make meaningful comparisons between data derived from multiple independent surveys.
- Maintain one consistent time series when survey methods change.
In the United States, for example, calibration has been used to compare statistics from different household surveys (PDF, 3 pages) and agricultural surveys (PDF, 6 pages). In New Zealand and South Africa, calibration has been used to make sure the implementation of survey design changes did not mask trends in adolescent health and population demographics. NOAA Fisheries has used calibration to transition to new bottom trawl survey techniques and, more recently, account for Marine Recreational Information Program survey improvements, described below.
The Need for Calibration in Fisheries Science and Management
Stock assessors and fisheries managers base their work on continuous, uninterrupted time series of fisheries statistics. The implementation of new recreational fishing surveys or the modification of existing survey designs can disrupt this time series by producing new estimates that aren't directly comparable to legacy estimates.
Sometimes, a new survey is designed to address bias in an existing survey. In these cases, the new survey's estimates are likely to be more accurate. Other times, a new survey is designed to meet new data needs. (This often occurs when estimates are needed faster, more frequently, or at finer spatial resolutions than an existing survey can provide.) In these cases, any differences between new and existing estimates can likely be explained by unidentified non-sampling errors affecting one or both surveys. This outcome is to be expected: no survey is error-free, and different survey designs are often impacted by different types of non-sampling errors that can drive estimates apart. For this reason, calibrating one set of estimates to another does not mean that one survey is "better" than another. Instead, calibration is only intended to reconcile differences between two sets of estimates.
Whenever existing and new surveys produce estimates that are systematically different from one another, calibration is an essential step that must occur before the new estimates can be used in science and management. Not calibrating when it’s necessary to do so can:
- Ignore inherent differences between existing and new monitoring programs.
- Create abrupt, unaccounted for changes in recreational fisheries statistics, masking real trends in a fishery or fish stock that must be tracked for effective assessment and management. such an abrupt change in a time series can misinform fisheries managers.
- Create a mismatch between new estimates (which may be used to monitor catch) and existing quotas (which may be based on legacy estimates), risking the allowance of overfishing or the setting of unnecessarily restrictive management measures.
All of these outcomes would disadvantage anglers.
NOAA Fisheries has produced recreational fisheries statistics since 1981. In recent years, the agency's data collection and estimation methods have undergone several significant changes. Since the Marine Recreational Information Program replaced the Marine Recreational Fisheries Statistics Survey in 2008, the agency has:
- Improved its Access Point Angler Intercept Survey;
- Developed the Fishing Effort Survey to replace the Coastal Household Telephone Survey; and
- Implemented more advanced estimation methods (PDF, 61 pages).
In July 2018, NOAA Fisheries calibrated its entire time series of recreational catch and effort estimates to account for these changes. The calibrated time series shows us what our historical estimates would have looked like if our new and improved survey designs had been in place all along.
Transitioning to the Fishing Effort Survey
How Did We Calibrate?
Generally speaking, the Fishing Effort Survey produces significantly higher estimates of shore and private boat fishing effort than the Coastal Household Telephone Survey. From 2015 through 2017, the CHTS and FES were conducted side-by-side. This three-year benchmarking period allowed us to study differences between the two sets of estimates and work with expert statistical consultants to develop a calibration model to convert between the CHTS and FES "currencies." Before the calibration model was used, it was refined through an independent expert peer review.
The calibration model assumes that both CHTS and FES estimates reflect real trends in recreational fishing activity over time, but that errors unique to each survey are driving differences between the time series. The model uses one set of variables to represent real trends in fishing activity—which influence both the CHTS and FES time series—and another set of variables to represent three kinds of survey errors that are likely at play: sampling error, systematic non-sampling error, and variable non-sampling error. More information about these model variables can be found in Section 8.1 of the MRIP Survey Design and Statistical Methods Manual.
The model can be run in two directions:
- Converting historical estimates (in the CHTS currency) into "FES-like" estimates; or
- Converting recent FES estimates into "CHTS-like" estimates.
We continue to produce CHTS-like estimates for those stocks whose catch advice is still set in the CHTS currency. These CHTS-like estimates will no longer be needed once all of the affected stocks have been assessed using FES and FES-like estimates, and their catch advice has been set using the calibrated FES-like time series.
Why Did We Choose These Methods?
When feasible, NOAA Fisheries' transition procedure recommends calibration involve benchmarking; conducting research to evaluate differences between new and legacy estimates; and developing a model to relate both sets of estimates to one another.
Transitioning to An Improved Access Point Angler Intercept Survey
How Did We Calibrate?
While logistical constraints prevented us from conducting a large-scale benchmarking study of our improved angler intercept survey, we did complete a benchmarking study in North Carolina to compare the MRFSS and MRIP intercept survey designs. Differences between the MRFSS-APAIS and MRIP-APAIS estimates varied depending on species, catch type, fishing mode, and other factors.
To calibrate the MRFSS-APAIS and MRIP-APAIS estimates, we used a statistical technique called raking. This process involves repeatedly and incrementally adjusting the sample weights of one set of estimates until their distributions align with a target. In our case, we adjusted the sample weights of the MRFSS-APAIS estimates until the distribution of their totals aligned with those of the MRIP-APAIS estimates. Before this calibration approach was used, it was refined through an independent expert peer review.
Raking reconciled differences between the MRFSS-APAIS and MRIP-APAIS survey designs. For example, adjusting the sample weights of the MRFSS-APAIS estimates accounted for a key difference in sampling distributions: While the MRFSS-APAIS limited sampling to peak daytime fishing activity—with little to no coverage of early morning or late evening trips—the MRIP-APAIS expanded sampling to cover a full 24-hour day.
It is important to note that sample weighting was not part of the MRFSS-APAIS estimation methods before 2004. Therefore, this calibration process also involved estimating sample weights for those earlier years based on relative changes in sampling intensity over time. To make these adjustments, MRFSS-APAIS and MRIP-APAIS angler interview data were divided into domains, or groups with shared characteristics. These domains were defined by state, sampling wave, and fishing mode, as well additional characteristics our research suggests drove differences between the two sets of estimates. More information about these adjustments can be found in Section 8.2 of the MRIP Survey Design and Statistical Methods Manual.
Why Did We Choose These Methods?
We used raking to calibrate the MRFSS-APAIS and MRIP-APAIS estimates because of the large number of estimates that needed to be calibrated. While the FES produces a handful of effort estimates each year, the APAIS produces hundreds of catch-per-trip estimates by species, catch type, fishing mode, area fished, state, geographic region, and sampling wave. It would have been impossible to develop a single model, or even a small number of models, to calibrate all of these estimates. The sample weight adjustment approach allowed us to accomplish numerous calibrations within a single framework, preserving both the comparability of our estimates and our public-use datasets.
Transitioning to State Surveys in the Gulf of Mexico
NOAA Fisheries has worked in close partnership with Louisiana, Mississippi, Alabama, and Florida to develop, test, implement, and foster the transition toward state data collection programs capable of meeting region-specific management needs for red snapper and other species. While the resulting state estimates are being used to monitor red snapper catch, they are systematically different from the MRIP estimates that are currently used to inform catch limits. (These differences are likely due to unidentified sources of non-sampling error.) Before state survey estimates can be considered by regional managers, appropriate calibrations will be needed to convert state estimates into a common currency, ensuring estimates that inform stock assessments and management decisions are consistent across time, space, fishing mode, and species.
The transition to these Gulf of Mexico state surveys is still underway, and the final calibration methods have yet to be determined. This process is complex for several reasons.
- First, many federal stock assessments need regional estimates. However, state data cannot produce one integrated regional estimate until we have evaluated and accounted for the differences between each state survey design. Therefore, the only sound way to incorporate the state surveys into the current stock assessment process is to calibrate state data to a common currency, whether it is one of the state’s currencies, the MRIP currency, or another common standard.
- Second, the transition from MRFSS APAIS-CHTS to MRIP APAIS-FES is still underway for a number of stock assessments in the Southeast. Until MRIP APAIS-FES estimates have been incorporated into all stock assessments, the modeled CHTS-like time series described above is still being used to inform catch limits. Therefore, calibration methods are needed to convert between the state survey estimates and the MRIP APAIS-FES currency (which has been conducted side-by-side with state surveys in the field), as well as the state survey estimates and the MRFSS APAIS-CHTS currency (or the currency in which numerous catch limits are currently set).
Faced with these challenges, NOAA Fisheries and its state and regional partners collectively decided that, in the near-term, the best way to incorporate state data into the federal science and management process would be to develop simple ratio-based calibrations to convert each of the four state currencies into either or both the MRIP APAIS-FES and MRFSS APAIS-CHTS currencies. State developed their own unique ratio-based calibration methods with input from NOAA Fisheries. Expert statistical consultants peer reviewed those methods in 2020.
As we learn more about the drivers of differences between state and MRIP surveys, and among the state surveys themselves, we may be able to develop more sophisticated calibration methods. A Gulf of Mexico sub-group of the MRIP Transition Team is working to develop a transition plan to support the incorporation of state survey estimates into the federal stock assessment and management process.