# Survey Statistics Overview

Sampling populations, calculating weighted estimates, and preventing and accounting for errors are important steps in our work to produce recreational catch statistics.

The Marine Recreational Information Program (MRIP) strives to be transparent about its work to collect recreational fishing data and produce accurate recreational catch statistics. Below, we’ve outlined the fundamental mathematical concepts behind survey statistics, from sampling populations and calculating weighted estimates to preventing and accounting for errors.

## Sampling

We use both census and sampling approaches in our recreational fishing surveys. Where a **census** collects information from all members of a target population, a **sample** collects information from a randomly selected and representative subset of a population to determine the characteristics of an entire population. Sampling is an effective method of collecting information when it’s not possible or practical to conduct a census.

There are two broad categories of sampling: **probability sampling** and **non-probability sampling**.

- In a probability sample, each member of the target population has a known, non-zero probability of being included in the sample. Probability samples are designed to be representative of the target population and to result in unbiased estimates, as long as the sample design is accounted for in the estimation process.
- In a non-probability sample, the relationship between the sample and the target population is unknown, and the probability of selecting a particular member of a target population cannot be accurately determined. Because the selection probability is unknown, there is no guarantee that a non-probability sample can produce unbiased estimates. Examples of non-probability samples include convenience sampling, in which members of a population are sampled based on the relative ease of reaching them; snowball sampling, in which each individual who is sampled refers an acquaintance to be sampled next; and volunteer or opt-in sampling, in which the sample member self-select into the survey.

When you know the probability that a particular member of a target population will be included in a sample, you can use a statistical method called weighting to produce unbiased estimates of population totals. For this reason, probability samples can be used to reveal characteristics of an entire population, even though it is only a subset of this population that is surveyed.

### Sample Size

A sample size describes the number of units measured in a sample survey. If, for example, you drew 10 marbles at random from a bag of 100 black and white marbles to estimate the number of each color that is in the bag, your sample size would be 10.

The more samples you draw, the more precise your estimate. However, increasing your sample size can require increased resources. When we certify a recreational fishing survey, we work with our partners and stakeholders to determine the level of sampling necessary to provide the level of precision that will meet science and management needs.

## Weighting

**Weighting** is a statistical method that ensures each sampled unit is properly represented in a final estimate.

In basic weighting, the assigned weight of a sample unit is equal to the inverse of the probability that unit will be included in a sample. If, for example, you drew 10 marbles at random from a bag of 100 marbles, each marble would have a 10 in 100 chance of being selected from the sample. Its assigned weight, therefore, would be 100 out of 10, or 10. If you drew 20 marbles from the same bag, each marble would have a 20 in 100 chance of being selected from the sample, and each sample would carry a weight of five.

Now, imagine you had two bags of 100 marbles. If you drew 10 marbles from the first bag and 20 marbles from the second bag, your two samples would carry different weights. When you ignore differences between the weights of multiple samples, you assume the target populations represented by each sample are the same. If that assumption is incorrect, the resulting estimates will be biased. When you recognize this difference, however, you can ensure your estimates of these target populations are accurate.

### MRIP Guide to Weighting Data

In this video, we visit a tackle shop to see how weighting is used to accurately estimate anglers’ catch.

## Accounting for Errors

All surveys include some amount of error. Properly designed surveys attempt to minimize error through planning, testing, and analysis.

### Sampling Error

Because a sample does not include all members of a population, an estimate based on a sample is likely to differ from the actual population value. Indeed, **sampling error** is inherent in all sample statistics. The size of the sampling error depends on the size of the sample, the design of the sample, and natural variability within the population being sampled. (Increasing the sample size, for example, generally decreases the sampling error.)

The most common measure of sampling error is **precision**, which measures the spread of independent sample estimates around a true population value. This is sometimes understood as the **standard error** or **confidence interval**. We account for standard error in our recreational fishing estimates by ensuring these estimates are made up of two parts: a **point estimate**, which represents our estimate of total recreational catch, and a **percent standard error**, which represents our confidence in this value and is similar to the margin of error used in polling. The lower the percent standard error, the higher our confidence that an estimate is close to the actual population value.

### Non-sampling Errors

**Non-sampling errors** include all errors that aren’t sampling errors. A non-sampling error that results in a systematic difference between an estimate and the true population value is commonly referred to as **bias**. Common non-sampling errors include:

**Coverage error**, which occurs when members of a target population are omitted, duplicated, or wrongly included in a sample frame. Omissions from a sample frame—known as undercoverage—result in bias, particularly when those left out of a sample frame have different characteristics than those included. Duplicating or including out-of-scope population members in a sample frame can also lead to bias.**Measurement error**, which occurs when a respondent provides an incorrect response to a survey question. Measurement error can occur when a survey question is ambiguous, poorly worded, or inconsistently asked; when a respondent can’t recall an activity or an event; or when a respondent intentionally misreports his or her response.**Nonresponse error**, which occurs when a respondent is unable or unwilling to respond to a survey. Nonresponse error results in bias when those who do not respond have different characteristics than those who do.**Data processing error**, which can occur while entering, coding, editing, or otherwise preparing survey data.