Methodology
This methodology provides detailed information on data collection, weighting, and imputation for the Small Business Credit Survey, and covers both employer and nonemployer data.
Data collection
The SBCS uses a convenience sample of small businesses. A diverse set of partner organizations that serve the small business community contact businesses by email.1 The Federal Reserve Banks also directly contact prior SBCS participants and other small businesses on select email lists. The survey instrument is an online questionnaire that typically takes 6 to 12 minutes to complete, depending on the intensity of a firm’s search for financing. The questionnaire uses question branching and flows based on responses to survey questions. For example, financing applicants receive a different line of questioning than nonapplicants. Therefore, the number of observations for each question varies by how many firms receive and complete a particular question.
Weighting
A sample for the SBCS is not selected randomly; thus, the SBCS may be subject to biases not present with surveys that do select firms randomly. For example, there are likely firms not on our contact lists, and this may lead to a noncoverage bias. To control for potential biases, the sample data are weighted so that the weighted distribution of firms in the SBCS matches the distribution of the firm population in the United States.
Due to the structure and availability of US Census Bureau data used to compute the weights, employer firms and nonemployer firms are weighted in separate processes. Both employers and nonemployers are weighted by age, industry, geographic location (urban or rural location), gender of owner(s), and race or ethnicity of owner(s). Employer firms are additionally weighted by firm size, measured by number of employees. We first limit the sample in each year to only employer or nonemployer firms. We then post-stratify respondents by their firm characteristics. Using a statistical technique known as “raking,” we compare the share of employer or nonemployer businesses in each category of each stratum (for example, within the industry stratum, the share of firms in the sample that are manufacturers) to the share of employer or nonemployer firms in the nation that are in that category. As a result, underrepresented firms are up weighted, and overrepresented firms are down weighted. We iterate this process several times for each stratum to derive a sample weight for each respondent. This weighting methodology was developed in collaboration with the National Opinion Research Center (NORC) at the University of Chicago. The data used for weighting come from data collected by the US Census Bureau.
Race/ethnicity and gender imputation
Not every respondent provides complete information on the gender, race, and/or ethnicity of their business’s owner(s). We need this information to correct for differences between the sample and the population data. To avoid losing these observations, we use a series of statistical models to impute the missing data. Generally, when the models predict with an accuracy of around 80 percent in out-of-sample tests, we use the predicted values from the models for the missing data.2 When the model outcomes are less certain, those data are not imputed, and the responses are dropped. After the models impute the data, we compare descriptive statistics of key survey questions with and without imputed data to ensure stability of estimates. In the final sample, five percent of nonemployer firm observations have imputed values for the gender, race, or ethnicity of a firm’s ownership.
Comparisons across survey years
Changes to the questionnaire and weighting methodology limit over-time comparability on certain metrics. The time series data in the most recent SBCS reports supersede and are not comparable with the time series data in earlier publications. Please see the methodology sections from the annual releases of the Report on Employer Firms for more complete explanation of survey methodology and changes to the questionnaire by survey year.
Employer Firms | ||
---|---|---|
Source | Weighting variable | Strata |
US Census Bureau Business Dynamic Statistics (BDS) | Age | 0-2 years, 3-5 years, 6-10 years, 11-20 years, 21+ years |
US Census Bureau Annual Business Survey (ABS) |
Race/ethnicity | Hispanic, non-Hispanic Asian, non-Hispanic Black or African American, non-Hispanic Native American, non-Hispanic white |
Gender | Equally owned by men and women, men-owned, women-owned | |
US Census Bureau County Business Pattern (CBP) | Industry | Business support and consumer services, finance and insurance, healthcare and education, leisure and hospitality, manufacturing, nonmanufacturing goods production and associated services, professional services and real estate, retail |
Geography | Rural, urban | |
Firm size | 1-4 employees, 5-9 employees, 10-19 employees, 20-49 employees, 50-499 employees | |
Nonemployer Firms | ||
Source | Weighting variable | Strata |
US Census Bureau Survey of Business Owners (SBO) | Age | 0-2 years, 3-4 years, 5-12 years, 13+ years |
US Census Bureau Nonemployer Statistics-Demographics (NES-D) | Race/ethnicity | Hispanic, non-Hispanic Asian, non-Hispanic Black or African American, non-Hispanic Native American, non-Hispanic white |
Gender | Equally owned or men-owned, women-owned | |
US Census Bureau Nonemployer Statistics (NES) | Industry | Business support and consumer services, finance and insurance, healthcare and education, leisure and hospitality, manufacturing, nonmanufacturing goods production and associated services, professional services and real estate, retail |
Geography | Rural, urban |
Endnotes
- For a full list of partner organizations, please visit our Partner Organizations page. Return to 1
- Out-of-sample tests are used to develop thresholds for imputing the missing information. To test each model’s performance, half of the sample of nonmissing data is randomly assigned as the test group, while the other half is used to develop coefficients for the model. The actual data from the test group are then compared with what the model predicts for the test group. On predicted probabilities that are associated with an accuracy of around 80 percent are used, although this varies slightly, depending on the number of observations that are being imputed. Return to 2