Claritas Methodology

THE CLARITAS DEMOGRAPHIC UPDATE METHODOLOGY

Claritas Inc.

May 1997

Nationwide sets of small area demographic estimates and projections were pioneered by the private sector over 20 years ago, and such "updates" are still a unique product of the private data suppliers. The Claritas Demographic Updates program traces its history to the industry's earliest years, and is well into its third decade in the hands of the industry's most experienced and expert demographers and statisticians.

It was Claritas' current demographic team that did the groundbreaking work in this area, and whose contributions extend beyond the information industry to the field of applied demography. It is a team that is always looking ahead to new methods and data sources, and already has spent over seven years actively participating in the planning of the 2000 census.

The Claritas Annual Demographic UPDATE

The annual UPDATE is a shorthand term for a massive set of demographic estimates and projections. Estimates are data prepared for current year, and projections (sometimes called forecasts) are prepared for dates five years in the future.

The Updates are produced for many geographic levels including national, state, county, place (city/town), census tract, and block group. Some items are even estimated at the block level--the smallest geographic level for which any census data are reported. Data are also available for commonly used areas such as metropolitan areas (MSAs and PMSAs), media areas and ZIP Codes. Because the Updates are produced for such small areas, they can be easily aggregated to custom geographic areas specified by the user.

The Update starts with the estimation and projection of "base counts," including total population, household population, group quarters population, households, families, and housing units. Characteristics related to these base counts are then updated. Population characteristics include age, sex, race and Hispanic ethnicity; households are estimated by age of householder, income and wealth; families are estimated by income; and owner occupied housing units are estimated by value.

Updates are prepared first for large geographic areas, then for progressively smaller areas, with adjustments ensuring consistency from one level to the next. In order to take full advantage of methodological refinements and new data resources, each set of Updates begins, not with the previous year's estimates, but with detailed base year data from the most recent decennial census. The target estimation and projection date is April 1 of the relevant year.

The Census Benchmark

The U.S. decennial census is mandated by the Constitution for the purpose of enumerating the nation's population, and determining the number of Representatives each state may send to the U.S. House of Representatives. By law, census data must also be used to define Congressional Districts, and in the distribution of funds from numerous federal programs.

The census also serves as the foundation of the nation's statistical system. Because of its extraordinary coverage, the census provides the most comprehensive, precise, and authoritative statistical portrait of the nation's population and housing stock--especially for small neighborhood areas. Marketers and planners throughout the U.S base billions of dollars worth of decisions on small-area demographic data--much of it derived from the decennial census.

The Need for Annually Updated Data

The decennial census is such an enormous undertaking that it takes years for the data to be processed, tabulated, and then released in the form of data products. Because of this lag, and the fact that the census is taken only once a decade, the "most recent" census data are sometimes more than 10 years old.

Demographic change in the 10 years between censuses can be dramatic. For instance, between 1979 and 1989, median household income increased by 78.7 percent, from $16,846 to $30,056. The number of households increased by 14.4 percent during the same period, while average household size decreased by 4.2 percent, and median age rose 2.9 years, to 32.9.

Change can be especially dramatic in small areas such as census tracts and ZIP Codes, where population size and demographic characteristics can change rapidly due to migration patterns driven by residential and commercial development, the local economy, and many other factors.

But the need for precise demographic data for marketing and planning purposes remains constant throughout the decade.

Since no government agency provides annual small area demographic estimates for the entire U.S., marketers and planners rely on private companies for this service. These companies have developed a variety of approaches to annual demographic estimation, and the results for small areas can vary widely. Therefore, users are well advised to familiarize themselves with the methods used to produce such data.

The Claritas Demographic Estimation Program

The present day Claritas was created in 1991 by the merger of Claritas Corporation (founded in 1971) and National Planning Data Corporation (founded in 1970). Subsequently, Claritas acquired Strategic Mapping, Inc., (formerly Donnelley Marketing Information Services) and Urban Decision Systems. All four companies pioneered the development of census based marketing information products, and the development of nationwide small area demographic estimates and projections.

Claritas Demography builds on the experience and strengths of these four programs, and is distinguished by a number of unique strengths.

Number and variety of data sources: The Claritas methodology uses a larger number and greater variety of post-census population and household data sources than any other company. The objective is to use the best source available for each area. Number and variety of data sources: The Claritas methodology uses a larger number and greater variety of post-census population and household data sources than any other company. The objective is to use the best source available for each area.

Local data sources: Claritas maintains an extensive network of local government sources for current small area demographic data. Local data sources: Claritas maintains an extensive network of local government sources for current small area demographic data.

Consumer household database sources: Claritas maintains a time series of household counts from the Donnelley Marketing residential database geocoded to the block group level. Consumer household database sources: Claritas maintains a time series of household counts from the Donnelley Marketing residential database geocoded to the block group level.

20+ Years' Experience: Claritas has developed its own demographic estimates and projections for more than 20 years, and now draws upon the experience of four companies which established the original small area estimation programs in the 1970s. 20+ Years' Experience: Claritas has developed its own demographic estimates and projections for more than 20 years, and now draws upon the experience of four companies which established the original small area estimation programs in the 1970s.

Professional Staff: No other company comes close to the expertise, experience, and professional stature of Claritas' demographers and statisticians. This is a staff that pioneered small area demographic estimation methods. Their reputations extend beyond the data industry to the field of applied demography. Professional Staff: No other company comes close to the expertise, experience, and professional stature of Claritas' demographers and statisticians. This is a staff that pioneered small area demographic estimation methods. Their reputations extend beyond the data industry to the field of applied demography.

Evaluated Accuracy: The high quality of the Claritas estimates has been evident in several independent tests, and the Claritas Updates are supported by a rigorous internal program of research, evaluation and testing. The accuracy of the Claritas 1990 population and household estimates was thoroughly evaluated (as described in a detailed paper), and new methods and data sources are tested before being used in production. Evaluated Accuracy: The high quality of the Claritas estimates has been evident in several independent tests, and the Claritas Updates are supported by a rigorous internal program of research, evaluation and testing. The accuracy of the Claritas 1990 population and household estimates was thoroughly evaluated (as described in a detailed paper), and new methods and data sources are tested before being used in production.

The Claritas Annual Demographic Update includes current year estimates and five year projections of the following data items.

Base Counts

Population

Households (occupied housing units)

Families (households with two or more related persons)

Group Quarters Population (e.g., dormitories, military barracks, prisons)

Housing Units (house, apartment, or group of rooms intended as separate living quarters)

Population Characteristics

Population by age

Population by sex

Population by race

Population by Hispanic ethnicity

Population by age, sex, race and Hispanic ethnicity

Median age (by sex)

Per capita income

Household Characteristics

Households by Income

Median household income

Average household income

Average household size (persons per household)

Households by size (number of persons)

Age of householder

Median age of householder

Income by age of householder

Median income by age of householder

Households by wealth

Median household wealth

Average household wealth

Households by income-producing assets

Family Characteristics

Family households by income

Median family income

Average family income

Housing Characteristics

Total specified owner occupied units

Value of specified owner occupied units

Median value of specified owner occupied units

Geography: The Spatial Framework for Data Analysis

Geography provides demographic data with its spatial dimension. Population is not distributed evenly across the landscape, and specific population segments (such as households with young children) have distinct spatial distributions of their own. Consumer demand is strongly associated with demographic characteristics, and therefore follows this complex pattern of "geo-demographic" distributions.

The link between geography and demography is the basis of "geo-demographic" data, and geography provides the consistent spatial framework for demographic and market analysis.

Census geographies, such as census tracts and block groups, are designed for the reporting and analysis of data. They cover the entire U.S., nest neatly within each other, and remain constant during the 10 years between each census. Large government units, such as states and counties also cover the entire U.S., and (with occasional exceptions) are stable over time. However, other commonly used areas are subject to change, and do not always respect other geographic boundaries. These include Metropolitan Areas (MAs), ZIP Codes, ZIP+4s, carrier routes, and media areas (such as DMAs).

The chart below indicates the basic structure of census and other common geographic units.

Basic Geographic Hierarchy

Nation
1

Metropolitan Areas
324

ZIP Codes
42,494
Carrier Routes
ZIP+4s

Regions
4

Divisions
9

States
51

Places
23,435

Counties
3,141

MCD/CCD
35,298

Tracts/BNAs
61,258

Block Groups
226,399

Blocks
6,961,697

Identifying changes in these geographic levels, and the relationships between them, is an important part of the annual Update process. For this purpose, Claritas maintains comprehensive rosters of geography, as well as updated cross-reference files indicating the correspondence between the various levels of geography.

In addition to the core geographic levels identified in the chart, Claritas also maintains rosters and cross-reference files enabling the production of estimates and projections for areas including:

Designated Market Areas (DMAs)

Congressional Districts

Telephone service areas

NPA/NXXs

Wire Center

Cable Television Franchise Areas

Yellow Pages Directory Areas

Postal Carrier Routes

ZIP +4 (PRIZM)

Special Note About Block Group Parts

Many Claritas methods are executed at what is technically the "block group and block group part" level of geography. Block group parts are defined where block groups are split by place and/or MCD boundaries, and census data reported for block groups are reported for block group parts as well. Thus, block group parts function as an intermediate geographic level, between block group and block. Because it is more familiar, the term "block group level" is used throughout this document. However, it is worth keeping in mind that Claritas "block group level" applications usually refer to data and methodologies executed for block groups and block group parts.

Data Sources

Even the best demographic estimation methods are only as good as the data used as input, so the selection and incorporation of accurate input is critical at all levels of geography.

Among the sources contributing data to the Claritas estimates are:

City planning agencies

Regional planning agencies

Donnelley Marketing

ADVO

National Association of Realtors

National Center for Health Statistics

Defense Manpower Data Center

U.S. Bureau of the Census

U.S. Bureau of Labor Statistics

U.S. Bureau of Economic Analysis

U.S. Postal Service

Geographic Data Technology, Inc. (a leading cartographic data firm)

Some specific data sources are described below.

1990 Census Summary Tape Files

The decennial census is an enormously ambitious effort that seeks to count every individual and household in the U.S. It provides the most comprehensive and accurate statistical benchmark of the U.S. population, and provides the foundation for the Claritas estimates and projections. Summary tape files 1 and 3 (STF 1 and STF 3) from the 1990 census contain detailed counts and demographic characteristics for census geographies, and serve as the base data for the estimates of many demographic characteristics.

Census MARS Files

A special set of "Modified Age/Race/Sex" (MARS) 1990 census data was produced by the Census Bureau to make the race data consistent with federal guidelines established by the Office of Management and Budget (see "Race and Hispanic Definitions" later in this document for a description of OMB consistent race), and to correct for age misreporting in the 1990 census. Because the Census Bureau's standard MARS products were produced only down to the census tract level, Claritas contracted for a special nationwide tabulation of the MARS data at the block group level, and uses these modified files as base data in estimating population by age/sex race and Hispanic origin.

Local Estimates

The most accurate and authoritative sources of updated small area population and household counts are often local governments, and other local organizations that monitor population change. Over the past 20 years, Claritas has developed contacts with more than 1,600 of these organizations, and acquired small area estimates and other current data from them for use in the Annual Update. This effort, which is unique to the industry, provides up-to-date census tract level data for many of the most rapidly growing and difficult-to-estimate parts of the country.

Donnelley Marketing

Household counts from the Donnelley Marketing residential database are used in many areas where local estimates are not available. The Donnelley database includes approximately 90 percent of total U.S. households coded to the block group level, and has a strong track record as input to small area estimates. The fact that Donnelley counts are available for years back to 1990 makes them a uniquely valuable measure of post censal household change.

ADVO

Residential mail delivery counts from ADVO, Inc provide another measure of household change. As part of its direct mail services business, ADVO works with the U.S. Postal Service to update its nationwide household address list on a monthly basis. ADVO counts by ZIP Code are remarkably current and complete, and therefore provide the basis for the Claritas ZIP Code household and population estimates.

Current Population Survey

The Current Population Survey (CPS) is a monthly survey of 60,000 households nationwide, conducted by the Census Bureau in collaboration with the Bureau of Labor Statistics. Each year, the March CPS includes an annual demographic supplement, which provides valuable current year detail on the characteristics of U.S. households and household population. The CPS provides valuable checks on the Claritas estimates for large areas, such as the state and national levels, as well as insights into important demographic trends related to average household size, household composition, and household "headship" rates.

Census Bureau Estimates

The Census Bureau produces excellent population estimates at the national, state, county and place/MCD levels of geography. Although their effective dates lag behind those of the Claritas Update, these Census Bureau estimates are a standard for quality, and are used as input for these larger geographic levels.

Bureau of Economic Analysis

Each year, the BEA produces income estimates for all U.S. counties. The estimates are detailed by type of income and an annual series dating back to 1969 provides rich detail on recent and historical income change. These data contribute to the Claritas income estimates at the county level and above.

Geographic Data Technology (GDT)

GDT is the source of numerous geographic files and capabilities that are critical to the Update process. These include ZIP Code boundaries, which are used in mapping and the specification of the relationships between ZIP Codes and small area census geographies. GDT files also provide updated address coding capabilities, which enable the geocoding of address lists to census and other geographic units, as well as the assignment of latitude/longitude coordinates to specific addresses.

The Claritas Methodology

Claritas has developed a multi-step and multi-source methodology to produce the annual demographic Update. The basic approach is that of adapting standard demographic methods for use with the best data available at each geographic level.

For example, Claritas tracks neighborhood level growth and decline through the annual acquisition of current small area data from across the nation. Sources include estimates from local governments, consumer database counts, and postal delivery statistics. Such sources allow a "bottom-up" methodology grounded in authoritative, local sources. Claritas also uses Census Bureau estimates and other federal data to produce highly accurate totals for larger areas such as cities, counties and states. These independent estimates are used as control totals for the small area estimates, thus providing the internal consistency of a "top-down" process.

Claritas has refined this methodology over the past two decades, and each year evaluates new data sources and new techniques to ensure the greatest possible accuracy. The methodology described in this document reflects the sources and procedures applied to the 1997 estimates and 2002 projections. It is a snapshot of the state of the art as practiced by Claritas in 1997.

Base Counts

Base counts include basic totals such as population, household population, group quarters population, households, families and housing units.

Total U.S.

Total population is estimated using the Census Bureau's estimates of total U.S. resident population. Resident population includes all persons residing in the U.S., regardless of citizenship, and excludes the armed forces population stationed outside the U.S. The Census Bureau estimates are published regularly on their Internet web site, and lag by only a few months. The Claritas estimate is a short regression based projection through the Bureau's most recent estimates to the Claritas estimate date.
Group quarters population, which the Census Bureau estimates at the state and county levels, is similarly projected to current year, and is the basis for estimating household population (persons in households). Group quarters population, which the Census Bureau estimates at the state and county levels, is similarly projected to current year, and is the basis for estimating household population (persons in households).

Household Population = total population - group quarters population

Total households, which the Census Bureau estimates at the state level, are summed to national level, and projected to current year. Average household size (persons per household) is then checked against independent Census Bureau estimates as well as the most recent Current Population Survey.

Total families are estimated by projecting to current year, the most recent trends in the ratio of family households to total households. The estimated ratio is applied to estimated total households to produce the national estimate of total families.

Total housing units are estimated by projecting to current year, the estimated ratio of housing units to households, and applying the result to the Claritas household estimate. This ratio is determined by summing Census Bureau housing unit and household estimates from state to national level, and projecting to the Claritas estimate date.

Note: While all base counts are estimated at the national level, only total population and group quarters population serve as exact control totals. Others serve as targets. For example, household estimates are developed at the county level by applying estimated persons per household to estimated household population. If necessary, the county persons per household estimates are revised to bring the resulting household estimates into conformity with the state and national targets. Note: While all base counts are estimated at the national level, only total population and group quarters population serve as exact control totals. Others serve as targets. For example, household estimates are developed at the county level by applying estimated persons per household to estimated household population. If necessary, the county persons per household estimates are revised to bring the resulting household estimates into conformity with the state and national targets.

Five-year projections of the national base counts are produced with similar methods targeted at the five year projection date. The Census Bureau's official national level population projections are used as a guideline, but not a rigid control, since these projections can be several years out of date, and may not reflect trends identified in more recent estimates. Five-year projections of the national base counts are produced with similar methods targeted at the five year projection date. The Census Bureau's official national level population projections are used as a guideline, but not a rigid control, since these projections can be several years out of date, and may not reflect trends identified in more recent estimates.

State

State level base count estimates are based on Census Bureau estimates produced as part of the Federal-State Cooperative Program for Estimates. The Census Bureau series provides annual state level estimates of population, group quarters population, households and housing units. Estimated household population, persons per household, and housing unit occupancy rates are derivatives of these estimates. As at the national level, the ratio of families to total households is projected to produce estimates and projections of total families.

The state estimates of base counts are short projections to current year through the Census Bureau's most recent estimates. A range of estimates is produced for total population, reflecting growth scenarios based on long versus short term change, as reflected in the Census Bureau's annual estimates. The mid range population estimate is used, and all other base counts are geared to this total. Only total population and group quarters population are formally adjusted to national level, and are in turn used as state control totals. The other estimated base counts serve as state level targets.

Five year projections of total population are produced by taking the average of two projected growth scenarios--one based on recent short term growth (or decline) and the other based on the state's longer term trend. The resulting projections reflect the assumption that recent short term trends will eventually be influenced by long-term forces. A good example is the resumption of growth in several "sunbelt" markets which experienced short-term decline during the mid 1980s. The other base counts are projected five years through the current year estimate, and reconciled to the total population projection. Five year projections of total population are produced by taking the average of two projected growth scenarios--one based on recent short term growth (or decline) and the other based on the state's longer term trend. The resulting projections reflect the assumption that recent short term trends will eventually be influenced by long-term forces. A good example is the resumption of growth in several "sunbelt" markets which experienced short-term decline during the mid 1980s. The other base counts are projected five years through the current year estimate, and reconciled to the total population projection.

As at the national level, the Census Bureau's state population projections are used as a frame of reference, but not as input, since they are often dated, and do not account for the impact of recent trends.

County

County level base count estimates start with the Census Bureau's county population estimates produced as part of the Federal State Cooperative Program for Estimates. Non-Census estimates are occasionally used in selected counties as warranted. A range of long and short-term growth rates is used to produce alternative projections to current year, and the median (or mid range) growth scenario is used as the final population estimate. Evaluations against the 1990 census confirmed that this method produced a highly accurate set of county population estimates in 1990.

The Census Bureau's estimates of county group quarters population (adjusted to state level) are used to estimate household population. The household population estimates are then divided by estimated persons per household to produce county estimates of total households. Family households and total housing units are derived by estimating family/household and units/households ratios, based on estimated change in these ratios at the state level. The estimated ratios are applied to the estimate of total households to produce estimated families and housing units.

Total and group quarters population estimates are ratio-adjusted to the state estimates described above. The other base counts are summed to state level, and checked for conformity with independent targets at that level.

Five year projections are produced first for population, by averaging the result of projections using short and long-range growth rates. Other base counts are projected through the estimate year, and adjusted for conformity with the population projection. Five year projections are produced first for population, by averaging the result of projections using short and long-range growth rates. Other base counts are projected through the estimate year, and adjusted for conformity with the population projection.

Place Level

Every other year, the Census Bureau produces population estimates for all government units (incorporated places and functioning minor civil divisions). These "place level" geographies range in size from cities as large as New York and Los Angeles to the thousands of small towns and MCDs with no more than a few hundred people. They are the only subcounty population estimates provided by the Census Bureau.

These place level population estimates are projected to current year to provide population control totals below the county level. The unincorporated portions of counties (area not included in a place or MCD) are treated as a separate "place level" record in this control process, thus enabling the adjustment of current year "place level" population controls to the county population estimates described above. Place level population controls are not produced for the five year projections.

Census Tract Level

Background: Background:

Nationwide sets of small area estimates are unheard of outside the data industry, but many local governments produce estimates for the census tracts in their jurisdictions. Because such data are often the best information available on small area trends, Claritas contacts 1,600 local agencies each year in an unparalleled effort to obtain, review, and incorporate the quality work being done by local demographers and planners. If completed local estimates are not available, administrative data indicating change (such as housing unit counts or utility hookups) are sometimes acquired.

The local data do not come in a neat package. Methodologies vary, as do estimation dates and even storage media. Some sources estimate population, others households, and still others housing units. For these reasons, the data are meticulously reviewed, and prepared for input to programs that account for these differences. In all cases, estimates of tract level average household size (persons per household) are critical to tying the varied input (population, households and housing units) together into a consistent set of current year base count estimates.

In areas where suitable local data are unavailable, alternative sources are used. The primary alternative is list count data from the Donnelley Marketing database of over 90 million households. Donnelley data--available to Claritas for the first time this year--have an impressive track record as input to census tract estimates. Donnelley is the only list supplier able to provide 1990 list counts needed to measure list coverage (list households/census households) and change since the census year. Using a modified version of the "housing unit method," Claritas measures tract specific rates of household change in the Donnelley list, and applies these rates to 1990 census household counts to establish preliminary tract level household estimates. The resulting estimates are then used--along with local estimates--as input to the standard Claritas estimation process.

In areas lacking suitable local or Donnelley data, ZIP Code delivery counts from ADVO provide another resource. ADVO, a major direct mail services company, works closely with the U.S. Postal Service to continually update its nationwide household address list. The ADVO time series from 1990 to current year is combined with tract specific intercensal trends to estimate tract specific household rates of change to current year. These rates of change are used to establish tract level household estimates which are used --with local and Donnelley based estimates--as input to the Claritas estimation process.

Because estimates of group quarters population are rare, many suppliers have held group quarters population constant at 1990 levels. However, Claritas provides estimates of change since 1990. A few local areas provide tract group quarters estimates, and change in military group quarters is estimated through the "base closing" checks described below. The Census Bureau now estimates group quarters population for counties, thus providing a valuable basis for adjusting 1990 small area totals.

Family households are estimated by projecting change in tract specific family/household ratios to current year based on estimated change in this ratio at the county level. Housing units are similarly estimated by projecting change in tract specific units/households ratios to current year based on estimated change in this ratio at the county level.

Event Tracts and Military Base Closings

It is during the acquisition and review of local demographic estimates that Claritas demographers account for events such as earthquakes, fires and hurricanes that can have a dramatic impact on the population of selected areas. Local, Donnelley, and ADVO data are critical to this effort, as is contact with local demographers.

In recent years, military base closings and realignments have had similarly dramatic effects on a number of communities. For this reason, Claritas tracks base closings and realignments, and estimates their impact at the census tract level. This effort is accomplished with military and civilian employment data from the Defense Manpower Data Center, which indicates the timing and magnitude of downsizing by installation, and a special tabulation of 1990 census commuting data, which indicates the extent to which neighboring tracts have depended on employment at the installation.

Tract level population counts are ratio adjusted to the place/MCD level estimates described above. Household estimates are controlled indirectly to this level by applying estimated persons per household to adjusted household population. Because families, housing units, and group quarters population are not estimated at the place/MCD level, the tract estimates of these items are adjusted to the county level controls.

Five year projections of tract level base counts are produced as nonlinear projections through the current year estimates. Rapid rates of growth and decline are moderated into the future to reflect the assumption that extreme rates of net migration are unlikely to be sustained over long periods of time. Event tracts, such as those described above, are projected separately, in order to reflect the extent of rebuilding or recovery from the relevant event. Ratio-adjustments for the five year tract projections are made to county level control totals. Five year projections of tract level base counts are produced as nonlinear projections through the current year estimates. Rapid rates of growth and decline are moderated into the future to reflect the assumption that extreme rates of net migration are unlikely to be sustained over long periods of time. Event tracts, such as those described above, are projected separately, in order to reflect the extent of rebuilding or recovery from the relevant event. Ratio-adjustments for the five year tract projections are made to county level control totals.

Block Group

Block group estimates and input data are scarce, so it has been standard industry practice to distribute tract population and household estimates to block groups using 1990 census ratios. However, block group change can be volatile and contrary to tract level trends. For this reason, Claritas now uses block group specific trends in the Donnelley Marketing list as direct input to block group household estimates. This application is an ambitious industry first, providing a direct measure of which block groups in a tract are changing most rapidly. Further block group precision is provided by the fact that Claritas corresponds the Census Bureau place estimates (including those with just a few hundred people) to "block group part" geographies.

Block

The seven million census blocks in the U.S. are the smallest units of census geography. The census reports only data from the complete count (or short form) census at this level.

Claritas produces estimates of population, households, and population age 18+ at the block level. However, given the absence of input data at this micro level of geography, users should be aware that the block estimates are block group estimates proportioned to blocks based on 1990 census percentages from STF 1.

Other Geographic Levels:

ZIP Code

Claritas ZIP Code estimates begin with the construction of a block group-to-ZIP Code correspondence file for current year. Supplied by Geographic Data Technology (GDT), this correspondence is determined by identifying the location of block centroids (latitude/longitude points) within current year ZIP Code boundaries established GDT. If a block's centroid falls within a ZIP Code boundary, it is allocated to that ZIP Code. These block-to-ZIP allocations define which block groups (or partial block groups) make up a given ZIP Code. For block groups allocated to more than one ZIP Code, percent inclusion factors are based on 1990 census block population counts. The resulting correspondence file establishes a geographic definition for each ZIP Code, and is the basis for reconfiguring block group data to ZIP Codes defined for current year.

Research, and years of experience, have convinced Claritas that counts from meticulously compiled mailing lists provide the basis for authoritative estimates of the number of households receiving mail at selected ZIP Codes. In particular, such counts can be more accurate than those obtained by reconfiguring block group data to ZIP Codes based on mechanically derived correspondence files--as described above.

For this reason, the initial Claritas ZIP Code estimates are compared to ADVO residential delivery counts--themselves adjusted for vacancies and seasonal occupancies. Discrepancies between the initial estimates and ADVO delivery counts are used to refine the original block group-to-ZIP Code correspondence file. Specifically, blocks are re-assigned from ZIPs where the initial household estimates exceed the ADVO count to adjacent ZIPs where the household estimate needs to be increased. The resulting block group-to-ZIP Code correspondence file produces ZIP Code estimates in closer conformity with the ADVO household delivery counts.

The refined correspondence file is used to produce both the current year ZIP Code estimates and the five year projections. Even the 1980 and 1990 census data are reconfigured to current ZIP Code definitions based on the refined correspondence file. In fact, it is only by applying the same correspondence file to data for all years that full trendability is achieved between 1980, 1990, current year estimate and five year projections--all with reference to the most current ZIP Code definitions.
A Note on "Hexagon," or P.O. Box ZIP Codes: A Note on "Hexagon," or P.O. Box ZIP Codes:

In rural areas that do not have residential mail delivery, residents typically pick up their mail at a nearby post office. Although residents usually live in the surrounding area, their addresses are boxes at the post office. Thus, it is impossible to define a spatial dimension for such ZIP Codes based on the addresses served. For mapping purposes, Claritas identifies such ZIP Codes with a hexagon, but the preparation of data is more involved.

Demographic data for P.O. Box ZIP Codes are produced by associating such ZIP Codes with census blocks and block groups near the relevant post office. This is accomplished using the latitude/longitude coordinates of the post office, and radiating out to identify those blocks whose centroids are closest to this point. Blocks are added until the cumulative population conforms with a predetermined target based on residential delivery statistics derived from the U.S. Postal Service and ADVO.

The blocks associated with the P.O. Box ZIP Codes, and their 1990 census population totals, define a block group-to-ZIP Code correspondence for these P.O. Box ZIPs. This correspondence is then used (as described for conventional ZIP Codes above) to produce demographic data for all years--including 1980, 1990, current year estimate and five year projection.

Cable TV, Yellow Pages, etc.

Claritas maintains detailed and comprehensive geographic cross-reference files, which serve as the basis for creating estimates and projections for additional geographic levels including:

Designated Market Areas (DMAs)

Congressional Districts

Telephone service areas

NPA/NXXs

Wire Center

Cable Television Franchise Areas

Yellow Pages Directory Areas

Postal Carrier Routes

Demographic Characteristics

Age, Sex, Race and Ethnicity

The estimation and projection of population by age, sex, race and Hispanic ethnicity involves complex methods that produce a full set of population numbers crosstabulated by age, sex, race and Hispanic ethnicity. A review of some basic definitions will make the methodology easier to follow.

Race and Hispanic Definitions

There are no universally accepted definitions of race and Hispanic ethnicity. The census currently defines Hispanic origin as an ethnicity, not a race. Hispanic origin is a separate census question, and in census tabulations, persons of Hispanic ethnicity can be of any race. Because Hispanics are included in each race category, the race categories alone sum to total population.

The standard Claritas age/sex/race and Hispanic Updates include this overlap of race and Hispanic ethnicity. However, a separate set of race and Hispanic data (minus the age/sex crosstabulation) are available for the nonoverlapping categories below.

White (Non-Hispanic)

Black (Non-Hispanic)

American Indian/Eskimo/Aleut (Non-Hispanic)

Asian or Pacific Islander (Non-Hispanic)

Other (Non-Hispanic)

Hispanic

Full age/sex/race and Hispanic distributions for the nonoverlapping categories are produced, and are available through custom delivery.

Another set of race definitions is sometimes referred to as "OMB consistent" because of its conformity with federal data standards established by the Office of Management and Budget (OMB). Because the OMB definitions do not provide for an "Other" race category, the Census Bureau produced a separate set of "modified" race data, which reassigned persons of "Other" race to specified race categories. As part of this effort, the Census Bureau took the opportunity to make corrections for age misreporting identified in the original 1990 census results. The final product of this work is known as the 1990 census MARS (Modified Age/Race/Sex) data.

The 1990 MARS tabulations serve as the base data for the Claritas age/sex/race and Hispanic estimates. However, because the Census Bureau produced no MARS products below the tract level, Claritas paid for a special tabulation of the MARS data at the block group level. Furthermore, to enable enhanced methodological applications, Claritas had the block group MARS data tabulated separately for group quarters and household population.

Age/Sex/Race and Hispanic at County Level and Above

Race and Hispanic Origin:

At the county level and above, Claritas race and Hispanic estimates and projections are based on state and county level race and Hispanic estimates from the Census Bureau. Claritas uses these Census Bureau estimates only as a source of percent race and Hispanic composition at the county level and above. At each level, the Census-based estimates and projections are adjusted for conformity with the Claritas estimates and projections of total population.

Although produced by age/sex detail, the Census Bureau estimates are the source of race/Hispanic control totals only. County, state and national age/sex totals are produced separately, as described below.

Age/Sex Composition:

Estimated and projected age/sex composition is produced with a modified cohort survival method described in more detail in the Tract and Block Group Level section below. These procedures start with 1990 census population by detailed age/sex/race/Hispanic composition, and use race and Hispanic specific survival probabilities to estimate and project age/sex/race composition to the current and projection years. The resulting age/sex/race/Hispanic distributions are then adjusted for conformity with the Census-based race and Hispanic estimates described above.

Age/Sex/Race and Hispanic Origin at the Tract and Block Group Levels

Because the subcounty age/sex/race and Hispanic procedures are so elaborate, the full method is first used to produce estimates at the census tract level, which are adjusted to the county totals described above. Only then are the methods applied to block groups, and the results adjusted to the tract level estimates. Because detailed age/sex/race and Hispanic data can be so thin at these levels, full decimal detail is retained throughout the computation process. The estimates are rounded to integers (i.e., "whole persons") only as a final step.

Age/Sex Composition:

Population by age/sex composition is estimated and projected using cohort survival methods. Cohort survival is a major factor in changing age structures, and is driven by the reality that, for example, persons age 35 in 1990 who survive another five years, will be age 40 in 1995. Accordingly, most populations with a large proportion of 35 year olds in 1990 can expect to have large proportions of 40 year olds in 1995. It is this process that has swelled the U.S. age structure at progressively older age categories as the baby boom cohort has aged.

Cohort survival methods involve the application of age and sex specific survival rates to population data broken down by age and sex. The application of these rates sets the cohort survival process into motion.

Claritas cohort survivals are judiciously executed starting with 1990 census population by single year of age at the block group level. The 1990 age/sex data are further cross classified by race and Hispanic ethnicity, and include the "modified age/race/sex" (MARS) improvements described above.

Note that the block group level MARS data were not part of any standard census product, and became available only because Claritas paid the Census Bureau to produce this special tabulation.

Single year survival probabilities, specified by single year of age, sex, race and Hispanic ethnicity were derived from life tables used by the Census Bureau in the development of its official population projections. Age and sex are always critical to such probabilities, but differential survival probabilities by race and Hispanic origin are a reality in our society, and contribute to the Claritas method.

Each round of cohort survival ages the population of each block group ahead one year by applying the single year survival probability to the number of persons in each age/sex/race/Hispanic category. For example, the process projects the number of 35 year olds who will survive to become 36 year olds, and so forth. Single year survivals are performed sequentially until the estimate and projection years are reached.

Accounting for Births:Accounting for Births:

As part of each cohort survival application, the population less than one year of age is "survived" to age "1." An estimate of births is required to fill the vacated "age 0" category. Births are estimated using the child/woman ratio--defined as the population "age 0" divided by females age 15-44 (childbearing age). The child/woman ratio is an indirect measure of fertility specific to each small area, and in the Claritas application, to race and Hispanic ethnicity as well.

Most important, the measure is sensitive to changes in the number of women of child bearing age--which is itself influenced by the cohort survival process, and is critical to anticipating changes in total births. An increase in the number of child bearing women will result in an increased number of births even if fertility rates (or the child-woman ratio) remain constant. Because fertility rates have been relatively stable in recent years, child/woman ratios from the 1990 census MARS files are used for both the estimate and projection intervals. The use of the MARS corrections is important because age misreporting in the 1990 census was most problematic in the "age 0" category.

Exceptions to Cohort Survival:

The cohort survival process is at work in all areas, but its effects are complicated by migration. Claritas methods incorporate the implicit assumption that the age/sex/race and Hispanic composition of population gained or lost through migration is similar to the block group's "survived" population for the estimate year. This assumption is refined, however, through adjustment to control totals.

Furthermore, the cohort survival process is often not applicable to populations living in group quarters facilities such as dormitories, military barracks, prisons and nursing homes. These populations have high turnover, and age/sex compositions which tend to be stable, reflecting the nature of the facility. For this reason, cohort survivals are applied only to the population living in households. Group quarters populations are estimated separately, and their age/sex/race/Hispanic compositions are held constant with those measured in the previous census.

Claritas also identifies segments of the household population (such as concentrations of college students in off-campus housing) for which cohort survival is not applicable. Concentrations of these "hidden group quarters" populations are identified through their distinctive imprint on small area age compositions, and are similarly exempted from the cohort survival process.

Ironically, the exemption of group quarters populations from the survival process improves results for small areas, but causes minor distortions at the national level. The reason is that such populations do "survive" to become older--they just do so in other areas. Consequently, the Claritas method includes a procedure to survive persons in group quarters, and add them to the household population in non-group quarters areas.

For both the estimate and projection years, preliminary age/sex distributions (along with race and Hispanic ethnicity) are ratio-adjusted to the independently derived estimates at the census tract and county levels.

Race and Hispanic Ethnicity:

Because Claritas performs cohort survival procedures separately for each category of race and Hispanic ethnicity, the results capture an important component of changing race/Hispanic composition. Specifically, the procedure captures differences traced to the effects of differential fertility and mortality rates by race and ethnicity. For example, the relatively high fertility rates of the Hispanic population contribute to their increasing percent of total population in many areas. To factor in the effects of migration, the tract and block group level cohort survival results are adjusted to the census-based county estimates described above.

Income Estimates

All Claritas income estimates are expressed in current year dollars using the "money income" definition reported in the 1990 census. In contrast to the 1990 census, which reported income for the previous calendar year (1989), Claritas income estimates are for the calendar year relevant to each set of estimates and projections. For example, 1996 income is estimated for 1996 households.

As with the demographic estimates and projections, data are produced first at the national level, then for progressively smaller areas, with successive ratio adjustments ensuring consistency between levels.

Per capita and aggregate income are estimated first. Aggregate income is the total of all income for all persons in an area, and per capita is the average income per person--or aggregate income divided by total estimated population. Income earned by persons in group quarters facilities is estimated separately, and subtracted from aggregate income to derive aggregate household income--or the total income earned by persons living in households. Aggregate household income divided by total estimated households is the estimate of average household income.

Household Income Distribution

Household income includes income earned by all persons living together in a housing unit (i.e., all household members). Claritas estimates household income for all 25 income categories reported in the 1990 census, as indicated below.

Households with income less than $5,000

Households with income of $ 5,000 to $ 12,499

Households with income of $ 12,500 to $ 14,999

Households with income of $ 15,000 to $ 17,499

Households with income of $ 17,500 to $ 19,999

Households with income of $ 20,000 to $ 22,499

Households with income of $ 22,500 to $ 24,999

Households with income of $ 25,000 to $ 27,499

Households with income of $ 27,500 to $ 29,999

Households with income of $ 30,000 to $ 32,499

Households with income of $ 32,500 to $ 34,999

Households with income of $ 35,000 to $ 37,499

Households with income of $ 37,500 to $ 39,999

Households with income of $ 40,000 to $ 42,499

Households with income of $ 42,500 to $ 44,999

Households with income of $ 45,000 to $ 47,499

Households with income of $ 47,500 to $ 49,999

Households with income of $ 50,000 to $ 54,999

Households with income of $ 55,000 to $ 59,999

Households with income of $ 60,000 to $ 74,999

Households with income of $ 75,000 to $ 99,999

Households with income of $100,000 to $124,999

Households with income of $125,000 to $149,999

Households with income of $150,000 and over.

Additional Income Ranges Calculated by Claritas

In addition to the standard census income categories, Claritas estimates detail for the "$150,000 and over" category. The "extended" income categories are indicated below.

Households with income of $150,000 to $249,999

Households with income of $250,000 to $499,999

Households with income of $500,000 and over.

Although few households had 1989 incomes in these ranges, this detail is important for analyses in affluent markets. Moreover, due to inflation, incomes in excess of $150,000 will be more common by the time the next census is taken. In fact, the Claritas projections already extend to years when the extended income categories will be more widely relevant.

The extended income categories are estimated first for 1989 (1990 census), and become part of the 1990 census base data from which the current year estimates and five year projections are produced. Pareto methods, which involve an assumption of exponential decay, are applied to the 1990 census income distribution in each block group to estimate the number of households in each of the extended income categories.

Income Estimation Method

At the national level, results from the most recent Current Population Survey are used to ensure the reasonableness of the major components of the Claritas income estimates.

At the state and county level, per capita income estimates produced annually by the Bureau of Economic Analysis (BEA) are the basis for estimating income change since the 1990 census. Specifically, 1989 (census year) BEA estimates and those for recent years are adjusted to reflect the census "money income" definition, and the observed rates of change, extended to the Claritas target date. These state and county specific BEA rates are then applied to the 1990 census base data to produce a current year estimate of per capita income. Internal Claritas research has demonstrated the effectiveness of BEA county data in estimating income growth from 1979 to 1989, and confirmed that the reconfiguration to "money income" enhanced this performance significantly.

As described above, estimated aggregate income is adjusted for group quarters, and divided by total estimated households to derive the estimate of average, or mean, household income. The 1990 census household income distribution is then statistically advanced to reflect the estimated current year mean income for each area. This procedure involves the estimation of the number of households advancing from one income category to another--based on the specific area's estimated rate of income growth.

Income change at the census tract level is estimated through the analysis of intercensal income growth. Tract rates of income growth differ largely due to differences in demographic composition. For example, tracts with highly educated populations or concentrations of professional employment may experience higher rates of income growth than those with concentrations of single parent families. Consequently, income growth exceeds the county rate in some tracts, but lags behind in others. Claritas measures this compositional effect during the intercensal period, and applies it to the estimation period to establish unique rates of change in mean income at the census tract level. As at the county level, 1990 census households by income distributions are statistically advanced to reflect estimated mean income.

Block group income is estimated by applying the tract level rates of change to all component block groups, and statistically advancing the 1990 census distribution to the target mean. Iterative proportional fitting to tract level income and block group total households completes the estimates.
Five year projections of income are produced by projecting national level mean household income ahead five years. Mean income for smaller areas is projected based on performance relative to the larger area. As in the income estimates, areas which have tended to outperform on income growth, will continue to do so. Once mean income is projected, the current year estimated income distributions are statistically advanced to reflect the projected means. Again, iterative multidimensional adjustments ensure consistency through all levels of geography. Five year projections of income are produced by projecting national level mean household income ahead five years. Mean income for smaller areas is projected based on performance relative to the larger area. As in the income estimates, areas which have tended to outperform on income growth, will continue to do so. Once mean income is projected, the current year estimated income distributions are statistically advanced to reflect the projected means. Again, iterative multidimensional adjustments ensure consistency through all levels of geography.

Family Income

A family household is one in which the householder is related to one or more other persons living in the household. Family households also include any other non-related persons living in the same housing unit. Family household income includes all income of all persons living in a family household. In contrast, family income includes only the income of family members, or persons related to the householder.

Family household income is estimated by subtracting the 1990 census non-family household income table from the household income table. This provides a 25-cell income table that reflects the income distribution of family households. This table is then extended to 27 categories using the same methods applied to household income.

Household income growth rates from 1989 to current year are then used to estimate mean family household income for current year, and the 1990 census distribution is statistically advanced to reflect the target mean. Five year projections are produced by trending the estimated mean out five years, and advancing the current year distribution to the current year. For both estimate and projection years, family household income distributions are adjusted to conform to both total family households estimated for the specific area, and the family household income distribution for the next higher geographic level.

Income by Age of Householder

The crosstabulation of household income by age of householder is valuable because income and life cycle stage, together, are so strongly associated with consumer needs and behavior. The Claritas "income by age" updates are produced after the estimates of population by age and households by income have been completed. The data constitute a 132 cell table defined by 12 categories of household income and 11 categories of householder age. The row and column totals from these tables (the "income" and "age" totals) are commonly referred to as the "marginal totals."

The estimates of households by income serve as the income "marginals," but population by age estimates must be converted to householder by age for use as the age "marginals." For each area estimated, 1990 census data are used to determine age specific "headship rates," or the percent of persons in specific age categories who are householders. Trends in the Current Population Survey are applied to the small area data to estimate headship rates for current year. The estimated headship rates are then applied to estimated population by age to produce estimated householders by age. A final adjustment to total households ensures consistency with the critical base count.

With the income and age (row and column) marginal totals estimated, the final step is to estimate the full crosstabulation of income by age of householder. In other words, values must be determined for each of the 132 income by age categories, or cells. Block group level income by age cell values from the 1990 census (expanded by Claritas to the full 132 cell extended income configuration) provide the initial input. Within each age category, the 1990 census income distributions are advanced to reflect the block group's (previously) estimated rate of income growth. This adjustment expresses the 1990 census income by age distribution in current dollar values. Iterative proportional fitting is then used to simultaneously adjust the resulting table to the income and age of householder marginal totals estimated for current year. The iterative proportional fitting method is described in more detail in the Appendix.

The 1990 census income by age tables reflect the statistical relationship between income and age for individual block groups. Iterative fitting not only adjusts the 1990 census table to conform simultaneously with the household by income and householder by age estimates (the marginal totals), but does so in a way that preserves the statistical relationship between income and age as measured for the specific block group.

The income by age estimates are produced at the county, tract, and block group levels, with adjustments ensuring consistency between levels.

Five year projections are produced using similar methods. Projected households by income serve as the income marginal totals, and the current year headship rates are used to convert projected population by age to projected householders by age. The income by age table estimated for current year is then adjusted to projected dollar values, and then iteratively adjusted to the projected marginal totals.

Households by Size

Estimates of households by size (number of persons) are produced for the categories 1, 2, 3, 4, 5, 6 and 7 or more persons. The distribution of households by size from the 1990 census serves as the base from which the current year estimates are derived. The 1990 distribution is advanced to current year based on estimated change in persons per household (average household size). Iterative proportional fitting is then used to ensure consistency with estimated household totals and average household size.

Projected households by size is based on the 1990 census and current year estimated distribution of households by size. The current year distribution is shifted to reflect the growth or decline in average household size during the projection interval. Iterative proportional fitting is then used to ensure consistency with projected household totals and average household size.

Housing Value

Value is estimated and projected for specified owner-occupied housing units, and is based on the 1990 census measure, which reflects the census respondent's estimate of how much their house would sell for, or the asking price if it was currently for sale. Median value is estimated and projected as well as the distribution of units among the categories below.

$15,000 or less

$15,000 to $19,999

$20,000 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $59,999

$60,000 to $74,999

$75,000 to $99,999

$100,000 to $124,999

$125,000 to $149,999

$150,000 to $174,999

$175,000 to $199,999

$200,000 to $249,999

$250,000 to $299,999

$300,000 to $399,999

$400,000 to $499,999

$500,000 or more

The base count of specified owner-occupied housing units is derived by applying 1990 census housing percentages to the completed estimate of total housing units.

Change in value since 1990 is estimated based on time series median sales price data supplied for major metropolitan areas by the National Association of Realtors. Although the price of units sold may not reflect the value of all housing in an area, Claritas research on the National Association of Realtors data confirmed that change in median sales price was a strong indicator of change in value over the 1980 to 1990 period. Therefore, post 1990 sales price data are used to determine change in value at the metropolitan area level. Estimated change in specific counties, tracts, and block groups reflects the trend in the broader market, but factors in differential income growth at the neighborhood level. Also, in markets where sales prices have declined in recent years, block group PRIZM cluster codes are used to identify neighborhoods where housing is most likely to retain its value.

Once median value is estimated, the 1990 census distribution is advanced to reflect the new median. Five year projections of value are produced in a similar manner by projecting median value five years beyond the current year estimate, and advancing the current year distribution to reflect the projected median.

Income Producing Assets and Wealth

The census does not collect information on wealth and income producing assets (IPA) so estimates of these items are unique in that they are not census based. The basis for estimates of wealth and income producing assets is the Market Audit database which is a proprietary survey of consumer financial behavior. As an on-going telephone survey, the Market Audit obtains data on more than 90,000 households per year. To obtain the most accurate estimates possible, Claritas uses multiple years worth of survey data, in fact over 450,000 completed interviews were used to create the IPA model. The challenge now is for Claritas to develop an accurate income producing assets estimation technique for households of the entire country.

Unlike traditional measures of assets and net worth, IPA is a measure of the ownership of assets that generate income and includes:

liquid deposits (checking and savings accounts)

time deposits (CDs)

retirement savings and

the market value of all securities, such as stocks, bonds and mutual funds.

IPA is a measure of the money and assets that make more money. Passive assets which are relatively liquid and do not generate an income flow (such as a house or art collection) may appreciate over time but are not included in the calculation of IPA.

Because of the size and sampling plan of the Market Audit, it provides both the depth and stability necessary to capture overall patterns of IPA, as well as regional market level differences. The Market Audit provides detailed patterns of financial holdings for over 60 distinct regions across the country. This survey is also weighted to reflect the age and income distributions of households within each market area.

The fundamental building block of the Claritas IPA estimates is household level data of age, income and home ownership. A full breakout of this household data allows Claritas to provide 112 unique cells for classifying households for every category of IPA. Because of the robust size of the Market Audit database, a loglinear model can be created to estimate the proportion of households in each of these 112 cells for every IPA category. This model is then applied to updated census demographics to yield estimates of the number of households within the IPA categories for various levels of geography. The estimates are further controlled by the market areas to allow for regional differences across the country.

Wealth is a measure of a household’s net worth as defined by assets minus liabilities. Assets include:

present value of home

motor vehicles

retirement savings and

the market value of all securities, such as stocks, bonds and mutual funds.

Liabilities include:

mortgages and home equity loans

auto loans

credit card balances

secured and unsecured personal loans

The modeling strategy of wealth is similar to that of IPA. A loglinear model is fitted to the Market Audit survey data. The predictors include household income, age of householder, home ownership, and IPA. The model is applied to the current year estimates of the predictors to yield the estimates of wealth, with the controls for the various market areas.

The five categories of Income Producing Assets are:

Households with IPA of less than $100,000

Households with IPA of $100,000 to $249,999

Households with IPA of $250,000 to $499,999

Households with IPA of $500,000 to $999,999

Households with IPA of $1,000,000 and over

The six categories of Wealth are:

Households with Wealth of less than $25,000

Households with Wealth of $25,000 to $49,999

Households with Wealth of $50,000 to $99,999

Households with Wealth of $100,000 to $249,999

Households with Wealth of $250,000 to $499,999

Households with Wealth of $500,000 and over

Smoothed Data

In addition to the annual demographic estimates and projections, Claritas provides a series of detailed 1990 census tables which are ratio-adjusted, or "smoothed," to relevant current year totals. For example, the 1990 census table on marital status is adjusted for conformity with estimated population age 15 and above by sex. These "smoothed" tables are not estimates, and do not purport to show anything beyond the effect of applying 1990 census distributions to estimated base count totals at the block group level.

Nevertheless, such data can be quite valuable. While percent distributions of characteristics have not been estimated beyond 1990, the totals within specific categories will often be more accurate than those of the 1990 census--especially in areas experiencing rapid population growth or decline. Moreover, because the "smoothed" data are produced at the block group level on a "bottom-up" basis, percent distributions for aggregations (any area including more than one block group) will differ from those observed in 1990. This bottom-up effect can be quite advantageous. For example, if the most rapidly growing block groups in a county tend to have relatively high concentrations of married couple households, the "smoothed" result will indicate an increased proportion of married couple households in that county for current year.

Therefore, taken for what they are, and used with an understanding of their limitations, the Claritas "smoothed" data are a legitimate, and highly valuable component of the annual demographic Update.

The list of "smoothed" data items is indicated below:

Persons 15 years old and over by sex and marital status

Households by household size and household type

Households by age of household members and household type

Households by household type and household size

Persons in group quarters by group quarters type

Occupied housing units by tenure

Housing units by units in structure

Persons by ancestry

Workers 16 years old and over by place of work

Workers 16 years old and over by means of transportation to work

Workers 16 years old and over by travel time to work

Workers 16 years old and over who did not work at home by aggregate travel time to work

Persons 25 years old and over by educational attainment

Persons 25 years old and over by race and educational attainment

Persons of Hispanic origin 25 years old and over by educational attainment

Persons 16 years old and over by sex and employment status

Employed persons 16 years old and over by industry

Employed persons 16 years old and over by occupation

Employed persons 16 years old and over by class of worker

Households by race of householder and household income

Hispanic households by household income

Aggregate household income by type of income

Families by number of workers in family

Families by poverty status, family type and presence and age of children

Housing units by year structure built

Occupied housing units by year householder moved into unit

Occupied housing units by tenure and vehicles available

Workplace Demographics

Standard census tabulations reflect population and demographic characteristics according to place of residence, and do not reflect the fact that many workers journey away from home during the day--taking their consumer behaviors with them. In order to capture this important reality, Claritas has developed products to reflect Workplace Demographics.

Serious workplace estimates are relatively new, and considerably more ambitious than conventional residence based data. In contrast to standard UPDATE items, the Workplace products have no published 1990 counts upon which to build. Therefore, the Workplace data are currently prepared for current year only--there are no corresponding 1990 census data, and no five year projections. Also, because the geocoding of workplace addresses is less precise than that for residential addresses, Workplace products are presently available only down to the census tract level. However, the development of Workplace products is relatively new, and improvements can be expected in the near future.

Workplace Population

The Claritas Workplace Population estimates provide information on businesses and employment, and thereby address the need for what is sometimes called "daytime population" data. The Claritas Workplace Population estimates are unique in that they go beyond private sector employment to include public sector employment, persons in the armed forces, and persons working at home. The estimates are produced down to the census tract level, and allocations to the block group level enable the retrieval of these estimates for geometric areas such as circles and polygons.

The major components of the Workplace Population estimates include the following:

Private Sector Employment

Private Sector Employment by Major Industry

Private Sector Business Locations

Private Sector Business Locations by Major Industry

Public Sector Employment

Military Employment

Persons Working at Home

Estimates by major industry (SIC) include the following categories.

1) Agriculture, Forestry, and Fisheries

2) Mining

3) Construction

4) Manufacturing, Nondurable Goods

5) Manufacturing, Durable Goods

6) Transportation

7) Communications and Other Public Utilities

8) Wholesale Trade

9) Retail Trade

10) Finance, Insurance, and Real Estate

11) Business and Repair Services

12) Personal Services

13) Entertainment and Recreation Services

14) Professional and Related Health Services

15) Professional and related Educational Services

16) Other Professional and related Services

A "total employment" estimate is computed by summing private sector employment, public sector employment, military employment, and persons working at home. However, users should be aware that some self-employed workers may not be captured by these estimates. Users should also note that "total employment" cannot be equated with "total daytime population," which would include persons not in the labor force, who reside in and remain in the area during the daytime.

Methodology

The Workplace Population estimates are produced using different resources at different levels of geography. At the state and national levels, employment estimates from the Bureau of Labor Statistics provide control totals for private sector employment by major industry and public sector employment by level of government. Employees per establishment ratios from the Census Bureau's County Business Patterns files provide the basis for estimating business establishments for the private sector categories.

At the county level, the Census Bureau's County Business Patterns provides an authoritative basis for estimating private sector employment and establishments by major industry. Federal public sector employment is estimated based on county level federal employment statistics provided by the federal Office of Personnel Management. State and local government employment are distributed from the previously computed state controls to county level based on county-to-state patterns exhibited in the 1990 census for state and local government employees.

Data from a major business list compiler are used to distribute the county level establishment estimates to the census tract level. Specifically, the tract within county distribution of businesses on the compiled list (as geocoded by Claritas) is used to distribute the county level establishment estimates to the tract level--for both total establishments and establishments by SIC.

Ratios of employees per business location from the compiled list are used to produce preliminary tract level employment estimates, which are then adjusted to conform with the previously estimated county and state employment totals. Public sector employment is similarly distributed to the tract level based on the tract within county distribution of public sector employees on the compiled list.

Establishments versus Business Locations: Establishments versus Business Locations:

Because County Business Patterns counts a business with multiple locations (e.g., storefronts) as a single "establishment," a final adjustment is made to convert the Claritas estimates from "establishments" to total "locations." This adjustment is made using SIC specific location/establishment ratios using the compiled list as the source of locations and County Business Patterns as the source of establishments.

Military Employment

Estimates of military employment are based on data provided by the Defense Manpower Data Center, and coded to census tract by Claritas.

Persons Working at Home

Because place of work and place of residence are the same for these workers, they are estimated from the 1990 census journey to work data, which identified about 3.4 million persons "working at home." The estimation of this component is important because of the growing popularity of "telecommuting," and because it captures at least some of the "self-employed" workforce not covered by establishment based data sources. Persons working at home are estimated by applying the percent of workers "working at home" in the 1990 census to the Claritas "smoothed" estimate of total workers age 16 and above for current year.

Workplace PRIZM Distribution

The objective of Workplace PRIZM is to specify the PRIZM lifestyle composition of the population working in an area by identifying the census tracts of residence of persons commuting to specific tracts of employment. Workplace PRIZM does not suggest a collective lifestyle of persons working in an area, but rather the mix of lifestyles brought to the workplace by commuters from diverse residential neighborhoods.

Workplace PRIZM is based on a special tabulation of tract-to-tract commuting data from the 1990 census. For each census tract of employment, the tabulation lists the census tracts of residence for workers commuting to that tract of employment, and the total number of workers in each tract-to-tract commuting flow.

For each census tract of employment, Workplace PRIZM is designed to present the percent distribution of the residential PRIZM cluster codes being brought into the tract by workers commuting to jobs in that tract. However, these percent distributions are applied to the "total employment" numbers from the Workplace Population estimates (see above) to suggest the number of workers in an area by residential lifestyle cluster.

Workplace PRIZM distributions are produced by carrying weighted averages of block group cluster codes along with commuters to their census tracts of work. The more commuters coming from neighborhoods with a given PRIZM code, the more heavily represented that code is in the tract's Workplace PRIZM distribution. Although the commuting file is limited to the tract level, block group PRIZM codes are fed into the commuting streams, weighted by the percent of a tract's workers in its component block groups. Thus, if a block group has a large proportion of a tract's workers, its PRIZM code is more heavily weighted in the tract flow, while PRIZM codes for block groups with relatively few workers contribute relatively little.

The tract-to-tract commuting patterns are fixed as of the 1990 census, but Workplace PRIZM is updated annually. Specifically, the weight assigned to each block group PRIZM code in a tract-to-tract commuting stream is based on total workers age 16+. Workers age 16+ are updated annually as a "smoothed" data item (see the section on smoothed data), and are very much influenced by the annual block group estimates of total population age 16+. Thus, the weights assigned to PRIZM codes within commuting streams are updated annually. Furthermore, the number of workers at each tract of employment is updated annually as part of the Workplace Population estimates. Thus, while the commuting patterns remain fixed, the residential population feeding into these streams, and the workplace population emerging from them are updated annually, and contribute a significant update to the Workplace PRIZM distribution.

APPENDIX

Allocation of 1980 Population to 1990 Geography

A new census provides not only new data, but a new small area geographic framework as well. At the county level and above, geographic units are quite stable, but a new census brings numerous changes at the census tract level and below. Consequently, 1980 census data must be allocated to 1990 census geographic units in order to identify the intercensal trends presented in demographic "Trend" reports, and which sometimes contribute to estimation methods. This allocation ensures "apples-to-apples" comparisons.

1980 census items converted to 1990 geography include:

Population

Group Quarters Population

Households

Families

Housing Units

Per Capita Income

Average Household Income

Average Family Income

Households by Income

Families by Income

Population by Age/Sex/Race and Hispanic ethnicity

Median Age by Sex, Race and Hispanic ethnicity

Population by Race and Hispanic ethnicity

Householders by Age

Households by Income and age of Householder

Median Household Income by Age of Householder

The allocation of 1980 census data to 1990 geography was accomplished through the creation of a 1980-to-1990 tract/MCD level geographic cross-reference file. The cross-reference file was developed from TIGER/Line files. TIGER is the Census Bureau's enormous and detailed computerized map of the entire U.S., and enables the geographic cross-reference by indicating both 1980 and 1990 geographic codes with all street segment and feature information in a given area. TIGER was used to confirm instances where 1980 tract boundaries were unchanged in 1990, or split cleanly to two or more 1990 tracts. Also identified were the relatively infrequent occurrence of two or more 1980 tracts combining to form one tract in 1990, and the common occurrence of complex changes not involving clean spits or combinations.

1980-to-1990 correspondence was determined, on a county by county basis, by assigning each 1990 block to one 1980 census tract or MCD based on a majority relationship. In most cases, the 1990 census blocks are entirely contained within a single 1980 tract or MCD. Where TIGER indicated that a 1990 block was split between two or more tracts or MCDs, the block was assigned to the 1980 tract that contained the majority of the block's faces (a block face being one side of a street segment).

The initial results were validated to ensure the coverage of all 1980 tracts and MCDs and all 1990 blocks. Clerical follow up included the assignment of missing blocks to 1980 geographies by a variety of methods including:

matching the entire six-character 1990 tract code associated with each block to the 1980 tract code.

matching the 1990 MCD code associated with each block with the 1980 MCD code in non-tracted areas.

matching the first four characters of the 1990 tract code associated with each block to the 1980 tract code.

finding the census tract or MCD whose centroid is closest (within the county) to the 1990 block centroid.

1980 census tracts and MCDs that failed to have a single 1990 census block assigned were resolved manually. These tended to be shipboard tracts or geographies with zero population in 1980.

A cross-reference file between 1990 blocks and 1980 block groups and enumeration districts was not created because the tract/MCD was the smallest unit of 1980 census geography represented nationwide in TIGER. 1980 block group codes were presented for urban areas, but were known to contain errors. The 1980 enumeration district codes relevant to all other areas were not provided.

Consistency of Complete Count and Sample Census Totals

Because much census information was collected on a sample basis, using the census "long form," the Census Bureau used sophisticated weighting techniques to estimate a complete count result. These weighted sample totals often differ from the complete count totals by small amounts. In other words, complete counts from STF 1 tables often differ from weighted sample counts from STF 3. For example, a census tract with 1,200 households might have an income table summing to 1,206 or 1,197 households. The differences are statistically inconsequential, but as part of the Update process, Claritas produces 1990 census tables with sample census totals adjusted for conformity with those of the complete count.

For those preferring the original 1990 census numbers, they are available in Claritas' full offering of STF 1 and STF 3 tables.

Adjustment Techniques

The adjustment process is essential to the production of estimates which use the most accurate input available at each geographic level, and are consistent across all levels of geography. The Claritas Updates are geographically consistent, meaning that for each data item, block group data always sum to tract totals, which always sum in turn to county, state and national totals. Adjustments techniques also ensure that characteristic distributions sum to base count totals (e.g., households by income always sums to total households). Adjustments are sometimes critical to the estimation process itself--as in the iterative proportional fitting used to produce the updates of income by age of householder. The basic techniques are described below.

Ratio Adjustment

Ratio adjustment is used when creating estimates for a group of smaller units of geography. It is also the method used to make a characteristic distribution (such as age) sum to within rounding error of the total base count (such as population)

The procedure is:

1) Add up the data for the smaller geographic units (or parts of distribution), and call it "S."

2) Let "L" be the estimate for the larger geographic unit (or total to be added up).

3) Calculate the adjustment factor "R" as L divided by S.

4) Multiply the data for each of the smaller units (or each element of the distribution) by the adjustment factor R.

5) Round adjusted data for each smaller unit to the nearest integer (e.g., 27.8 households rounds to 28 households).

Once adjusted results are rounded to integer (step 5 above), ratio adjustment only guarantees that the sum of the estimates will be close (or within rounding error) of the desired total L. Further processing is needed to make the sum of the estimates equal L. Claritas refers to this process as "sprinkling."

Sprinkling

Sprinkling is the placement of rounding error in a distribution such that the probability that the "error" is placed in a particular element of the distribution is proportional to the size of the element. The term "error" refers to the difference between the sum of the adjusted distribution and the desired sum.

For example, if the distribution were initially 100, 200, and 300, and the sum needed to be 601, then the extra 1 (the "rounding error") would need to be placed in one of the three categories of the distribution. Using sprinkling, the initial estimate of 300 would have a 50 percent chance of being changed to 301, 200 would have a 33.3 percent chance of becoming 201, and 100 would have a 16.7 percent chance of being changed to 101.

Note that sprinkling has the effect of spreading the extra (or deficit) count in the most gentle way possible. By using actual percent distributions to determine the probabilities for assigning rounding error, the impact on percent distributions is minimized.

Iterative Proportional Fitting

Iterative Proportional Fitting (IPF) methods are an elaborate form of ratio-adjustment, and are used when estimates must be adjusted to conform simultaneously with two sets of "marginal" control totals--often referred to as the two dimensions of a a two-dimensional table. Income by age of householder is a good example. The estimates must sum to both households by income and householders by age. In another example, block group level race estimates must sum to both total population for the block group, and population within each race category at the census tract level.

IPF methods begin with a two-dimensional table with target row and column totals, referred to as the row and column marginal totals. For example, one might have 12 categories of households by income as the row totals and 11 categories of householders by age as the column totals established for a 132 cell (12 x 11) table. The objective is to produce estimates for the table's 132 cells that sum to both the row and column marginal totals.

The execution of IPF methods requires an initial set of cell values, often called the "seed" values. In the case of income by age of householder, the seed values are obtained from the 1990 census. The arrangement of households in the income by age table reflects an intricate set of probabilities defining the relationship between household income and age of householder--as measured for the specific geography in the census. However, as 1990 census figures, these values sum to neither estimated households by income nor estimated householders by age.

Iterative proportional fitting achieves this conformity through a series of ratio adjustments to the row and column marginal totals. Each round (or iteration) of row and column adjustments brings the seed values closer to conformity with the marginal totals. The number of iterations required varies by area, but the values eventually "converge" on a result that sums, within rounding error, to the marginal totals. Two-dimensional "sprinkling" is then used to eliminate rounding error. The resulting estimates not only sum to the desired marginal totals, but preserve the statistical relationship between the two variables (income and age) measured for the area by the census.

This procedure is often adapted to simultaneously adjust initial set of estimates to a fixed set of row and column "marginal" totals.

Income Distributions

A source of occasional confusion is the fact that the 1990 census reported income earned during calendar year 1989. This is the case whether the data are described as "1989 income" or "1990 census income." The one year census lag is logical, since no one had yet received their 1990 income in April 1990 when the census was taken. The Claritas Updates are not constrained by this reporting limitation, and therefore present income for the calendar year corresponding to the household estimate or projections. For example, the 1996 Update includes estimates of 1996 households by income earned in 1996. When comparing such estimates against the census, note that total households represent a six year change since 1990, while income represents a seven year change since 1989.

Extended Income and Pareto Interpolation

Income tabulations from the 1990 census top out at the "$150,000 or more" category. This reporting limit made sense for standard census products since in 1989, only 1.6 percent of all households had incomes in excess of $150,000. However, this reporting limit is not appropriate for affluent neighborhoods (even in 1989), and will become less appropriate nationwide as incomes increase through the decade. Claritas has therefore "extended" the 1990 census income distributions to include categories of: $150,000 to $249,999, $250,000 to $449,999, and $500,000 and over.

Vilfredo Pareto (1848-1923), creator of the unrelated "80/20 rule," also is credited for creating a method used to approximate the upper end of an income distribution. Pareto's distribution is an exponential decay curve.

The Pareto distribution is typically used to extend income ranges for very large areas, such as whole countries, where income distributions are regular and smooth. The application of Pareto methods for small areas, where distributions can have irregular shapes, requires some care. For this reason, extended income categories are produced and sequentially controlled starting with the national level, followed by states, counties, tracts and block groups. At each level, 1990 census tabulations specifying the aggregate income of households with incomes exceeding $150,000 were used to check and refine the Pareto results.

Claritas applies the Pareto extension to the 1990 census income data only. Estimated and projected income for the extended categories is produced with standard methods applied to the extended 1990 census base.

Inflation and Income

A common question is how the effect of inflation is accounted for in the Claritas income estimates.

Inflation, as commonly measured by the Consumer Price Index, reflects changing prices, and a corresponding change in the value of a dollar. For example, items that would have cost $100 in 1983, would have cost about $147 by 1993--a 47 percent inflation in prices. Thus $100 was not the same in 1993 as it was in 1983.

Inflation is not a measure of income change, but the two are related. Some income sources (such as Social Security and some union contracts) are "indexed" by inflation, and workers typically require and demand more pay to cover the increased costs of living. Although income tends to follow inflation, it does not move at the same rate. There are periods when income growth outpaces inflation, and periods when it lags behind. These income changes relative to inflation are referred to as "real" income growth.

The Claritas income estimates and projections are expressed in current dollar values--which reflect how many dollars are being received at the relevant year. As such, they reflect both "real" income growth (or decline) and the change due to the effect of inflation. Rather than estimating the effects separately, Claritas measures the combined or net effect through input sources (such as the Bureau of Economic Analysis income estimates) which themselves estimate income change in current dollars. The inflation effect measured in these estimates is implicitly incorporated into the Claritas estimates. Note that accounting for inflation in this manner is different from controlling for inflation--which requires removing the effect of inflation, to produced estimates in constant dollar values.

Acquiring the Claritas Demographic Estimates and Projections

The Claritas annual demographic estimates and projections can be obtained in a variety of forms.

Hard-copy reports ordered via an 800 number.

Data licensed in our desktop marketing systems, such as Compass, or via the company's online data access and analysis system, Claritas Connect.

Diskette magnetic tape, or CD-ROM.

Annually updated demographics are also used to update Claritas' segmentation systems such as the PRIZM lifestyle cluster system, and the P$YCLE financial consumer segmentation system.

For information about obtaining these data, call 1-800-284-4868 or the nearest Claritas office.