1. Overview

The demographics table contains basic person-descriptives for the people found in VDW tables.  It should also serve as a lookup dataset for MRNs—every MRN appearing in any other VDW file should appear in Demographics (even if nothing substantive is known about the person).

2. Data Dictionary

Variable name Type(Length) Preferred Format Description Valid Values Comments
MRN char(varies) n/a

An arbitrary identifier unique to an individual within a site.

Any, so long as they uniquely identify individual people.

Used to link people across files within a site. May or may not contain the official local indigenous person identifier (e.g., "medical record number").

Regardless of whether it contains the official local identifier, this variable should never leave the site. Projects needing to move individual-level data should create a study-specific person identifier and substitute it for MRN on any data that is to move (See, e.g., the %DeIDDset() standard macro.)

birth_date numeric(4) mmddyy10 Date of Birth SAS Date  
gender char(1) n/a Gender or sex of the person.
M
Male
F
Female
U
Unknown
O
Other: Transsexual, Transgendered, or anything else that does not fit into one of the prior categories.
This variable is whatever the health plan collects as GENDER (role) or SEX (biology), in that order of preference. Information on transgendered and transsexual individuals should be coded as “Other”. Sites able to provide reliable data on transgendered and transsexual persons should document the availability of this information on their site dataset implementation page on the portal.

Race1

Race2

Race3

Race4

Race5

char(2) n/a Race(s) of the person.
HP
Native Hawaiian or Other Pacific Islander
IN
American Indian/Alaska Native
AS
Asian
BA
Black or African American
WH
White
MU
More than one race, particular races unknown or not reported
UN
Unknown or Not Reported
See Note 1 below.
hispanic char(1) n/a Hispanic origin (ethnicity)
Y
Yes
N
No
U
Unknown
Nulls are no longer allowed.
primary_language char(3) n/a Primary language spoken at last contact

As defined in the ISO-639-2 specification, or 'unk', for unknown.

Partial list:

eng
English
spa
Spanish
  • This is capable of changing over time—we are only concerned with the most recent information.
  • People whose languages are not known should have the value 'und', (undetermined) or ‘unk’ (unknown). (The former is part of the official ISO-639-2 spec, and the latter is our own devising. The two codes should be interpreted as equivalent.)
  • People whose languages are listed as 'other' in source data should be mapped to 'und'.
  • Clarity sites may find this information in patient.language_c
needs_interpreter char(1) n/a Needs interpreter to communicate w/an English-only speaker?
Y
Yes
N
No
U
Unknown
  • This is capable of changing over time—we are only concerned with the most recent information.
  • Clarity sites may find this information in pat_enc.interpreter_need_yn

3. Notes

Note 1: Race

Our goal is to have the most complete, reliable, and detailed race and ethnicity information in the demographics file. SDMs should gather race information from all sources permitted at your site—for example, tumor registry data; state birth & death data; and regular large-scale social surveys. If there is more than one source of race information for a set of individuals, SDMs should give preference to self-reported sources.

Where multiple sources (or multiple measures from a single source) conflict as to the race of a given person, and the SDM does not have a reason to prefer one source to another (e.g., no one source is known to be most trustworthy, etc.) code all races indicated by any source.

For data sources that treat Hispanic ethnicity as a value of race (so e.g., you know the person is Hispanic, but you don’t know their race) code the person’s race as "Unknown".

Format

Race is now coded in accordance with the NIH Policy And Guidelines On The Inclusion Of Women And Minorities As Subjects In Clinical Research.

 

Example Race Coding Race1 Race2 Race3 Race4 Race5
White only WH UN UN UN UN
White and Pacific-Islander HP WH UN UN UN
No Race Known UN UN UN UN UN
African-American and Native-American IN BA UN UN UN
Multi-racial, particular races not reported. MU UN UN UN UN
Recoding From The Prior Scheme

The below format can be used to translate the values of the prior race coding scheme into those of the new scheme.

      value $Race
         '01' = 'WH'
         '02' = 'BA'
         '03' = 'IN'
         '04', '05', '06', '08', '09',
         '10', '11', '12', '13', '14',
         '96' = 'AS'
         '07', '20', '21', '22', '25',
         '26', '27', '28', '30', '31',
         '32', '97' = 'HP'
         Other = 'UN'
      ;
    

4. Future Plans

We plan to revisit coding of sex (biology), gender (role), race, and ethnicity as our health plans increase their sophistication of collecting this information from their members.

5. Contact Information

Please contact the VDW Enrollment and Demographics working group for questions and suggestions. Working group contact information can be found on the VDW Operations section of the CRN Portal.

Back