๐ฆ Release Notes: Version 0.1.0
The first release of the ALS TDI ARC Study, mapped to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), restructures a subset of the ARC Natural History Study into the OMOP CDM structure and maps a subset to standardized vocabularies.
This is part of a larger harmonization effort with Answer ALS and the Critical Path Institute.
Note about EHR data: We are actively working to include electronic health record (EHR) data in future releases. Tools in use for EHR integration:
- CDAtransformer โ Parse C-CDA and FHIR files into structured tables.
- RWDExchange โ Evaluate exchangeability of real-world data (EHR, registries) for external comparator trials.
๐ Complete Data Set and Documentation
๐งฎ CDM Version
- OMOP CDM v5.4 โ Documentation
๐ฅ Participant Summary
- Total participants: 1,665
- People with ALS
- Asymptomatic carriers
- Healthy controls
Participant type available in Person table (
participant_source).
Note: not all participants answered all surveys.
This version includes:
- Self-reported surveys
- ALSFRS-R data
- Laboratory results from blood samples
๐ Citation
ALS Therapy Development Institute (ALS TDI). (2023). ALS Research Collaborative (ARC) [Data set]. ALS Therapy Development Institute. https://doi.org/10.71944/C3NA-9124
-
๐งฉ Domain Mappings
-
๐ง Person
(Person Domain)
- Year of birth
- Sex
- Race
- Ethnicity
- IDs sequentialized; prefixes
CASE_,CONTROL_,ASYMP_retained - Unknown/multiple race/ethnicity/sex โ
concept_id = 0
-
๐ Observation
(Observation Domain)
- Validated, self-reported ALSFRS-R (mapped using custom concepts; more details in the custom concepts section of this page)
- Speech
- Salivation
- Swallowing
- Handwriting
- Cutting Food
- Dressing Hygiene
- Turning in Bed
- Walking
- Climbing Stairs
- Dyspnea
- Orthopnea
- Respiratory Insufficiency
- Total Score
- Self-reported ALS diagnosis (mapped using custom concepts; more details in the custom concepts section of this page)
- El Escorial Criteria (revised) categories
- Definitive
- Possible
- Probable โ Lab Supported
- Suspected
- Medical history
- Family medical history
- Personal medical history
- History of head injury
- ALS symptom onset
- Anatomical site of symptom onset
- Lifestyle: tobacco use
- Occupation/industry
- Military service
- Validated, self-reported ALSFRS-R (mapped using custom concepts; more details in the custom concepts section of this page)
-
๐งช Measurement
(Measurement Domain)
- Laboratory measurements: lab-provided measurements from blood draws, including
- A/G Ratio
- Albumin (g/dL)
- Alkaline Phosphatase (U/L)
- Basophils (%)
- Basophils Abs (10^3/mm3)
- Bilirubin Total (mg/dL)
- BUN (mg/dL)
- BUN/Creatinine Ratio
- Calcium (mg/dL)
- Chloride (MM01/L)
- CO2 (MM01/L)
- Creatinine (mg/dL)
- EGFR (mL/min/1.3 sq m)
- Eosinophils (%)
- Eosinophils Abs (10^3/mm3)
- Globulin (g/dL)
- Glucose (mg/dL)
- Hematocrit (%)
- Hemoglobin (%)
- Lymphocytes (%)
- Lymphocytes Abs (10^3/mm3)
- MCH (PB)
- Monocytes (%)
- Monocytes Abs (10^3/mm3)
- Neutrophils (%)
- Neutrophils Abs (10^3/mm3)
- Platelet Count (10^3/mm3)
- Potassium (MM01/L)
- RDW (%)
- Red Blood Cell Count (10^6/mm3)
- SGOT (AST) (U/L)
- SGPT (ALT) (U/L)
- Sodium (MM01/L)
- Total Protein (g/dL)
- White Blood Cell Count (10^3/mm3)
- Self-reported ALS-linked genetic mutations
- PFN1
- SOD1
- SPG11
- FUS
- TARDBP
- C90RF72
- VCP
- NEK1
- Laboratory measurements: lab-provided measurements from blood draws, including
-
๐ Drug Exposure
(Drug Exposure Domain)
- Self-reported medications and supplements
- Ingredient-level mapping (โฅ20 frequency mapped, others = 0)
- Dosage not calculated; source values retained
- Missing start date โ dummy
1900-01-01 - Missing end date โ start date reused
-
โฐ๏ธ Mortality
(Death Domain)
- Date of death (month/day set to
12-31for privacy)
- Date of death (month/day set to
๐๏ธ Dates and Timing
- Dates may be shifted for de-identification (Hripcsak et al., JAMIA 2016).
- If missing:
- Survey date
- Dummy
1900-01-01 - Approximate date applied
- Observation period:
- Start = first survey date
- End = last event or death date
โ Missing Data
- Not collected โ excluded unless required by OMOP CDM.
- Not all participants have complete data.
- Controls and asymptomatic carriers often have fewer entries.
๐ ๏ธ Custom Concepts
Some ALS-specific variables lacked standardized OMOP vocabularies; custom/local concepts (>2,000,000,000) were created.
- Anatomical site of symptom onset โ
2000000396 - El Escorial Criteria โ
2000000061
El Escorial Harmonization
- Harmonized with Answer ALS & C-Path.
- Self-reported, validated by ALS TDI staff.
| El Escorial Status | Custom Concept ID |
|---|---|
| Suspected | 2000000062 |
| Possible | 2000000058 |
| Probable Laboratory Supported | 2000000060 |
| Probable | 2000000059 |
| Definite | 2000000057 |
๐ CDM Summary Counts
| Domain | Person IDs | Records | Primary Concept Field | # Unique Concepts |
|---|---|---|---|---|
| person | 1,665 | 1,665 | N/A | N/A |
| death | 586 | 586 | N/A | N/A |
| observation_period | 1,665 | 1,665 | N/A | N/A |
| visit_occurrence | 1,665 | 32,340 | visit_occurrence_concept_id | 0 |
| condition_occurrence | 1,410 | 1,410 | condition_concept_id | 1 |
| measurement | 383 | 39,271 | measurement_concept_id | 42 |
| observation | 1,595 | 340,371 | observation_concept_id | 34 |
| drug_exposure | 872 | 4,881 | drug_exposure_concept_id | 115 |
(Other OMOP domains not populated in this release)
๐งญ Guidance for Data Use
- ๐ Review
*_source_valueand*_source_concept_idcolumns to trace original survey responses. - ๐ง Explore concept definitions with OHDSI Athena.
๐ Surveys Included
The dataset integrates the following surveys:
- Enrollment
- General Information
- Family History
- Geography
- Lifestyle
- Occupation
- Medical History โ Hospitalization
- Medical History โ Injuries
- Medical History โ Clinical Trials
- Medical History โ Conditions
- Your ALS Experience
- Medications
- Supplements
๐ Survey Questionnaires
Summaries of survey forms (not all mapped to OMOP in this release).
Enrollment
- DOB, phone, address, gender, ethnicity, race
- Marital status, education
- Height/weight (current, at age 40)
ALS Diagnostic Status
- Possible, Lab-Supported Probable, Probable, Definite, Asymptomatic Carrier, PLS
Timeline
- First symptom date & site
- First neurology visit
- First possible diagnosis
- Formal diagnosis
Physician Information
- Primary care physician, neurologist
Health & Function
- Devices: tracheostomy, feeding tube, CPAP, DPS
- Comorbidities, family ALS, genetic testing, medications, trial participation, bleeding disorders
- Functional ability: stairs, arm raise, wheelchair use
Emergency Contact
- Name, relation, phone, email
Family History
- Relatives, conditions (ALS, Alzheimerโs, MS, autoimmune, etc.)
Geography
- Birthplace, long-term residences, farm/ranch history
Lifestyle
- Smoking history
- Physical activity history
Occupation
- Employment history, industry, job titles
- Military service, deployments
Your ALS Experience
- Diagnosis details, age at diagnosis
- Health events since onset (pneumonia, falls, clots)
- Symptom progression (cramps, twitching, swallowing, speech, bowel/bladder)
Medical History โ Hospitalization
- ER visits/hospital stays past 3 months
Medical History โ Injuries
- Head/neck injuries by cause
- Age, severity, associated conditions
Medical History โ Conditions
- Physician-diagnosed: ALS, Alzheimerโs, asthma, Crohnโs, fibromyalgia, dystrophy, neuropathy, psoriasis, rheumatoid arthritis, lupus, thyroid disease, TIA, ulcerative colitis, etc.
Clinical Trials
- Trial name, start/end, Gov ID, sponsor, phase, type, treatment, enrollment size
Medications
- Drug name, dosage, start/end dates, frequency
Supplements
- Product name, brand, start/end dates, frequency, serving size
๐ For full OMOP domain details, see the OMOP CDM v5.4 Reference Guide.