Skip to content

New to RWE? Start Here

These curated, freely accessible resources provide foundational knowledge, practical examples, and step-by-step instructions for navigating real world evidence:

Book

Guide to Real-World Data for Clinical Research
By Danielle Boyce (ALS TDI) and Pavel Goriacko (Montefiore)
Visit rwd.guide

Online Course

Introduction to OMOP: Your Frequently Asked Questions Answered
Taught by Danielle Boyce (ALS TDI) and Pavel Goriacko (Montefiore)
Enroll in the course

Train-the-Trainer (OHDSI/OMOP)

πŸ‘‰ Open the Train-the-Trainer materials

A curated, hands-on curriculum covering OMOP/CDM fundamentals, vocabularies, Atlas, and exercises.

Curated Resource Overview

OHDSI, OMOP, and FHIR for Neurodegenerative Disease Researchers
Created by Danielle Boyce (ALS TDI)
Access the resource

ALS Geospatial Hub:

The ALS Geospatial Hub brings together authoritative data from federal agencies, research institutions, and non-profit organizations, organizing it by geography. Access the resource

πŸ“š Additional Resources to Support the STARDUSTT Framework

πŸ”§ Git & GitHub

πŸ—„οΈ Relational Databases


πŸ“˜ Data Science Handbook


🧰 Data Management Tools & Guidance


πŸ’» Programming & Data Science Resources

Core Platforms:
- Project Jupyter β€” Interactive computing.
- What is the Jupyter Notebook? β€” Beginner-friendly guide.
- NIAID NIH Informatics Resources.

Software Carpentry (free, hands-on lessons):
- Python
- R
- Databases & SQL

Additional Learning:
- DataCamp β€” Interactive coding lessons.
- Khan Academy: SQL Basics.
- Codecademy: Learn Python.
- Python Data Science Handbook by Jake VanderPlas.
- R for Data Science by Hadley Wickham & Garrett Grolemund.
- NIH β€œAll of Us” Jupyter & Programming Docs.

OMOP/OHDSI

This page contains practical resources for working with Observational Health Data Science and Informatics (OHDSI) community resources and Observational Medical Outcomes Partnership (OMOP) data, including an interactive OMOP data dictionary, code snippets for common analytic tasks using a variety of software, and examples of observational research projects suited to OMOP/OHDSI frameworks. For a good overview of OHDSI/OMOP, please see the resources in the chapter, "New to RWE? Start Here."

OMOP Scavenger Hunt

In addition to the essential resources listed on "New to RWE? Start Here" page, jump-start your expertise by exploring these resources early in your OHDSI journey.


Join the Community

☐ Introduce yourself on the OHDSI Forums β€” β€œWelcome to OHDSI” thread

☐ Follow OHDSI on LinkedIn

☐ Subscribe to the OHDSI Newsletter

☐ Learn about OHDSI Workgroups

☐ Attend an OHDSI Community Call

☐ Review past & upcoming OHDSI events

☐ Join the OHDSI Microsoft Teams environment

Reading and Reference

☐ Bookmark the Book of OHDSI

☐ Bookmark NIH All of Us OMOP documentation

OMOP Training & Tutorials

☐ Enroll in the EHDEN Academy

☐ Watch OHDSI tutorials & workshops

The OMOP CDM

☐ Bookmark the OMOP Common Data Model site

☐ Read the OMOP CDM FAQ

Standardized Vocabularies

☐ Search vocabularies in Athena β€” look up individual concepts of interest to you

Data

☐ Download the MIMIC-IV demo OMOP dataset

Software & Tools

☐ Review OHDSI software tools

☐ Explore the Atlas Demo


πŸ“Š OMOP CDM Basic Data Dictionary

For a sample interactive OMOP data dictionary detailing the fields in the OMOP CDM, please click on the thumbnail below. For the specific ARC study data dictionary, visit the Neuromine Data Portal.

OMOP Data Dictionary Thumbnail

πŸ“ˆ Projects Best Suited for Observational Research and OHDSI Network Studies


πŸ§ͺ Analytic Use Cases and Examples

Analytic Use Case Type Structure Example
Clinical Characterization Disease Natural History Amongst patients who are diagnosed with <insert your disease of interest>, what are the patient’s characteristics from their medical history? Amongst patients with rheumatoid arthritis, what are their demographics (age, gender), prior conditions, medications, and health service utilization behaviors?
Treatment Utilization Amongst patients who have <insert your disease of interest>, which treatments were patients exposed to amongst <list of treatments for disease> and in which sequence? Amongst patients with depression, which treatments were patients exposed to SSRI, SNRI, TCA, bupropion, esketamine and in which sequence?
Outcome Incidence Amongst patients who are new users of <insert your drug of interest>, how many patients experienced <insert your known adverse event of interest from the drug profile> within <time horizon following exposure start>? Amongst patients who are new users of methylphenidate, how many patients experienced psychosis within 1 year of initiating treatment?
Population-level Effect Estimation Safety Surveillance Does exposure to <insert your drug of interest> increase the risk of experiencing <insert an adverse event> within <time horizon following exposure start>? Does exposure to ACE inhibitor increase the risk of experiencing Angioedema within 1 month after exposure start?
Comparative Effectiveness Does exposure to <insert your drug of interest> have a different risk of experiencing <insert any outcome (safety or benefit)> within <time horizon following exposure start>, relative to <insert your comparator treatment>? Does exposure to ACE inhibitor have a different risk of experiencing acute myocardial infarction while on treatment, relative to thiazide diuretic?
Patient-level Prediction Disease Onset and Progression For a given patient who is diagnosed with <insert your disease of interest>, what is the probability that they will go on to have <another disease or related complication> within <time horizon from diagnosis>? For a given patient who is newly diagnosed with atrial fibrillation, what is the probability that they will go on to have ischemic stroke in next 3 years?
Treatment Response For a given patient who is a new user of <insert your chronically-used drug of interest>, what is the probability that they will <insert desired effect> in <time window>? For a given patient with T2DM who starts on metformin, what is the probability that they will maintain HbA1C <6.5% after 3 years?
Treatment Safety For a given patient who is a new user of <insert your drug of interest>, what is the probability that they will experience <insert adverse event> within <time horizon following exposure>? For a given patient who is a new user of warfarin, what is the probability that they will have GI bleed in 1 year?

Source: OHDSI. (2023). Save Our Sisyphus Challenge Slides (PDF)


🧭 Current CDM

CDM54 Image

Source: OHDSI Common Data Model

  • πŸ”— Interactive (Select) OMOP Data Dictionary
    https://github.com/DBJHU/DBJHU.github.io/blob/main/SelectOMOPDataDictionaryInteractivev2.html

πŸ—‚οΈ Commonly Used CDM Tables Overview

The OMOP common data model (CDM) is a relational database made up of different tables that relate to each other by foreign keys (XXXX_ID values; e.g., PERSON_ID or PROVIDER_ID). The OMOP tables in your data export are as follows:

Table Description
Person Contains basic demographic information describing a participant, including biological sex, birth date, race, and ethnicity.
Visit_occurrence Captures encounters with healthcare providers or similar events. Contains the type of visit a person has (outpatient care, inpatient care, or long-term care), as well as the date and duration information. Rows in other tables can reference this table, for example, condition_occurrences related to a specific visit.
Condition_occurrence Indicates the presence of a disease or medical condition stated as a diagnosis, a sign, or symptom, which is either observed by a provider or reported by the patient.
Drug_exposure Captures records about the utilization of a medication. Drug exposures include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as drugs. Drug exposure is inferred from clinical events associated with orders, prescriptions written, pharmacy dispensing, procedural administrations, and other patient-reported information.
Measurement Contains both orders and results of a systematic and standardized examination or testing of a participant or participant's sample, including laboratory tests, vital signs, quantitative findings from pathology reports, etc.
Procedure_occurrence Contains records of activities or processes ordered by or carried out by a healthcare provider on the patient to have a diagnostic or therapeutic purpose.
Observation Captures clinical facts about a person obtained in the context of an examination, questioning, or a procedure. Any data that cannot be represented by another domain, such as social and lifestyle facts, medical history, and family history, are recorded here.
Device_exposure Captures information about a person's exposure to a foreign physical object or instrument which is used for diagnostic or therapeutic purposes. Devices include implantable objects, blood transfusions, medical equipment and supplies, other instruments used in medical procedures, and material used in clinical care.
Death Contains the clinical events surrounding how and when a participant dies.

βœ… OMOP Data Quality


πŸ”§ ETL Basics

  • PDF: https://www.ohdsi.org/wp-content/uploads/2019/09/OMOP-Common-Data-Model-Extract-Transform-Load.pdf
  • Book: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html

πŸ› οΈ ETL Steps

  1. Dataset profiling and documentation
  2. Create data model documentation, sample data, data dictionaries, code lists, and other relevant information (23-Aug)
  3. Execute database profiling scan (WhiteRabbit) on source database
  4. Prepare mapping approach/documents based on scan reports from database profiling scan

  5. Generation of the ETL Design

  6. Mapping workshop with all relevant parties to:
    1. Understand the source
    2. Define the scope of source data to be transformed
    3. Define acceptance criteria for OMOP output
      Output: draft mapping document
  7. Finalize mapping document:

    • Integrate all notes/documentation from workshop
    • Work through mappings and verify, update, fill in gaps
    • Meetings/emails with data contact/technical contact (TC) as needed
  8. Source Data Integrations and Semantic Mapping

  9. Source Code mapping:
    • Identify which codes are already mapped to standard vocabulary
    • Identify code types for codes that need to be mapped
    • Translation of code description/phrases to English, if/as needed
    • Create proposed code mappings
  10. Generate mappings for data coming out of flowsheets (together with consortium)
  11. Review/approval of code mappings (often by medical experts with the Data Owner)
  12. Identify imaging & waveform data; map using consortium-defined guidelines
  13. Use OHNLP to extract OMOP data from unstructured sources

  14. Technical architecture design

  15. CI/CD strategy & version control
  16. OHDSI ecosystem needs & infrastructure design

  17. Technical ETL Development

  18. Implement ETL (preferred language/structure)
  19. Update ETL based on testing/QA/feedback (8, 9)

  20. Setting up Infrastructure

  21. Deploy core servers and services based on (4)

  22. Install OHDSI tools

  23. Database server, Achilles/DQD/Ares, Atlas/WebAPI, RStudio Server, HADES, notebooks & other site-specific tools

  24. ETL Testing and Validation

  25. Test ETL on sample/dev data, then DO data
  26. Verify & document QA
  27. Submit Achilles/DQD/AresIndexer results regularly
  28. Plan & manage ETL development

  29. Data Quality Assessment

  30. QA/Acceptance testing for mapping accuracy & completeness
  31. Review & approval by Data Owner

  32. Documentation

    • Mapping Documentation, Themis checks, and technical/transform documentation
  33. Project Management Throughout

    • Organize tasks, milestones, and follow-up

πŸ§ͺ OHDSI Analysis Tools

R, SQL, Python, or any preferred data analysis software.
Reference: The Book of OHDSI β€” Chapter 9: SQL and R


πŸ“˜ Data Science Handbook

Open, rigorous and reproducible research: A practitioner’s handbook β€” Stanford Data Science


🧰 Data Management Tools & Resources

  • DMP Tool: https://dmptool.org/
  • NIH DMS Policy Planning: https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#after

πŸ’» Programming Resources (Jupyter, Python, SQL, R)

Software Carpentry (free lessons): - Programming with Python - Programming with R - Databases and SQL

Additional resources: - DataCamp - Khan Academy β€” SQL Basics - Codecademy β€” Learn Python 2 - Python Data Science Handbook - R for Data Science - NIH β€œAll of Us” documentation:
Jupyter & programming


🌐 OHDSI Resources


⭐ Special Topic: Clinical Registries Using OHDSI

OHDSI and Clinical Registries: Sanity for Health Systems (Aug. 22 Community Call)

πŸ’» OMOP Code Snippets

We provide a publicly available set of OMOP code snippets used in the I-LEARN Course to help learners explore and analyze OMOP Common Data Model datasets using tools like R, SQL, and Python.

πŸ”— Repository: BoyceLab/OMOP-Code-Snippets-for-I-LEARN-Course

🧰 What You'll Find in this Repository

The repository contains example scripts and templates to:

  • Query OMOP data using SQL
  • Analyze OMOP-mapped data using R
  • Connect and run queries via RPostgreSQL
  • Explore how standard concepts relate to source codes

πŸ“‚ Folder Highlights

  • SQL/: Ready-to-use SQL queries for common OMOP domains (e.g., drug exposure, observation).
  • R/: R scripts that demonstrate how to load, analyze, and visualize OMOP data.
  • concepts/: Examples for working with concept_id and concept_relationship tables.

πŸ“˜ Use Cases

These snippets are designed for: - Learners in the Tufts CTSI I-LEARN course - Researchers new to OHDSI/OMOP - Analysts working with OMOP-formatted ALS datasets