Analysis of Diabetes Health Indicators using SQL

5/1/20255 min read

In this project, I'm analyzing a comprehensive diabetes health indicators dataset from Kaggle to understand the key factors that predict diabetes risk across different demographics. Using advanced SQL functions and statistical analysis, I aim to uncover patterns that could inform public health policy and healthcare resource allocation. The dataset contains over 250,000 survey responses from the CDC's Behavioral Risk Factor Surveillance System, providing a robust foundation for meaningful insights.

The original dataset can be found here: Kaggle - Diabetes Health Indicators Dataset

You can learn more about the original data source here: CDC Behavioral Risk Factor Surveillance System

The Healthcare Context

Diabetes affects over 37 million Americans, with healthcare costs exceeding $327 billion annually. For healthcare organizations, understanding the demographic and lifestyle patterns associated with diabetes isn't just about patient care—it's about strategic resource planning, preventive care programs, and cost management.

Hospitals and health systems need to identify high-risk populations to:

  • Optimize screening programs by targeting resources where they'll have maximum impact

  • Design prevention interventions based on modifiable risk factors

  • Predict healthcare utilization to ensure adequate staffing and capacity

  • Address health equity by identifying disparities in care access and outcomes

Early identification of prediabetic populations represents a critical intervention opportunity, as lifestyle modifications can delay or prevent progression to type 2 diabetes.

Database Structure

I'm using Oracle SQL functions to analyze the DIABETES_HEALTH_INDICATORS table, which contains comprehensive health and demographic data, including diabetes status, BMI, physical activity levels, age groups, gender, healthcare access, and lifestyle factors like alcohol consumption.

Analysis 1: Diabetes Category Breakdown

The Foundation Query

First, let's understand the overall distribution of diabetes status in our population. This baseline analysis will inform all subsequent investigations.

Output:

This query reveals the critical intervention opportunity: the prediabetic population represents individuals who could benefit from targeted prevention programs before progressing to full diabetes.

Analysis 2: BMI Patterns Across Diabetes Groups

Understanding the Weight-Diabetes Connection

Body Mass Index is a well-established risk factor for diabetes, but how does average BMI progress across our three categories?

The progression of BMI across categories provides clear evidence for weight management as a prevention strategy. This data supports the business case for employer wellness programs and preventive care investments.

Analysis 3: Physical Activity Impact

The Exercise Paradox

Does physical activity truly correlate with lower diabetes rates across our large population sample?

This analysis quantifies the protective effect of physical activity, providing evidence for community-based exercise programs and workplace wellness initiatives.

Analysis 4: Gender Disparities

Uncovering the Gender Gap

Are there significant differences in diabetes prevalence between men and women that could inform targeted screening programs?

Gender-specific patterns could indicate hormonal factors, healthcare-seeking behaviors, or screening disparities that warrant further investigation and targeted interventions.

Analysis 5: Age Group Analysis

The Aging Effect

How does diabetes prevalence change across age groups, and where are the critical intervention windows?


Age-stratified analysis reveals when diabetes risk accelerates, informing the timing of screening programs and preventive interventions for maximum cost-effectiveness.

Analysis 6: Alcohol Consumption Patterns

The Complex Relationship with Alcohol

Heavy alcohol consumption can affect diabetes risk through multiple pathways. Let's examine this relationship:


Understanding alcohol's role in diabetes risk helps healthcare providers address lifestyle counseling and identify populations needing substance abuse screening alongside diabetes care.

Analysis 7: Healthcare Access Analysis

The Access-Outcome Connection

Healthcare access is a critical social determinant of health. How does access to care correlate with diabetes outcomes?


This analysis reveals the relationship between healthcare access and diabetes outcomes, providing evidence for policy discussions about healthcare expansion and early intervention programs.

Key Findings Summary

From this comprehensive SQL analysis, several critical insights emerge:

Population Distribution:

  • The majority of the population falls into the "No Diabetes" category

  • A significant prediabetic population represents intervention opportunities

  • Full diabetes cases show concerning demographic clustering

BMI Relationships:

  • Clear BMI progression across diabetes categories

  • Weight management emerges as a critical prevention strategy

  • Early intervention in overweight populations could prevent progression

Lifestyle Factor Impact:

  • Physical activity shows protective effects against diabetes

  • Age-related risk acceleration occurs at predictable intervals

  • Healthcare access correlates with better diabetes outcomes

Demographic Pattern:

  • Gender differences suggest the need for targeted screening approaches

  • Age-stratified risk reveals optimal intervention timing

  • Alcohol consumption patterns indicate the need for integrated care approaches

Healthcare Policy Implications:

  • Healthcare access disparities affect diabetes outcomes

  • Prevention programs could be more cost-effective than treatment

  • Multi-factor risk assessment enables precision public health approaches

Technical Implementation Notes

This analysis leveraged several advanced SQL techniques:

  • CASE WHEN statements for categorical analysis and risk stratification

  • GROUP BY with multiple dimensions for comprehensive demographic analysis

  • Aggregate functions (COUNT, AVG) for population-level insights

  • ORDER BY for logical result presentation

  • Multi-table conceptual framework ready for JOIN operations with additional datasets

Business Impact and Next Steps

This analysis provides healthcare organizations with:

Immediate Applications:

  1. Risk stratification models for patient populations

  2. Resource allocation guidance for prevention programs

  3. Evidence base for policy advocacy and program funding

  4. Baseline metrics for intervention effectiveness tracking

Strategic Implications:

  • Prevention programs targeting prediabetic populations could reduce long-term costs

  • Age and demographic-specific screening protocols would improve early detection

  • Lifestyle intervention programs have quantifiable impact potential

  • Healthcare access improvements could significantly affect population health outcomes

Future Analysis Opportunities:

  • Predictive modeling using machine learning techniques

  • Cost-benefit analysis of intervention programs

  • Geographic analysis with additional location data

  • Longitudinal analysis with time-series data

Conclusion

Using advanced SQL analysis on this comprehensive diabetes dataset, we've uncovered actionable insights that can inform healthcare strategy at multiple levels. The clear patterns in demographic risk factors, lifestyle influences, and healthcare access effects provide a data-driven foundation for improving diabetes prevention and care.

This analysis demonstrates how powerful SQL queries can transform raw health data into strategic intelligence, enabling healthcare organizations to make evidence-based decisions about resource allocation, program development, and patient care protocols.

The prediabetic population identified through this analysis represents the greatest opportunity for intervention. This finding could reshape how healthcare systems approach diabetes prevention and ultimately improve outcomes while controlling costs.

I hope you enjoyed reading about this project. If you'd like to see more of my work, please connect with me on LinkedIn


Output
Output