Lab 1: Census Data Quality for Policy Decisions

Evaluating Data Reliability for Algorithmic Decision-Making

Author

Arzoo

Published

February 10, 2026

Assignment Overview

Scenario

You are a data analyst for the [Your State] Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.

Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.

Learning Objectives

Apply dplyr functions to real census data for policy analysis
Evaluate data quality using margins of error
Connect technical analysis to algorithmic decision-making
Identify potential equity implications of data reliability issues
Create professional documentation for policy stakeholders

Submission Instructions

Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/labs/lab_1/

Make sure to update your _quarto.yml navigation to include this assignment under an “Labs” menu.

Part 1: Portfolio Integration

Create this assignment in your portfolio repository under an labs/lab_1/ folder structure. Update your navigation menu to include:

- text: Assignments
  menu:
    - href: labs/lab_1/your_file_name.qmd
      text: "Lab 1: Census Data Exploration"

If there is a special character like a colon, you need use double quote mark so that the quarto can identify this as text

Setup

# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidyverse)
library(tidycensus)
library(scales)
library(RColorBrewer)
library(knitr)

# Set your Census API key


# Choose your state for analysis - assign it to a variable called my_state
my_state <- "LA"

State Selection: I have chosen Louisiana for this analysis because:

Louisiana’s rural parishes show the highest income estimate uncertainty, with low-confidence counties concentrated in areas with smaller, more dispersed populations. These unreliable income estimates pose a direct risk to algorithmic systems — parishes with high MOE could be systematically misclassified for funding priority, meaning communities with genuine need may be overlooked or incorrectly ranked simply due to data limitations rather than actual economic conditions.

Part 2: County-Level Resource Assessment

2.1 Data Retrieval

Your Task: Use get_acs() to retrieve county-level data for your chosen state.

Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide

Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.

# Write your get_acs() code here

county_level <- get_acs( state = my_state, geography = "county", 
                     variables = c( med_hh_inc = "B19013_001", tot_pop = "B01003_001"), year = 2022, survey = "acs5", output = "wide")


# Clean the county names to remove state name and "County" 
county_level <- county_level %>% 
  mutate(county = str_remove(NAME, ", Louisiana"), 
county = str_remove( county, " Parish") )



# Hint: use mutate() with str_remove()

# Display the first few rows
county_level

# A tibble: 64 × 7
   GEOID NAME                   med_hh_incE med_hh_incM tot_popE tot_popM county
   <chr> <chr>                        <dbl>       <dbl>    <dbl>    <dbl> <chr> 
 1 22001 Acadia Parish, Louisi…       44977        2841    57674       NA Acadia
 2 22003 Allen Parish, Louisia…       52755        6411    22798       NA Allen 
 3 22005 Ascension Parish, Lou…       93800        3514   126973       NA Ascen…
 4 22007 Assumption Parish, Lo…       47023        3781    21067       NA Assum…
 5 22009 Avoyelles Parish, Lou…       38696        5175    39529       NA Avoye…
 6 22011 Beauregard Parish, Lo…       68525        6225    36553       NA Beaur…
 7 22013 Bienville Parish, Lou…       34268        3054    12958       NA Bienv…
 8 22015 Bossier Parish, Louis…       64598        3122   128877       NA Bossi…
 9 22017 Caddo Parish, Louisia…       47572        2032   236259       NA Caddo 
10 22019 Calcasieu Parish, Lou…       64370        3249   210770       NA Calca…
# ℹ 54 more rows

2.2 Data Quality Assessment

Your Task: Calculate margin of error percentages and create reliability categories.

Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)

Hint: Use mutate() with case_when() for the categories.

# Calculate MOE percentage and reliability categories using mutate()
county_level <- county_level %>% 
  mutate ( MOE_perc = (med_hh_incM/med_hh_incE)*100) %>% 
  mutate( confidence_level = case_when(
    MOE_perc < 5 ~ "High Confidence", 
    MOE_perc >=5 & MOE_perc <10 ~ "Moderate Confidence", 
    MOE_perc > 10 ~ "Low Confidence"
    
  ))
# Create a summary showing count of counties in each reliability category

county_level %>% 
  count(confidence_level, name = "county") %>% 
  mutate( share = county / sum(county) * 100)

# A tibble: 3 × 3
  confidence_level    county share
  <chr>                <int> <dbl>
1 High Confidence         15  23.4
2 Low Confidence          23  35.9
3 Moderate Confidence     26  40.6

# Hint: use count() and mutate() to add percentages

2.3 High Uncertainty Counties

Your Task: Identify the 5 counties with the highest MOE percentages.

Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()

Hint: Use arrange(), slice(), and select() functions.

library(knitr)
library(kableExtra)
# Create table of top 5 counties by MOE percentage
reliability_table <- county_level %>% 
  arrange( desc(MOE_perc)) %>% 
  slice(1:5) %>% 
  select( County = county, 
          Median_Income= med_hh_incE,
          MOE = med_hh_incM, 
          MOE_perc, 
          Reliability = confidence_level)
reliability_table

# A tibble: 5 × 5
  County       Median_Income   MOE MOE_perc Reliability   
  <chr>                <dbl> <dbl>    <dbl> <chr>         
1 Cameron              69847 16919     24.2 Low Confidence
2 Red River            43821  8891     20.3 Low Confidence
3 Madison              34508  6612     19.2 Low Confidence
4 East Carroll         30856  4931     16.0 Low Confidence
5 St. James            62946  9793     15.6 Low Confidence

# Format as table with kable() - include appropriate column names and caption
reliability <- county_level %>%
  arrange(desc(MOE_perc)) %>%
  slice(1:5) %>%
  select(
    county,
    med_hh_incE,
    med_hh_incM,
    MOE_perc,
    confidence_level
  ) %>%
  kable(
    col.names = c(
      "County",
      "Median Household Income",
      "Margin of Error",
      "MOE (%)",
      "Reliability"
    ),
    digits = 2,
    caption = "Top 5 Counties with the Highest Median Household Income MOE Percentages",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "Margin of Error (MOE) expressed as a percentage of the estimate.",
      "🟢 High Confidence: MOE < 5%",
      "🟡 Moderate Confidence: MOE 5–10%",
      "🔴 Low Confidence: MOE ≥ 10%",
      "Source: American Community Survey 5-Year Estimates"
    ),
    general_title = "Notes:"
  )
reliability

Top 5 Counties with the Highest Median Household Income MOE Percentages
County	Median Household Income	Margin of Error	MOE (%)	Reliability
Cameron	69,847	16,919	24.22	Low Confidence
Red River	43,821	8,891	20.29	Low Confidence
Madison	34,508	6,612	19.16	Low Confidence
East Carroll	30,856	4,931	15.98	Low Confidence
St. James	62,946	9,793	15.56	Low Confidence
Notes:
Margin of Error (MOE) expressed as a percentage of the estimate.
🟢 High Confidence: MOE < 5%
🟡 Moderate Confidence: MOE 5–10%
🔴 Low Confidence: MOE ≥ 10%
Source: American Community Survey 5-Year Estimates

Data Quality Commentary:

All five parishes with the highest income uncertainty fall into the Low Confidence category, with Cameron Parish showing the most extreme MOE at 24.22% — meaning the true median household income could be nearly $17,000 above or below the reported estimate of $69,847. An algorithmic system relying on this income data would be most likely to misclassify Cameron, Red River, Madison, East Carroll, and St. James parishes, all of which are small, rural Louisiana parishes where the ACS samples too few households to produce stable estimates. The combination of small population sizes and geographic isolation in these parishes drives up uncertainty, meaning communities that may have genuine economic need could be systematically ranked incorrectly by any automated funding allocation system.

Part 3: Neighborhood-Level Analysis

3.1 Focus Area Selection

Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.

Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.

# Use filter() to select 2-3 counties from your county_reliability data
selected_counties <- county_level %>%
  filter(confidence_level %in% c(
    "High Confidence",
    "Moderate Confidence",
    "Low Confidence"
  )) %>%
  group_by(confidence_level) %>%
  slice_max(MOE_perc, n = 1, with_ties = FALSE) %>%
  ungroup()
selected_counties

# A tibble: 3 × 9
  GEOID NAME           med_hh_incE med_hh_incM tot_popE tot_popM county MOE_perc
  <chr> <chr>                <dbl>       <dbl>    <dbl>    <dbl> <chr>     <dbl>
1 22069 Natchitoches …       41310        2062    37478       NA Natch…     4.99
2 22023 Cameron Paris…       69847       16919     5447       NA Camer…    24.2 
3 22053 Jefferson Dav…       52470        5233    32277       NA Jeffe…     9.97
# ℹ 1 more variable: confidence_level <chr>

# Store the selected counties in a variable called selected_counties

# Display the selected counties with their key characteristics
selected_counties %>% 
    select(
      county,
      med_hh_incE,
      MOE_perc, 
      confidence_level
    ) %>% 
  kable(
    col.names = c(
      "County",
      "Median Household Income",
      "MOE (%)",
      "Reliability"
    ),
    digits = 2,
    caption = "Selected Counties to Run a tract level analysis",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "Margin of Error (MOE) expressed as a percentage of the estimate.",
      "🟢 High Confidence: MOE < 5%",
      "🟡 Moderate Confidence: MOE 5–10%",
      "🔴 Low Confidence: MOE ≥ 10%",
      "Source: American Community Survey 5-Year Estimates"
    ),
    general_title = "Notes:"
  )

Selected Counties to Run a tract level analysis
County	Median Household Income	MOE (%)	Reliability
Natchitoches	41,310	4.99	High Confidence
Cameron	69,847	24.22	Low Confidence
Jefferson Davis	52,470	9.97	Moderate Confidence
Notes:
Margin of Error (MOE) expressed as a percentage of the estimate.
🟢 High Confidence: MOE < 5%
🟡 Moderate Confidence: MOE 5–10%
🔴 Low Confidence: MOE ≥ 10%
Source: American Community Survey 5-Year Estimates

# Show: county name, median income, MOE percentage, reliability category

Comment on the output:

The three selected parishes — Cameron, Jefferson Davis, and Natchitoches — represent distinct reliability tiers, allowing for a meaningful comparison of how data quality varies across Louisiana’s rural landscape. Cameron Parish, with its small and geographically dispersed coastal population, shows the highest income uncertainty, while Natchitoches reflects moderate confidence typical of small interior parishes. This selection ensures our tract-level analysis captures the full range of data quality challenges the department may encounter when deploying algorithmic tools statewide.

3.2 Tract-Level Demographics

Your Task: Get demographic data for census tracts in your selected counties.

Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.

# Define your race/ethnicity variables with descriptive names
tract_level <- get_acs(
  state = my_state, 
  geography = "tract", 
  county = c("069", "023", "053"),
  variables = c(
    med_hh_inc = "B19013_001", 
    tot_pop = "B03002_001", 
    white = "B03002_003", 
    black = "B03002_004", 
    hisp = "B03002_012"
  ), 
  year = 2022, 
  survey = "acs5", 
  output = "wide"
)

# Add readable tract and county name columns using str_extract() or similar
tract_level_clean <- tract_level %>%
  mutate(
    tract_number = str_extract(NAME, "\\d+\\.?\\d*"),
    county_name = str_extract(NAME, "(?<=; )[^;]+(?= Parish)")
  )
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter


# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations

tract_level_clean <- tract_level_clean %>% 
  mutate( perc_white = ((whiteE/ tot_popE)*100), 
          perc_black = ((blackE/tot_popE)*100), 
          perc_hisp = ((hispE/tot_popE)*100))

3.3 Demographic Analysis

Your Task: Analyze the demographic patterns in your selected areas.

# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
highest_hisp_tract <- tract_level_clean %>%
  mutate(perc_hisp = (hispE / tot_popE) * 100) %>%
  arrange(desc(perc_hisp)) %>%
  slice(1) %>%
  select(tract_number, county_name, tot_popE, hispE, perc_hisp)
highest_hisp_tract

# A tibble: 1 × 5
  tract_number county_name     tot_popE hispE perc_hisp
  <chr>        <chr>              <dbl> <dbl>     <dbl>
1 4.01         Jefferson Davis     2365   159      6.72

# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group
county_summary <- tract_level_clean %>%
  mutate(
    perc_white = (whiteE / tot_popE) * 100,
    perc_black = (blackE / tot_popE) * 100,
    perc_hisp = (hispE / tot_popE) * 100
  ) %>%
  group_by(county_name) %>%
  summarize(
    num_tracts = n(),
    avg_perc_white = mean(perc_white, na.rm = TRUE),
    avg_perc_black = mean(perc_black, na.rm = TRUE),
    avg_perc_hisp = mean(perc_hisp, na.rm = TRUE),
    .groups = "drop"
  )

county_summary

# A tibble: 3 × 5
  county_name     num_tracts avg_perc_white avg_perc_black avg_perc_hisp
  <chr>                <int>          <dbl>          <dbl>         <dbl>
1 Cameron                  5           92.4           1.08          1.60
2 Jefferson Davis          8           75.5          18.2           2.69
3 Natchitoches            11           51.4          39.6           2.22

# Create a nicely formatted table of your results using kable()

county_summary %>%
  kable(
    col.names = c(
      "Parish",
      "Number of Tracts",
      "Avg % White",
      "Avg % Black",
      "Avg % Hispanic/Latino"
    ),
    digits = 2,
    caption = "Average Demographics by Parish",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "Percentages represent the average across all census tracts within each parish.",
      "Source: American Community Survey 5-Year Estimates (2022)"
    ),
    general_title = "Notes:"
  )

Average Demographics by Parish
Parish	Number of Tracts	Avg % White	Avg % Black	Avg % Hispanic/Latino
Cameron	5	92.38	1.08	1.60
Jefferson Davis	8	75.48	18.17	2.69
Natchitoches	11	51.38	39.58	2.22
Notes:
Percentages represent the average across all census tracts within each parish.
Source: American Community Survey 5-Year Estimates (2022)

Part 4: Comprehensive Data Quality Evaluation

4.1 MOE Analysis for Demographic Variables

Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.

Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics

# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
tract_level_clean <- tract_level_clean %>% 
  mutate ( MOE_white = ((whiteM/whiteE)*100), 
           MOE_black = (( blackM/ blackE) *100),
           MOE_hisp = ((hispM/hispE)*100))

# Create a flag for tracts with high MOE on any demographic variable
tract_level_clean <- tract_level_clean %>%
  mutate(
    high_MOE_flag = ifelse(
      replace_na(MOE_white > 15, FALSE) |
      replace_na(MOE_black > 15, FALSE) |
      replace_na(MOE_hisp > 15, FALSE),
      "🔴 High MOE (>15%)",
      "Acceptable MOE"
    )
  )
# Use logical operators (| for OR) in an ifelse() statement

# Create summary statistics showing how many tracts have data quality issues
tract_level_clean %>%
  group_by(county_name, high_MOE_flag) %>%
  summarize(num_tracts = n(), .groups = "drop") %>%
  kable(
    col.names = c("Parish", "Data Quality", "Number of Tracts"),
    digits = 2,
    caption = "Data Quality Summary by Parish",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "A tract is flagged if any demographic variable (White, Black, or Hispanic) has MOE > 15%.",
      "🔴 High MOE (>15%): Estimates may be unreliable.",
      "Acceptable MOE: Estimates are sufficiently reliable for analysis.",
      "Source: American Community Survey 5-Year Estimates (2022)"
    ),
    general_title = "Notes:"
  )

Data Quality Summary by Parish
Parish	Data Quality	Number of Tracts
Cameron	🔴 High MOE (>15%) \|	5\|
Jefferson Davis	🔴 High MOE (>15%) \|	8\|
Natchitoches	🔴 High MOE (>15%) \|	11\|
Notes:
A tract is flagged if any demographic variable (White, Black, or Hispanic) has MOE > 15%.
🔴 High MOE (>15%): Estimates may be unreliable.
Acceptable MOE: Estimates are sufficiently reliable for analysis.
Source: American Community Survey 5-Year Estimates (2022)

4.2 Pattern Analysis

Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.

# Group tracts by whether they have high MOE issues
moe_comparison <- tract_level_clean %>%
  group_by(county_name) %>%
  summarize(
    num_tracts          = n(),
    avg_population      = mean(tot_popE,  na.rm = TRUE),
    avg_perc_white       = mean(perc_white, na.rm = TRUE),
    avg_perc_black       = mean(perc_black, na.rm = TRUE),
    avg_perc_hisp        = mean(perc_hisp,  na.rm = TRUE),
    perc_unreliable_white = mean(replace_na(MOE_white > 15, FALSE)) * 100,
    perc_unreliable_black = mean(replace_na(MOE_black > 15, FALSE)) * 100,
    perc_unreliable_hisp  = mean(replace_na(MOE_hisp  > 15, FALSE)) * 100,
    .groups = "drop"
  )
# Calculate average characteristics for each group:
# - population size, demographic percentages

# Use group_by() and summarize() to create this comparison



# Create a professional table showing the patterns

moe_comparison %>%
  kable(
    col.names = c(
      "Parish",
      "Tracts",
      "Avg Population",
      "Avg % White",
      "Avg % Black",
      "Avg % Hispanic",
      "% Tracts Unreliable (White)",
      "% Tracts Unreliable (Black)",
      "% Tracts Unreliable (Hispanic)"
    ),
    digits = 2,
    caption = "Demographic Estimates and Data Reliability by Parish",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "% Tracts Unreliable = share of tracts where MOE exceeds 15% of the estimate.",
      "High unreliability for minority groups reflects small subgroup populations in these parishes.",
      "🔴 Estimates with high MOE should be interpreted with caution.",
      "Source: American Community Survey 5-Year Estimates (2022)"
    ),
    general_title = "Notes:"
  )

Demographic Estimates and Data Reliability by Parish
Parish	Tracts	Avg Population	Avg % White	Avg % Black	Avg % Hispanic	% Tracts Unreliable (White)	% Tracts Unreliable (Black)	% Tracts Unreliable (Hispanic)
Cameron	5	1,089.40	92.38	1.08	1.60	80.00	100.00	100
Jefferson Davis	8	4,034.62	75.48	18.17	2.69	50.00	100.00	100
Natchitoches	11	3,407.09	51.38	39.58	2.22	81.82	90.91	100
Notes:
% Tracts Unreliable = share of tracts where MOE exceeds 15% of the estimate.
High unreliability for minority groups reflects small subgroup populations in these parishes.
🔴 Estimates with high MOE should be interpreted with caution.
Source: American Community Survey 5-Year Estimates (2022)

Pattern Analysis:

The analysis reveals that data quality problems are not randomly distributed — they are systematically concentrated among minority demographic groups across all three parishes. Black and Hispanic populations show substantially higher MOE percentages than white populations, not because of data collection failures, but because these groups represent smaller shares of the total population in these parishes, resulting in smaller ACS sample sizes and therefore greater statistical uncertainty. This pattern is consistent across Cameron, Jefferson Davis, and Natchitoches parishes, suggesting a structural limitation of the ACS in rural, majority-white areas. The practical implication is serious: an algorithm relying on demographic estimates for these communities would have the least reliable data precisely for the minority groups it may be intended to serve.

Part 5: Policy Recommendations

5.1 Analysis Integration and Professional Summary

Your Task: Write an executive summary that integrates findings from all four analyses.

Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?

Executive Summary:

Overall Pattern Identification: Across all four analyses, a consistent pattern emerges — data reliability in Louisiana’s rural parishes is inversely related to population size and demographic diversity. County-level income estimates show high uncertainty in sparsely populated coastal and inland parishes, while tract-level demographic data reveals that minority group estimates carry disproportionately high margins of error even within parishes that appear reliable at the aggregate level.

Equity Assessment: Black and Hispanic residents across Cameron, Jefferson Davis, and Natchitoches parishes face the greatest risk of algorithmic harm. Because ACS estimates for these groups carry MOE values well above 15% in the majority of tracts, any algorithm using these figures to allocate social services would be operating on fundamentally unreliable inputs for the very communities most likely to need those services. This creates a feedback loop where data scarcity compounds existing inequity.

Root Cause Analysis: The underlying driver of both data quality issues and bias risk is the same: small subgroup population sizes in rural parishes lead to small ACS sample sizes, which in turn produce wide confidence intervals. This is not a data entry or methodology failure — it is a structural limitation of survey-based estimation in low-density, low-diversity areas. Rural minority communities are, by definition, underrepresented in the sample, making their estimates the least stable.

Strategic Recommendations: The Department of Human Services should implement a tiered approach to algorithmic decision-making that explicitly accounts for data reliability. High-confidence parishes can proceed with automated prioritization, while moderate-confidence parishes should require human review of algorithmic outputs before funding decisions are finalized. In low-confidence parishes, and for any tract where minority demographic estimates carry MOE above 15%, the department should suspend automated classification entirely and instead conduct targeted supplemental surveys or partner with local organizations to gather ground-truth data before making resource allocation decisions.

6.3 Specific Recommendations

Your Task: Create a decision framework for algorithm implementation.

# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category

# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"  
# - Low Confidence: "Requires manual review or additional data"

# Format as a professional table with kable()

county_level %>%
  select(county, med_hh_incE, med_hh_incM, MOE_perc, confidence_level) %>%
  mutate(
    recommendation = case_when(
      confidence_level == "High Confidence"     ~ "🟢 Safe for algorithmic decisions",
      confidence_level == "Moderate Confidence" ~ "🟡 Use with caution - monitor outcomes",
      confidence_level == "Low Confidence"      ~ "🔴 Requires manual review or additional data"
    )
  ) %>%
  arrange(desc(MOE_perc)) %>%
  kable(
    col.names = c(
      "Parish",
      "Median Household Income",
      "Margin of Error",
      "MOE (%)",
      "Reliability",
      "Recommendation"
    ),
    digits = 2,
    caption = "Algorithm Implementation Decision Framework by Parish",
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width = FALSE
  ) %>%
  footnote(
    general = c(
      "Recommendations based on Margin of Error (MOE) as a percentage of the estimate.",
      "🟢 High Confidence (MOE < 5%): Estimates are reliable for automated decisions.",
      "🟡 Moderate Confidence (MOE 5–10%): Use with human oversight and outcome monitoring.",
      "🔴 Low Confidence (MOE > 10%): Data unreliability poses risk of algorithmic bias.",
      "Source: American Community Survey 5-Year Estimates (2022)"
    ),
    general_title = "Notes:"
  )

Algorithm Implementation Decision Framework by Parish
Parish	Median Household Income	Margin of Error	MOE (%)	Reliability	Recommendation
Cameron	69,847	16,919	24.22	Low Confidence	🔴 Requires manual review or additional data \|
Red River	43,821	8,891	20.29	Low Confidence	🔴 Requires manual review or additional data \|
Madison	34,508	6,612	19.16	Low Confidence	🔴 Requires manual review or additional data \|
East Carroll	30,856	4,931	15.98	Low Confidence	🔴 Requires manual review or additional data \|
St. James	62,946	9,793	15.56	Low Confidence	🔴 Requires manual review or additional data \|
Caldwell	45,707	7,075	15.48	Low Confidence	🔴 Requires manual review or additional data \|
St. Helena	46,402	7,065	15.23	Low Confidence	🔴 Requires manual review or additional data \|
LaSalle	59,926	9,117	15.21	Low Confidence	🔴 Requires manual review or additional data \|
Concordia	38,929	5,562	14.29	Low Confidence	🔴 Requires manual review or additional data \|
Catahoula	48,259	6,502	13.47	Low Confidence	🔴 Requires manual review or additional data \|
Avoyelles	38,696	5,175	13.37	Low Confidence	🔴 Requires manual review or additional data \|
Union	47,068	6,133	13.03	Low Confidence	🔴 Requires manual review or additional data \|
Tensas	35,653	4,539	12.73	Low Confidence	🔴 Requires manual review or additional data \|
West Feliciana	71,985	9,132	12.69	Low Confidence	🔴 Requires manual review or additional data \|
West Carroll	45,035	5,686	12.63	Low Confidence	🔴 Requires manual review or additional data \|
Pointe Coupee	53,045	6,662	12.56	Low Confidence	🔴 Requires manual review or additional data \|
Allen	52,755	6,411	12.15	Low Confidence	🔴 Requires manual review or additional data \|
East Feliciana	64,709	7,398	11.43	Low Confidence	🔴 Requires manual review or additional data \|
Morehouse	37,875	4,038	10.66	Low Confidence	🔴 Requires manual review or additional data \|
Iberville	59,410	6,240	10.50	Low Confidence	🔴 Requires manual review or additional data \|
Evangeline	34,526	3,582	10.37	Low Confidence	🔴 Requires manual review or additional data \|
De Soto	49,807	5,118	10.28	Low Confidence	🔴 Requires manual review or additional data \|
Sabine	40,777	4,130	10.13	Low Confidence	🔴 Requires manual review or additional data \|
Jefferson Davis	52,470	5,233	9.97	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Franklin	41,129	4,063	9.88	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Grant	57,362	5,569	9.71	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Richland	48,125	4,550	9.45	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Winn	44,922	4,240	9.44	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
West Baton Rouge	80,510	7,340	9.12	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Beauregard	68,525	6,225	9.08	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Lincoln	37,001	3,309	8.94	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Bienville	34,268	3,054	8.91	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
St. John the Baptist	65,114	5,713	8.77	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Lafourche	61,381	4,953	8.07	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Assumption	47,023	3,781	8.04	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
St. Charles	79,191	6,237	7.88	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Washington	41,803	3,193	7.64	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Jackson	40,406	2,801	6.93	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Acadia	44,977	2,841	6.32	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Vermilion	56,194	3,429	6.10	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Livingston	77,978	4,335	5.56	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Terrebonne	63,088	3,476	5.51	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
St. Bernard	55,857	3,072	5.50	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Webster	34,263	1,859	5.43	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Ouachita	49,261	2,573	5.22	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Tangipahoa	55,274	2,869	5.19	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Calcasieu	64,370	3,249	5.05	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Claiborne	32,034	1,616	5.04	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Plaquemines	77,996	3,933	5.04	Moderate Confidence	🟡 Use with caution - monitor outcomes \|
Natchitoches	41,310	2,062	4.99	High Confidence	🟢 Safe for algorithmic decisions \|
St. Mary	47,322	2,292	4.84	High Confidence	🟢 Safe for algorithmic decisions \|
St. Landry	44,478	2,152	4.84	High Confidence	🟢 Safe for algorithmic decisions \|
Bossier	64,598	3,122	4.83	High Confidence	🟢 Safe for algorithmic decisions \|
St. Martin	50,806	2,405	4.73	High Confidence	🟢 Safe for algorithmic decisions \|
Caddo	47,572	2,032	4.27	High Confidence	🟢 Safe for algorithmic decisions \|
Iberia	55,190	2,319	4.20	High Confidence	🟢 Safe for algorithmic decisions \|
Vernon	56,547	2,244	3.97	High Confidence	🟢 Safe for algorithmic decisions \|
Ascension	93,800	3,514	3.75	High Confidence	🟢 Safe for algorithmic decisions \|
Rapides	55,407	2,017	3.64	High Confidence	🟢 Safe for algorithmic decisions \|
Orleans	51,116	1,705	3.34	High Confidence	🟢 Safe for algorithmic decisions \|
St. Tammany	76,914	2,541	3.30	High Confidence	🟢 Safe for algorithmic decisions \|
Lafayette	66,617	2,192	3.29	High Confidence	🟢 Safe for algorithmic decisions \|
East Baton Rouge	62,083	1,760	2.83	High Confidence	🟢 Safe for algorithmic decisions \|
Jefferson	63,257	1,592	2.52	High Confidence	🟢 Safe for algorithmic decisions \|
Notes:
Recommendations based on Margin of Error (MOE) as a percentage of the estimate.
🟢 High Confidence (MOE < 5%): Estimates are reliable for automated decisions.
🟡 Moderate Confidence (MOE 5–10%): Use with human oversight and outcome monitoring.
🔴 Low Confidence (MOE > 10%): Data unreliability poses risk of algorithmic bias.
Source: American Community Survey 5-Year Estimates (2022)

Key Recommendations:

Your Task: Use your analysis results to provide specific guidance to the department.

Counties suitable for immediate algorithmic implementation: Parishes classified as High Confidence — where median household income MOE falls below 5% — are appropriate candidates for algorithmic decision-making. In these areas, the ACS estimates are stable enough that automated prioritization is unlikely to produce systematic misclassification. The department should still monitor outcomes over time to detect any emergent disparities.
Counties requiring additional oversight: Parishes in the Moderate Confidence tier (MOE between 5–10%) can be included in algorithmic workflows, but only with active human oversight. Program officers should review algorithmic outputs for these parishes before finalizing funding decisions, and outcome data should be tracked quarterly to identify whether any communities are being consistently under- or over-served relative to their actual need.
Counties needing alternative approaches: Low Confidence parishes (MOE above 10%), and any tract where minority demographic estimates exceed 15% MOE, should be removed from automated decision pipelines entirely. For these areas, the department should pursue one or more of the following: targeted supplemental ACS surveys, partnerships with local nonprofits and service providers to gather administrative data, or manual needs assessments conducted by field staff with community knowledge.
1. Do data reliability patterns persist over time — that is, have the same parishes consistently shown high MOE across multiple ACS waves (2015, 2017, 2019, 2022) — or do reliability issues fluctuate with population shifts and post-disaster displacement (particularly relevant for Cameron Parish post-Hurricane Laura)?
2. How do tract-level income MOE patterns compare to demographic MOE patterns — are the tracts with the least reliable income estimates also the tracts with the least reliable racial composition estimates, suggesting compounded data quality risk for the most vulnerable communities?
3. Would switching from ACS 5-year estimates to administrative data sources (SNAP enrollment, Medicaid claims, school lunch eligibility) produce more reliable proxies for economic need in low-confidence parishes, and how would that change which communities are prioritized for services?

Technical Notes

Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [date]

Reproducibility: - All analysis conducted in R version [your version] - Census API key required for replication - Complete code and documentation available at: [your portfolio URL]

Methodology Notes: [Describe any decisions you made about data processing, county selection, or analytical choices that might affect reproducibility]

Limitations: [Note any limitations in your analysis - sample size issues, geographic scope, temporal factors, etc.]

Submission Checklist

Before submitting your portfolio link on Canvas:

All code chunks run without errors
All “[Fill this in]” prompts have been completed
Tables are properly formatted and readable
Executive summary addresses all four required components
Portfolio navigation includes this assignment
Census API key is properly set
Document renders correctly to HTML

Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/labs/lab_1/your_file_name.html