Skip to main content
Lesson 6 of 10intermediate

Data Collection & Gap-Filling Strategies

Practical strategies for collecting LCA data, estimating missing values, handling uncooperative suppliers, and dealing with processes not in any database.

25 minUpdated Jan 15, 2025

Prerequisites:

life-cycle-inventory-analysis

Data Collection & Gap-Filling Strategies

"How do I estimate missing data for raw materials?" and "What do I do when my specific process isn't in any database?" are among the most practical challenges every LCA practitioner faces. This guide provides battle-tested strategies for real-world data collection.

The Data Collection Reality

Ideal scenario: Complete primary data for all processes.

Reality: You'll have gaps, approximations, and missing information.

The good news: Every LCA has data gaps. The key is managing them transparently and minimizing their impact on conclusions.

FAQ: Data Collection Challenges

"How do I estimate missing data for raw materials?"

Strategy 1: Use proxy data

Find a similar material in your database:

Missing MaterialPossible ProxyAdjustment Needed
Specialty polymerGeneric polymer familyScale by density/properties
Regional steelGlobal/EU steelAdjust energy mix
Exotic wood speciesSimilar hardwoodMay be acceptable as-is
Custom alloyBase metal + additivesCombine datasets

Example: Estimating a specialty plastic

You need data for PEEK (polyether ether ketone), but your database only has "generic engineering plastic."

Approach:

  1. Find PEEK's monomer chemistry (literature)
  2. Use stoichiometry to estimate feedstock requirements
  3. Adjust energy consumption based on processing temperature
  4. Apply uncertainty factor (±30-50%)

Strategy 2: Use stoichiometric calculations

For simple chemical reactions, calculate theoretical inputs:

Reaction: A + B → C + byproduct

If you know the reaction equation:
- Calculate mass balance
- Add process energy (estimate from similar reactions)
- Add emissions (from stoichiometry + combustion)

Strategy 3: Scale from known processes

Use scaling factors:

PropertyScaling Relationship
EnergyOften scales with mass or temperature
Process timeLinear with batch size
EmissionsProportional to energy/material use

Strategy 4: Use industry data

Sources for material emission factors:

  • Trade association reports (steel, aluminum, plastics, cement)
  • Industry EPD averages
  • Academic literature
  • Government inventories (EPA, EEA)

"Where can I find emission factors for my specific region/country?"

National/Regional Emission Factor Sources:

RegionResourceCoverage
United StatesEPA Emission Factor HubAir pollutants, GHGs
United StateseGRIDElectricity grid by region
EuropeEEA Emission InventoriesAir pollutants by country
EuropeEF Database (via Nexus)Product Environmental Footprint
InternationalIEA StatisticsEnergy, electricity by country
InternationalIPCC Emission Factor DatabaseGHG by sector/activity
JapanIDEA Database (AIST)Comprehensive national data
ChinaCLCD (Chinese Life Cycle Database)Chinese processes
AustraliaNational Greenhouse AccountsGHG factors
IndiaBEE, CEAEnergy, electricity

Electricity grid mixes (critical!):

SourceCoverageAccess
ecoinvent60+ countriesPaid
IEA150+ countriesPaid (reports)
Ember Climate200+ countriesFree (data explorer)
ENTSO-EEurope hourlyFree
EPA eGRIDUS by subregionFree

Creating custom regional data:

When no source exists:

  1. Get the activity data (national statistics)
  2. Find emission factors from similar regions
  3. Combine with local energy mix
  4. Document assumptions explicitly

Example: Regional manufacturing data

You need impacts for steel production in Country X (no database coverage).

Approach:

1. Get Country X electricity mix
2. Get Country X fuel mix for industry
3. Take European steel process from ecoinvent
4. Replace electricity input with Country X mix
5. Adjust transport distances if significant
6. Apply uncertainty factors

"How do I handle data gaps when suppliers won't share information?"

This is common. Suppliers may refuse due to:

  • Confidentiality concerns
  • Lack of LCA capability
  • Not understanding the request
  • Fear of liability

Strategies for uncooperative suppliers:

Level 1: Make it easier

  • Send a simple questionnaire (not full LCI forms)
  • Ask for publicly available data (EPDs, CSR reports)
  • Explain that approximate data is acceptable
  • Offer to sign an NDA

Level 2: Use public information

  • Check if supplier has published EPDs
  • Search for sustainability reports
  • Look for industry-average data
  • Check trade association statistics

Level 3: Estimate from product information

  • Use bills of materials (known from purchasing)
  • Infer from product specifications
  • Apply industry-average conversion efficiencies
  • Use weight and material type for proxy selection

Level 4: Use conservative assumptions

  • Assume worst-case scenarios
  • Use highest emission factors
  • Document as "upper bound estimate"

Level 5: Sensitivity analysis

  • Test if supplier data would change conclusions
  • If minor impact (<5%), generic data is acceptable
  • If major impact (>20%), flag as key uncertainty

Supplier engagement template:

Dear [Supplier],

We're conducting an environmental assessment of our products
and would appreciate basic information about [product name].

Specifically, we need:
1. Primary materials and approximate quantities
2. Energy source for manufacturing (electricity, gas, etc.)
3. Location of production
4. Any existing environmental certifications (ISO 14001, EPD)

This information will be used internally for product improvement.
Data can be approximate and will be treated as confidential.

[Simple questionnaire attached - 1 page maximum]

"What do I do when my specific process isn't in any database?"

This happens often with:

  • Novel technologies
  • Small-scale/artisanal processes
  • Emerging materials
  • Specialized manufacturing

Building a custom process dataset:

Step 1: Map the process

Inputs:               Process:              Outputs:
- Raw materials  →    [Your Process]   →   - Main product
- Energy                                    - Byproducts
- Auxiliaries                               - Emissions
- Water                                     - Waste

Step 2: Collect what you can measure

  • Energy bills (electricity, gas)
  • Material purchase records
  • Waste manifests
  • Water bills
  • Product output quantities

Step 3: Estimate what you can't measure

  • Direct emissions from combustion (use emission factors)
  • Fugitive emissions (industry guidelines)
  • Wastewater quality (industry averages)

Step 4: Link to background data Your measured inputs connect to database processes:

  • Electricity → Your regional grid mix
  • Natural gas → Database natural gas supply
  • Steel → Database steel production

Example: Custom manufacturing process

You make specialty widgets. No database process exists.

Your measurements:

FlowQuantity per 1,000 widgets
Steel input50 kg
Electricity200 kWh
Natural gas100 MJ
Scrap output5 kg

Your custom LCI:

Unit process: Widget manufacturing

Inputs:
- Steel, hot rolled: 50 kg [from ecoinvent]
- Electricity, medium voltage: 200 kWh [regional grid]
- Natural gas, burned: 100 MJ [from ecoinvent]

Outputs:
- Widget: 1,000 units
- Steel scrap: 5 kg [to recycling]
- CO2 from gas: 5.6 kg [calculated from combustion]
- Heat: ~90 MJ [waste heat, usually ignored]

Data Quality Hierarchy

When filling gaps, prefer data sources in this order:

PriorityData TypeExample
1Measured primary dataYour factory meters
2Supplier-specific dataSupplier EPD
3Regional industry averageNational industry association
4Technology-matched proxySame process, different region
5Generic proxySimilar process type
6Expert estimationStoichiometry, engineering judgment
7Literature valuesPeer-reviewed studies

Documenting Data Gaps

Create a data gap register:

GapApproachProxy SourceUncertainty
Specialty coatingSimilar polymerecoinvent acrylic±50%
Supplier electricityRegional averageIEA data±20%
Transport distanceEstimateGoogle Maps±30%
Waste treatmentIndustry averageLiterature±40%

Include in your report:

  1. Percentage of data that is primary vs. secondary
  2. Key data gaps and how they were addressed
  3. Impact of gaps on results (sensitivity analysis)
  4. Recommendations for future data improvement

Practical Data Collection Tools

Simple Supplier Questionnaire

SUPPLIER DATA FORM

Product: ________________________
Date: __________________________

1. PRIMARY MATERIALS
   Material 1: _______ Amount per unit: _______
   Material 2: _______ Amount per unit: _______
   Material 3: _______ Amount per unit: _______

2. ENERGY (per unit of product)
   Electricity: _______ kWh
   Natural gas: _______ m³
   Other fuel: _______ (specify)

3. MANUFACTURING LOCATION
   Country: _______
   Region: _______

4. CERTIFICATIONS
   □ ISO 14001  □ EPD available  □ Other: _______

5. PACKAGING
   Type: _______ Weight: _______

Notes: ________________________________

Materiality Screening

Before investing effort in gap-filling, screen for importance:

Quick test:

Rough impact = Amount × typical emission factor

If rough impact &lt; 1% of total, use generic proxy
If rough impact > 10% of total, invest in better data

Uncertainty Scoring

Apply pedigree matrix scores:

CriterionScore 1 (best)Score 5 (worst)
ReliabilityVerified primary dataNon-qualified estimate
CompletenessAll flows measuredLimited sampling
Temporal< 3 years old> 15 years old
GeographicSame regionUnknown origin
TechnologySame technologyDifferent technology

Key Takeaways

  1. Data gaps are normal—manage them, don't hide them
  2. Use proxy data wisely—choose closest match, document adjustments
  3. Regional data exists—government and industry sources often have emission factors
  4. Engage suppliers early—simple questionnaires work better than complex forms
  5. Build custom datasets when needed—primary measurements + database background
  6. Prioritize by materiality—invest effort where it matters most
  7. Document everything—transparency builds credibility

Gap-Filling Checklist

When addressing any data gap:

☐ Identify the gap and its potential impact ☐ Search for closest proxy in primary database ☐ Check secondary databases and literature ☐ Contact suppliers or industry associations ☐ Estimate using stoichiometry or scaling if needed ☐ Apply uncertainty factors ☐ Document source, adjustments, and rationale ☐ Test sensitivity to the proxy choice ☐ Flag significant gaps in the report


Next Steps

With data collection strategies in hand, the next lesson covers LCIA Method Selection—choosing the right impact assessment methodology for your study.