Data Collection & Gap-Filling Strategies
Practical strategies for collecting LCA data, estimating missing values, handling uncooperative suppliers, and dealing with processes not in any database.
Prerequisites:
Data Collection & Gap-Filling Strategies
"How do I estimate missing data for raw materials?" and "What do I do when my specific process isn't in any database?" are among the most practical challenges every LCA practitioner faces. This guide provides battle-tested strategies for real-world data collection.
The Data Collection Reality
Ideal scenario: Complete primary data for all processes.
Reality: You'll have gaps, approximations, and missing information.
The good news: Every LCA has data gaps. The key is managing them transparently and minimizing their impact on conclusions.
FAQ: Data Collection Challenges
"How do I estimate missing data for raw materials?"
Strategy 1: Use proxy data
Find a similar material in your database:
| Missing Material | Possible Proxy | Adjustment Needed |
|---|---|---|
| Specialty polymer | Generic polymer family | Scale by density/properties |
| Regional steel | Global/EU steel | Adjust energy mix |
| Exotic wood species | Similar hardwood | May be acceptable as-is |
| Custom alloy | Base metal + additives | Combine datasets |
Example: Estimating a specialty plastic
You need data for PEEK (polyether ether ketone), but your database only has "generic engineering plastic."
Approach:
- Find PEEK's monomer chemistry (literature)
- Use stoichiometry to estimate feedstock requirements
- Adjust energy consumption based on processing temperature
- Apply uncertainty factor (±30-50%)
Strategy 2: Use stoichiometric calculations
For simple chemical reactions, calculate theoretical inputs:
Reaction: A + B → C + byproduct
If you know the reaction equation:
- Calculate mass balance
- Add process energy (estimate from similar reactions)
- Add emissions (from stoichiometry + combustion)
Strategy 3: Scale from known processes
Use scaling factors:
| Property | Scaling Relationship |
|---|---|
| Energy | Often scales with mass or temperature |
| Process time | Linear with batch size |
| Emissions | Proportional to energy/material use |
Strategy 4: Use industry data
Sources for material emission factors:
- Trade association reports (steel, aluminum, plastics, cement)
- Industry EPD averages
- Academic literature
- Government inventories (EPA, EEA)
Document everything. When using proxy or estimated data, record: the source, the rationale for selection, any adjustments made, and the estimated uncertainty.
"Where can I find emission factors for my specific region/country?"
National/Regional Emission Factor Sources:
| Region | Resource | Coverage |
|---|---|---|
| United States | EPA Emission Factor Hub | Air pollutants, GHGs |
| United States | eGRID | Electricity grid by region |
| Europe | EEA Emission Inventories | Air pollutants by country |
| Europe | EF Database (via Nexus) | Product Environmental Footprint |
| International | IEA Statistics | Energy, electricity by country |
| International | IPCC Emission Factor Database | GHG by sector/activity |
| Japan | IDEA Database (AIST) | Comprehensive national data |
| China | CLCD (Chinese Life Cycle Database) | Chinese processes |
| Australia | National Greenhouse Accounts | GHG factors |
| India | BEE, CEA | Energy, electricity |
Electricity grid mixes (critical!):
| Source | Coverage | Access |
|---|---|---|
| ecoinvent | 60+ countries | Paid |
| IEA | 150+ countries | Paid (reports) |
| Ember Climate | 200+ countries | Free (data explorer) |
| ENTSO-E | Europe hourly | Free |
| EPA eGRID | US by subregion | Free |
Creating custom regional data:
When no source exists:
- Get the activity data (national statistics)
- Find emission factors from similar regions
- Combine with local energy mix
- Document assumptions explicitly
Example: Regional manufacturing data
You need impacts for steel production in Country X (no database coverage).
Approach:
1. Get Country X electricity mix
2. Get Country X fuel mix for industry
3. Take European steel process from ecoinvent
4. Replace electricity input with Country X mix
5. Adjust transport distances if significant
6. Apply uncertainty factors
"How do I handle data gaps when suppliers won't share information?"
This is common. Suppliers may refuse due to:
- Confidentiality concerns
- Lack of LCA capability
- Not understanding the request
- Fear of liability
Strategies for uncooperative suppliers:
Level 1: Make it easier
- Send a simple questionnaire (not full LCI forms)
- Ask for publicly available data (EPDs, CSR reports)
- Explain that approximate data is acceptable
- Offer to sign an NDA
Level 2: Use public information
- Check if supplier has published EPDs
- Search for sustainability reports
- Look for industry-average data
- Check trade association statistics
Level 3: Estimate from product information
- Use bills of materials (known from purchasing)
- Infer from product specifications
- Apply industry-average conversion efficiencies
- Use weight and material type for proxy selection
Level 4: Use conservative assumptions
- Assume worst-case scenarios
- Use highest emission factors
- Document as "upper bound estimate"
Level 5: Sensitivity analysis
- Test if supplier data would change conclusions
- If minor impact (<5%), generic data is acceptable
- If major impact (>20%), flag as key uncertainty
For EPDs: Critical review typically requires at least an attempt at primary data for foreground processes. Document your efforts to obtain data even if unsuccessful.
Supplier engagement template:
Dear [Supplier],
We're conducting an environmental assessment of our products
and would appreciate basic information about [product name].
Specifically, we need:
1. Primary materials and approximate quantities
2. Energy source for manufacturing (electricity, gas, etc.)
3. Location of production
4. Any existing environmental certifications (ISO 14001, EPD)
This information will be used internally for product improvement.
Data can be approximate and will be treated as confidential.
[Simple questionnaire attached - 1 page maximum]
"What do I do when my specific process isn't in any database?"
This happens often with:
- Novel technologies
- Small-scale/artisanal processes
- Emerging materials
- Specialized manufacturing
Building a custom process dataset:
Step 1: Map the process
Inputs: Process: Outputs:
- Raw materials → [Your Process] → - Main product
- Energy - Byproducts
- Auxiliaries - Emissions
- Water - Waste
Step 2: Collect what you can measure
- Energy bills (electricity, gas)
- Material purchase records
- Waste manifests
- Water bills
- Product output quantities
Step 3: Estimate what you can't measure
- Direct emissions from combustion (use emission factors)
- Fugitive emissions (industry guidelines)
- Wastewater quality (industry averages)
Step 4: Link to background data Your measured inputs connect to database processes:
- Electricity → Your regional grid mix
- Natural gas → Database natural gas supply
- Steel → Database steel production
Example: Custom manufacturing process
You make specialty widgets. No database process exists.
Your measurements:
| Flow | Quantity per 1,000 widgets |
|---|---|
| Steel input | 50 kg |
| Electricity | 200 kWh |
| Natural gas | 100 MJ |
| Scrap output | 5 kg |
Your custom LCI:
Unit process: Widget manufacturing
Inputs:
- Steel, hot rolled: 50 kg [from ecoinvent]
- Electricity, medium voltage: 200 kWh [regional grid]
- Natural gas, burned: 100 MJ [from ecoinvent]
Outputs:
- Widget: 1,000 units
- Steel scrap: 5 kg [to recycling]
- CO2 from gas: 5.6 kg [calculated from combustion]
- Heat: ~90 MJ [waste heat, usually ignored]
Data Quality Hierarchy
When filling gaps, prefer data sources in this order:
| Priority | Data Type | Example |
|---|---|---|
| 1 | Measured primary data | Your factory meters |
| 2 | Supplier-specific data | Supplier EPD |
| 3 | Regional industry average | National industry association |
| 4 | Technology-matched proxy | Same process, different region |
| 5 | Generic proxy | Similar process type |
| 6 | Expert estimation | Stoichiometry, engineering judgment |
| 7 | Literature values | Peer-reviewed studies |
Documenting Data Gaps
Create a data gap register:
| Gap | Approach | Proxy Source | Uncertainty |
|---|---|---|---|
| Specialty coating | Similar polymer | ecoinvent acrylic | ±50% |
| Supplier electricity | Regional average | IEA data | ±20% |
| Transport distance | Estimate | Google Maps | ±30% |
| Waste treatment | Industry average | Literature | ±40% |
Include in your report:
- Percentage of data that is primary vs. secondary
- Key data gaps and how they were addressed
- Impact of gaps on results (sensitivity analysis)
- Recommendations for future data improvement
Practical Data Collection Tools
Simple Supplier Questionnaire
SUPPLIER DATA FORM
Product: ________________________
Date: __________________________
1. PRIMARY MATERIALS
Material 1: _______ Amount per unit: _______
Material 2: _______ Amount per unit: _______
Material 3: _______ Amount per unit: _______
2. ENERGY (per unit of product)
Electricity: _______ kWh
Natural gas: _______ m³
Other fuel: _______ (specify)
3. MANUFACTURING LOCATION
Country: _______
Region: _______
4. CERTIFICATIONS
□ ISO 14001 □ EPD available □ Other: _______
5. PACKAGING
Type: _______ Weight: _______
Notes: ________________________________
Materiality Screening
Before investing effort in gap-filling, screen for importance:
Quick test:
Rough impact = Amount × typical emission factor
If rough impact < 1% of total, use generic proxy
If rough impact > 10% of total, invest in better data
Uncertainty Scoring
Apply pedigree matrix scores:
| Criterion | Score 1 (best) | Score 5 (worst) |
|---|---|---|
| Reliability | Verified primary data | Non-qualified estimate |
| Completeness | All flows measured | Limited sampling |
| Temporal | < 3 years old | > 15 years old |
| Geographic | Same region | Unknown origin |
| Technology | Same technology | Different technology |
Key Takeaways
- Data gaps are normal—manage them, don't hide them
- Use proxy data wisely—choose closest match, document adjustments
- Regional data exists—government and industry sources often have emission factors
- Engage suppliers early—simple questionnaires work better than complex forms
- Build custom datasets when needed—primary measurements + database background
- Prioritize by materiality—invest effort where it matters most
- Document everything—transparency builds credibility
Gap-Filling Checklist
When addressing any data gap:
☐ Identify the gap and its potential impact ☐ Search for closest proxy in primary database ☐ Check secondary databases and literature ☐ Contact suppliers or industry associations ☐ Estimate using stoichiometry or scaling if needed ☐ Apply uncertainty factors ☐ Document source, adjustments, and rationale ☐ Test sensitivity to the proxy choice ☐ Flag significant gaps in the report
Next Steps
With data collection strategies in hand, the next lesson covers LCIA Method Selection—choosing the right impact assessment methodology for your study.