Semantic Matter

OpenAI v1-Compatible API for R&D

From Lab Notebooks toMachine-Readable Knowledge Graphs

Turn mixed lab records into FAIR-aligned data assets. Blobfish helps teams move from experiment records to analysis-ready data using repeatable extraction and mapping workflows.

FAIR by Design

Findable, Accessible, Interoperable, and Reusable data from day one.

Automated Pipelines

Connect captured metadata to modeling and analytics workflows.

Defensible IP

Audit-ready provenance and consistent vocabulary alignment.

Industrial R&D data in practice

Even with major digital programs, lab context is often still spread across logbooks, mixed-language notes, PDF reports, and equipment files.

xKnowledge loss when experts leave the organization
xSiloed data that resists cross-department analysis
xHeavy manual burden for QA and regulatory compliance
xInability to feed digital twins with reliable, structured inputs

Multi-Modal Input Data

Unstructured Information to Graph

Blobfish converts scans, tables, logs, and reports into structured metadata that teams can query, review, and reuse.

Blobfish image from presentation cover slide
The blobfish thrives in its natural habitat, adapted to extreme high-pressure conditions. Photograph By National Science Foundation-Ocean Observatory Initiative/University of Washington/CSSF

SemanticMatter lab assistant

Use a focused API workflow to ingest, interpret, and structure R&D records into a queryable knowledge graph.

01

Ingest

Scanned pages, scientific articles, equipment logs, and databases.

02

Identify

Extract key entities, variables, parameters, and relationships automatically.

03

Map

Align local terms to standard ontologies and create a unified schema.

04

Serve

Deliver FAIR-aligned data to downstream tools and decision makers.

Core capabilities

Built for scientific data workflows that need repeatable extraction, mapping, and review.

Metadata Generation

  • Text-to-Entity from tables, reports, and SQL.
  • Handles unstructured free text and source code.
  • Automated context extraction.

Provenance Capture

  • Human-in-the-loop validation workflows.
  • Visual provenance graph proposals.
  • Audit-ready history tracking.

Ontology Alignment

  • Text-to-OWL generation.
  • Align local terminology to standard vocabularies.
  • Community standard compliance.

Industrial Knowledge Graph

  • Cross-department Q&A and discovery.
  • Run graph queries for cross-team analysis.
  • Keep semantic context attached to the data.

OpenAI v1-Compatible API

  • Connect to existing LLM infrastructure.
  • Use standard orchestrators such as LangChain.
  • Route supported tasks to Blobfish handlers.
  • Run in controlled enterprise environments.

From the presentation

Selected screenshots from the Blobfish presentation showing the architecture and model-handler workflow in action.

Blobfish Component Architecture

Blobfish Component Architecture

OpenAI-compatible interface, model handlers, and specialized generator stack.

From lab records to decisions

Move faster with a reproducible data pipeline.

Step 1

Lab Experiment

Raw Data Generation

Step 2

Blobfish Ingestion

Structure and Validate

Step 3

Knowledge Graph

FAIR-Aligned Asset

Step 4

Business Insight

Analytics and Decisions

Proven impact

Regulatory Documentation and Audit

Automate the creation of audit-ready provenance trails from scattered lab notes.

  • Reduced manual curation time by 70%
  • 100% traceability for compliance

Digital Twin Integration

Feed simulation frameworks with structured, validated, and unit-aligned data inputs.

  • Zero formatting errors in simulation inputs
  • Faster iteration cycles

Multi-Site Lab Harmonization

Align terminology and data structures across global R&D centers automatically.

  • Unified vocabulary across 5+ sites
  • Seamless cross-team data reuse

Research Program Overview

Enable 'what did we learn?' queries across thousands of past experiments.

  • Unlock dark data from past projects
  • Prevent redundant experiments

Why Blobfish Matters

Blobfish addresses data quality bottlenecks in industrial R&D where teams need structured FAIR-aligned data for analytics and AI.

Defensible IP

Proprietary model handlers and ontology workflows create a deep moat against generic LLM wrappers.

Scalable Infrastructure

OpenAI v1-compatible APIs reduce integration work in existing MLOps stacks.

Massive Market Need

Driven by regulatory pressure (FAIR mandates) and the need for clean data to fuel AI initiatives.

Strategic Expansion

A phased rollout strategy grows from marine operations into chemicals, materials, energy, manufacturing, and food tech.

Industrial Use Cases: Strategic Expansion

Phase 1

Offshore Asset Integrity and Marine Operations

SemanticMatter expands Blobfish into marine operations, providing a FAIR data backbone for offshore asset integrity. By turning inspection reports, sensor feeds, and maintenance histories into a unified knowledge graph, operators get traceable, machine-readable inputs for digital twins, risk models, and operational decision support.

Phase 2

Chemicals and Materials Science

SemanticMatter brings Blobfish into the heart of chemicals and materials R&D, creating a FAIR knowledge graph over experiments, formulations, and characterization data. By turning human-written lab notes and instrument outputs into machine-readable, ontology-aligned metadata, organizations gain a traceable memory of what was tried, what worked, and why, ready to power optimization models, scale-up decisions, and regulatory submissions.

Phase 3

Energy, Manufacturing and Food Tech

SemanticMatter extends Blobfish into Energy, Manufacturing, and Food Tech as a FAIR data spine connecting lab, plant, and quality systems. By converting recipes, process logs, and QA data into a machine-readable, ontology-aligned knowledge graph, organizations gain a unified view of how process decisions impact yield, quality, and compliance, fueling optimization models, digital twins, and trustworthy reporting.

How it works

1

Ingest

Upload files or stream data via API. We handle PDFs, images, SQL, and more.

2

Understand and Structure

LLM agents identify entities and relationships using domain-specific context.

3

Align and Enrich

Map local variables to standard ontologies (for example, QUDT and BFO) and enrich with external metadata.

4

Validate

Human-in-the-loop workflows allow experts to review and approve generated graphs.

5

Deploy

Push validated knowledge graphs to data lakes, digital twins, or analytics platforms.

Frequently asked questions

How does Blobfish integrate with our existing LLM stack?

Blobfish exposes an OpenAI v1-compatible API. You can switch the base URL in clients such as LangChain or LlamaIndex to route relevant tasks to specialized handlers.

What standards and ontologies do you support?

Blobfish supports community standards such as QUDT, BFO, and SOSA/SSN, plus custom OWL and RDF ontologies.

Can we keep all data on-premise or in a private cloud?

Yes. Blobfish can run in private VPC or on-premise infrastructure so data remains under your control.

How does human validation fit into the workflow?

A dedicated human-in-the-loop validation interface lets domain experts review, correct, and approve generated knowledge graphs.

What is required to start a pilot?

A typical pilot starts with a scoped dataset, such as 50-100 lab notebooks or reports, followed by a technical scoping session.

Bring order to lab data

Machine-readable FAIR data for modern R&D

Work with teams building reusable knowledge assets for analytics, digital twins, and AI.