CrawlKit data platform dashboard

Case study

Data assets with

CrawlKit reframes data engineering around trusted assets: contracts, lineage, durable execution, and AI recommendations you can audit.

Client

CrawlKit

Industry

AI-Native Data Engineering Platform

Timeline

North-star product architecture

Scope

Product Strategy, Platform Architecture, Data Engineering UX, AI Safety

Case StudyData EngineeringAIRestateArrowIcebergOpenLineage

CrawlKit started as a question about trust. Modern data teams already have schedulers, warehouses, notebooks, scripts, and dashboards. What they usually do not have is a single place to answer the questions that matter when something breaks: what created this asset, what did it depend on, what contract was it supposed to satisfy, who owns it, and what should happen next?

NoScope’s work reframed CrawlKit from a narrow tooling idea into an AI-native data engineering platform. The north star is not another catalog or pipeline scheduler. It is trusted asset materialization: a workflow that takes source data, executes transformations with deterministic infrastructure, validates the output against contracts, commits versioned table state, emits lineage, and leaves behind evidence that both humans and AI agents can inspect.

The architecture centers on a control plane for assets, contracts, connectors, policies, approvals, and run history. Durable execution is handled by Restate workflows so retries are safe and long-running operations can be resumed without duplicating side effects. The data plane uses Arrow and DataFusion for efficient execution, Parquet for storage, Iceberg for snapshot-based table state, and OpenLineage/OpenTelemetry for provenance and observability.

The AI layer is deliberately constrained. It can plan assets, suggest SQL, draft contracts, explain failures, and recommend remediation, but every recommendation is bounded by contracts, lineage, policy, evidence links, and approval gates. That makes CrawlKit an example of NoScope’s core identity: AI can accelerate engineering work only when the surrounding system can verify what happened.

The product spec defines a workspace where data engineers, analytics engineers, platform engineers, governance leads, and data consumers all see the same resource lifecycle. Asset catalog, AI planner, transform editor, contract editor, run detail, lineage explorer, incident workspace, and approval queue all point at the same asset record.

The result is a platform story built around measurable operating outcomes: reducing mean time to diagnose failures, increasing trusted asset coverage, improving quality gate adoption, reducing incident recurrence, and lowering cost per successful materialization. Those metrics define the NoScope bar: software that runs, then explains what happened.

Results

What the numbers showed

North-star operating metrics defined for trusted data assets

30m

target time to first trusted asset

50%+

target MTTD reduction

80%+

trusted asset coverage target

Platform architecture

From prompt to materialized asset, every step is inspectable

Data platform dashboard overview

Command center: assets, runs, incidents, contracts, and lineage in one console

Asset catalog and metrics

Asset catalog: ownership, freshness, contracts, and quality gates

Workflow infrastructure

Durable runtime: Restate workflows, idempotent commits, OpenLineage events

Testimonial

CrawlKit

"The product thesis is simple: every production data asset should be able to explain what created it, what it depends on, whether it is valid, who owns it, and what should happen next if it fails."
NoScope project team

NoScope project team

Product architecture note

Need a platform that makes data trustworthy

We design data systems around contracts, lineage, runtime guarantees, and the operating model your team actually needs.