Klark | KlarkLabs | KlarkLabs

Background

Klark Industries is an industrial company juggling a dozen heterogeneous data sources: SAP ERP, shared Excel files, CSV exports from multiple SaaS tools, and legacy internal databases. Operations teams spent several hours every week manually consolidating this data to produce their management reports. Leadership lacked a real-time view of key business indicators.

The mission: build a centralized platform that aggregates, normalizes, and exposes this data through interactive dashboards, with an AI layer for answering business questions in natural language.

Technical Challenges

Integrating Heterogeneous Sources

Each data source had its own formats, update frequencies, and reliability levels. The SAP ERP exposed a legacy SOAP API, CSV exports had no fixed schema, and some legacy systems only supported polling via FTP.

// lib/connectors/base-connector.ts
export abstract class BaseConnector {
  abstract readonly sourceId: string;
  abstract readonly pollingInterval: number;
 
  abstract fetchRawData(): Promise<RawDataPayload>;
  abstract transform(raw: RawDataPayload): Promise<NormalizedRecord[]>;
 
  async sync(): Promise<SyncResult> {
    const raw = await this.fetchRawData();
    const records = await this.transform(raw);
    await this.upsertToMongoDB(records);
    return { count: records.length, sourceId: this.sourceId };
  }
 
  private async upsertToMongoDB(records: NormalizedRecord[]) {
    const ops = records.map((r) => ({
      updateOne: {
        filter: { externalId: r.externalId, sourceId: this.sourceId },
        update: { $set: { ...r, syncedAt: new Date() } },
        upsert: true,
      },
    }));
    await db.collection("records").bulkWrite(ops, { ordered: false });
  }
}

Dashboard Performance

With millions of records, MongoDB aggregations had to be carefully optimized. We implemented a pre-aggregation system that computes metrics every 5 minutes and stores them in dedicated collections.

// jobs/aggregate-metrics.ts
export async function aggregateSalesMetrics(orgId: string, period: "day" | "week" | "month") {
  const pipeline = [
    { $match: { organizationId: orgId, type: "sale", date: { $gte: getPeriodStart(period) } } },
    {
      $group: {
        _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
        totalRevenue: { $sum: "$amount" },
        orderCount: { $sum: 1 },
        avgOrderValue: { $avg: "$amount" },
      },
    },
    { $sort: { _id: 1 } },
  ];
 
  const results = await db.collection("records").aggregate(pipeline).toArray();
 
  await db.collection("metrics_cache").replaceOne(
    { orgId, type: "sales", period },
    { orgId, type: "sales", period, data: results, computedAt: new Date() },
    { upsert: true }
  );
}

Natural Language AI Assistant

The most strategic feature is an assistant capable of answering questions like "What is my revenue this month compared to last year?" by dynamically generating and executing MongoDB queries.

// lib/ai/query-agent.ts
async function answerBusinessQuestion(question: string, orgContext: OrgContext) {
  const schemaDescription = await buildSchemaDescription(orgContext.orgId);
 
  const { query } = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are a MongoDB query generator. Given the schema: ${schemaDescription}
Generate a valid MongoDB aggregation pipeline as JSON. Only output JSON.`,
      },
      { role: "user", content: question },
    ],
    response_format: { type: "json_object" },
  });
 
  const pipeline = JSON.parse(query.choices[0].message.content!);
  const results = await db.collection("records")
    .aggregate([{ $match: { organizationId: orgContext.orgId } }, ...pipeline])
    .toArray();
 
  return results;
}

Deployed Solution

The final architecture runs on Next.js 15 for the frontend and API routes, MongoDB Atlas for persistence and vector search, AWS Lambda for periodic sync jobs, and CloudFront for static asset distribution. Connectors run on AWS EventBridge with schedules configurable per client.

The platform supports multi-tenancy with full data isolation per organization, role management (admin, editor, reader), and an audit log for all sensitive actions.

Results

The initial deployment covered 8 connectors (SAP, Salesforce, HubSpot, Shopify, QuickBooks, and three internal databases). In production since December 2025, the platform processes over 2 million records per day with a median dashboard response time of 145ms on pre-aggregated views.

Operations teams reduced their weekly reporting time from 6 hours to under 30 minutes. The client reported a 4x ROI at the 6-month mark relative to the project cost.