Article

What I Would Do as the First Data Engineer (First 90 Days)

Most companies that hire their first data engineer are not starting from zero. Data already exists across the business, but it lives in spreadsheets, dashboards, and manual workflows that…

Key Takeaways

  • The first 90 days should focus on reducing confusion in the existing data environment, not rebuilding everything at once.
  • Early progress comes from understanding where data originates, how it moves, and who is responsible for key definitions.
  • High impact workflows should be stabilized first so engineering effort aligns with real business risk.
  • A simple foundation and lightweight request process usually create more value than introducing a complex modern stack too early.
  • Success after 90 days looks like better trust, clearer ownership, and fewer fragile manual processes.

Most companies that hire their first data engineer are not starting from zero. Data already exists across the business, but it lives in spreadsheets, dashboards, and manual workflows that have grown organically. Different teams pull data from different systems, apply their own logic, and arrive at slightly different answers to the same question.

This creates a predictable set of problems. Reports break, numbers do not match, and simple questions take longer than they should. Over time, people stop trusting the data and default to their own versions of the truth. The goal of the first 90 days is not to rebuild everything, but to bring clarity to this environment and create a foundation the business can rely on.

1. Understand the Reality of the Data Landscape

Before building anything new, the focus should be on understanding how data actually flows through the business. This means going beyond official systems and looking at how teams truly operate day to day. In many cases, the most important workflows are not in a warehouse, but in a spreadsheet that someone quietly maintains.

At a high level, I would map out:

  • Where data originates
  • How it is transformed
  • Where it is consumed
  • Who owns each piece

Ownership is especially important, even if it is informal today. Without it, there is no way to enforce consistency or accountability.

Documentation begins here, not later. For high impact reports, it should be clear what the report is used for, how metrics are defined, and what assumptions were made. This is where inconsistencies surface. Two dashboards showing the same metric often rely on different logic, and without documentation there is no way to reconcile them.

By the end of this phase, there should be a clear and shared understanding of how data is actually used across the business.

2. Prioritize Based on Business Impact

Once the landscape is understood, the next step is to introduce focus. Not every report or dataset deserves attention at the same time, and trying to fix everything at once slows progress.

A simple way to prioritize is to group workflows by business impact.

High priority workflows

These are the workflows the business cannot operate without. If they fail, the impact is immediate and visible.

  • Daily sales reporting used by leadership
  • Operational dashboards that drive same day decisions
  • Financial data used for reconciliation or cash tracking

Medium priority workflows

These are important for understanding performance, but do not disrupt daily operations if delayed.

  • Monthly and month to date reporting
  • Department level performance dashboards
  • Trend analysis used for planning

Low priority workflows

These are useful, but optional. Many of them started as exploratory work and became permanent without clear ownership.

  • Marketing dashboards tied to past experiments
  • One off analyses that turned into recurring reports
  • Dashboards that are rarely used but still maintained

This categorization ensures that early effort is spent stabilizing the parts of the data ecosystem that actually matter.

3. Establish a Simple and Intentional Foundation

With priorities clear, decisions about tools and architecture can be made with context. This is where restraint matters. Introducing a full modern data stack too early often adds complexity without solving the core problems.

The goal is to create a foundation that is simple and reliable. That usually means:

  • Defining where data should live at each stage
  • Standardizing how data is ingested and transformed
  • Ensuring outputs are consistent and accessible

At the same time, strong alignment with the business needs to be established. For each critical dataset, there should be a clear owner responsible for defining what the data means. The data engineer is responsible for implementing and enforcing those definitions, but not creating them.

From here, a roadmap can be built that focuses on high priority workflows first. Early projects should replace fragile manual processes with structured, repeatable pipelines.

4. Bring Structure to Ad Hoc Work

Ad hoc requests are one of the biggest sources of inefficiency in small teams. They come through messages, meetings, and side conversations, often without enough context to be handled effectively.

Introducing a simple system can significantly improve this. It does not need to be complex, but it should be consistent.

At a minimum:

  • A single intake channel for requests
  • Clear information required before work begins
  • A lightweight prioritization process
  • Defined expectations for turnaround time

Not every request should be immediate, and not every question should be self serve. Some require deeper analysis or new data that does not yet exist. Creating structure around this allows the team to stay focused on high impact work without becoming reactive.

5. What Progress Looks Like After 90 Days

After 90 days, success is not defined by how many tools were implemented. It is defined by whether the environment is more stable, understandable, and aligned with the business.

You should expect to see:

  • A clear view of where data comes from and how it flows
  • Documentation for critical reports and metrics
  • Fewer conflicting numbers across teams
  • Clear ownership between the business and the data function
  • Initial pipelines replacing manual processes in high priority areas

The platform is not fully built at this point, but it is stable enough to build on with confidence.

A Stronger Starting Point

The first 90 days are about reducing chaos, not introducing complexity. Most companies do not need more tools at this stage. They need clarity, structure, and alignment between the business and the data function.

Once that foundation exists, building a scalable and reliable data platform becomes significantly easier.

Related Reading