What-if capacity scenarios

How much does it cost to keep the lights on at Meta and sustain its growth? A lot. I designed a forecasting tool that cut modeling time by 80% using object-oriented UX.

Finance and infrastructure leaders can now quickly build scenarios and present them directly to Mark Zuckerberg and other executives.

CATEGORY

CATEGORY

CATEGORY

Enterprise design

Enterprise design

Enterprise design

TEAM

TEAM

TEAM

1 Tech lead

1 Front-end engineer

2 Backend engineers

1 Tech lead

1 Front-end engineer

2 Backend engineers

TIMEFRAME

TIMEFRAME

TIMEFRAME

Dec 2023 - Feb 2024

Dec 2023 - Feb 2024

Dec 2023 - Feb 2024

In 2024, Meta allocated an unprecedented $35-40 billion for capital expenditures, mainly to accelerate its infrastructure for advancing its AI capabilities.

Strategic capacity planning— forecasting both financial and energy resources needed to sustain Meta's existing offerings while fueling AI innovation—became more crucial than ever.

Strategic capacity planning— forecasting both financial and energy resources needed to sustain Meta's existing offerings while fueling AI innovation—became more crucial than ever.

Strategic capacity planning— forecasting both financial and energy resources needed to sustain Meta's existing offerings while fueling AI innovation—became more crucial than ever.

Strategic capacity planning— forecasting both financial and energy resources needed to sustain Meta's existing offerings while fueling AI innovation—became more crucial than ever.

Strategic capacity planning— forecasting both financial and energy resources needed to sustain Meta's existing offerings while fueling AI innovation—became more crucial than ever.

ForecasterX is an internal tool that transforms Meta's complex data center, server, and network specifications into clear financial and energy projections. The platform empowers executives to make high-stakes decisions about where to invest resources, which hardware to purchase, and how to grow their operations—all essential to supporting Meta's AI ambitions.

ForecasterX forecast display

ForecasterX forecast display

ForecasterX forecast display

ForecasterX forecast display

ForecasterX forecast display

Problem: Forecasting takes too long and involves too many people

ForecasterX delivers a single quarterly baseline forecast using agreed-upon inputs from data centers, servers, networks, and company policies. The process involves generating multiple forecasts, selecting one, refining it, and finally aligning on it as a baseline. This extensive workflow—which ultimately drives all capacity planning decisions—creates a significant operational burden.

Creating forecasts takes too much time because it requires back-and-forths between infrastructure teams who provide the data and ML engineers who build the models, resulting in extended development timelines.

Creating forecasts takes too much time because it requires back-and-forths between infrastructure teams who provide the data and ML engineers who build the models, resulting in extended development timelines.

Creating forecasts takes too much time because it requires back-and-forths between infrastructure teams who provide the data and ML engineers who build the models, resulting in extended development timelines.

Creating forecasts takes too much time because it requires back-and-forths between infrastructure teams who provide the data and ML engineers who build the models, resulting in extended development timelines.

Creating forecasts takes too much time because it requires back-and-forths between infrastructure teams who provide the data and ML engineers who build the models, resulting in extended development timelines.

Solution: Make it easy to create what-if scenarios

We need to enable input owners to easily create forecast scenarios by changing input data, running the model to generate the new forecast, and understand how big of a change it is from the current baseline. They can then select the best forecasts for leadership review, eventually settling on one as the plan of record for next quarter's baseline forecast.

I joined the project in December 2023 to enhance this capability. A rudimentary version called "Canary service" was implemented —which presented our first challenge.

The existing functionality was called "Canary service"

The existing functionality was called "Canary service"

The existing functionality was called "Canary service"

The existing functionality was called "Canary service"

The existing functionality was called "Canary service"

Keep it simple, silly

I was confused about why the engineering team named this feature "Canary service". As I ramped up, I realized it foreshadowed a deeper complexity issue—an analogous testing method had become the feature name, obscuring its purpose.

Through documentation review, hands-on testing, Q&A with the team, and object-oriented analysis, I uncovered the core elements enabling what-if scenario creation. It was a lot of elements.

  • Model

  • Model run

  • Baseline

  • Version

  • Scenario

  • Model

  • Model run

  • Baseline

  • Version

  • Scenario

  • Data input

  • Stage

  • Canary run

  • Run ID

  • Diff

  • Data input

  • Stage

  • Canary run

  • Run ID

  • Diff

Leveraging user research, I identified the few key objects, actions, and attributes truly essential to users.

  • Model

  • Model run

  • Scenario

  • Model

  • Model run

  • Scenario

  • Data input

  • Canary run

  • Data input

  • Canary run

Then, relationships and redundancies began to emerge:

  • A model is a fixed set of data inputs and outputs

  • An output is a forecast

  • A scenario is one model run

  • A comparison is a canary run. It is a delta of 2 scenarios

Technical objects like models and canaries don't have to be surfaced to users. This leads to a simplified conceptual model consisting of baseline scenario or forecast, new scenario, input data, and comparing new against the baseline.

By presenting this simplified conceptual model to the team, I successfully advocated renaming this feature. We finally settled on Scenario Workbench.

Technical objects like models and canaries don't have to be surfaced to users. This leads to a simplified conceptual model. In fact, I successfully advocated renaming this feature to Scenario workbench.

A simplified conceptual model

A simplified conceptual model

A simplified conceptual model

A simplified conceptual model

A simplified conceptual model

Scenario versus comparison

I mapped about the end-to-end user flow before starting any interface design. I thought about whether the conceptual model should be indexed on a scenario or on a comparison. While users ultimately want to compare new scenarios against baseline scenarios, I designed around scenario creation rather than comparison. This approach leverages the more intuitive mental model of creating an object (scenario or forecast) versus creating an attribute (comparison).

Creating a scenario versus creating a comparison

Creating a scenario versus creating a comparison

Creating a scenario versus creating a comparison

Creating a scenario versus creating a comparison

Creating a scenario versus creating a comparison

I refined the user flow with two key improvements. First, I secured buy-in for public/private visibility controls, allowing users to experiment with scenarios without signaling unintended organizational changes.

Second, I designed for auto-saving that creates a draft as soon as users name their scenario, making work recovery more intuitive than using system IDs and preventing frustrating data loss during the complex creation process.

Refined user flow

Refined user flow

Refined user flow

Refined user flow

Refined user flow

Prioritize to simplify

Creating what-if scenarios is seemingly easy:

  1. Create a new scenario by adding a name and some metadata

  2. Select a baseline and modify key inputs

  3. Generate and compare results

Step 1: Scenario setup

Step 1: Scenario setup

Step 1: Scenario setup

Step 1: Scenario setup

Step 1: Scenario setup

However, the second step of changing inputs was more complex. At first, I attempted to give equal prominence to all modification methods—a mistake that led to complexity.

In version 1, I adapted the proof-of-concept approach with an editable JSON interface and side-by-side comparison view to clearly display changes.

  1. Select a baseline scenario to get its default data inputs

  2. Comparison view of edited data input

Version 1

Version 1

Version 1

Version 1

Version 1

In version 2, after the tech lead explained that direct JSON editing wasn't possible, I pivoted to a streamlined download-modify-upload workflow

Version 2

Version 2

Version 2

Version 2

Version 2

In version 3, technical constraints eased, allowing users to modify files or select different versions from the connected data hub.

The tech lead noted users would have just created these versions via CLI and would know their numbers, so I designed a simple interface showing baseline data on the left with version selection or upload options on the right.

Version 3

Version 3

Version 3

Version 3

Version 3

These early iterations gave equal prominence to all modification methods, creating complexity. The breakthrough came when I prioritized version selection as the primary interaction, while placing secondary actions in a more contextual interface.

These early iterations gave equal prominence to all modification methods, creating complexity. The breakthrough came when I prioritized version selection as the primary interaction, while placing secondary actions in a more contextual interface.

These early iterations gave equal prominence to all modification methods, creating complexity. The breakthrough came when I prioritized version selection as the primary interaction, while placing secondary actions in a more contextual interface.

These early iterations gave equal prominence to all modification methods, creating complexity. The breakthrough came when I prioritized version selection as the primary interaction, while placing secondary actions in a more contextual interface.

These early iterations gave equal prominence to all modification methods, creating complexity. The breakthrough came when I prioritized version selection as the primary interaction, while placing secondary actions in a more contextual interface.

Final design of changing input data

Final design of changing input data

Final design of changing input data

Final design of changing input data

Final design of changing input data

However, the second step of changing inputs was more complex. At first, I attempted to give equal prominence to all modification methods—a mistake that led to complexity.

To help users navigate dense information while creating scenarios and selecting input data to change, I optimized for space efficiency and cognitive clarity:

  1. The input data that's selected on the left is previewed on the right. As new versions are selected, the right preview would update.

  2. Version numbers appear in preview pane for selection confirmation

  3. Input data that has an updated version is indicated with a blue dot

  4. The "Changed inputs" tab groups all modified inputs for quick review

Visualizing scenario results

Scenario creation requires processing time. The underlying model needs several hours to generate forecasts based on the modified input data. The interface needs an in-progress state while results are being calculated.

I initially placed the metadata (baseline scenario, modified input data, and scenario description) on the left. Later, I prioritized scenario results on the left and moved metadata to the right, recognizing users care more about outcomes than setup details.

In-progress view of scenario results

In-progress view of scenario results

In-progress view of scenario results

In-progress view of scenario results

In-progress view of scenario results

To generate ideas on the best way to visualize a scenario forecast, I interviewed 3 potential users to understand their preferences.

Based on the research, I organized scenario results by product and region dimensions. Each dimension is measured using both financial metrics (Operating Expenses and Capital Expenditures) and infrastructure metrics (megawatts and rack counts)

One user shared an example of a heat map she created in Excel to manually track changes in capacity forecasts. I leveraged this visualization technique in my explorations.

The initial results interface featured product and region toggle filters with vertical labels grouping these dimensions. Users also expressed the need to conduct deeper analysis in tools like notebooks, so I ensured the results could be seamlessly opened in their preferred analytical environments.

Version 1 of results

Version 1 of results

Version 1 of results

Version 1 of results

Version 1 of results

After finding toggles created usability confusion, I replaced them with tabs to clearly separate product and region dimensions. This simplified navigation despite adding a second layer of tabs.

Final design of scenario results

Final design of scenario results

Final design of scenario results

Final design of scenario results

Final design of scenario results

Although the primary user intent is to see how this new what-if scenario compares against the baseline, it's also important for them to see the raw output. For this, I leveraged the existing implementation so users do not have to learn a new visualization pattern.

Scenario output uses existing implementation

Scenario output uses existing implementation

Scenario output uses existing implementation

Scenario output uses existing implementation

Scenario output uses existing implementation

Debugging unexpected scenario results

All three interviewed users emphasized the need to troubleshoot unexpected scenario outcomes. If the scenario output was unexpected, it's crucial to quickly understand why.

The most likely reason is because the new input data was wrong or problematic. The design highlights input data changes in two ways:

First, the right-side of the results tab prominently displays the number of changed input data for this scenario and what they are.

Scenario results page shows number of changed input data

Scenario results page shows number of changed input data

Scenario results page shows number of changed input data

Scenario results page shows number of changed input data

Scenario results page shows number of changed input data

Second, a dedicated input data tab consolidates all the changed input data and displays a clear comparison between the altered values and baseline data.

Dedicated input data tab for each scenario

Dedicated input data tab for each scenario

Dedicated input data tab for each scenario

Dedicated input data tab for each scenario

Dedicated input data tab for each scenario

Systemizing accessibility

Based on feedback from stakeholders, we replaced the semantically loaded orange and blue palette with neutral purple and teal colors, eliminating implicit positive/negative associations with directional changes.

But I realized there might be issues for users with red/green color blindness.

Right image has red-green color blindness filter applied

Right image has red-green color blindness filter applied

Bottom image has red-green color blindness filter applied

Right image has red-green color blindness filter applied

Bottom image has red-green color blindness filter applied

Initially, I explored a spectrum of opacity variations that would fulfill color-blindness needs. But research showed translucent colors pose unique challenges, with background colors significantly affecting contrast ratios.

The front-end engineer and I opted to linearly interpolate between the lightest acceptable shade of purple/teal and the darkest acceptable shade of purple/teal to derive the color value of the cell background. A 100% delta displays the darkest accessible purple or teal shade meeting color-blindness requirements, while a 1% delta appears in the lightest shade.

After this project, I proposed a heat map component for the design system that leveraged this approach.

Measuring impact

Architecting the back-end

I successfully applied object-oriented UX to simplify complexity, influencing both interface design and backend architecture—unlike typical projects where design follows technical limitations.

One example: the team renamed generic "runs/run IDs" to "scenarios" throughout the codebase to better match user mental models.

Data model influenced by design

Data model influenced by design

Data model influenced by design

Data model influenced by design

Data model influenced by design

I can't count how many times I've worked on projects where the internal jargon didn't match the UX. With OOUX, the data model is influenced by the design. The exercise of designing the UX and that of architecting the data backend go hand-in-hand.

Front-end engineer

I can't count how many times I've worked on projects where the internal jargon didn't match the UX. With OOUX, the data model is influenced by the design. The exercise of designing the UX and that of architecting the data backend go hand-in-hand.

Front-end engineer

I can't count how many times I've worked on projects where the internal jargon didn't match the UX. With OOUX, the data model is influenced by the design. The exercise of designing the UX and that of architecting the data backend go hand-in-hand.

Front-end engineer

I can't count how many times I've worked on projects where the internal jargon didn't match the UX. With OOUX, the data model is influenced by the design. The exercise of designing the UX and that of architecting the data backend go hand-in-hand.

Front-end engineer

I can't count how many times I've worked on projects where the internal jargon didn't match the UX. With OOUX, the data model is influenced by the design. The exercise of designing the UX and that of architecting the data backend go hand-in-hand.

Front-end engineer

Time. Time. Time.

The modus operandi of internal tools is to increase efficiency; time-savings is the name of the game. Finance teams previously relied on engineers to implement data changes and run the model—typically a 24-hour process.

With scenario workbench automation, this now takes about four hours, reducing turnaround time by over 80%.