Hybrid i18n with Next and Astro (part 3)

Cover image

In this series, I relate my quest for the perfect internationalization system (i18n) for hybrid frameworks, namely Next.js and Astro. This story involves nasty technical concepts, but also teapots 🫖, so it's probably worth the read.

The end goal is to improve the Devographics translation system, which powers the State of JavaScript, CSS, HTML, React and GraphQL surveys. It is used both by the Next.js survey app and the new Astro results app which is not released yet.

You are reading the third part of this series. You can read part 1, "Tokens are all you need", here. Part 2, "Server Components, we have a deeply nested problem" is available there.

From tokens to token expressions

In the previous article, we have dealt with translations in Server Components. Let's head back to the browser.

We still need Client Components

Ideally, all translated texts should be rendered by server components. With server components, unused tokens stay on the server and never reach the client.

But that's not feasible in real life. In a non-trivial app, some translations will cross the "client boundary". To rephrase, translated texts are leaves in the component tree, and in React, leaves can't always be server components, many will have to be client components.

In this schema, LikeButton and Links are client components that may render translated text. They live beyond the "client boundary", so they can't use server components down the tree.

A tree showing client components at the leaves

Illustration taken from Next.js Learn tutorial

This means we do need a strategy to send translation tokens to the browser.

Since you can pass rendered Server Components to client components props or rely on interleaving, in theory, you could actually stick to server components only. It would be difficult though and I feel like the benefits would be limited.

Extract and filter tokens from client components

In the first part of this series, we mentioned that loading too many translations in the browser was a common problem with internationalization.

We can't just send the full dictionary of translations. It weighs dozens of kilobytes, and may take hundreds of milliseconds to load! We must filter tokens that are actually used by client components in the current page.

To get past this issue, we need two steps:

Extract a list of the tokens used by each client component for a given page
Filter the dictionary fetched server-side, so we send only relevant tokens to the client.

First step could be as simple as exporting a list of required tokens from each client component.

// in ClientComponent.tsx
// Only the token "home.title" is sent to the browser
"use client";
function ClientComponent() {
  return (
    <button onClick={doSomething}>
      <T token="home.title" />
    </button>
  );
}
export const tokens = ["home.title"];

Second step is just a matter of using this list of tokens.

// /some-page/page.tsx
// Get the full dictionary server-side
// and keep only tokens that are relevant
// for current page
import { tokens, ClientComponent } from "./ClientComponent";
async function SomePage() {
  const dictionary = await loadDictionary(currentLanguage);
  const clientDictionary = filter(dictionary, tokens);
  return (
    <I18nContext dict={clientDictionary}>
      <ClientComponent />
    </I18nContext>
  );
}

Great, we are done! We are done, right? Right? Someone?

No we're not, for a good reason, and a bad reason.

Let's start with the good one: dynamicness of the keys. The bad one (Next being mean with us) will be the topic of the next article of this series.

What happens when keys are dynamic

Translations keys are constant strings, like home.title, which lets us retrieve the translated text "Welcome home" | "Mi casa es tu casa".

But in Devographics apps, many of our tokens are actually dynamic and depend on the current survey: State of JS 2023, State of CSS 2021 etc.

Instead of "home.title" , a token may be ${surveyId}.home.title, relying on a variable surveyId.

During render, we need to inject the right surveyId to get the final key, for instance html2023.home.title.

Then we render the correct translated text for this survey: "Welcome to the State of HTML 2023 edition" | "Mi HTML survey es tu HTML survey".

This is a big problem, because it makes it much harder to list our tokens.

// in ClientComponent.tsx
export const tokens = [
  // ✅ OK just a string
  "home.title",
  // ❌ BAD we don't know "surveyId" statically!
  `${surveyId}.home.title`,
];

🎶 Clap along if you feel / like bundle magic is powerless 🎶

I'd like to stress out that we are talking about translation keys, not values.

It's pretty common to have translation strings that depend on a variable, like Hello ${userName}, it's called interpolation. Most i18n library will support that. That's fine. What's uncommon is to have translation tokens that are dynamic, like ${surveyId}.home.title.

In the wild

Paraglide treats i18n dictionaries as code, in order to benefit from modern build system tree-shaking capabilities to filter out unused tokens. That's super smart, I like the idea of reusing existing infrastructure.

But a key assumption is that dictionaries and token keys are known statically, while neither are true in our scenario. We load translations tokens from an API, and our tokens can be dynamic.

// can't be tree-shaken, at least not easily
message[${surveyId}.home.title]

The i18next-parser library uses comments to list all possible values for dynamic keys, but hardcoding should be avoided as much as possible.

// Smart but not satisfactory,
// we want to avoid hardcoding the list of surveys
/*
t('state_of_css.home.title')
t('state_of_js.home.title')
*/
t(surveyId + ".home.title");

Not exactly dynamic

I've spent some time chewing this issue, and hopefully our case is not as desperate as it sounds. Our tokens are not exactly dynamic, but "semi-dynamic", and that changes everything.

The difference between "dynamic" and "semi-dynamic" is that in the latter scenario, the variables are know at build-time.

For instance, the surveyId is a value we set in an environment variable when we load the app. Value can be html2023. While it is used as a JavaScript variable, it behaves more closely to a macro or a global constant.

To rephrase, at any point in time, we do know whether we are generating the html2023 result app or the css2021 survey form. We don't have to wait a client-side render to guess the value of ${surveyId}.home.title.

Our problem can be solved, we simply need to rethink the way we represent these semi-dynamic tokens.

Token expressions

Let's rewrite ${surveyId}.home.title to "{{surveyId}}.home.title".

Tada 🎉.

The first version is a template literal, using a dynamic variable to get the current survey. The second is just a string, with a special syntax that I've named token expressions.

Token expressions can be used as-is by client code, because client components only care about the translated value, not the key used to get the translation.

<T key={surveyId + ".home.title"} />
// is replaced by:
<T key="{{surveyId}}.home.title" />
// now the key is a static string!

🎶 Clap along if you feel / like a bundler without a roof 🎶

We can finally export a static list of required token expressions from any client component:

// in ClientComponent.tsx
"use client";
function ClientComponent() {
  return (
    <button onClick={doSomething}>
      <T token="{{surveyId}}.home.title" />
    </button>
  );
}
// The token expression is just a string,
// we can export/analyze it statically
export const tokens = ["{{surveyId}}.home.title"];

This tiny change is a huge step towards an optimal i18n system, as it allows us to properly handle multi-tenant application whose translation keys can vary based on some build-time context.

"Dude, you're solving a dumb problem"

The acute reader might notice that we could have simplified our tokens by removing the contextual part. For instance html2023.home.title could simply be home.title. No dynamic value, no problem.

Just swap dictionaries, not keys!

Here are some arguments against this idea:

We have legacy code and want to support it as is. Yeah life's tough.
It makes translators life easier to have contextual clues in the final token value.
We need to support true dynamic variables anyway, see next section.

This also explains why we didn't use macros to inject build-time variable. In addition to being incompatible with aysnchronous server-side data fetching, they can't handle true dynamic tokens.

Actual dynamic variables

Sadly, some tokens still do not fall in our definition of "semi-dynamic" tokens with values known at build-time. This happens when the rendered translated text depends on some user input.

For instance home.${selectedTab}. We don't know selectedTab at build-time, it's a client-side state value.

But now that we are equipped with a language to describe tokens, we have nothing to worry about, we just need to enhance the syntax!

Let's introduce:

Range token expressions: form.[inputType].title will match all the possible input types in our application, and select the following tokens: [form.text.title, form.email.title].
Wildcard expressions: chat.xaxis_unit.[*] will match all tokens with the matching structure

Range and wildcard expressions might lead to a slight overfetching if abused, but that's still much better than the initial situation.

In Astro

In case you had any doubt, this works perfectly fine in Astro. Here is the piece of code that loads the right locale server-side and filters the strings from a given page.

// For instance ["{{surveyId}}.home.title"]
import { tokenExprs } from "./SomeComponent";
const { locale } = Astro.params;
const localeWithStrings = await getLocaleWithStrings({
  contexts: ["common", "results", surveyId, surveyId + "_" + edition.year],
});
Astro.locals.i18n = astroI18nCtx(localeWithStrings);
const clientLocaleWithStrings = filterClientSideStrings(
  localeWithStrings,
  tokenExprs,
  { surveyId, editionId }
);

Here is the final dictionary, notice the {{editionId}} parameter. In this example, it matches css2023 for the 2023 State of CSS survey.

Dictionary of client-side tokens

Summary

We've made a huge step towards a properly optimized i18n system for client components:

Tokens are now defined as static strings, even if they depend on variables: {{surveyId}}.home.title.
We can filter the complete server-side dictionary based on expressions used by client components, and send a smaller dictionary to the browser.

Each client component is responsible for exporting the tokens or token expressions it depends on.

Want a dramatic plot twist? This doesn't work in Next. We can't export a list of tokens from a file using the "use client" directive, and run map/filter operations over it server-side.

In the next article, we will design an API to declare tokens from a client component.

Update: part 4 "Optimized client-side translations" is out!

A bientôt !