Project: Translation on boston.gov
This page documents implementation of the Google Translate Basic API on boston.gov.
Background
Previously, the Bing Translate Web Widget (a free javascript widget) was used to translate content on boston.gov. The widget was deprecated in July 2019. A new translation solution is required.
The free Google Translate JavaScript Widget is no longer being serviced and while it is still usable, the translations it yields proved too poor to move forward with this as viable option for Boston.gov. As an alternative, Google offers two versions of their Cloud Translation API: Basic (i.e. pre trained ) and Advanced API (i.e. uses your domain specific data, training required).
Basic API
Translates text that appears between tags. The output retains the (untranslated) HTML tags, with the translated text between the tags to the extent possible due to differences between the source and target languages. The order of HTML tags in the output may differ from the order in the input text due to word order changes in the translation.
Using just an API KEY will make a RESTful API call to the server. This would require multiple API calls, and thus would prove costly on high traffic pages. Using the required cloud libraries would require fewer API calls with static pages stored on the server.
Advanced API
Cloud Translation - Advanced supports translating text using custom AutoML Translation models, and for creating glossaries to ensure that the Translation translates a customer's domain-specific terminology correctly.
Before you can use Cloud Translation - Advanced, you must enable the AutoML API (automl.googleapis.com) if you want to use AutoML custom models for your project. If you plan to use a glossary or the batch features, you also need to create a Google Cloud Storage bucket and grant your service account access to it.
Based on feature/price comparisons, we decided to move forward with the Basic API.
There are some municipalities that use the Google Translate web translator directly, rather than the API or widget, to translate content on their sites. The State of Maryland's website, for example, employs a small bit of Javascript which provides users with a Translate button and drop down in the site header. When a user selects a language in the dropdown, all the text is run through the web translator and a translated version of the page is displayed. There is no call to an API, thus no cost. Example:
This solution has zero cost, but does not offer the same flexibility in terms of implementation nor the ability to incorporate domain specific data to populate glossaries.
Based on all the available options, we decided to move forward with a short and long term translation strategy.
Short term translation strategy
Use translate.google.com to provide translations for all languages provided/supported by the tool. Users should be able to translate any page or document (on pages) after selecting a language of their choice.
There is no cost associated with translation in this first iteration, given that we are proposing to use the free Google Translate web translator.
This short term strategy has no effect on any of our current content creation workflows.
Issues
Pre-translated content
We encountered an issue with existing multilingual content on boston.gov, i.e. pre-translated pages in Drupal. When, for example, user visits the boston.gov homepage and selects a given language using the translate button in the site header, the URL of the homepage (www.boston.gov) is run through the Google web translator and a translated version of the homepage is displayed to the user. When the user subsequently navigates through the site, new pages openedy the user are also opened in the Google web translator. Thus, a user could navigate from the homepage (post-translation) to the one of the existing multilingual pages on boston.gov. In this case, the Google web translator will re-translate the translated text, even if the translation setting of the translator and the language of the text on the page are the same. And most importantly, there are differences in the two version of the text, i.e. between the pre-translated text (translated by a human) and the re-translated pre-translated text (re-translated by Google web translator). To solve this issue, we followed these steps:
Long term translation strategy
Integrate Basic Google Translate API into current Drupal workflow such that when a new English language page is created in Drupal, multilingual copies of that page can be automatically generated, saved as drafts, and then subsequently quality checked and published by translators.
The cost of translation for the Basic API is $20 per million characters translated. Because we are only proposing to translate pages at the moment of publication, and then saving those translated pages as unique nodes in Drupal, the translation cost for this implementation of Google Translate API will likely be significantly lower than the cost of translating pages per user requests.
This long term strategy requires some changes to our current content creation workflow:
Metrolist REACT pages issue resolution
While Metrolist loads under the offsite version of Google Translate, the React views (Search and AMI Estimator) do not populate.
Background
React Router matches on the current URL (
window.location
ordocument.location
).The Google Translate widget loads the entire page into an
iframe
.Under normal circumstances, the React page loading inside of an
iframe
would not break anything, sinceiframe
s are self-contained. Thelocation
would still be e.g.https://www.boston.gov/metrolist/search
even if included on another domain. However, Google needs to modify the content on the page in order to translate it, and it isn’t possible to modify the contents of aniframe
from the parent page (unless they talk to each other usingpostMessage
). Therefore, Google merely scrapes the content of the included page and dynamically inserts it into aniframe
that it controls.Because of the above process, the
location
under Google Translate is notwww.boston.gov
but rathertranslate.googleusercontent.com
. The path of the page becomes/translate_c
, which throws off React Router matching that expects/metrolist
.index.bundle.js?v=2.x:2 Warning: You are attempting to use a basename on a page whose URL path does not begin with the basename. Expected path "/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=auto&sp=nmt4&tl=ja&u=https://www.boston.gov/metrolist/ami-estimator&usg=ALkJrhjYWXizTPU7YYBqcKUYUV0LgW-l5g" to begin with "/metrolist".
Google Translate also adds a
base
tag to the page’shead
set to the original URL (e.g.<base href="https://www.boston.gov/metrolist/ami-estimator" />
. This is to make sure relative links won’t break from being on a different domain. Ironically this breaks navigation for Single-Page Apps, which use the HTML5 History API rather than doing a real server request. History API cannot updatelocation
s across domains:Uncaught DOMException: Failed to execute 'pushState' on 'History': A history state object with URL 'https://www.boston.gov/metrolist/ami-estimator/household-income' cannot be created in a document with origin 'https://translate.googleusercontent.com' and URL 'https://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=auto&sp=nmt4&tl=ja&u=https://www.boston.gov/metrolist/ami-estimator&usg=ALkJrhjWcZACKPBWvA4U6iyZm2NCx47XBw'.
Clicking the link on
/metrolist/ami-estimator/result
to/metrolist/search
from within Google Translate is not possible since Google puts anX-Frame-Options
security header on theiframe
.
Solution
The top-level basename of
/metrolist
was removed and the routes were updated from/
,/search
, and/ami-estimator
to/metrolist/
,/metrolist/search
, and/metrolist/ami-estimator
respectively. This removed the basename mismatch console warning, but it did not fix the routing.We detect whether we are inside a Google Translate
iframe
, i.e. if the current domain istranslate.googleusercontent.com
—which should match 100% of the time, but in case that were to change we also check fortranslate.google.com
or the path/translate_c
—and if there is a query string present. If both conditions are met, we scan for a query string parameter pointing to/metrolist/*
, then extract the path from the first match. Google Translate re-hosts the page content by scraping whatever is specified in theu
parameter (“u” for “URL” most likely), e.g.u=https://www.boston.gov/metrolist/search
. Given that parameter is found, we can extract/metrolist/search
and then manually override the React Router location to think it is on/metrolist/search
even if it is actually on/translate_c
.Additionally, we store references to the two Google URLs in localStorage (
metrolistGoogleTranslateUrl
andmetrolistGoogleTranslateIframeUrl
) for later use.On forward/back navigation between AMI Estimator subroutes, we temporarily change the
base
fromhttps://www.boston.gov/metrolist/ami-estimator
(or equivalent dev environment) tohttps://translate.googleusercontent.com/metrolist/ami-estimator
. Even though the latter URL does not exist, it satisfies the necessary security conditions for navigation by keeping us on the same domain. Then, after navigating, thebase
is immediately changed back toboston.gov
so links and assets do not break.Finally, the link on
/metrolist/ami-estimator/result
to/metrolist/search
is swapped out with a new Google Translate URL. If left alone, then the untranslated Search page would load inside theiframe
. So we readlocalStorage.metrolistGoogleTranslateUrl
and replace theu
parameter with the equivalent/metrolist/search
URL for whataver domain it’s on. This URL has to be read from localStorage because if we try to readwindow.parent.location.href
it will be blocked for security reasons:Uncaught DOMException: Blocked a frame with origin "https://translate.googleusercontent.com" from accessing a cross-origin frame. It also has to be loaded in a new tab/window with
<a target="_blank"></a>
because otherwise we get another security error: Refused to display 'https://translate.google.com/translate?depth=1&pto=aue&rurl=translate.google.com&sl=auto&sp=nmt4&tl=ja&u=https://www.boston.gov/metrolist/search' in a frame because it set 'X-Frame-Options' to 'deny'.
Caveats
If Google Translate changes the way their code works, this could break.
Although this fix is verifiable on CI as far as translation goes, until the appropriate CORS headers are added to Acquia, the parts of the app that rely on API data will not resolve, so it will still appear broken. Although this also has to do with cross-origin restrictions, it is completely unrelated to the Translate issue, so it is safe to ignore. But to work around this and verify that the site is indeed 100% working, you can run Chrome without security enabled. Download Chrome Canary and run this command (macOS, but you can search for your platform equivalent):
open -n -a Google\ Chrome\ Canary --args --disable-web-security --user-data-dir=/tmp/chrome --disable-site-isolation-trials --allow-running-insecure-content
.
Last updated