Have Freedom of Choice | Using ML → Democratizing NLP

I want to | When → So I can 授人以魚不如授人以漁

7 min readDec 10, 2019

この job story は Zeals Advent Calendar のために書かれています。
I write this job story for Zeals Advent Calendar.
這則 job story 是為 Zeals Advent Calendar 而寫的。

Take-Home Messages

情報非対称性を解消：コミュニケーション壁を下げる
Eliminate information asymmetry: lower barriers to communications
消弭資訊不對稱：降低溝通門檻
オープンソースソフトウェアと無料のクラウドサービスを利用する
Utilize open source softwares and free cloud services
善用開源軟體與免費雲端服務
• huggingface, fastai, AllenNLP, etc.
• Google Colab, Paperspace Gradient, etc.
コードだけでなく、すべてのためにレビューを求める
Seeking reviews for everything, not just for code
不光是程式碼，任何事都該徵詢審閱

§1 Talk Is Cheap, but……

IMAO,
実用的な話はちょっと優れている場合があります。
actionable talk can be slightly better.
實用的嘴砲倒有可能好點。

何かを民主化するなどの善意のために、誰もが参加することが難しい場合は矛盾したものになるでしょう。
For such a good intention as “democratizing something,” it would be an oxymoron if it is difficult for anyone to participate.
以「民主化某事」這般良善意圖而言，若人人參與並非易事就自相矛盾了。

(So the talk shop here is…)
僕が NLP で ML を使用する時、選択の自由が欲しい場合、NLP がより民主的になるように行われるか。
When I want to have freedom of choices for using machine learning (ML) on natural language processing (NLP), how the job is done such that NLP will be more democratic.
該怎麼做，才能在我想用 ML 搞 NLP 時擁有選擇的自由，使 NLP 更民主化。

僕の頭の中に浮かんでいるのは：
What’s been floating in my mind are:
我腦中浮現的要點是：

ヒント/トリックよりもコンセプトを優先する
Prefer concept over tips/tricks
概念先於技巧
入手しやすい
Accessible
高可用性
責任を負う
Accountable
問責

§2 Barriers and Inequalities

社会経済的な階層、母語、心身機能の障害などの明白なものに加えて、ソフトウェアやハードウェアの需給にも（哲学的に）多くの障壁と不平等が存在すると主張します。ただし、エッジケースをテストするために、変なものをいくつか挙げてみましょう：
Besides the obvious ones such as socioeconomic strata, native tongues, physical/mental challenges, etc., I’d argue that (philosophically) many barriers and inequalities exist within supply and demand of softwares and hardwares, too. For testing edge cases, though, let me list some weird one:
除了社經地位、母語、身心障礙之類明顯的例子之外，我認為（哲學上）軟硬體的供需裡也有許多藩籬與不平等。為檢驗極端狀況，容我講些奇怪的：

I want to code in Python with the speed of C and the type system of Haskell.¹
I want fast computation but I can’t afford or don’t like the limited choices.²
I want free, liberal, and secure Internet.³
I want……

それらはわがままて聞こえるかもしれませんが、あなたがポイントをわかるて、（異議）論点の奥の手があると確信しています。原則として、特定のヒント/トリックではなく、僕は（実用的な）一般論に固守するですが、例として提供します。
Those may sound willful and spoiled, but I’m sure you get the point and have (counter-)arguments up your sleeves. In principle, I like to stick to (actionably) general concepts instead of specific tips/tricks, but I will still provide some as examples.
這聽來任性驕縱，不過相信你懂我的意思且自有高見。原則上，我寧可聚焦於（可行的）一般概念而非特定技巧，但還是會舉一兩個為例。

要するに、最初に自分たちがどんな選択肢を持っているかを知らなければならず、それから交渉します。
In short, first we must know what choices we have, and then we bargain.
簡單地講，首先得知道我們有什麼選擇，然後才能討價還價。

誰もが完璧な英語を話し、直接情報を受け取る先進国では生まれないかもしれませんが、追いついて助け合うことができます。望ましいソフトウェア/ハードウェアの選択肢はないかもしれませんが、代替手段としてコミュニティに参加したり、コミュニティを開始したりすることもできます。そのため、自己教育と生涯学習が不可欠です。
We may not be born in a developed country that everyone speaks perfect English and receives first hand information, but we can try to catch up and help each other. We may not have preferable choices of software/hardware, but we can try joining or even starting a community for alternatives. So self-education and lifelong learning is crucial.
我們或許不是生在已開發國、能講流利英語、能獲得第一手資訊的人。但我們可以一起試著趕上，並互相幫助。我們或許沒有理想的軟硬體方案，但我們可以試著加入甚至成立提供替代方案的社群。因此，自我教育與終身學習是關鍵。

You can’t always get what you want. — https://multimedia.thenational.ae/assets/multimedia/interactives/flat-graphics/rolling-stones.jpg

プログラマーになれて幸運です。ほとんどさまざまな選択肢があります。私は何かを学び、貢献することができます。さらに良いことに、多くの才能のある人々もそれをやっています。ほとんどの場合、僕は彼らの素晴らしい作品を他の人に紹介するだけです。たとえば、huggingface は最近、より小さく、速く、安価なBERTの蒸留バージョンをリリースしました。それが何だかわからない？最初に英語（基本的な不平等の1つ）を学び、次に fast.ai のような無料のオンラインコースに登録します。
I’m lucky to be a programmer. Virtually various choices are available to me. I can learn something and then contribute. Even better, plenty talented people are doing it, too. Most of time I just refer their great works to others. For example, huggingface recently released a distilled version of BERT that is smaller, faster, and cheaper. Don’t know what it is? First learn English (one of fundamental inequalities), and then enroll free online courses like fast.ai’s.
我有幸生為碼農。幾乎擁有各種選擇。我可以學點什麼然後回饋。更棒的是，許多傑出人士也這麼做。於是我通常只要引介他們的傑作即可。例如， huggingface 最近發表了更小、更快、更划算的 distilled version of BERT. 不懂？先學英文（基本的不平等之一），然後報名 fast.ai 之類的免費線上課程吧！

一方、制限を知ることはより重要かもしれません。たとえば、僕の知る限り、IDSIA-sacred と同じくらい洗練されていますが、それは暗黙的に擬似ランダム状態を追跡するだけで、その構成（ほとんどの場合ハイパーパラメーター）は通常匿名の辞書です。名前付きの構成がありますが、残念ながら、A / Bテスト用の以前のスナップショットモデルをリサイクルする再現可能なトレーニングサイクルを使用できるように、型付きの不変の構成でランダム性を明示的に制御する必要があります。したがって、私はしぶしぶ車輪を再発明します。擬似コードは次のようなものです：
On the other hand, it may be more important to know the limitations. For example, to my best knowledge, as sophisticated as IDSIA-sacred is, it only implicitly tracks pseudo-random states, and its configurations (mostly hyper-parameters) are usually just anonymous dictionaries. Although one may have a named config, unfortunately, I need explicit controls on randomness with typed and immutable configurations, so that I can have reproducible training cycles that recycle preceding snapshot models for A/B testing. Therefore I reinvent wheels reluctantly. Their pseudocode is something like:
另一方面，知道極限在那可能更重要。好比說，就我所知，精密如 IDSIA-sacred 也僅在內部隱含追蹤 pseudo-random states, 而且 configurations（多用於 hyper-parameters）通常也只是 anonymous dictionary. 雖然可以改用 named config, 無奈我需要的是透過型別與唯讀 configurations 來明確地控制隨機性，才能有可回收過去 snapshot model 進行 A/B 測試的可重現 training cycle. 於是我勉為其難重造輪子。虛擬碼像這樣：

conf_0 = typed_conf(corpus, task, model, env, hyperparams, ...)
cycle_0 = training(conf_0, ..., prev_cycle=None)conf_a = conf_0.replicate_except(change(hyperparams.someone, 'a'))
cycle_a = training(conf_a, ..., prev_cycle=cycle_0)conf_b = conf_0.replicate_except(change(hyperparams.someone, 'b'))
cycle_b = training(conf_b, ..., prev_cycle=cycle_0)

実験は些細なことのように思えるかもしれませんが、ランダムな状態と非決定的/不安定なアルゴリズムにより、実装が複雑になり、エラーが発生しやすくなります。確かに、IDSIA-sacredなツールや他のツールを使用してそれを行う賢い方法がありますが、これはとにかく単なる例です。小さくて退屈なこともありますが、内部には不平等が隠されています：数千のプログラマーやGPUが自由に使えるのであれば、不規則性（人為的ミスやランダムな結果）は平均化される可能性が高いので、気にする必要はありません。この例の貧弱さにもかかわらず、それは huggingface と IDSIA-sacred のスローガンの両方を実際に反映できます。「一度に１つのコミット；すべての実験は神聖です」。言い換えれば、再現可能な実験を完全に制御することで、Google ColabやPaperspace Gradientなどの制限があるでも無料サービスでかなりの量の作業を行うことができます。１つのコミット/実験もその二酸化炭素排出量も無駄にされません。
The experiment may seem trivial, but random states and non-deterministic/unstable algorithms make implementations complex hence human-error-prone. Surely there are clever ways to do it with IDSIA-sacred or other tools, this is just an example anyways. As small and tedious as it can be, an inequality hides inside: you may not care about them if thousands of programmers and GPUs are at your disposal, since irregularities (human errors and random outcomes) will likely be averaged out. Despite all the meagerness of this example, it can really reflect both huggingface and IDSIA-sacred’s slogans: “one commit at a time; every experiment is sacred.” In other words, with total controls of reproducible experiments, we can have considerable amount of works done with limited yet free services such as Google Colab and Paperspace Gradient. Neither a single commit/experiment nor its carbon dioxide emissions will be wasted.

理想的には、大学院生の（勾配）降下法だけでなく、すべての人の知識を管理し広める機会が増えるポジティブフィードバックループが発生します。
Ideally, a positive feedback loop will then occur, not just of graduate student descent algorithms, but of more opportunities to curate and spread knowledge for everyone.

些細なヒントや愚かなコードスニペットを本当に読みたい場合は、
In case you really want to read my insignificant tips and stupid code snippets, please refer to
tianjianjiang/playground/IMDb_baseline.ipynb

上記のノートブックは、次のトリックを適用します。
The above notebook applies following tricks:

不安定な入力を生成する可能性のある並列プロシージャを使用しないでください。
NEVER use parallel procedures that can generate unstable input;
双方向LSTMとともにcuDNNのdropoutを使用しないでください（基になるC / C ++関数を制御できない場合）。
NEVER use cuDNN’s dropout along with bidirectional LSTM (unless you can control the underlying C/C++ functions);
4つの可能な擬似乱数ジェネレータのすべての状態を覚えておいてください。
Remember every state of four possible pseudo random number generators;
MixedPrecisionの動的損失スケールのすべての変更を思い出してください。
Remember every change of dynamic loss scale for MixedPrecision;
バッチサイズを考慮した学習率を見つけます。
Find a learning rate that takes batch size into account;

(2) のdrouptouは特に難しい：それなしではトレーニング速度は~10％遅くなりますが、それを使用すると成功または失敗の妄想を引き起こします —— 精度はランダムに5％以上変動する可能性があります！
The dropout thing of (2) is particularity challenging: training speed will be ~10% slower without it, but using it will induce you a delusion of success or failure —— its accuracy can vary more than 5% randomly!

日本のコーパスにとって、これらのトリックを適用するのはそれほど難しくありません
For Japanese corpora, it won’t be too hard to apply those tricks to
• n-waves/multifit/notebooks/MLDoc-JA-multifit_fp16.ipynb
• n-waves/multifit/notebooks/CLS-JA.ipynb.

§3 Checks and Balances

「AI倫理：Andrew Ng教授のdeeplearning.ai x MLT」では、参加者はこの質問に対する答えを推論するよう求められました。
In “AI ETHICS: Prof. Andrew Ng’s deeplearning.ai x MLT”, participants were asked to reason their answers for this question:

Should a university accepts convicted offenders of human rights (e.g., dictators)?

https://blog.deeplearning.ai/blog/the-batch-amazons-surveillance-network-ai-that-gets-the-facts-right-deepfakes-get-regulated-predicting-volcanic-eruptions

道徳的推論の他の思考実験のように、簡単な答えはありません。標準的な答えは言うまでもありません。それは、究極の答えを決定することよりも、異なる意見の間で対話を開くことに関するものです。さらに、すべての政府/企業の責任であると主張する代わりに、草の根の方法でAI倫理の実用的なコードを策定したいと考えています。
Like any other thought experiments of moral reasoning, there won’t be straightforward answers, let alone a standard one. It is more about opening a dialogue among different opinions than dictating an ultimate answer. Furthermore, we want to formulate actionable code of AI ethics in a grassroots way, instead of arguing that it’s all government/enterprise responsibilities.

僕のデータ倫理チームでは、この単純なコードを作成しました。多くのプログラマーがコードレビューを行い、科学コミュニティがピアレビューを行うのと同様に、データの影響を受ける人々の多様な視点からレビューを求めます。悲しいことに、私たちは偏見や差別について多くの話を聞きましたが、それらの問題に対する単純な答えはありませんでした。誰もが（少なくともすべてのプログラマーが）毎日できることは、セカンドオピニオンを得ることです。
In my team of data ethics, we came up this simple code: just like many programmers do code review and the science community does peer review, we shall also seek reviews from diverse perspectives with people impacted by the data. Sadly we all heard too many stories about bias and discrimination, and again no simple answers to those issues. What everyone (at least every programmer) can do everyday is to get a second opinion.

たとえば、台湾の子供向けのeラーニングチャットボットを作成する場合、私たちの1人が小学校の教科書のコーパスを確認し、
For example, if we are going to create an e-learning chatbot for Taiwanese kids, hopefully one of us will review the corpus of elementary school textbooks, and then realize that it contains gender stereotypical lessons like

「父は新聞を読むために早く起きます、母は家事のために早起きし、私は学校に行くために早起きします。」
“My father gets up early for reading newspapers, my mother gets up early for housekeeping, and I get up early to got to school.”

その時点から、私たちはついにそれについて何をすべきかを議論することができます。（画像処理を行う場合、お母さんのパンプについて誤解されることもあります。#KuToo）
From that point onward, we can finally discuss what to do about it. (If you do image processing, you may also be misled about mom’s pumps. #KuToo)

このプロセスは浅薄なもので、権限の分離がないように見える場合があります。僕が賭けているのは、なんらかの大賞でもありません。ほとんどの場合、「単著者の論文ほど疑わしいものはない」という議論のように、私は自分自身を信用していません。そして、今日のテクノロジーの影響を受けている人々の数から判断すると、おそらくプログラマーはWorld Wide Webに渡って何らかのsuperpowerを持っているため、責任があります。近い将来、CI/CDには、「感情分析システムにおける性別と人種の偏見の調査」や「報道の事実性の予測とニュースメディアソースのバイアス」などのルーチンを含める必要があります。
The process may seem superficial and lacking separation of powers. What I am betting on is not about some grand prize, either. Most of time I just don’t trust myself, like the argument of “there is nothing more suspicious than a single-author paper.” And judging by how many people may be affected by technologies today, perhaps programmers do have some superpower across the World Wide Web, hence certain responsibility. Maybe in the near future, continuous integration/deployment will have to include routines like “Examining Gender and Race bias in Sentiment Analysis Systems” or “Predicting Factuality of Reporting and Bias of News Media Sources.”

ところで、まもなく日本のAIコミュニティが開催する機械学習と公平性に関する声明とシンポジウムがあります。
By the way, soon there will be a symposium of ML faireness held by Japanese AI communities.

Addendum

選挙制度は民主主義の重要な部分ですが、僕は今まで意図的にそれについて話すことを避けています。一般的の投票システムに欠陥があることが多いだけでなく、理論的にはランク集約の問題はNP-hardです。僕は自分の好みがローカルで個人的であることを好みます。 :)
Although electoral systems are a significant part of democracy, I deliberately avoid talking about it until now. Because not only popular voting systems are often flawed, but theoretically rank aggregation problem is NP-hard. I prefer my preferences staying local and personal. :)

Footnotes

Sooner or later we will be able to use Swift for TensorFlow
Alternatives to GPU ain’t easy:
Intel: https://01.org/dnnl;
AMD: https://rocm.github.io/dl.html
Google: https://github.com/pytorch/xla
Watch Mr. Robot, the most accurate TV show about hacktivism.