人工智慧 (Artificial Intelligence)
人工智慧 (AI) 是能讓事物變更聰明的科技，我們可以這樣定義：「讓機器展現人類的智慧。」它是一個能讓電腦執行人類工作的廣義術語，而人工智慧的範圍眾說紛紜，隨著時間推衍產生更多的應用和變化。
現今所執行的系統是一種弱人工智慧的形式 – 系統可以做一件或是多件事情，而做的程度與人類相當，甚至超越人類。比如說我們透過寫程式碼來創建學習系統，訓練它辨識物體或是手勢。舉例來說：自然語言處理、電子遊戲行為的人工智慧、機器學習都是弱人工智慧的形式。
- 語音識別 / 聲波探測
- 自然語言處理 / 語意分析
風格轉換 – 學習用藝術家的風格繪畫
- 修復 / 轉換
機器學習 (Machine Learning)
機器學習的厲害之處在於它可以自主學習。現在的機器學習應用都做得不錯，比如識別物件，同樣的 ML 系統仍然可以使用在未來的物件，並不需要重寫程式碼，這是相當方便且強大的。
深度學習 (Deep Learning)
這樣的DL技術被稱為深度神經網絡（deep neural networks – DNNs）。在DNNs的情況下，深度學習本質上就是DL所在的代碼結構，它們被安排在鬆散地模仿人類大腦的圖層中，學習模式中的模式(learning patterns of patterns)。
人工智慧這個概念可以追溯到1950年代，是相當長的一段時間。到了1980年，機器學習開始越來越受歡迎。大約到了2010年，DL在弱人工智慧系統方面有了重大的進展。你可以發現這三個詞彼此之間的聯繫 – 基本上是彼此的子集。深度學習驅動機器學習，最後實現了人工智慧。
全新 Digital Marketing 體驗，請聯絡 Web 仔。
Insight Machine Learning
How to Grid Search Data Preparation Techniques
by Jason Brownlee on 14/07/2020 at 19:00
Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data The post How to Grid Search Data Preparation Techniques appeared first on Machine Learning Mastery.
Exploring Faster Screening with Fewer Tests via Bayesian Group Testing
by Google AI on 14/07/2020 at 16:58
Posted by Marco Cuturi and Jean-Philippe Vert, Research Scientists, Google Research, Brain Team How does one find a needle in a haystack? At the turn of World War II, that question took on a very concrete form when doctors wondered how to efficiently detect diseases among those who had been drafted into the war effort. Inspired by this challenge, Robert Dorfman, a young statistician at that time (later to become Harvard professor of economics), proposed in a seminal paper a 2-stage approach to detect infected individuals, whereby individual blood samples first are pooled in groups of four before being tested for the presence or absence of a pathogen. If a group is negative, then it is safe to assume that everyone in the group is free of the pathogen. In that case, the reduction in the number of required tests is substantial: an entire group of four people has been cleared with a single test. On the other hand, if a group tests positive, which is expected to happen rarely if the pathogen’s prevalence is small, at least one or more people within that group must be positive; therefore, a few more tests to determine the infected individuals are needed. Left: Sixteen individual tests are required to screen 16 people — only one person’s test is positive, while 15 return negative. Right: Following Dorfman’s procedure, samples are pooled into four groups of four individuals, and tests are executed on the pooled samples. Because only the second group tests positive, 12 individuals are cleared and only those four belonging to the positive group need to be retested. This approach requires only eight tests, instead of the 16 needed for an exhaustive testing campaign.Dorfman’s proposal triggered many follow-up works with connections to several areas in computer science, such as information theory, combinatorics or compressive sensing, and several variants of his approach have been proposed, notably those leveraging binary splitting or side knowledge on individual infection probability rates. The field has grown to the extent that several sub-problems are recognized and deserving of an entire literature on their own. Some algorithms are tailored for the noiseless case in which tests are perfectly reliable, whereas some consider instead the more realistic case where tests are noisy and may produce false negatives or positives. Finally, some strategies are adaptive, proposing groups based on test results already observed (including Dorfman’s, since it proposes to re-test individuals that appeared in positive groups), whereas others stick to a non-adaptive setting in which groups are known beforehand or drawn at random.In “Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design”, we present an approach to group testing that can operate in a noisy setting (i.e., where tests can be mistaken) to decide adaptively by looking at past results which groups to test next, with the goal to converge on a reliable detection as quickly, and with as few tests, as possible. Large scale simulations suggest that this approach may result in significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. As such, this approach is particularly well suited for situations that require large numbers of tests to be conducted with limited resources, as may be the case for pandemics, such as that corresponding to the spread of COVID-19. We have open-sourced the code to the community through our GitHub repo.Noisy and Adaptive Group Testing in a Non-Asymptotic RegimeA group testing strategy is an algorithm that is tasked with guessing who, among a list of n people, carries a particular pathogen. To do so, the strategy provides instructions for pooling individuals into groups. Assuming a laboratory can execute k tests at a time, the strategy will form a k ⨉ n pooling matrix that defines these groups. Once the tests are carried out, the results are used to decide whether sufficient information has been gathered to determine who is or is not infected, and if not, how to form new groups for another round of testing.We designed a group testing approach for the realistic setting where the testing strategy can be adaptive and where tests are noisy — the probability that the test of an infected sample is positive (sensitivity) is less than 100%, as is the specificity, the probability that a non-infected sample returns negative.Screening More People with Fewer Tests Using Bayesian Optimal Experimental DesignThe strategy we propose proceeds the way a detective would investigate a case. They first form several hypotheses about who may or may not be infected, using evidence from all tests (if any) that have been carried out so far and prior information on the infection rate (a). Using these hypotheses, our detectives produce an actionable item to continue the investigation, namely a next wave of groups that may help in validating or invalidating as many hypotheses as possible (b), and then loop back to (a) until the set of plausible hypotheses is small enough to unambiguously identify the target of the search. More precisely,Given a population of n people, an infection state is a binary vector of length n that describes who is infected (marked with a 1), and who is not (marked with a 0). At a certain time, a population is in a given state (most likely a few 1’s and mostly 0’s). The goal of group testing is to identify that state using as few tests as possible. Given a prior belief on the infection rate (the disease is rare) and test results observed so far (if any), we expect that only a small share of those infection states will be plausible. Rather than evaluating the plausibility of all 2n possible states (an extremely large number even for small n), we resort to a more efficient method to sample plausible hypotheses using a sequential Monte Carlo (SMC) sampler. Although quite costly by common standards (a few minutes using a GPU in our experimental setup), we show in this work that SMC samplers remain tractable even for large n, opening new possibilities for group testing. In short, in return for a few minutes of computations, our detectives get an extensive list of thousands of relevant hypotheses that may explain tests observed so far.Equipped with a relevant list of hypotheses, our strategy proceeds, as detectives would, by selectively gathering additional evidence. If k tests can be carried out at the next iteration, our strategy will propose to test k new groups, which are computed using the framework of Bayesian optimal experimental design. Intuitively, if k=1 and one can only propose a single new group to test, there would be clear advantage in building that group such that its test outcome is as uncertain as possible, i.e., with a probability that it returns positive as close to 50% as possible, given the current set of hypotheses. Indeed, to progress in an investigation, it is best to maximize the surprise factor (or information gain) provided by new test results, as opposed to using them to confirm further what we already hold to be very likely. To generalize that idea to a set of k>1 new groups, we score this surprise factor by computing the mutual information of these “virtual” group tests vs. the distribution of hypotheses. We also consider a more involved approach that computes the expected area under the ROC curve (AUC) one would obtain from testing these new groups using the distribution of hypotheses. The maximization of these two criteria is carried out using a greedy approach, resulting in two group selectors, GMIMAX and GAUCMAX (greedy maximization of mutual information or AUC, respectively).The interaction between a laboratory (wet_lab) carrying out testing, and our strategy, composed of a sampler and a group selector, is summarized in the following drawing, which uses names of classes implemented in our open source package. Our group testing framework describes an interaction between a testing environment, the wet_lab, whose pooled test results are used by the sampler to draw thousands of plausible hypotheses on the infection status of all individuals. These hypotheses are then used by an optimization procedure, group_selector, that figures out what groups may be the most relevant to test in order to narrow down on the true infection status. Once formed, these new groups are then tested again, closing the loop. At any point in the procedure, the hypotheses formed by the sampler can be averaged to obtain the average probability of infection for each patient. From these probabilities, a decision on whether a patient is infected or not can be done by thresholding these probabilities at a certain confidence level.Benchmarking We benchmarked our two strategies GMIMAX and GAUCMAX against various baselines in a wide variety of settings (infection rates, test noise levels), reporting performance as the number of tests increases. In addition to simple Dorfman strategies, the baselines we considered included a mix of non-adaptive strategies (origami assays, random designs) complemented at later stages with the so-called informative Dorfman approach. Our approaches significantly outperform the others in all settings. We executed 5000 simulations on a sample population of 70 individuals with an infection rate of 2%. We have assumed sensitivity/specificity values of 85% / 97% for tests with groups of maximal size 10, which are representative of current PCR machines. This figure demonstrates that our approach outperforms the other baselines with as few as 24 tests (up to 8 tests used in 3 cycles), including both adaptive and non-adaptive varieties, and performs significantly better than individual tests (plotted in the sensitivity/specificity plane as a hexagon, requiring 70 tests), highlighting the savings potential offered by group testing. See preprint for other setups.Conclusion Screening a population for a pathogen is a fundamental problem, one that we currently face during the current COVID-19 epidemic. Seventy years ago, Dorfman proposed a simple approach currently adopted by various institutions. Here, we have proposed a method to extend the basic group testing approach in several ways. Our first contribution is to adopt a probabilistic perspective, and form thousands of plausible hypotheses of infection distributions given test outcomes, rather than trust test results to be 100% reliable as Dorfman did. This perspective allows us to seamlessly incorporate additional prior knowledge on infection, such as when we suspect some individuals to be more likely than others to carry the pathogen, based for instance on contact tracing data or answers to a questionnaire. This provides our algorithms, which can be compared to detectives investigating a case, the advantage of knowing what are the most likely infection hypotheses that agree with prior beliefs and tests carried out so far. Our second contribution is to propose algorithms that can take advantage of these hypotheses to form new groups, and therefore direct the gathering of new evidence, to narrow down as quickly as possible to the “true” infection hypothesis, and close the case with as little testing effort as possible. Acknowledgements We would like to thank our collaborators on this work, Olivier Teboul, in particular, for his help preparing figures, as well as Arnaud Doucet and Quentin Berthet. We also thank Kevin Murphy and Olivier Bousquet (Google) for their suggestions at the earliest stages of this project, as well as Dan Popovici for his unwavering support pushing this forward; Ignacio Anegon, Jeremie Poschmann and Laurent Tesson (INSERM) for providing us background information on RT-PCR tests and Nicolas Chopin (CREST) for giving guidance on his work to define SMCs for binary spaces.
Google at ICML 2020
by Google AI on 13/07/2020 at 18:00
Posted by Jaqui Herman and Cat Armato, Program ManagersMachine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.As a leader in machine learning research, Google is proud to be a Platinum Sponsor of the thirty-seventh International Conference on Machine Learning (ICML 2020), a premier annual event taking place virtually this week. With over 100 accepted publications and Googlers participating in workshops, we look forward to our continued collaboration with the larger machine learning research community.If you’re registered for ICML 2020, we hope you’ll visit the Google virtual booth to learn more about the exciting work, creativity and fun that goes into solving some of the field’s most interesting challenges. You can also learn more about the Google research being presented at ICML 2020 in the list below (Google affiliations bolded).ICML ExpoGoogle Dataset Search: Building an Open Ecosystem for Dataset Discovery Natasha NoyEnd-to-end Bayesian inference workflows in TensorFlow Probability Colin Carroll PublicationsPopulation-Based Black-Box Optimization for Biological Sequence DesignChristof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D SculleyPredictive Coding for Locally-Linear ControlRui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung BuiFedBoost: A Communication-Efficient Algorithm for Federated LearningJenny Hamer, Mehryar Mohri, Ananda Theertha SureshFaster Graph Embeddings via CoarseningMatthew Fahrbach, Gramoz Goranci, Richard Peng, Sushant Sachdeva, Chi WangRevisiting Fundamentals of Experience ReplayWilliam Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will DabneyBoosting for Control of Dynamical SystemsNaman Agarwal, Nataly Brukhim, Elad Hazan, Zhou LuNeural Clustering ProcessesAri Pakman, Yueqi Wang, Catalin Mitelut, JinHyung Lee, Liam PaninskiThe Tree Ensemble Layer: Differentiability Meets Conditional ComputationHussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul MazumderRepresentations for Stable Off-Policy Reinforcement LearningDibya Ghosh, Marc BellemareREALM: Retrieval-Augmented Language Model Pre-TrainingKelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei ChangContext Aware Local Differential PrivacyJayadev Acharya, Keith Bonawitz, Peter Kairouz, Daniel Ramage, Ziteng SunScalable Deep Generative Modeling for Sparse GraphsHanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale SchuurmansDeep k-NN for Noisy LabelsDara Bahri, Heinrich Jiang, Maya Gupta†Revisiting Spatial Invariance with Low-Rank Local ConnectivityGamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon KornblithSCAFFOLD: Stochastic Controlled Averaging for Federated LearningSai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha SureshIncremental Sampling Without Replacement for Sequence ModelsKensen Shi, David Bieber, Charles SuttonSoftSort: A Continuous Relaxation for the argsort OperatorSebastian Prillo, Julian Martin EisenschlosXTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation (see blog post)Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin JohnsonLearning to Stop While Learning to PredictXinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le SongBandits with Adversarial ScalingThodoris Lykouris, Vahab Mirrokni, Renato Paes LemeSimGANs: Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG ClassificationTomer Golany, Daniel Freedman, Kira RadinskyStochastic Frank-Wolfe for Constrained Finite-Sum MinimizationGeoffrey Negiar, Gideon Dresdner, Alicia Yi-Ting Tsai, Laurent El Ghaoui, Francesco Locatello, Robert M. Freund, Fabian PedregosaImplicit differentiation of Lasso-type models for hyperparameter optimizationQuentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph SalmonInfinite attention: NNGP and NTK for deep attention networksJiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman NovakLogarithmic Regret for Learning Linear Quadratic Regulators EfficientlyAsaf Cassel, Alon Cohen, Tomer KorenAdversarial Learning Guarantees for Linear Hypotheses and Neural NetworksPranjal Awasthi, Natalie Frank, Mehryar MohriRandom Hypervolume Scalarizations for Provable Multi-Objective Black Box OptimizationDaniel Golovin, Qiuyi (Richard) ZhangGenerating Programmatic Referring Expressions via Program SynthesisJiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, Mayur NaikOptimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching ApproachMartin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig BoutilierAutoML-Zero: Evolving Machine Learning Algorithms From Scratch (see blog post)Esteban Real, Chen Liang, David R. So, Quoc V. LeHow Good is the Bayes Posterior in Deep Neural Networks Really?Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin†Which Tasks Should Be Learned Together in Multi-task Learning?Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio SavareseInfluence Diagram Bandits: Variational Thompson Sampling for Structured Bandit ProblemsTong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. MengshoelDisentangling Trainability and Generalization in Deep Neural NetworksLechao Xiao, Jeffrey Pennington, Samuel S. SchoenholzThe Many Shapley Values for Model ExplanationMukund Sundararajan, Amir NajmiNeural Contextual Bandits with UCB-based ExplorationDongruo Zhou, Lihong Li, Quanquan GuAutomatic Shortcut Removal for Self-Supervised Representation LearningMatthias Minderer, Olivier Bachem, Neil Houlsby, Michael TschannenFederated Learning with Only Positive LabelsFelix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv KumarHow Recurrent Networks Implement Contextual Processing in Sentiment AnalysisNiru Maheswaranathan, David SussilloSupervised Learning: No Loss No CryRichard Nock, Aditya Krishna MenonReady Policy One: World Building Through Active LearningPhilip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen RobertsWeakly-Supervised Disentanglement Without CompromisesFrancesco Locatello, Ben Poole, Gunnar Raetsch, Bernhard Schölkopf, Olivier Bachem, Michael TschannenFast Differentiable Sorting and RankingMathieu Blondel, Olivier Teboul, Quentin Berthet, Josip DjolongaDebiased Sinkhorn barycentersHicham Janati, Marco Cuturi, Alexandre GramfortInterpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device FailureJohn SippleAccelerating Large-Scale Inference with Anisotropic Vector QuantizationRuiqi Guo, Philip Sun, Erik Lindgren, Quan Geng†, David Simcha, Felix Chern, Sanjiv KumarAn Optimistic Perspective on Offline Reinforcement Learning (see blog post)Rishabh Agarwal, Dale Schuurmans, Mohammad NorouziThe Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of GeneralizationBen Adlam, Jeffrey PenningtonPrivate Query Release Assisted by Public DataRaef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Zhiwei Steven WuLearning and Evaluating Contextual Embedding of Source CodeAditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen ShiEvaluating Machine Accuracy on ImageNetVaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig SchmidtImputer: Sequence Modelling via Imputation and Dynamic ProgrammingWilliam Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep JaitlyDomain Aggregation Networks for Multi-Source Domain AdaptationJunfeng Wen, Russell Greiner, Dale SchuurmansPlanning to Explore via Self-Supervised World ModelsRamanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak PathakContext-Aware Dynamics Model for Generalization in Model-Based Reinforcement LearningKimin Lee, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo ShinRetro*: Learning Retrosynthetic Planning with Neural Guided A* SearchBinghong Chen, Chengtao Li, Hanjun Dai, Le SongOn the Consistency of Top-k Surrogate LossesForest Yang, Sanmi KoyejoDual Mirror Descent for Online Allocation ProblemsHaihao Lu, Santiago Balseiro, Vahab MirrokniEfficient and Scalable Bayesian Neural Nets with Rank-1 FactorsMichael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma†, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin TranBatch Stationary Distribution EstimationJunfeng Wen, Bo Dai, Lihong Li, Dale SchuurmansSmall-GAN: Speeding Up GAN Training Using Core-SetsSamarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus OdenaData Valuation Using Reinforcement LearningJinsung Yoon, Sercan Ö. Arik, Tomas PfisterA Game Theoretic Perspective on Model-Based Reinforcement LearningAravind Rajeswaran, Igor Mordatch, Vikash KumarEncoding Musical Style with Transformer AutoencodersKristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse EngelThe Shapley Taylor Interaction IndexKedar Dhamdhere, Mukund Sundararajan, Ashish AgarwalMultidimensional Shape ConstraintsMaya Gupta†, Erez Louidor, Olexander Mangylov†, Nobu Morioka, Taman Narayan, Sen ZhaoPrivate Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication OverheadBadih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus PaghLearning to Score Behaviors for Guided Policy OptimizationAldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. JordanFundamental Tradeoffs between Invariance and Sensitivity to Adversarial PerturbationsFlorian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik JacobsenOptimizing Black-Box Metrics with Adaptive SurrogatesQijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan, Mahdi Milani Fard, Maya Gupta†Circuit-Based Intrinsic Methods to Detect OverfittingSat Chatterjee, Alan MishchenkoAutomatic Reparameterisation of Probabilistic ProgramsMaria I. Gorinova, Dave Moore, Matthew D. HoffmanStochastic Flows and Geometric Optimization on the Orthogonal GroupKrzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas SindhwaniBlack-Box Variational Inference as a Parametric Approximation to Langevin DynamicsMatthew Hoffman, Yi-An Ma†Concise Explanations of Neural Networks Using Adversarial TrainingPrasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Somesh Jha, Xi Wup-Norm Flow Diffusion for Local Graph ClusteringShenghao Yang, Di Wang, Kimon FountoulakisEmpirical Study of the Benefits of Overparameterization in Learning Latent Variable ModelsRares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David SontagRobust Pricing in Dynamic Mechanism DesignYuan Deng, Sébastien Lahaie, Vahab MirrokniDifferentiable Product Quantization for Learning Compact Embedding LayersTing Chen, Lala Li, Yizhou SunAdaptive Region-Based Active LearningCorinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan ZhangCountering Language Drift with Seeded Iterated LearningYuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron CourvilleDoes Label Smoothing Mitigate Label Noise?Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv KumarAcceleration Through Spectral Density EstimationFabian Pedregosa, Damien ScieurMomentum Improves Normalized SGDAshok Cutkosky, Harsh MehtaConQUR: Mitigating Delusional Bias in Deep Q-LearningAndy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig BoutilierOnline Learning with Imperfect Hints Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish PurohitGo Wide, Then Narrow: Efficient Training of Deep Thin NetworksDenny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale SchuurmansOn Implicit Regularization in β-VAEsAbhishek Kumar, Ben PooleIs Local SGD Better than Minibatch SGD?Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan SrebA Simple Framework for Contrastive Learning of Visual RepresentationsTing Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey HintonUniversal Average-Case Optimality of Polyak MomentumDamien Scieur, Fabian PedregosaAn Imitation Learning Approach for Cache ReplacementEvan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan AhnCollapsed Amortized Variational Inference for Switching Nonlinear Dynamical SystemsZhe Dong, Bryan A. Seybold, Kevin P. Murphy, Hung H. BuiBeyond Synthetic Noise: Deep Learning on Controlled Noisy LabelsLu Jiang, Di Huang, Mason Liu, Weilong YangOptimizing Data Usage via Differentiable RewardsXinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham NeubigSparse Sinkhorn AttentionYi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng JuanOne Policy to Control Them All: Shared Modular Policies for Agent-Agnostic ControlWenlong Huang, Igor Mordatch, Deepak PathakOn Thompson Sampling with Langevin AlgorithmsEric Mazumdar, Aldo Pacchiano, Yi-An Ma†, Peter L. Bartlett, Michael I. JordanGood Subnetworks Provably Exist: Pruning via Greedy Forward SelectionMao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang LiuOn the Global Convergence Rates of Softmax Policy Gradient MethodsJincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale SchuurmansConcept Bottleneck ModelsPang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy LiangSupervised Quantile Normalization for Low-Rank Matrix ApproximationMarco Cuturi, Olivier Teboul, Jonathan Niles-Weed, Jean-Philippe VertMissing Data Imputation Using Optimal TransportBoris Muzellec, Julie Josse, Claire Boyer, Marco CuturiLearning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention Over ModulesSarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua BengioStochastic Optimization for Regularized Wasserstein EstimatorsMarin Ballu, Quentin Berthet, Francis BachLow-Rank Bottleneck in Multi-head Attention ModelsSrinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Jakkam Reddi, Sanjiv KumarRigging the Lottery: Making All Tickets WinnersUtku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich ElsenOnline Learning with Dependent Stochastic Feedback GraphsCorinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan ZhangCalibration, Entropy Rates, and Memory in Language ModelsMark Braverman, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang Composable Sketches for Functions of Frequencies: Beyond the Worst CaseEdith Cohen, Ofir Geri, Rasmus PaghEnergy-Based Processes for Exchangeable DataMengjiao Yang, Bo Dai, Hanjun Dai, Dale SchuurmansNear-Optimal Regret Bounds for Stochastic Shortest PathAlon Cohen, Haim Kaplan, Yishay Mansour, Aviv RosenbergPEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (see blog post)Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. LiuThe Complexity of Finding Stationary Points with Stochastic Gradient DescentYoel Drori, Ohad ShamirThe k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural NetworksJakub Swiatkowski, Kevin Roth, Bas Veeling, Linh Tran, Josh Dillon, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin†Regularized Optimal Transport is Ground Cost AdversarialFrançois-Pierre Paty, Marco CuturiWorkshopsNew In MLInvited Speaker: Nicolas Le RouxOrganizers: Zhen Xu, Sparkle Russell-Puleri, Zhengying Liu, Sinead A Williamson, Matthias W Seeger, Wei-Wei Tu, Samy Bengio, Isabelle GuyonLatinX in AIWorkshop Advisor: Pablo Samuel CastroWomen in Machine Learning Un-WorkshopInvited Speaker: Doina PrecupSponsor Expo Speaker: Jennifer WeiQueer in AIInvited Speaker: Shakir MohamedWorkshop on Continual LearningOrganizers: Haytham Fayek, Arslan Chaudhry, David Lopez-Paz, Eugene Belilovsky, Jonathan Schwarz, Marc Pickett, Rahaf Aljundi, Sayna Ebrahimi, Razvan Pascanu, Puneet Dokania5th ICML Workshop on Human Interpretability in Machine Learning (WHI)Organizers: Kush Varshney, Adrian Weller, Alice Xiang, Amit Dhurandhar, Been Kim, Dennis Wei, Umang BhattSelf-supervision in Audio and SpeechOrganizers: Mirco Ravanelli, Dmitriy Serdyuk, R Devon Hjelm, Bhuvana Ramabhadran, Titouan ParcolletWorkshop on eXtreme Classification: Theory and ApplicationsInvited Speakers: Sanjiv KumarHealthcare Systems, Population Health, and the Role of Health-techOrganizers: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas SindhwaniTheoretical Foundations of Reinforcement LearningProgram Committee: Alon Cohen, Chris DannUncertainty and Robustness in Deep Learning Workshop (UDL)Invited Speaker: Justin GilmerOrganizers: Sharon Li, Balaji Lakshminarayanan, Dan Hendrycks, Thomas Dietterich, Jasper SnoekProgram Committee: Jeremiah Liu, Jie Ren, Rodolphe Jenatton, Zack Nado, Alexander Alemi, Florian Wenzel, Mike Dusenberry, Raphael Lopes Beyond First Order Methods in Machine Learning SystemsIndustry Panel: Jonathan HseuObject-Oriented Learning: Perception, Representation, and ReasoningInvited Speakers: Thomas Kipf, Igor Mordatch Graph Representation Learning and Beyond (GRL+)Organizers: Michael Bronstein, Andreea Deac, William L. Hamilton, Jessica B. Hamrick, Milad Hashemi, Stefanie Jegelka, Jure Leskovec, Renjie Liao, Federico Monti, Yizhou Sun, Kevin Swersky, Petar Veličković, Rex Ying, Marinka ŽitnikSpeakers: Thomas KipfProgram Committee: Bryan Perozzi, Kevin Swersky, Milad Hashemi, Thomas Kipf, Ting ChengML Interpretability for Scientific DiscoveryOrganizers: Subhashini Venugopalan, Michael Brenner, Scott Linderman, Been KimProgram Committee: Akinori Mitani, Arunachalam Narayanaswamy, Avinash Varadarajan, Awa Dieng, Benjamin Sanchez-Lengeling, Bo Dai, Stephan Hoyer, Subham Sekhar Sahoo, Suhani VoraSteering Committee: John Platt, Mukund Sundararajan, Jon KleinbergNegative Dependence and Submodularity for Machine Learning Organizers: Zelda Mariet, Mike Gartrell, Michal Derezinski7th ICML Workshop on Automated Machine Learning (AutoML) Organizers: Charles Weill, Katharina Eggensperger, Matthias Feurer, Frank Hutter, Marius Lindauer, Joaquin VanschorenFederated Learning for User Privacy and Data ConfidentialityKeynote: Brendan McMahanProgram Committee: Peter Kairouz, Jakub Konecný MLRetrospectives: A Venue for Self-Reflection in ML ResearchSpeaker: Margaret MitchellMachine Learning for Media DiscoverySpeaker: Ed ChiINNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood ModelsOrganizers: Chin-Wei Huang, David Krueger, Rianne van den Berg, George Papamakarios, Chris Cremer, Ricky Chen, Danilo Rezende4th Lifelong Learning WorkshopProgram Committee: George Tucker, Marlos C. Machado2nd ICML Workshop on Human in the Loop Learning (HILL)Organizers: Shanghang Zhang, Xin Wang, Fisher Yu, Jiajun Wu, Trevor DarrellMachine Learning for Global HealthOrganizers: Danielle Belgrave, Danielle Belgrave, Stephanie Hyland, Charles Onu, Nicholas Furnham, Ernest Mwebaze, Neil LawrenceCommitteeSocial Chair: Adam White†Work performed while at Google
Framework for Data Preparation Techniques in Machine Learning
by Jason Brownlee on 12/07/2020 at 19:00
There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of The post Framework for Data Preparation Techniques in Machine Learning appeared first on Machine Learning Mastery.
Grounding Natural Language Instructions to Mobile UI Actions
by Google AI on 10/07/2020 at 17:01
Posted by Yang Li, Research Scientist, Google Research Mobile devices offer a myriad of functionalities that can assist in everyday activities. However, many of these functionalities are not easily discoverable or accessible to users, forcing users to look up how to perform a specific task — how to turn on the traffic mode in Maps or change notification settings in YouTube, for example. While searching the web for detailed instructions for these questions is an option, it is still up to the user to follow these instructions step-by-step and navigate UI details through a small touchscreen, which can be tedious and time consuming, and results in reduced accessibility. What if one could design a computational agent to turn these language instructions into actions and automatically execute them on the user’s behalf?In “Mapping Natural Language Instructions to Mobile UI Action Sequences”, published at ACL 2020, we present the first step towards addressing the problem of automatic action sequence mapping, creating three new datasets used to train deep learning models that ground natural language instructions to executable mobile UI actions. This work lays the technical foundation for task automation on mobile devices that would alleviate the need to maneuver through UI details, which may be especially valuable for users who are visually or situationally impaired. We have also open-sourced our model code and data pipelines through our GitHub repository, in order to spur further developments among the research community.Constructing Language Grounding ModelsPeople often provide one another with instructions in order to coordinate joint efforts and accomplish tasks involving complex sequences of actions, for example, following a recipe to bake a cake, or having a friend walk you through setting up a home network. Building computational agents able to help with similar interactions is an important goal that requires true language grounding in the environments in which the actions take place. The learning task addressed here is to predict a sequence of actions for a mobile platform given a set of instructions, a sequence of screens produced as the system transitions from one screen to another, as well as the set of interactive elements on those screens. Training such a model end-to-end would require paired language-action data, which is difficult to acquire at a large scale. Instead, we deconstruct the problem into two sequential steps: an action phrase-extraction step and a grounding step. The workflow of grounding language instructions to executable actions.The action phrase-extraction step identifies the operation, object and argument descriptions from multi-step instructions using a Transformer model with area attention for representing each description phrase. Area attention allows the model to attend to a group of adjacent words in the instruction (a span) as a whole for decoding a description.The action phrase extraction model takes a word sequence of a natural language instruction and outputs a sequence of spans (denoted in red boxes) that indicate the phrases describing the operation, the object and the argument of each action in the task.Next, the grounding step matches the extracted operation and object descriptions with a UI object on the screen. Again, we use a Transformer model, but in this case, it contextually represents UI objects and grounds object descriptions to them.The grounding model takes the extracted spans as input and grounds them to executable actions, including the object an action is applied to, given the UI screen at each step during execution.ResultsTo investigate the feasibility of this task and the effectiveness of our approach, we construct three new datasets to train and evaluate our model. The first dataset includes 187 multi-step English instructions for operating Pixel phones along their corresponding action-screen sequences and enables assessment of full task performance on naturally occurring instructions, which is used for testing end-to-end grounding quality. For action phrase extraction training and evaluation, we obtain English “how-to” instructions that can be found abundantly from the web and annotate phrases that describe each action. To train the grounding model, we synthetically generate 295K single-step commands to UI actions, covering 178K different UI objects across 25K mobile UI screens from a public android UI corpus.A Transformer with area attention obtains 85.56% accuracy for predicting span sequences that completely match the ground truth. The phrase extractor and grounding model together obtain 89.21% partial and 70.59% complete accuracy for matching ground-truth action sequences on the more challenging task of mapping language instructions to executable actions end-to-end. We also evaluated alternative methods and representations of UI objects, such as using a graph convolutional network (GCN) or a feedforward network, and found those that can represent an object contextually in the screen lead to better grounding accuracy. The new datasets, models and results provide an important first step on the challenging problem of grounding natural language instructions to mobile UI actions. ConclusionThis research, and language grounding in general, is an important step for translating multi-stage instructions into actions on a graphical user interface. Successful application of task automation to the UI domain has the potential to significantly improve accessibility, where language interfaces might help individuals who are visually impaired perform tasks with interfaces that are predicated on sight. This also matters for situational impairment when one cannot access a device easily while encumbered by tasks at hand.By deconstructing the problem into action phrase extraction and language grounding, progress on either can improve full task performance and it alleviates the need to have language-action paired datasets, which are difficult to collect at scale. For example, action span extraction is related to both semantic role labeling and extraction of multiple facts from text and could benefit from innovations in span identification and multitask learning. Reinforcement learning that has been applied in previous grounding work may help improve out-of-sample prediction for grounding in UIs and improve direct grounding from hidden state representations. Although our datasets were based on Android UIs, our approach can be applied generally to instruction grounding on other user interface platforms. Lastly, our work provides a technical foundation for investigating user experiences in language-based human computer interaction. Acknowledgements Many thanks to my co-authors Jiacong He, Xin Zhou, Yuan Zhang and Jason Baldridge on this work at Google Research. I would also like to thank Gang Li who provided generous help for creating open-source datasets, and Ashwin Kakarla, Muqthar Mohammad and Mohd Majeed for their help with the annotations.