USING COMPUTING FIRST PRINCIPLES TO IMPROVE THE SYMBIOTIC PERFORMANCE IN ALGORITHMS AND PROCESSORS USED IN LOW-POWERED MACHINE LEARNING

Nsinga, Robert

直近一年間の累計

アクセス数 : ? 件

ダウンロード数 : ? 件

この文献の参照には次のURLをご利用ください : https://repo.lib.tokushima-u.ac.jp/117609

ID	117609
タイトル別表記	コンピューティングの第一原理を使用して、低電力の機械学習で使用されるアルゴリズムとプロセッサの共生パフォーマンスを向上させる
著者	ンシンガ, ロバート徳島大学大学院先端技術科学教育部（システム創生工学専攻）
キーワード	embedded system IEEE754-2008 floating point digital signal processor Q format notation
資料タイプ	学位論文
抄録	Using less electric power or speeding up processing is catching the interests of researchers in deep learning. Models have grown in complexity and size using as much precision depth as can be computationally supported regardless of how expensive the minimum required cooling system might cost. Quantization has offered ease of deployment to small devices lacking floating precision capability, but little has been suggested about the floating numbers themselves. This thesis evaluates hardware acceleration for embedded devices that cannot support the energy requirements of floating numbers and proposes solutions to challenge the limits of power consumption and apply them to measure their effectiveness in terms of energy demand and speed capacity. Experts have declared the end of Moore’s law with the current state of nanotechnology coming to terms with its inability to increase the performance per transistor density ratio. Accelerators, although providing a countering measure, have also increased their power needs to unsustainable levels. At the same time there has been sufficient increase in knowledge, such as distributed computing, to branch-off into possibilities that could reduce power demands while maintaining, or possibly increase microprocessor performance. This thesis highlights some important challenges that were born out of the rapid rise of deep learning. We present experimental results showing that low-powered devices can serve as powerful tools in low cost deep learning research. In doing so we are interested in slowing down the ongoing trend that favors expensive investment in deep learning computers. Using known properties in computer architecture, hardware acceleration, and digital arithmetic we implement ways to design algorithms that symbiotically match their performance in accordance with the theoretical limits afforded by the hardware components that run them. Computer processors are utilized based on their ability to execute instructions defined in code or machine-readable format. Some processors are multi-purpose, others are domain-specific, the former being good at a wide range of tasks and the latter only focused for specific tasks. While executing any task an ideal processor should engage all its transistors to ensure that no part is left underutilized. However, in practice it is not always the case, which is why domain-specific processors are optimized to carry only the instructions for which they would fully commit their components. It is considered good practice when algorithms are designed to encourage the maximum use of available capacity for any execution. Our proposed method improves the symbiotic complementarity in peak algorithm performance and theoretical hardware capacity.
発行日	2022-09-20
備考	内容要旨・審査要旨・論文本文の公開
フルテキストファイル	k3652_abstract.pdf 51.8 KB k3652_review.pdf 33.3 KB k3652_fulltext.pdf 3.59 MB
言語	eng
著者版フラグ	博士論文全文を含む
文科省報告番号	甲第3652号
学位記番号	甲先第436号
学位授与年月日	2022-09-20
学位名	博士（工学）
学位授与機関	徳島大学