ID | 117609 |
タイトル別表記 | コンピューティングの第一原理を使用して、低電力の機械学習で使用されるアルゴリズムとプロセッサの共生パフォーマンスを向上させる
|
著者 |
ンシンガ, ロバート
徳島大学大学院先端技術科学教育部(システム創生工学専攻)
|
キーワード | embedded system
IEEE754-2008
floating point
digital signal processor
Q format notation
|
資料タイプ |
学位論文
|
抄録 | Using less electric power or speeding up processing is catching the interests of researchers in deep learning. Models have grown in complexity and size using as much precision depth as can be computationally supported regardless of how expensive the minimum required cooling system might cost. Quantization has offered ease of deployment to small devices lacking floating precision capability, but little has been suggested about the floating numbers themselves. This thesis evaluates hardware acceleration for embedded devices that cannot support the energy requirements of floating numbers and proposes solutions to challenge the limits of power consumption and apply them to measure their effectiveness in terms of energy demand and speed capacity.
Experts have declared the end of Moore’s law with the current state of nanotechnology coming to terms with its inability to increase the performance per transistor density ratio. Accelerators, although providing a countering measure, have also increased their power needs to unsustainable levels. At the same time there has been sufficient increase in knowledge, such as distributed computing, to branch-off into possibilities that could reduce power demands while maintaining, or possibly increase microprocessor performance. This thesis highlights some important challenges that were born out of the rapid rise of deep learning. We present experimental results showing that low-powered devices can serve as powerful tools in low cost deep learning research. In doing so we are interested in slowing down the ongoing trend that favors expensive investment in deep learning computers. Using known properties in computer architecture, hardware acceleration, and digital arithmetic we implement ways to design algorithms that symbiotically match their performance in accordance with the theoretical limits afforded by the hardware components that run them. Computer processors are utilized based on their ability to execute instructions defined in code or machine-readable format. Some processors are multi-purpose, others are domain-specific, the former being good at a wide range of tasks and the latter only focused for specific tasks. While executing any task an ideal processor should engage all its transistors to ensure that no part is left underutilized. However, in practice it is not always the case, which is why domain-specific processors are optimized to carry only the instructions for which they would fully commit their components. It is considered good practice when algorithms are designed to encourage the maximum use of available capacity for any execution. Our proposed method improves the symbiotic complementarity in peak algorithm performance and theoretical hardware capacity. |
発行日 | 2022-09-20
|
備考 | 内容要旨・審査要旨・論文本文の公開
|
フルテキストファイル | |
言語 |
eng
|
著者版フラグ |
博士論文全文を含む
|
文科省報告番号 | 甲第3652号
|
学位記番号 | 甲先第436号
|
学位授与年月日 | 2022-09-20
|
学位名 |
博士(工学)
|
学位授与機関 |
徳島大学
|