2023 研究主題清單 (2023 Research List)


主持人(PI)研究主題(Research Topic)研究介紹(Introduction)其他資訊(Other Information)
張原豪
Yuan-Hao Chang
基於記憶體與儲存體運算之設計、優化與模擬

Design, Optimization, Simulation with In/Near Memory and Storage Computing
隨著製程技術的發展,記憶體裝置及儲存裝置的性能都獲得高度的發展,這些裝置除了存取執行時的資料或是儲存永遠性資料之外,裝置內的硬體能力已經逐漸足夠強大到可以支援額外的運算。因此,本計畫將研究如何適度的將運算工作適度的分割並分派給記憶體及儲存裝置來執行,以降低對CPU造成的執行壓力,並且降低資料搬移的時間成本及能耗負擔。

本計畫預期成果研究並提出利用記憶體與儲存體進行運算的關鍵技術,並且研究如何利用記憶體與儲存體計算的新架構進行大資料應用的運算加速以及效能優化。

This project aims to study the memory-centric computing architectures for next-generation quantum simulation/verification, machine learning algorithms, and neural network accelerators. This project's main theme will focus on selecting an appropriate memory-centric approach for a target application and proposing optimizing strategies based on the characteristics of the application and underlying memory/storage computing hardware. We will tackle the critical issues of selecting and designing (1) appropriate memory-centric computing architectures, (2) emerging non-volatile memory technologies (e.g., ReRAM, PCM, MRAM, and flash memory) that support in-memory (or in-storage) computation, (3) optimizing strategies based on the characteristics of the target application and hardware design, and (4) efficient and effective simulation platform to estimate the efficiency of the proposed architecture.

Task 1: Exploiting the processing-in-memory architecture for applications
Task 2: Adopting the processing-near-memory architecture for big-data applications
Task 3: In-storage computing design and optimization
PI個人首頁(PI's Information) :
https://www.iis.sinica.edu.tw/~johnson/

實驗室網址(Research Information) :
https://www.iis.sinica.edu.tw/~johnson/Notebook.php
https://

Email :
johnson@iis.sinica.edu.tw
廖純中
Churn-Jung Liau
應用邏輯

Applied Logic
應用邏輯於情感計算上之研究。

The research is related to the application of symbolic logic to affective computing.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liaucj/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :
carol@iis.sinica.edu.tw
徐讚昇
Tsan-sheng Hsu
以應用為中心的資料密集計算基礎研究

Intensive data computing foundations with applications
資料密集計算的基礎研究

Motivated by applications, such as endgame databases construction, computer board game playing, medical informatic and simulations of spreading of infectious diseases,
we plan to investigate fundamental problems involving
massive data sets using methods such as algorithms, parallelization, implementation techniques and/or deep learning.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~tshsu

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :
carol@iis.sinica.edu.tw
穆信成
Shin-Cheng Mu
函數語言與命令語言程式之正確性推理

Reasoning about Functional and Imperative Programs
我的研究興趣是程式語言與函數程式設計(functional programming),近年來也包括 concurrent 程式的型別系統與命令式語言(imperative languages)的推理。它們的共同點是使用符號推理的方式確保程式的正確性。不論在哪個典範中,我們都希望把寫程式視作一個可用數學與邏輯方式推理的行為。程式的正確性可用型別系統或邏輯推演保證,甚至可用規格與需求開始,經由數學方法一步步推導出程式。

本領域可做的大方向包括
* 以函數語言為工具,開發 Hoare logic 與命令式語言程式推導使用的教學系統。我們開發了一個協助程式推導的整合環境 Guabao (https://scmlab.github.io/guabao/ ), 理論與實作方面都需要更多人投入。
* 設計幫助推理用的符號、程式語言、型別系統等。
* 挑選一些演算法問題,嘗試以數學方法實際證明演算法之正確性,或將演算法推導出來。
* 研究 concurrent 程式以及其型別系統 (session type) 與邏輯之關係。

如對以上題目有興趣,在三個月的實習期間,我們可用一到一個半月的時間學習相關理論(函數編程、型別、邏輯等),用剩下的時間研究新東西或開發系統。

My research interest concerns programming language and functional programming, and extends to Hoare logic and type systems for concurrent programs. The common theme is that programming is seen as a formal, mathematical activity. Correctness of a program can be guaranteed by logical reasoning or type system. Or, a program can even be derived stepwise from its specification.

Possible topics include:

* develop tools for reasoning about imperative programs, using a functional programming language. We have developed an integrated environment, Guabao (https://scmlab.github.io/guabao/ ), about which there is still plenty of theory and implementation to be done.

* design symbols, languages, or type systems that aids the programmers in reasoning about programs;

* pick an algorithm, and apply our approaches to prove its correctness or even to derive an algorithm;

* study the type system (session type) for concurrent programs and its relationship with logic;


More details can be discussed. If you are interested, we can spend the first 1 to 1.5 months of the internship studying the background knowledge, before diving into developing something new.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/scm/

實驗室網址(Research Information) :
https://scm.iis.sinica.edu.tw/ncs/
https://

Email :

劉庭祿
Tyng-Luh Liu
新興電腦視覺與深度學習技術

Emerging Computer Vision and Deep Learning Techniques
My current research focuses on the following topics:
1. Computer vision techniques for 3D point clouds
2. Human-object interactions and related computer vision techniques
3. Diffusion models and their applications
4. NERF-related computer vision techniques
5. Multi-modal deep learning techniques
6. X-shot, self-supervised, meta learning, etc.
7. Federated learning and its applications

My current research focuses on the following topics:
1. Computer vision techniques for 3D point clouds
2. Human-object interactions and related computer vision techniques
3. Diffusion models and their applications
4. NERF-related computer vision techniques
5. Multi-modal deep learning techniques
6. X-shot, self-supervised, meta learning, etc.
7. Federated learning and its applications
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liutyng/

實驗室網址(Research Information) :
https://homepage.iis.sinica.edu.tw/~liutyng/
https://

Email :
liutyng@iis.sinica.edu.tw
王新民
Hsin-Min Wang
語音辨識、語者暨語言辨識、語音合成與轉換、語者分離與分段、語音機器翻譯與問答系統

speech recognition, speaker/language recognition, speech synthesis and conversion, speaker separation and segmentation, speech translation, spoken question answering system
我們實驗室致力於符合我國語言使用情境(國語、臺語、客語、英語)的語音處理研究,學術研究與系統開發兼重。

1)在語音辨識方面,我們的辨識器要聽得懂年輕人的國語、老人家的臺語,對於喜歡繞英文的人來說,也要難不倒它;加上語者過濾的技術,更要能專一辨識出特定語者的聲音,達到客製化、不受環境干擾的效果。

2)在語音合成方面,我們的合成系統不僅要會說國語、臺語,也要能夠使用語音轉換的技術,客製出使用者指定的人聲,不僅有娛樂性,更是語音保存、有聲書製作不可或缺的技術。

3)目前的「純」機器翻譯及「純」文字問答並不稀奇,真正能便利地應用於智慧家庭、音箱的使用環境還得藉助語音。因此,不僅我們的翻譯系統要能處理國人所需的國臺、國英互譯,我們的問答系統也要達到快速聆聽、高效反應的效果。

Our laboratory is dedicated to research on speech processing in line with the context of language use in our country (Mandarin, Taiwanese, Hakka, and English), with emphasis on both academic research and system development.

1) In terms of speech recognition, our recognizer must be able to recognize Mandarin speech of young people and Taiwanese speech of the elderly. For those who like to mix English in their speech, it should not be troublesome. Coupled with the technology of adaptation, it is necessary to be able to identify the voice of a specific speaker specifically to achieve the effect of customization and robustness to environmental interference.

2) In terms of speech synthesis, our synthesis system must not only speak Mandarin and Taiwanese, but also be able to use voice conversion technology to customize the user-specified human voice, which is not only entertaining, but also an indispensable technology for spoken language preservation and audiobook production.

3) The current text machine translation and text Q&A are not uncommon. It is really convenient to use in smart homes and speaker environments with the help of voice. Therefore, not only our machine translation system must be able to handle the translation between Mandarin and Taiwanese and between Mandarin and English that local people need, but also our Q&A system must achieve the effect of fast listening and efficient response.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/whm/

實驗室網址(Research Information) :
http://slam.iis.sinica.edu.tw/
https://

Email :
whm@iis.sinica.edu.tw
蘇黎
Li Su
以人為核心的音樂人工智慧

Human-centered music AI
音樂與文化科技實驗室致力於探討最前沿的數位訊號處理與深度學習技術,應用於各種音樂人工智慧的熱門議題。我們特別關注以人為核心的音樂人工智慧研究,包括但不限於:

1. 音樂內容辨識與理解、自動採譜
2. 音樂與多媒體內容生成與互動
3. 計算音樂學
4. 深度學習、認知科學與生物音樂學
5. 音樂人工智慧技術之應用與評估方法

我們歡迎資訊/電機等理工相關科系或音樂相關科系背景,或有志於跨領域研究之同學應徵。 熟悉深度學習、音訊處理、影像處理、電腦圖學、人機互動、UI/UX、認知科學、音樂學等任一領域者優先考慮。

The mission of the Music and Culture Technology Lab is to solve cutting-edge research topics on music AI using novel deep learning and signal processing research technologies. We specifically concentrate on of human-centered music AI, including (but not limited to) the following research directions:

1. Music content understanding and automatic music transcription
2. Music and multimedia content generation/interaction
3. Computational musicology
4. Deep learning, cognitive science and biomusicology
5. Application and evaluation method of music AI technology

We welcome students with background in EE/CS, musicology and related fields and who are interested in inter-disciplinary research to join our intern program. The students who are familiar to deep learning, signal processing, image processing, computer graphics, HCI, UI/UX, cognitive science, or musicology will be considered with first priority.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lisu/

實驗室網址(Research Information) :
https://mctlab.iis.sinica.edu.tw/mctl/
https://github.com/Music-and-Culture-Technology-Lab

Email :
lisu@iis.sinica.edu.tw
吳真貞
Jan-Jan Wu
深度學習計算在異質多處理器環境之高效能排程技術

Deep Neural Network Scheduling for Heterogeneous System Architectures
將多個網絡組合成混合模型或多模型是提高 DNN 性能的可行方法。這些模型可以通過利用不同網絡的優勢來解決更複雜的任務。 例如,多車型的應用包括自動駕駛汽車和語音助手。另一方面,異質系統架構在現代計算機中被廣泛採用。它混合了各種類型的計算設備,可更有效地利用資源並提高多種工作負載的效能。例如,谷歌雲服務器可能包含許多 CPU、GPU 和 TPU. 如果可以有效地利用系統資源,異質系統架構將可提高 DNN 的計算效能。然而,TensorFlow、PyTorch 和 TVM 等現代深度學習平台主要是為同質系統設計的。他們只在一種類型的設備上運行 DNN。此外,這些平台也不支援混合模型和多模型。
為了解決這些問題,本計畫將發展可在異質多處理器環境中支援高效能且自動化的混合型/多模型的深度學習計算系統。神經網絡可以表示為計算圖。 問題變成如何將圖形映射到異質計算設備。 本計劃將分兩個階段解決此類映射問題:(1) 資源分配階段將圖節點分配給設備,(2) 排程階段確定圖節點的執行順序。我們針對此二階段映射問題提出數種高效率的演算法及系統實作。本計畫特色在於充分發揮各階層的平行度,包括 data parallelism, pipeline parallelism(例如,跨設備切割模型,工作負載以管道方式流經拆分的子模型),以及tensor parallelism(例如,AI 加速器使用 VLIW 來同時計算許多向量或矩陣).


Because of the demand for higher prediction accuracy, today’s neural networks are becoming deeper, wider, and more complex, typically with many layers and a large number of parameters. Moreover, combining multiple networks into a hybrid- or multi-model is a viable way to improve the performance of DNNs. These models can resolve more complex missions by leveraging the strengths of different networks. On the other hand, heterogeneous system architectures (HSAs) are getting widely adopted in modern computers. It mixes various types of computing devices and communication technologies, allows for more efficient use of resources and improved performance for many types of workloads. Such HSAs provide ample opportunity to improve the performance of DNNs if the system resources can be efficiently and effectively utilized.
However, modern deep learning platforms such as TensorFlow, PyTorch, TVM, etc. are mainly designed for homogeneous systems. They run DNNs only on one type of devices, leaving other devices of the heterogeneous systems unused. Furthermore, hybrid- and multi-models are overlooked in these platforms. Hence, developers need to manually tune the performance on the target hardware, which usually needs expert knowledge and experience.
To address these issues, we will design a runtime system to handle the execution of hybrid-/multi-models on HSAs efficiently and automatically. A neural network can be represented as a computational graph. The problem becomes how to map the graph(s) to the heterogeneous devices. We plan to tackle such mapping problem in two phases: (1) the resource allocation phase assigns graph nodes to devices, and (2) the scheduling phase determines the execution order of the graph nodes. Three core issues will be addressed in resource allocation: (1) We need to assign operations to appropriate computing devices to minimize the computation cost. (2) We need to assign the operations so that no operations use the same computing device at the same time. (3) We must choose the appropriate communication medium when two related operations are mapped to different computing devices, so as to reduce the communication overhead.
The challenge in designing an efficient scheduling is how to exploit the parallelism among the computing devices while retaining data dependency. We consider three types of parallelism: data parallelism (DP), pipeline parallelism (PP), and tensor parallelism (TP). DP is a widely adopted technique of dividing a large workload into smaller subsets and executing multiple copies of the neural network on these subsets simultaneously on the devices. PP divides the model across the devices and workload flows through the split sub-models in a pipeline manner. It can be useful for training very large or complex models and speed up streaming applications. TP divides the computation of a single layer across the devices, which process different parts of the tensors in parallel. For example, the AI accelerators (e.g., Google’s EdgeTPU) employ VLIW to simultaneously compute many vectors or matrices. The above three parallelisms impose different constraints and resource requirements of the devices. Therefore, a sophisticated method is required to determine the best parallelism configuration to run the DNNs.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wuj/

實驗室網址(Research Information) :
https://www.iis.sinica.edu.tw/pages/wuj/index_zh.html
https://www.iis.sinica.edu.tw/zh/page/ResearchGroup/ComputerSystem.html

Email :
wuj@iis.sinica.edu.tw
洪鼎詠
Ding-Yong Hong
深度學習軟體與硬體協同優化研究

Deep Learning Software/Hardware Co-optimization
我們將研究深度學習軟體與硬體協同優化方法。(1) 研究如何利用編譯器技術, 優化深度學習程式, 使其在CPU/GPU/AI加速器上達到最佳的運算效能。(2) 針對壓縮模型(pruning/quantization), 設計深度學習模型architecture/compiler/parallelization優化方案。

We aim to study hardware/software co-optimization for deep learning models. (1) Exploiting compiler techniques to accelerate deep learning applications on CPUs/GPUs/AI accelerators. (2) Enhancing compressed models (pruning/quantization) with compiler and parallelization techniques.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/dyhong/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/dyhong/
https://

Email :
dyhong@iis.sinica.edu.tw
蔡懷寬
Huai-Kuang Tsai
生物資訊

Bioinformatics
近年來,生物資訊在醫學領域受到相當大的關注,不僅是定序技術的進步,或是在數據分析的能力也持續在更新。本實驗室除了與國內許多生物試驗單位合作外,更從世界知名大型研究機構取得資料,以大數據分析解開生物中複雜的調控機制與原理。

我們的研究方向著重於真核生物的基因體,並整合多種體學資料 (multi-omics) ,以不同的生物觀點,更具系统性的探討基因體層級的交互關係,與其在演化上的重要性。而我們最新的研究主題著重在人類重大疾病與大數據資料庫整併,藉由不同的序列資料尋找目前生物醫學上未解出的困境。我們的研究方法透過資料探勘和機器學習來建立分析模型,用來預測生物的基因調控,同時達到個人化精準醫療的開發及癌症預測應用。

本實驗室想要找對生物資料運用感興趣的大專生,你可以來自資工或生物背景,但應該熟悉至少一種程式語言及對生物學感興趣。我們會提供生物資訊學相關領域的知識訓練,因此只要您對於跨領域研究感興趣,也想解決目前生物領域面臨的瓶頸,歡迎您加入我們的研究團隊。


The Tsai lab studies big data from biological systems using bioinformatic techniques and statistical methods. We work with biologists to seek insights into the genomics of eukaryotic organisms. By integrating multi-omics data, we study genome-wide regulatory systems on gene expressions and their significance in evolution. In addition, we are currently expanding into the area of biomedical informatics, aiming at integrating disease information with sequencing data for development of applications in precision medicine. We use methods such as data mining and machine learning in our studies on regulatory mechanisms in genomics, with the aim of building predictive models with potentials for applications.

We are seeking interns with a background in either computer science or biological science. The applicant should have experience in using at least one programming language and have a strong interest in biology. We will provide training in bioinformatics-related domain knowledge, and we expect our interns to be able to learn from team members from different backgrounds. If you are passionate in taking up the challenge of solving biological problems with techniques in informatics, we welcome you to join our team!
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hktsai/

實驗室網址(Research Information) :
https://bits.iis.sinica.edu.tw/?id=1
https://

Email :
hktsai@iis.sinica.edu.tw
林仲彥
Chung-Yen Lin
以人工智慧來解析生物醫學大數據

Harnessing Biomedical Big data and AI for a better quality of life
我們的團隊主要研究模式與非模式生物之多維基因體學(OMICS),包括基因體、轉錄體、單細胞轉錄體、蛋白質交互網路、腸道微生物與疾病關連等巨量資訊數據分析,同時也定序、重組與註解了多個重要經濟生物之基因體,目前致力於個人化基因體的重新組裝,探討不同程度的序列變異與疾病之間的關係,並利用人工智慧模型,以台灣人體資料庫為基礎,結合先天遺傳差異、身體檢測數值與後天環境等,來以全新的視角,來建立預測模型,希望能早期預防及解析老化與疾病等相關問題。我們的團隊,成員來自資料科學、生物醫學與資訊技術等各類專業領域,是一個跨領域的研究團隊,歡迎不同背景(資訊、統計、數學及生物相關)的人才一起合作。研究範圍以單細胞基因解析、水生經濟動物基因體育種、精準健康老化、病原智慧分型、新型抗菌/抗病毒藥物的開發篩選與合成驗證、及利用人類腸道與環境微生物來進行人工智慧疾病與治療成效預測等課題為主,同時發展新的高速計算工具及雲端分析平台,以及引入深度學習等策略,來探討基因、病原與環境的三角互動關係。

Our team's main goal is to analyze big omic data, which may lead us to know more about the secrets of biological regulations hidden among massive data deluge. By combining open-source tools and self-developed programs/ platforms, we have assembled, annotated, and decoded several aquatic genomes with high economic importance. Meanwhile, AI approaches are also introduced into our metagenomic studies and the design of functional therapeutic peptides. One of our targets is to develop new approaches to fill the gaps in the assembled human genome to pave the way for personalized and precision medicine. New approaches like deep learning will be introduced to rediscover our studies. Several platforms/ applications we developed in AI and biological knowledge focus on smart typing of upper respiratory pathogens and novel antibiotics identification and design.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cylin/

實驗室網址(Research Information) :
http://eln.iis.sincia.edu.tw
https://hub.docker.com/u/lsbnb

Email :
cylin@iis.sinica.edu.tw
林仁俊
Jen-Chun Lin
關鍵幀之間人物3D姿勢與形體的補間

3D Human Pose and Shape Generation In-between Keyframes
在遊戲與動畫產業中,透過電腦自動生成令人信服的人物3D動畫是至關重要的。然而,現行主流的3D動畫製作方式仍是以動畫師手動對關鍵幀(keyframe)進行補間繪製而成,儘管目前在3D動畫製作上已有許多工具被開發出來,但動畫師在進行補間繪製時仍然需要遵循人體運動學的規範,因此,整個動畫製作的過程仍然非常的耗時且具挑戰性。雖然最近人體動作捕捉 (Human Motion Capture; MOCAP) 系統被發展出來加速電腦生成3D動畫的速度,即,透過讓演員穿戴MOCAP裝置來捕捉演員的3D動作以克服手繪補間需遵循人體運動學規範的限制,來加速生成3D動畫的時間,但它仍然需要專門的設備和動畫師手動的後處理,使得3D動畫製作成本大幅的提升。因此,在本研究議題上,我們旨在開發數據驅動(data driven)的技術,即,最大限度地利用現有的3D/2D資料來建構類神經網路以在關鍵幀之間自動生成符合人體運動學的3D姿勢與形體補間,以降低遊戲、動畫與電腦特效產業的成本。

實習生預計從3D姿勢與形體的補間/估測或其他相關主題中選定題目進行獨立研究。實習結束後,若同學表現優良則可繼續與實驗室合作研究並發表論文。

A convincing performance for computer-generated (CG) characters is essential in the game and animation industries. The process of keyframing, a primary method for creating animation, remains time-consuming and challenging, despite advancements in tools and techniques. Animators still need to adhere to detailed pose specifications to achieve naturalness in drawing body motion in-between keyframes. While advanced motion capture (MOCAP) systems can be more effective than keyframing techniques for creating CG characters with numerous degrees of freedom, they still require specialized equipment and manual post-processing, making animation costly. Therefore, designing a data-driven approach that utilizes existing 3D/2D data to automatically generate 3D human motion that is kinematically compliant in-between keyframes is desirable for creating CG character animations.

Interns are expected to conduct independent research on selected topics from 3D human pose and shape generation in-betweening, 3D human pose and shape estimation, or other related topics. After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
PI個人首頁(PI's Information) :
https://sites.google.com/site/jenchunlin/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/jenchunlin/
https://

Email :
jenchunlin@iis.sinica.edu.tw
鐘楷閔
Kai-Min Chung
密碼學、複雜度理論或量子密碼學之獨立研究

Independent Research on Cryptography, Complexity Theory, or Quantum Cryptography
The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/kmchung/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~kmchung/
https://

Email :
kmchung@iis.sinica.edu.tw
黃瀚萱
Hen-Hsen Huang
知識庫中的反事實因果關係之建立與推理

Counterfactual Causal Analysis in Knowledge Bases
知識圖譜由呈現事實的三元組構成,表達實體之間的關係。在典型的知識圖譜中,所有的事實都預設是可靠、真實的,但在現實上,知識圖譜很可能包含不確定的資訊,其中錯誤的事實甚至可能和其他事實互相衝突。在這個計畫中,我們預期將反事實知識引入知識圖譜,對不確定資訊進行反事實因果分析,藉以偵測與更正知識圖譜上不可靠的內容。除了可以確保知識圖譜的一致性,還可以進一步與深度學習模型的工作憶體整合,讓線上模型自動更正不實資訊。

Typical knowledge bases are composed of factual triples, representing the relations among entities with an assumption in mind that all the facts are true and reliable. In the real world, however, a knowledge base possibly contains uncertain information, and the untrue facts may be inconsistent with others mutually. In this project, our goal is to introduce a different kind of knowledge, the counterfactual knowledge, into knowledge bases to advance the causal analysis over the uncertain information. The results are expected to be not only useful for guarding the knowledge base integrity but also having the potential for misinformation correction in the machine's working memory.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hhhuang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/hhhuang/
https://

Email :
hhhuang@iis.sinica.edu.tw
古倫維
Lun-Wei Ku
(1) 多模態問題生成 (看圖對話) (2) 假新聞免疫 (3) 運動科技-智慧教練

(1) Multimodal Question Generation (2) Fake News Immune (3) SportTech - AI Coach
在這些研究主題中,將學習到自然語言處理之資訊擷取、文章分類、文字生成、知識庫使用、圖像文字結合等概念,另涵蓋自然語言基礎工具的使用及機器學習、深度學習的模型建立等先進技術,可與老師討論希望選擇的研究主題。實習期間會專注於上述研究主題並參與模型開發及論文撰寫。各主題研究內容詳述如下:

(1) 在多模態問題生成專案中,我們注重在圖像(照片、影片)問題生成,以人類見到圖像時腦中建構的世界為概念,接著產生自然可開啟對話的問題。相關技術可應用於社群網站提高人氣,或是關懷病人與老人。

(2) 在假新聞免疫研究中,我們著重於研究甚麼樣的新聞內容與呈現形式,讀者會傾向於相信或不相信,我們將進行內容理解,網路模擬及使用者端的研究。

(3)運動科技-智慧教練中,我們希望開發對特定運動姿態的小樣本或無樣本學習模型,並經由圖像文字結合技術,自動生成智慧教練指導語。此研究目標為真正可用的系統。

實驗室尚有其他研究主題正在進行,可到
http://www.lunweiku.com/ 參考相關論文。
實習結束後,表現優良的同學可繼續與實驗室合作研究並發表論文。

Interns will learn how to use basic natural language processing tools, extract information from texts, classify documents, and generate dialogs. Machine learning and deep learning technologies for NLP will be touched. Interns can select the topic/team they wish to join.

(1) In multimodal question generation project, we are focusing on the concept that human will construct the perception of a seen image sequence. Then we aim at generating a natural question to provoke the following pleasant conversations. This research is a topic extended from our series research on VIST (visual storytelling). The developed model can be used in social media platform or conversations with people who need mental care.

(2) In fake news immune project, we focus on studying why and how readers trust fake news. We will explore approaches which mitigate the impact of fake news. Moreover, the goal is to provide a mechanism which can dynamically adjust the reading environment to increase the immunity to fake news.

(3) In the sport technology project, we want to try applying few-shot or zero-shot learning on gestures/elements of the specific sport. Then we aim at generating automatic coaching instructions based on the videos of gestures/elements.

(4) Interns can also choose to develop demo applications for the existing technologies in our lab.

The research topics include but not limited to the above.
After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lwku/

實驗室網址(Research Information) :
http://academiasinicanlplab.github.io/
https://

Email :
lwku@iis.sinica.edu.tw
倪儒本
Ruben Niederhagen
看英文說明

Cryptographic Engineering
看英文說明

In this internship, you will get an insight into how cryptographic schemes can be implemented securely and efficiently. You will learn about new and upcoming cryptographic schemes from the field of Post-Quantum Cryptography and you will have the task to implement one of these schemes on an architecture of your choice - a modern general-purpose CPU with SIMD units, an embedded CPU such as the Arm Cortex-M4, or as hardware design on an FPGA.

Strong skills in English speaking, understanding, reading, and writing are mandatory.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/ruben/

實驗室網址(Research Information) :
http://www.polycephaly.org/
https://

Email :
ruben@iis.sinica.edu.tw
蔡孟宗
Meng-Tsung Tsai
串流式圖論演算法

Graph Streaming Algorithms
我的研究興趣在探討如何使用 O(n) 的記憶體空間處理各式的圖論計算問題,這裡 n 是指輸入圖的節點個數。

我們假設圖的邊是按照某個最糟的順序一條一條給演算法,而且只給一次。一張 n 個節點的圖,最多會有 Ω(n^2) 條邊,因為限制只能使用 O(n) 的記憶體空間,勢必得強迫演算法 "忘記" 大部分曾經讀進來的邊。在這個前提下,如何設計演算法完成各式的圖論計算問題?

在這嚴格的限制下,或許心裡的第一問題是:"是否大部分的圖論問題都不能在使用 O(n) 記憶體的狀況下完成計算?" 目前的研究文獻已經證實,許多圖論計算問題可以,但也有許多圖論計算問題,保證無法在這限制下計算出來。後者的情況,常常能找到方法在使用少量的空間下,找到 (1) 不錯的近似解、(2) 具有隨機成分的最佳解 (在很高的成功機率下)、或 (3) 具有隨機成分的不錯的近似解 (在很高的成功機率下)。這邊的機率與輸入的圖無關,只和演算法使用的隨機成分有關。

近期實驗室的研究成果有:

1. 存在一般圖上的 NP-complete 圖論計算問題,可以使用 O(n) 空間回答!
2. 對於將輸入圖拆分成盡可能少的無環子圖這個圖論計算問題,任何演算法都需要 Ω(n^2) 的記憶體空間才能找到最佳的拆分法!但存在演算法,只要 O(n) 的記憶體空間就能找到近似於最佳解的拆分法。

在這個專題,我們預期可以學習到如何使用數學工具回答:"在侷限的記憶體空間下,有哪些圖論計算問題可以被解決?有哪些圖論計算問題保證無法被解決?以及你喜歡的圖論計論問題是屬於哪一類?"

We are interested in whether a graph problem can be computed using O(n) space, where n denotes the number of vertices in the input graph.

We assume that the edges of the input graph are given to algorithms one by one, in an arbitrary order, and only once. Note that an n-vertex graph may have Ω(n^2) edges. If an algorithm uses O(n) space, then it has to "forget" much information of the input. Given the restriction, can we design algorithms to solve graph problems?

One may wonder whether there are many problems that can be solved using little space. It has been shown in the literature that: dozens of graph problems can be solved using little space, while dozens of graph problems cannot. In the latter case, the community usually can come up with a solution that approximates the best possible to within some factor, a solution that matches an optimal one with high probability, or a solution that approximates the best possible to within some factor with high probability. The probabilities here depend only on the randomness used in algorithms, and do not depend on the input graph.

The recent results obtained by our lab include:

1. There exists some NP-complete graph problem on general graphs that can be computed using O(n) space!
2. For any streaming algorithm, decomposing a graph into the least number of acyclic subgraphs requires Ω(n^2) space. However, this problem can be well approximated using O(n) space.

In this independent study, we expect to learn how to apply mathematical methods to answer the questions: whether a graph problem can be solved using little space, and which category your favorite graph problem belongs to?
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mttsai/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/mttsai/
https://

Email :
kasuistry@gmail.com
王柏堯
Bow-Yaw Wang
密碼程式形式化驗證

Formal verification of cryptographic programs
本研究將以形式化技術,驗證密碼程式之正確性。

We will apply formal techniques to verify cryptographic programs.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/bywang/

實驗室網址(Research Information) :
https://github.com/fmlab-iis/cryptoline
https://

Email :
bywang@iis.sinica.edu.tw
楊柏因
Bo-Yin Yang
後量子密碼學

Postquantum Cryptography
後量子密碼學是中大型量子電腦問世之後仍可保持安全性的公鑰密碼學。
本實驗室主要工作是在做後量子密碼學, 特別是其中的實作。
暑期研習的目標主要是動手做後量子密碼學, 如果暑假前即可開始最佳。

本實驗室不是 "量子密碼學", 想學量子密碼學的請出門左轉鐘楷閔教授研究室

Postquantum Cryptography (PQC) is the study of public-key cryptography that stays secure in the face of cryptographically relevant quantum computers.  Our lab mostly does postquantum cryptography particularly PQC implementations. It is best if you can start before the summer and we aim to get hands-on experience implementing PQC.

This is not the lab for Quantum Cryptography (QC), for QC please see Prof. Kai-Min Chung's lab.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/byyang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/byyang/
https://

Email :
byyang@iis.sinica.edu.tw
陳郁方
Yu-Fang Chen
Topics on logics, automata, and formal verification.

邏輯,自動機,與形式化驗證相關研究探討
我們實習的主題將環繞邏輯(logic)和自動機(automata)在形式化驗證(formal verification)的應用。形式化驗證是開發高品質軟硬體的一個方法。他將軟硬體設計是否符合定義的規格這樣的問題,透過基於邏輯或自動機的數學方法加以證明。我們實驗室會聚焦於兩種主要的數學工具:有限狀態自動機(finite-state automata)和SMT (satisfiability modulo theories),探討如何藉由他們來自動的確保系統或程式的正確性。這樣的題目比較適合同時對於電腦科學和數學有興趣的同學。

實習的內容會依照參與同學的程度作調整。原則上會先由文獻探討和整理開始出發,初步目標會是讓大家能讀懂目前頂尖國際會議的論文。如果程度好的同學,有機會能在這過程中產生新的想法,進而產生能發表於國際會議的成果。


Our internship's theme will be applying logic and automata in formal verification. Formal verification is a method for developing high-quality hardware and software, which uses mathematical methods based on logic or automata to prove whether hardware and software designs conform to defined specifications. Our lab will focus on two main mathematical tools, finite-state automata and SMT (satisfiability modulo theories), and discuss how to use them to ensure a system's or program's correctness automatically. Such topics are more suitable for students who are interested in computer science and mathematics at the same time.

We will adjust the content of the internship according to the level of the participating students. In principle, it will start with a literature discussion, and the initial goal will be to enable everyone to understand the current top international conference papers. Students with good progress can generate new ideas in this process and then produce results that are publishable at international conferences.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~yfc

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~yfc
https://

Email :
yfc@iis.sinica.edu.tw
呂俊賢
Chun-Shien Lu
深度學習安全-對抗性攻擊與防禦、後門攻擊與防禦、深偽偵測

Deep Learning Security-Adversarial Attack and Defense、Backdoor Attack and Defense、Deepfake Detection
   隨著深度學習技術的發展與演進,人工智慧(AI)的運用無所不在且跨越多個領域,其影響層面之寬廣實屬罕見。然而,因人工智慧的應用,伴隨而來的安全議題也益發重要,特別是在有安全需求的環境裡,忽略這些議題有時將導致災難性的傷害。近來的熱門研究發現,精心設計的adversarial examples (AEs)對於訓練良好的深度學習網路能達到相當程度的愚弄效果,而且這adversarial examples所引進的``adversarial perturbation (AP)’’,對於人眼或人耳感受不到與原資料(benign inputs)有差異,具備良好的imperceptible特質。根據文獻紀載(https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html),目前AI Security的著作發表數量從2014年以來呈現指數成長,顯示相關研究議題受到極大重視。
   為了不影響AE 的視覺品質,adversarial perturbations能量受控,從訊號處理角度觀點來看這類似早期資料隱藏所嵌入的高頻訊號,同樣都是考量不破壞被處理影像的視覺品質。然而不同的是,資料隱藏的研究可說是以影像內容為主體,而深度學習的安全除了影響內容之外還得考量深度學型模型,且眾所周知深度學型模型功能強大,但也太脆弱了,讓adversarial attacks有漏洞可鑽。有鑑於此,本實驗室打算從訊號處理與網路模型分別探討對抗攻擊的防禦方法。深度學習網路模型將以image classifier這task為主。

The security threats due to the use of AI applications have received remarkably attention recently, especially in a security or privacy-critical environments, as ignoring these issues may lead to disastrous. Recent researches reveal that the sophisticated design of adversarial examples can achieve efficient fooling effect on well-trained deep neural networks (DNNs). Meanwhile, the ``adversarial perturbations’’ introduced by adversarial samples are indistinguishable from the benign inputs in terms of human perception. According to the literature in https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html, the amount of publications, pertaining to AI Security, has been grown exponentially since 2014.
   In order to not degrade the perceptual quality of adversarial examples, adversarial perturbations are limited. From the viewpoint of signal processing,  this is in a sense similar to data hiding in high-frequency noise mebedding. Nevertheless, the major difference is that the host data in data hiding is digital media (e.g., image) itself while deep learning security needs to take both media data and network model into consideration. In addition, it is well-known that deep learning model is powerful in diverse areas but still is fragile to adversarial attacks. In view of the above observations, we propose to study defenses against adversarial attacks from the aspects of signal processing and learning models, respectively.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lcs/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/lcs/
https://

Email :
lcs@iis.sinica.edu.tw
楊得年
De-Nian Yang
元宇宙之多媒體社群深度學習

Multimedia, Social Networks, and Deep Learning in the Metaverse
(一)社群資料探勘、機器學習、與演算法設計:
• 基於虛擬、擴增實境(VR/AR)或元宇宙(metaverse)的推薦系統:如規劃避免3D暈眩或撞到障礙物的虛擬和現實路徑、團體購物中最大化共同和個人的喜好與虛擬世界社群網路中NFT交易推薦系統。
• 社群影響力分析與優化:如選定優惠券最佳兌換率投放目標、具性別平等意識之影響力最大化、個人化密度彈性群體查詢與圖(graph)中子結構的資訊融合。
• 其他應用領域的推薦系統:如社群直播組合推薦、為團體活動安排與活動潛在的參與者推薦。

(二)多媒體網路演算法設計與分析:
藉由分析問題NP困難度及不可近似性的方法,以及高階演算法設計技巧 (如近似演算法、競爭演算法等),來解決多媒體網路中的各類應用問題
• 虛擬實境和元宇宙網路:如規劃有線及無線網路資源配置和排程方式、選定3D多視角影片所需傳輸及合成之場景和最佳化多媒體網路資源效率及確保使用者的沉浸體驗。
• 軟體定義網路中的各類效能優化問題:如在具有網際網路工程之動態群組,基於使用者應用需求,設計路由及選擇資料源以最小化總頻寬消耗和更新規則的總數,並確保線路/節點容量限制。
• 行動邊緣運算情境:如結合數位雙生(Digital twin)藉設計高階演算法建置高效、可靠的社群物聯網(Social IoT)及群眾外包(Crowdsourcing)系統。


元宇宙是一個整合了多重虛擬世界的生態系,人們能透過其化身(Avatar)在元宇宙中社交、購物和創作,現實世界的物品也能透過數位雙生(Digital Twin)的形式存在其中,其為實體裝置之虛擬代表,被認為是物理世界和虛擬世界之間的橋樑。長期以來,我們關心未來元宇宙中的各種社群網路問題,包括虛擬與擴增實境(VR/AR)的朋友或NFT推薦系統、即時串流平台推薦系統、社群影響力分析及社群資料探勘;此外,我們也關心下個世代的網路優化問題,包括多媒體與軟體定義網路效能優化、有線及無線網路資源配置、單播/群播排程方式和建置可靠的社群物聯網(Social IoT)和群眾外包(crowdsourcing)。在這裡,你可以學到的技術包括圖神經網路(Graph Neural Network)、機器學習、張量分解技術、分析問題NP困難度及不可近似性的方法、整數/線性/半正定規劃、動態規劃、隨機湊整、對偶理論、抽樣方法等高階演算法設計技巧。歡迎想出國留學、增強實作能力、有元宇宙創業憧憬的同學,於今年夏天加入我們,一起探索未來元宇宙與社群物聯網的無限可能。




A. Social network data mining, machine learning, and algorithm design:
Research tensor decomposition, neural network, machine learning, and other technical solutions for:
• Virtual, augmented reality (VR/AR) or metaverse recommendation system (e.g., user-item configuration recommendation, planning a path avoiding 3D motion sickness and obstacles, and virtual world social network in NFT markets).
• Social influence analysis and optimization (e.g., coupon allocation for redemption maximization, fairness influence maximization, density personalized group query, and fusing graph substructures information into node features).
• Recommendation systems for other applications (e.g., social live streams recommendation system, group activities arrangement, and potential customers recommendation).

B. Algorithm design and analysis for multimedia and software-defined network:
We analyze NP-hardness, design approximation algorithms, and use advanced algorithm techniques to solve problems in various multimedia and software-defined networks.
• Virtual reality (VR) and metaverse applications (e.g., design resource allocation methods for wireless/wired networks and scheduling algorithms, select the required scene to synthesize in 3D multi-view videos, and optimize the resource use efficiency and users' immersive experience in multimedia networks).
• Various performance optimization problems in software-defined networks (e.g., consider application requirements to design routing schemes and choose data sources for minimizing the total bandwidth consumption and the total number of update rules and ensuring line/node capacity in dynamic groups with Internet engineering).
• Mobile edge computing scenarios (e.g., incorporate digital twins to build high-performance and reliable social IoT and crowdsourcing applications).


Welcome those who plan to study abroad and enhance their implementation skills or are interested in the metaverse. Please join us this summer. Let's explore the opportunities of the future metaverse and social IoT.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/
https://

Email :
dnyang@iis.sinica.edu.tw
呂及人
Chi-Jen Lu
深度學習的原理與應用

Deep learning: foundations and applications
研究深度學習的原理,並探索深度學習在強化學習、影像處理、自然語言等各個領域的應用。

Study the foundation of deep learning, and explore its diverse applications in various areas such as reinforcement learning, computer vision and natural language processing.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/
https://

Email :
cjlu@iis.sinica.edu.tw
王建民
Chien-Min Wang
機器學習與遺傳式編程

Machine Learning and Genetic Programming
(1) 人機組隊之深度強化學習:人機組隊 (Human-Autonomy Teaming, HAT) 已成為最新興的 AI 究趨勢之一,包括以人為中心的人智計算,和深度強化學習 (Deep Reinforcement Learning, DRL) 的自治 AI 算法。先進的 DRL 系統除了可達到智慧系統與人類進行更密切的合作外,同時亦可作為人類最佳的模範幫手、教練或競爭伙伴,以執行更合乎道德規範的互動性組隊,進而完成更合理且更適用的目標任務。HAT 基於人機之間的共享權限,以正確地學習共通指令、共同目標和競爭伙伴關係的模型;本研究成果將輔助 HAT 成為更有效的決策系統,同時達到高相容性和可靠性的人機系統。本研究計劃旨在構建一個具有交互運作、協作團隊和風險分析集合的仿真人 DRL 系統。在研究計畫中提出的創新方法,可作為未來 HAT 與 DRL 系統開發的重要基礎,以實現未來動態和自主環境中,更重要、與人相容、直觀和可靠的自主系統。

(2) 使用遺傳式編程探究監督式機器學習:本研究計畫透過嘗試解決兩個不同需求的應用問題,來探討監督式機器學習的兩個不同階段。第一個應用問題是要找出最能符合、解釋觀察樣本資料的機率分佈數學模型,此與監督式機器學習的第一階段(訓練/學習)目標一致;而第二個應用問題(網際服務品質時間序列預測)則要求模型除了要能符合學習/訓練資料之外,還需具有一般性的能力,以便未來能正確地應對未曾見過之資料或情況,此與監督式機器學習第二階段對模型的要求相同。此外,不同於時下熱門的深度學習方法使用類神經網路模型和倒傳遞式訓練,本研究計畫探索機器學習的另一種可能性與方向,也就是遺傳式編程 (Genetic Programming, GP) 。其使用數學表達式模型和演化式搜尋學習,有益於機器學習結果的理解、推導與運用,符合 Explainable AI 所提倡之概念。


(1) Deep Reinforcement Learning for Human-Autonomy Teaming: Human-Autonomy Teaming (HAT) has become one of the most emerging AI research trends consisting of Human-Centered Computing and self-governed AI algorithms such as Deep Reinforcement Learning (DRL). The advanced DRL system with sophisticated design allows intelligent gent's closer cooperation with humans while performing moral, reasonable, and applicable tasks as humans' most exemplary assistants, tutors, and/or competition partners. Based on HAT's pursuing the collective goals of sharing the authority, offering instructions, and/or competitions between humans and machines, the research outcomes will help HATs become more effective decision-support systems while sustaining a highly compatible and reliable Human-AI system. Furthermore, this research proposal aims at constructing a human-level DRL system with an interactive, collaborative teaming and risk analysis integration. The novel approach proposed in this study can enhance and extend the development and important foundation of future HAT with DRL systems as Explainable AI methodology for more considerable, human-compatible, intuitive, and reliable applications in future dynamic and autonomous environments.

(2) Exploring Supervised Machine learning with Genetic Programming: Through solving two individual application problems, this research proposal investigates two separate stages of supervised machine learning. First, the main purpose of the former application problem is to identify the probability distribution function for a set of observation data, which matches the goal of the training/learning phase of supervised machine learning. Afterwards, the second application problem concentrates on Web service QoS time series prediction, which requires the generated model to be capable of dealing with unseen data or situations rather than just merely fitting provided data. Moreover, instead of adopting the widely used deep learning techniques, this research proposal tries another possibility and research direction, i.e., genetic programming, which employs evolutionary searching/learning strategy and mathematical expression-based models that are helpful for the understanding and use of the outcomes of machine learning process.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cmwang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/page/research/ComputerSystem.html?lang=zh
https://

Email :
iho@iis.sinica.edu.tw
修丕承
Pi-Cheng Hsiu
間歇性深度學習

Intermittent-aware Deep Learning
此計畫屬於嵌入式系統研究領域,特別關注「間歇性智慧物聯網系統」,讓物聯網裝置可靠環境中不穩定的供電,間歇性地執行深度學習模型,無須再安裝電池而永續運作。我們開發系統軟體,讓深度學習模型得以輕鬆部署且間歇性地執行於無電池的微型裝置,使得AI研究人員可以專注於深度網絡設計而不是模型佈署。學生將整合並佈署我們開發的「間歇性作業系統」與「間歇性深度學習推論工具」於超低功率嵌入式裝置,並學習到系統實作與開發的經驗。


This project’s scope lies in the area of embedded systems, with a special focus on enabling battery-less IoT devices to intermittently execute deep neural networks (DNN) via ambient power. We develop system software for AI researchers to easily deploy and efficiently execute their DNN models onto battery-less tiny devices, so that AI researchers can focus on deep network design rather than model deployment. You are expected to learn rich hands-on experience in prototype implementations and hacking system kernels, by integrating and deploying our previously developed intermittent operating system and intermittent deep learning tool onto ultra-low power embedded platforms.

PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/pchsiu/

實驗室網址(Research Information) :
https://emclab.citi.sinica.edu.tw
https://

Email :
pchsiu@citi.sinica.edu.tw
柯向上
Hsiang-Shang ‘Josh’ Ko
附正確性證明的函式程式與型式化的數學基礎

Verified functional programming and formalised foundations of mathematics
每當我們寫出一個數學定理和證明,其實也就寫出一個具有型別 (type) 的函式程式 (functional program)。從一開始 Curry、Howard 等人察覺到幾套獨立發明的數理邏輯系統和計算系統竟有相同本質,到 Martin-Löf 發明 Type Theory 作為數學和程式寫作的大一統基礎,隨後衍生出眾多有成熟實作的證明輔助器 (proof assistants) 和依值型別程式語言 (dependently typed programming languages),現在我們已可用同一型式表達證明和程式,並讓電腦檢查其正確性。

實習的焦點會放在 Agda 這個程式語言學界常用的語言,由基本的 Agda 程式寫作延伸至 (Homotopy) Type Theory 這套新興的數學基礎,或是往證明與程式合一的依值型別程式寫作 (dependently typed programming) 方向探索。實習型式類似讀書會,自行研讀材料和動手實作,並定期與老師同學們分享討論;若有餘裕也可做個小專案。

參考讀物請見英文版介紹後半。

Whenever we write down a mathematical theorem and its proof, we have also written down a typed functional program. Based on the Curry–Howard correspondence, which stemmed from observations that several independently invented logical and computational systems were nevertheless essentially the same, Martin-Löf’s Type Theory is a common foundation for mathematics and programming, and has spawned numerous proof assistants and dependently typed programming languages, within which we can express proofs and programs uniformly, and check their correctness using a computer.

We will focus on Agda, which is a popular language for the programming languages research community. Starting with basic programming in Agda, we can explore either (Homotopy) Type Theory, a newly developed foundation of mathematics, or dependently typed programming, where proofs are embedded into programs. The format will be like a studying group, where each member will study relevant materials, write Agda programs, and share their findings and discuss with the group. It is also possible to do a small project if time permits.

References

* (Homotopy) Type Theory

Martín Hötzel Escardó [2019]. Introduction to univalent foundations of mathematics with Agda. DOI: 10.48550/arXiv.1911.00580. https://www.cs.bham.ac.uk/~mhe/HoTT-UF-in-Agda-Lecture-Notes/

Simon Thompson [1999]. Type Theory and Functional Programming. Addison-Wesley. ISBN: 9798482847145. https://www.cs.kent.ac.uk/people/staff/sjt/TTFP/

* Dependently typed programming in Agda

Ana Bove and Peter Dybjer [2009]. Dependent types at work. In International LerNet ALFA Summer School on Language Engineering and Rigorous Software Development 2008, volume 5520 of Lecture Notes in Computer Science, pages 57–99. Springer. DOI: 10.1007/978-3-642-03153-3_2. https://www.cse.chalmers.se/~peterd/papers/DependentTypesAtWork.pdf

Hsiang-Shang Ko [2021]. Programming metamorphic algorithms: An experiment in type-driven algorithm design. The Art, Science, and Engineering of Programming, 5(2):7:1–34. https://doi.org/10.22152/programming-journal.org/2021/5/7

Conor McBride [2011]. Ornamental algebras, algebraic ornaments. https://personal.cis.strath.ac.uk/conor.mcbride/pub/OAAO/LitOrn.pdf
PI個人首頁(PI's Information) :
https://josh-hs-ko.github.io

實驗室網址(Research Information) :
https://www.iis.sinica.edu.tw/pages/joshko/
https://

Email :
joshko@iis.sinica.edu.tw
蕭邱漢
Chiu-Han Hsiao
以深度學習技術為基礎之腎臟細胞癌影像辨識與血漿生物標記融合技術

Integrated-Fusion-Based Deep Learning of Image Segmentation and Plasma Biomarker for RCC
腎臟細胞癌 (RCC) 是最常見的惡性腎腫瘤,屬於一種原發性惡性腺癌來源於腎小管上皮細胞,在男性和女性中都很常見,其他常見的有包括血管平滑肌脂肪瘤,腎轉移和伴有出血性或複雜性腎囊腫的腎淋巴瘤等。RCC 分為三種主要類型:為腎亮細胞癌 (ccRCC)、乳頭狀腎細胞癌 (pRCC) 和Chromophobe RCC (chRCC)。此外,腎囊性病變 (Bosniak) 根據影像學特徵可分為五類。因具有多種放射學特徵表現,在診斷上常使用電腦斷層掃描 (CT) 進行輔助診斷。
然而,在醫療AI領域中應用深度學習技術已被證實具有其優越性,例如癌症檢測和臨床預測上,但是,除非收集到足夠多且高品質之數據,否則很難提高模型準確度。另外,許多研究指出,檢查血漿蛋白變化的程度可視為一種生物標記,應用蛋白質組學方法適用於臨床以非侵入性方式偵測RCC。
本計畫目標為開發融合性應用深度學習技術之系統架構,並結合蛋白質修飾或轉譯修飾 (PTMs) 之生物標記及分析結果,希冀研究成果可補充目前之RCC臨床診斷指引和後續診療策略建議。
綜合上述,本計畫應用深度學習技術用以偵測腎臟和腫瘤,結合蛋白質分析技術,研發生物標記,以非侵入性方式提供具有預測性、預防性、個人化和參與性的腎臟細胞癌早期偵測系統,未來,可應用於多家醫院並且有效提供資訊用以協助臨床醫生診斷。

Renal cell carcinoma (RCC) is commonly diagnosed in men and women. Primary malignant adenocarcinomas are derived from the kidney tubular epithelium, the most common malignant renal tumor, which has a variety of radiographic appearances. All renal masses, particularly other renal tumors, should be considered in the differential. It most commonly includes angiomyolipoma (AML), which usually has a significant fat component, kidney metastasis, and renal lymphoma with hemorrhagic or complex renal cysts. There are three major subtypes of RCC: clear cell (ccRCC), papillary (pRCC), and chromophobe (chRCC). In addition, renal cystic lesions (Bosniak) are classified into five categories based on imaging characteristics. It needs to be diagnosed very carefully by a doctor using contrast-enhanced computed tomography (CT).

Recent advances in deep learning techniques have demonstrated state-of-the-art performance for medical imaging processes, such as cancer detection and clinical prediction. The problem is that it is difficult to improve the model's accuracy unless sufficient data is collected. Furthermore, numerous studies have examined the abundance of changes in plasma proteins. Clinically significant markers were not found in most of them. As a result, the proteomic approach can also be considered a novel method for identifying biomarkers. To detect RCC non-invasively, blood plasma is a suitable specimen.

The purpose of this proposal is to develop deep learning algorithms for detecting RCC based on CT images that have been fused with proteomic analyses. Moreover, it can be updated to complement current clinical guidelines and outline potential future directions for RCC research and treatment. Additionally, it is proposed that protein modifications or posttranslational modifications (PTMs) will be considered novel biomarkers and essential features of the models.

The application of the proposed methods enables kidneys and tumors to be segmented automatically and precisely. It provides more predictive, preventive, personalized, and participatory consistency with highly accurate identification results. Additionally, it will present the ability to classify, segment, locate, and detect RCC on medical images and biomarkers in multiple hospitals. It will significantly assist clinicians in the diagnosis of diseases in clinical settings.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/chiuhanhsiao/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/chiuhanhsiao/
https://

Email :
chiuhanhsiao@citi.sinica.edu.tw
陳駿丞
Jun-Cheng Chen
深偽偵測與深度生成模型之影像應用

Deepfake Detection and Deep Generative Models  for Image Applications
隨著深度生成模型和各類生成式AI的快速發展,有心人士可以利用這些技術, 大量製造深偽影片和假消息,破壞他人名譽與社會信任,如日前網紅使用基於生成對抗網路軟體之換臉事件,或用文字生成影像大量生成影像,尤其深偽生成演算法不斷推陳出新,開發有效的深偽偵測演算法,已經是非常重要的研究議題‧ 要設計有效的深偽偵測器,更需要知道背後深度生成模型的背景知識(如生成對抗網路、擴散模型)。 在這個實習中,你可以接觸到各種深度生成模型在影像處理和電腦視覺的應用(如影像超解析度、去躁、風格轉換等應用),當你對深度生成模型有一定了解後,我們手把手帶你設計一個深偽影像偵測器和其他相關之電腦視覺任務。

表現優秀的學生,實習結束後,可以獲得續聘兼任研究助理的機會繼續研究,並可發表論文至電腦視覺與影像處理國際知名會議或參加國際競賽為國爭光‧

With the fast development of deep generative model and other generative artificial intelligence, malicious people can leverage these technologies to massively produce deepfake videos and fake news. These greatly defame others and destroy societal trust, such as the event of the net celebrity who replaces the faces of the videos with other celebrities. As the new deepfakes are invented very fast, it has become a very important research issue to design an effective deepfake detector.  To achieve this goal, it is essential to understand the underlying knowledge of deep generative models (generative adversarial network and diffusion model). During the internship, we will guide you through various deep generative models and their applications to image super-resolution, denoising, and style transfer, etc, and finally we will design a deepfake detector together and other related computer vision tasks.

For those who performs well during the internship, we will offer you the opportunity to be hired as a part-time research assistant and continue the research for famous international computer vision and image processing conferences or competitions.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/pullpull/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/pullpull/
https://

Email :
pullpull@citi.sinica.edu.tw
王釧茹
Chuan-Ju Wang
表示學習演算法於人工智能之應用

Representation learning and its applications
異質性資料涵括各式結構化(如:消費記錄、產品規格)及非結構化數據(如:網友文字評論),其各自的資料結構及特徵空間大不相同,因此如何進行彼此間的關聯、整合及推論仍屬當代人工智能技術及其相關應用的一大挑戰。然而透過機器學習的非監督式學習法則有可能將異質性資料表現於共同的特徵空間之中,倘若又能在此空間中獲得優良的資料表示法,則可作為異質性資料分析的穩固基石。因此,本研究主題從深度學習及網路表示法的框架切入,深入探究其空間轉換的特性及其保留的訊息,並將針對不同的資料型式及應用情境設計對應之演算法。除了演算法設計及理解外,本實習亦具有有以下三個特色:1) 將使用真實世界的資料進行資料分析及學習;2) 將學習如何在unix-like 環境下處理大量資料並運行實驗;3) 將學習如何使用網頁前端技術進行結果之視覺化呈現。



The research topics will be related to the processing and understanding heterogeneous data (including texts, pictures, audio signals, social relations, and user behaviors) and using the deep learning and/or network embedding techniques for various AI-enable applications. In addition to the model design, during the internship, the participant will also 1) have hands-on experience with real-world data, 2) learn how to deal with large-scale data and conduct experiments under unix-like systems, 3) learn how to visualize the learned results using front-end web programming techniques.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cjwang/

實驗室網址(Research Information) :
http://cfda.csie.org
https://

Email :
cjwang@citi.sinica.edu.tw
陳孟彰
Meng Chang Chen
深度學習應用於惡意軟體分析

Deep learning for malware analytics
(細節見下面)

As malware has caused great damage to computers and networks,  malware analytics becomes significant to understand the attack activities and program structure of malware. In this summer program, we need some help to implement various deep learning models and experiment on various real-world data. Students can learn not just deep learning techniques but also knowledge of malware.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mcc/

實驗室網址(Research Information) :
https://www.mambaplus.tw/
https://

Email :
mcc@iis.sinica.edu.tw
吳廸融
Ti-Rong Wu
深度強化式學習與電腦遊戲

Deep Reinforcement Learning and Computer Games
近年來深度強化式學習於電腦遊戲取得優異的成果,如擊敗世界圍棋冠軍李世石的AlphaGo。本研究將探討應用各種深度強化式學習之技術於電腦遊戲上,包含但不限於:棋盤類遊戲如圍棋、五子棋、黑白棋以及電玩遊戲等。

實習生將會參與開發電腦遊戲、使用深度強化式學習算法訓練電腦遊戲程式以及改善搜尋演算法效能等。歡迎對電腦遊戲以及深度強化式學習演算法有興趣的同學加入。我們也歡迎表現良好的同學於實習後繼續與實驗室合作,參與電腦遊戲競賽或發表論文。

Deep reinforcement learning (DRL) has achieved several successes recently in computer games, such as AlphaGo beating world Go champion Lee Sedol. Our research focuses on applying various DRL techniques to computer games, including but not limited to board games and video games such as Go, Gomoku, Othello, and Atari.

Interns will be involved in developing computer games, training game playing agents through DRL algorithms, and improving performance through search algorithms. Students who are interested in computer games and DRL are welcome to join us. After the internship, we also welcome students with good performance to continue to work with us, participate in computer game tournaments or publish papers.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/tirongwu/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/tirongwu/
https://

Email :
tirongwu@iis.sinica.edu.tw
張佑榕
Ronald Y. Chang
基於AI的第六代無線通訊

AI-Based 6G Wireless
見英文介紹

This internship will explore the intersection of wireless communication and machine learning.  Ideal candidates should have good programming skills and have interest or experience in one or more of the following (techniques and applications):

1) Wireless Communication:  communication systems, signals and systems, engineering mathematics (linear algebra, probability, etc.), digital signal processing
2) Machine Learning:  convolutional neural networks (CNNs), graph neural networks (GNNs), reinforcement learning (RL), federated learning (FL)
3) Applications:  multiple-input multiple-output (MIMO) communication, cell-free networks, reconfigurable intelligent surface (RIS) assisted communication, satellite communication

Interns will participate in weekly and ad-hoc meetings, conduct research, and prepare research reports/slides/presentations/research papers. Paid extensions after the official two-month period are possible upon demonstration of satisfactory performance and mutual agreement.
PI個人首頁(PI's Information) :
https://www.citi.sinica.edu.tw/pages/rchang/index_en.html

實驗室網址(Research Information) :
https://www.citi.sinica.edu.tw/~rchang/
https://

Email :
yjrchang@gmail.com
王志宇
Chih-Yu Wang
邊緣智慧最佳化與系統開發

Foundation of Edge Intelligence
從事邊緣智慧(Edge Intellignece)相關研究應用,邊緣智慧的科普文章可參考:

https://newsletter.sinica.edu.tw/26211/

實習內容將針對邊緣智慧的相關技術探討與最佳化設計(理論、軟硬體皆有相關題目),可參考PI近年研究發表。確切題目待面議。

Our goal is to identify, analyze, predict, and manage the strategic behaviors of humans in various networks such as communication networks, information networks, social networks, etc. The main theme of the internship this year is Edge Intelligence, including resource optimization, protocol design, system development, etc.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cywang/

實驗室網址(Research Information) :
http://snaclab.citi.sinica.edu.tw
https://

Email :
cywang@citi.sinica.edu.tw
鄭湘筠
Hsiang-Yun Cheng
適用於深度學習與巨量資料分析之新世代記憶體系統

Energy-efficient future memory systems for deep learning and big data analytics
近年深度學習與資料密集程式(圖論分析、基因序列分析)等,越來越盛行,這些程式在運算時往往需要大量的記憶體存儲空間與高效的資料存取,然而目前主流的運算系統無法滿足這些需求, 使得我們必須重新思考如何設計未來的電腦系統。其中一個極具潛力的設計方向是從傳統以運算單元為主的系統切換到以記憶體為主的運算系統,藉由在記憶體內或週邊直接做部分的運算減少資料傳輸造成的效能瓶頸。許多新興的記憶體技術,像是Intel Optane Memory、電阻式記憶體(ReRAM)等,兼具存儲與運算功能,為實現以記憶體為中心的運算系統帶來新的曙光,產業界也積極開發於3D堆疊式記憶體(3D-stacked memory)或DRAM晶片上加入簡單運算單元之技術以實現近記憶體運算,但由於硬體技術尚不成熟以及和傳統截然不同的運算模式,在系統設計上有許多尚待克服之挑戰。

本實習計畫的目標為針對深度學習與巨量資料分析之應用情境,探討不同層面上之設計挑戰,包括電路與元件階層、計算結構階層、及演算法階層,並以軟硬體協同設計的方式, 設計高效能低耗電之新世代記憶體系統。實習生可選擇參與下列研究主題,或其他相關研究議題。

1. 針對不同類型之神經網路,如應用於個人化推薦之深度學習模型、記憶增強型神經網路(Memory-augmented neural network, MANN)等,設計以記憶體內運算/近記憶體內運算加速的記憶體架構,並從軟體與硬體層面優化提升效能。
2. 設計適用於加速圖論分析相關演算法之新興記憶體系統。
3. 基於記憶體內運算,以軟硬體協同設計提升基因序列分析演算法之效能。


In recent years, data analytics applications that must process large volumes of data, such as deep learning, graph analytics, genome data analytics, etc., have become increasingly popular. These big data applications demand large memory capacity and efficient data access. Unfortunately, mainstream computing systems are not designed to meet their needs. This forces us to fundamentally rethink how to design future computing platforms. One promising solution is to shift from contemporary processor-centric design toward revolutionary memory-centric design. Emerging memory technologies, such as Intel's Optane Memory, resistive RAM, etc., offer computing-in-memory capability. Many major vendors in the industry are also developing techniques to add simple computing units on 3D-stacked memory or DRAM DIMMs to enable near-memory computing. Despite promising, bringing such a system into practice remains challenging due to the hardware constraints and the distinct computing behavior.

Our goal is to study the design challenges at different system layers, including device/circuit level, architecture level, and algorithm level, and propose cross-layer designs to fully exploit the potential of in-memory/near-memory computing systems. Candidate topics include, but are not limited to, the following:

1. For different types of neural networks, such as deep learning models for personalized recommendation and memory-augmented neural networks (MANN), develop energy-efficient in-memory/near-memory computing systems through software-hardware co-design.
2. Cross-layer co-design to accelerate graph analytic algorithms via in-memory/near-memory computing.
3. Cross-layer co-design to accelerate genome analysis via in-memory/near-memory computing.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/
https://

Email :
hycheng@citi.sinica.edu.tw
陳伶志
Ling-Jyh Chen
從智慧城市到幸福城市:使用微感測器與智慧優網探究城市的未來發展

Making Smart City a Happy City: using low-cost sensors and AIoT
智慧城市已是近年來城市發展的重要趨勢,但隨著越來越多的資訊科技被融入城市生活中,人們也開始反思這些新興建設的整體目標與實際效果是否符合期待,以及這些智慧科技是否對於城市的生活品質真能有所提昇。我們認為,城市是一個結合物理環境和社會經濟的複雜系統,城市的發展不應過度追求智慧,而更應聚焦都市中生活型態 (way of life) 與步調 (pace of life) 的品質提升,促進民眾的整體幸福感受。

在這個專案中,我們希望透過兼具時間與空間高解析度的微型感測物聯網,進行兼具學理、創意與應用價值的資料混搭與進階分析。我們將以人為本體,著重在民眾的生活步調、環境感知與空間知覺等面向,運用微型環境感測器,搭配人工智慧與時空大數據分析,探究幸福城市的量測指標與促進方案。我們歡迎對於物聯網、人工智慧、微型感測、大數據分析有強烈學習意願的夥伴一同參與我們的探究,更歡迎任何有關智慧城市感測的創新想法與挑戰構想,讓我們一起為增進城市的幸福感努力。

Smart cities have become an important trend in urban development in recent years. Still, as more and more information technologies are integrated into urban life, people are beginning to reflect on whether the overall goals and actual effects of these new constructions meet expectations and whether these innovative technologies can improve the quality of life in the city. We believe a city is a complex system that combines the physical environment and the social economy. The development of a city should not overly pursue wisdom but should focus on improving the quality of the way of life and pace of life in the city. Promote the overall well-being of the population.

In this project, we hope to conduct data mashup and advanced analysis with academic theory, creativity, and application value through the micro-sensing Internet of Things with high resolution of time and space. This project will be people-centric and focus on people's life pace, environmental perception, and space perception. We will use micro-environmental sensors, combine artificial intelligence and spatio-temporal big data analysis, and explore a happy city's measurement indicators and promotion schemes. We welcome partners willing to learn about the Internet of Things, artificial intelligence, micro-sensing, and big data analysis to participate in our exploration. We will learn, work, enjoy the process, and produce good results together. For further questions, please feel free to contact us.
PI個人首頁(PI's Information) :
https://cclljj.github.io/

實驗室網址(Research Information) :
https://cclljj.github.io/research/
https://

Email :
cclljj@iis.sinica.edu.tw
曹昱
Yu Tsao
整合聲音影像訊號之醫學資訊分析

Audio and Video Signal Processing for Medical Applications
我們將研究聲音及影像在各種醫學資料上的分析,包括心音肺音、ECG訊號、醫學影像、聽覺以及構音系統,預計所使用的方法為最新的人工智慧以及訊號處理演算法。

We will study sound and images on various medical data, including heart sounds and lung sounds, ECG signals, medical images, auditory and articulation systems. It is expected that the methods used will be the latest artificial intelligence and signal processing algorithms.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yu.tsao/

實驗室網址(Research Information) :
https://www.citi.sinica.edu.tw/pages/yu.tsao/index_en.html
https://

Email :
yu.tsao@citi.sinica.edu.tw
李政池
Gen-Cher Lee
端對端加密通訊應用設計

Application for end-to-end encrypted communication
「端對端加密通訊應用設計」研究主題主要建立在本實驗室建立的端對端加密通訊的協定基礎之上研究開發可能的創新應用。可學習及運用到的軟體技術包括橢圓曲線及後量子密碼學演算法的開發應用、跨平台的cross-compile程式開發技巧。開發應用於開發前端Android/iOS/Desktop及後端spring boot或nodejs server應用,可選擇學習的程式語言包括C/C++、Java/Kotlin、Swift/ObjC、Javascript,若有興趣亦可學習到微服務的開發與部署技能。

The research topic for "Application Design for End-to-End Encrypted Communication" involves the development of innovative applications based on our end-to-end communication protocols. The software techniques that can be learned include application developments of elliptic curve and post quantum cryptography, cross-compile multiple platforms software development. The front-end Android/iOS/Desktop applications and back-end spring boot/nodejs sever development can be learned by using C/C++, Java/Kotlin, Swift/ObjC, Javascript programming languages. If you are interested, you can also learn the development and deployment skills of microservices.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/ziv/

實驗室網址(Research Information) :
https://www.e2eelab.org
https://

Email :
ziv@citi.sinica.edu.tw
黃彥男
Yennun Huang
入侵偵測與防禦/隱私保護/AIoT/智慧農業系統開發及AI資料科學分析

Intrusion Detection and Prevention/Data Privacy/AIoT/Smart Agriculture system development and AI scientific data analysis
1.入侵偵測與防禦系統研發
- 系統開發與測試
- 系統稽核紀錄和網路封包分析
- 人工智慧框架使用
- 自動化入侵規則生成

2. 資料安全隱私保護
- 從事資料安全與隱私保護之相關領域研究工作
- Python、Linux、Matlab、Deep Learning Toolbox基礎程式撰寫能力
- Deep Learning, Federated Learning技術相關研究

3.智慧物聯網
a.物聯網系統
-嵌入式系統開發
-無線感測網路應用開發
-全端系統開發
b.微機器學習
-智能移動偵測
-邊緣運算開發

4. 智慧農業計畫
- 資料蒐集、彙整、分析
- 使用機器學習方法訓練模型
- 資料科學相關
- ROS機器人開發
- 機器視覺

1. Intrusion Detection and Prevention System Development
- system development and testing
- system audit logs and network packets analysis
- AI framework usage
- automatic intrusion rule generation

2. Data science system program development
- Engaged in research work in related fields of data security and privacy
- Basic programming ability for Python, Linux, Matlab, Deep Learning Toolbox
- Research on deep learning and Federated learning technology

3. AIoT
a. IoT system
-Embedded System development
-Wireless sensor network application development
-Full Stack Development
b. tinyML
-Smart Motion Detection
-Edge Computing Development

4. Smart Agriculture Project
- data collection, compilation, and analysis
- Use machine learning methods to train models
- Data science related
- ROS robot development
- Machine Vision
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yennunhuang/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/yennunhuang/
https://

Email :
yenjoanna@gmail.com
葉彌妍
Mi-Yen Yeh
深度學習與巨量資料探勘技術於人工智慧應用

Big Data Mining and Deep Learning Techniques for AI applications
研究主題包含深究多種機器學習與資料探勘的模型與技術,並且了解如何與人工智慧應用做連結。特別針對可以圖形結構表示的資料,例如社群網路、智識圖譜、時空關係圖、特徵因果關係圖等,學習該如何把想解決的人工智慧應用問題轉成圖上可定義的學習問題,並尋找合適的Graph-based學習模型來解。

Our research topic includes understanding various deep learning and data mining models and techniques and applying them to AI applications. One particular focus is on application data that can be modeled as a graph, such as social networks, knowledge graphs, spatial-temporal graphs, and feature causality graphs. We will study how to formulate a learning problem on graphs to map the corresponding AI application and find suitable graph-based learning models to solve it.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/miyen/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/miyen/
https://

Email :

莊庭瑞
Tyng-Ruey Chuang
1. 相互豐富的研究資料與維基資料  2. 創新以及可永續的研究資料管理和協作  

1. Mutual Enrichment between Research Data and Wikidata  2. Innovative and Sustainable Research Data Management and Collaboration
1. 開放研究資料已不再是新的口號。從 Open Data 到 FAIR Data 有各種倡議與原則,但面臨研究實務上的議題時,針對不同學科領域的研究資料,存在著不同程度的想像空間及挑戰。好的研究,始於好的資料。除了使用 Wikidata 做為「研究資料寄存所」 (網址: https://data.depositar.io/about ) 的資料集關鍵字來源,以加強資料集之間的語意連結之外,本實驗室也陸續嘗試與不同學科領域的研究夥伴,進行各種資料的爬梳及結構化的處理。

2. 我們將從社群 (community)、技術 (technology)、協作 (collaboration)、以及研究 (research) 四面向, 協力發展台灣本地在研究資料管理的實踐社群。此實踐社群將以我們已開發的「研究資料寄存所」(網址: https://data.depositar.io/about )為實踐的場域之一。本計畫的預期成果包括:培養研究資料管理人才、參與開放資料軟體系統的國際協作專案、提昇研究資料管理實踐社群在台灣的規模與內涵、以及參與並貢獻所能到全球研究社群。

更多資訊可參閱以下:

1. Improving data discovery through Wikidata @ WikidataCon 2019

https://commons.wikimedia.org/wiki/File:Improving_data_discovery_through_Wikidata_-_WikidataCon_2019.pdf

2. Open Repositories for Scholarly Communication and Participatory Research @ "Open Science Initiatives in Asia" panel at the 18th Research Data Alliance Plenary Meeting

https://m.odw.tw/u/trc/m/rda-p18-panel/

3. Datasets and presentations from the depositar project team:

https://data.depositar.io/organization/depositar

1. We will study WiIkidata, and use Wikidata to enrich research datatsets, and vice versa. We will study and use Graph DB, for example TerminusDB, to build and maintain knowledge store and to connect it to Wikidata.  We will further enhance our research data repository (called depositar, website: https://data.depositar.io/about ) with Wikidata, and vice versa.

2. We will work on the community, technology, collaboration, and research aspects of research data management. We will help develop a community of practice for research data management in Taiwan. A research data repository we have developed (called depositar, website: https://data.depositar.io/about ) can function as a starting place where the communities practice research data management. The expected outcome of such a effort includes: cultivating research data management talents, participating in international collaborative projects for open data software systems, elevating the scale and capacity of the research data management community in Taiwan, and participating in and contributing to the global research community.

Please refer to the following for more information:

1. Improving data discovery through Wikidata @ WikidataCon 2019

https://commons.wikimedia.org/wiki/File:Improving_data_discovery_through_Wikidata_-_WikidataCon_2019.pdf

2. Open Repositories for Scholarly Communication and Participatory Research @ "Open Science Initiatives in Asia" panel at the 18th Research Data Alliance Plenary Meeting

https://m.odw.tw/u/trc/m/rda-p18-panel/

3. Datasets and presentations from the depositar project team:

https://data.depositar.io/organization/depositar
PI個人首頁(PI's Information) :
https://iis.sinica.edu.tw/~trc/public/

實驗室網址(Research Information) :
https://data.depositar.io/about
https://rdm.depositar.io

Email :
trc@iis.sinica.edu.tw
王大為
Da-Wei Wang
民眾自主意願表達健康資料授權之資訊架構設計

An information framework for respecting the autonomy of individuals
這個研究主題希望透過資訊技術的協助,設計並實現健康資料治理機制,讓民眾對於自己的個人資料使用的自主意願表達更容易執行,進而提升民眾對於資料使用的參與度,也可以讓當事人知道資料的使用情形,獲得更多的資訊回饋,促進資料利用的良性循環。

This research is using information technology to design and implement a health data governance mechanism and to carry out that individual autonomy of personal data can be demonstrated easily.  Thereby, individual participation in the use of data can be increased and promotes a virtuous cycle of data utilization.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wdw/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :

馬偉雲
Wei-Yun Ma
自己打造一個 Chat GPT - 大語言模型的訓練、校正、應用

Build up a Chat GPT - LLMs' Training, Adjustment, and Application
2022年11月,OpenAI在AI界投下一個重磅炸彈,他們展示了Chat GPT。(https://openai.com/blog/chatgpt/) 它基於GPT3.5,能理解大多數的人類語言,
可以用任何語言包含中文回答任何問題,可以聊天、寫小說、甚至寫code。全球包含台灣的 AI 界無不震驚。事實上,這只是預訓練的大語言模型(Lagre Language Model,LLM)一個最廣為人知的例子,其他還有Google 的 LaMDA、Facebook的 M2M-100、OpenAI的GPT3、阿里巴巴的 PLUG等等。

台灣在這波大型語言模型的潮流下,可以說缺席得相當嚴重。原因之一是這些Model都太大了,沒有設備或運算力能開發,大廠們也沒有開源。所幸,就在2022年7月,BigScience這個多國合作長達一年的計畫向世界發布了一系列開源LLM語言模型- BLOOM series of models。開源了350M到176B參數的各種大小模型。雖然對176B的模型來說,後續進行預訓練擴展(用新的語料繼續預訓練過程)或fine-tuning所需的運算力對多數學研機構仍是難題,但至少能站在巨人的肩膀(BLOOM)上再謀發展,就能大幅度的降低所需經費和時間。

我們(中研院詞庫小組)我們過去已經有預訓練以及應用GPT2的經驗,因此在2023年我們和聯發科以及國網中心合作,正打造一個繁體中文的LLM,並在此基礎上展開最前沿的研究與應用。在今年暑假,我們也開放數個名額開放給實習生,參與這個具有研究挑戰與歷史性意義的專案。目前規劃以下題目讓同學們選擇(若同學有其他LLM的新研究題目或想法也歡迎討論):

1. 如何將LLM的知識傳遞給一般比較小的LM (Knowledge Distillation)

Knowledge Distillation (知識蒸餾)是一種模型壓縮技術,使得規模較小的 student 模型從可以更龐大的 teacher 模型中 "學習"。也就是說透過知識蒸餾的技術,將比較優秀但是很龐大的teacher 模型所習得的"知識"能"蒸餾"出來,傳授給較小的student 模型。這樣的技術套用在LLM非常合適。因為很多實際的應用情境是缺乏部署LLM的環境的,只能夠部署比較小的LM。因為我們專案環境足以建構與執行LLM,故使得進行LLM的知識蒸餾題目成為可能。我們已有若干新的研究想法,也歡迎同學加入討論,腦力激盪出更多新的火花。

2. 開發在預訓練階段進行校正的新技術 (Reward Model)

在預訓練的階段進行校正是目前如OpenAI的Instruct GPT 或是Chat GPT所採用的技術,研究人員為監督式學習後的模型進行後續校正,他們讓模型根據前文來產生後續的句子,在字的生成分布上做取樣(Sampling)。以這種方式可以產生多個不一樣的句子,研究人員再雇用人將這些不一樣的句子做人工的重排序(reranking),重排序的結果可作為reward再回傳給模型,透過強化學習來調整模型參數,達到校正的目的。但是,這些方法忽略了一個標記實務上常出現的問題:就是人工重排序的不一致的情形。 我們在初步的實驗中發現,這會導致訓練不穩定,因此怎麼設計校正的新技術是一個實用且具挑戰性的課題。

3. LLM的校園對話生成

LLM效能的試驗場域非常重要,如何應用與評估生成文字的品質不易有明確的定義,而且對於不同的應用可能也有不同的考量。我們決定用校園對話生成為LLM效能的試驗場域,這帶來以下的好處與創新:

* 校園對話的食衣住行育樂等主題,本身調性比較平和,一旦系統產生有敵意或偏見,較容易突顯出來。

* 集中在特定的校園會在聊天範圍上有一定的限制,例如聊到台大附近好吃的地方,LLM就不該回應一家清大附近的餐廳。這對人工或是系統判定是否為事實比較容易。

* 產生的對話系統可以實際地給學生運用,成為校園日常生活的小幫手以及聊天抒解壓力的一個管道。

* 預計實際地讓學生使用反饋,亦可以充分反饋或凸顯研究上的問題與挑戰。


2022年11月,OpenAI在AI界投下一個重磅炸彈,他們展示了Chat GPT。(https://openai.com/blog/chatgpt/) 它基於GPT3.5,能理解大多數的人類語言,
可以用任何語言包含中文回答任何問題,可以聊天、寫小說、甚至寫code。全球包含台灣的 AI 界無不震驚。事實上,這只是預訓練的大語言模型(Lagre Language Model,LLM)一個最廣為人知的例子,其他還有Google 的 LaMDA、Facebook的 M2M-100、OpenAI的GPT3、阿里巴巴的 PLUG等等。

台灣在這波大型語言模型的潮流下,可以說缺席得相當嚴重。原因之一是這些Model都太大了,沒有設備或運算力能開發,大廠們也沒有開源。所幸,就在2022年7月,BigScience這個多國合作長達一年的計畫向世界發布了一系列開源LLM語言模型- BLOOM series of models。開源了350M到176B參數的各種大小模型。雖然對176B的模型來說,後續進行預訓練擴展(用新的語料繼續預訓練過程)或fine-tuning所需的運算力對多數學研機構仍是難題,但至少能站在巨人的肩膀(BLOOM)上再謀發展,就能大幅度的降低所需經費和時間。

我們(中研院詞庫小組)我們過去已經有預訓練以及應用GPT2的經驗,因此在2023年我們和聯發科以及國網中心合作,正打造一個繁體中文的LLM,並在此基礎上展開最前沿的研究與應用。在今年暑假,我們也開放數個名額開放給實習生,參與這個具有研究挑戰與歷史性意義的專案。目前規劃以下題目讓同學們選擇(若同學有其他LLM的新研究題目或想法也歡迎討論):

1. 如何將LLM的知識傳遞給一般比較小的LM (Knowledge Distillation)

Knowledge Distillation (知識蒸餾)是一種模型壓縮技術,使得規模較小的 student 模型從可以更龐大的 teacher 模型中 "學習"。也就是說透過知識蒸餾的技術,將比較優秀但是很龐大的teacher 模型所習得的"知識"能"蒸餾"出來,傳授給較小的student 模型。這樣的技術套用在LLM非常合適。因為很多實際的應用情境是缺乏部署LLM的環境的,只能夠部署比較小的LM。因為我們專案環境足以建構與執行LLM,故使得進行LLM的知識蒸餾題目成為可能。我們已有若干新的研究想法,也歡迎同學加入討論,腦力激盪出更多新的火花。

2. 開發在預訓練階段進行校正的新技術 (Reward Model)

在預訓練的階段進行校正是目前如OpenAI的Instruct GPT 或是Chat GPT所採用的技術,研究人員為監督式學習後的模型進行後續校正,他們讓模型根據前文來產生後續的句子,在字的生成分布上做取樣(Sampling)。以這種方式可以產生多個不一樣的句子,研究人員再雇用人將這些不一樣的句子做人工的重排序(reranking),重排序的結果可作為reward再回傳給模型,透過強化學習來調整模型參數,達到校正的目的。但是,這些方法忽略了一個標記實務上常出現的問題:就是人工重排序的不一致的情形。 我們在初步的實驗中發現,這會導致訓練不穩定,因此怎麼設計校正的新技術是一個實用且具挑戰性的課題。

3. LLM的校園對話生成

LLM效能的試驗場域非常重要,如何應用與評估生成文字的品質不易有明確的定義,而且對於不同的應用可能也有不同的考量。我們決定用校園對話生成為LLM效能的試驗場域,這帶來以下的好處與創新:

* 校園對話的食衣住行育樂等主題,本身調性比較平和,一旦系統產生有敵意或偏見,較容易突顯出來。

* 集中在特定的校園會在聊天範圍上有一定的限制,例如聊到台大附近好吃的地方,LLM就不該回應一家清大附近的餐廳。這對人工或是系統判定是否為事實比較容易。

* 產生的對話系統可以實際地給學生運用,成為校園日常生活的小幫手以及聊天抒解壓力的一個管道。

* 預計實際地讓學生使用反饋,亦可以充分反饋或凸顯研究上的問題與挑戰。
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/ma/

實驗室網址(Research Information) :
https://ckip.iis.sinica.edu.tw/
https://

Email :
ma@iis.sinica.edu.tw
宋定懿
Ting-Yi Sung
機器學習的生物資訊研究

Machine learning in bioinformatics
我們實驗室進行生醫相關(如:蛋白體、蛋白基因體)的生物資訊研究,近年來我們亦展開利用機器學習方法針對重要的生醫課題進行預測。

今年我們將以機器學習方法對可能為藥物標的之序列進行重要相關課題的預測,從搜集公開的資料開始,建構品質較佳的資料集,發展機器學習模型,並評估不同方法的準確率。

Our lab has been focused on bioinformatics research for proteomics and proteogenomics. In recent years, we also started to work on using machine learning to predict important biomedical properties. Specifically, this year we work on predicting important biomedical properties of drug target candidates. The task involves collecting publicly available data to construct good-quality datasets, developing prediction models using various machine learning approaches, comparing their prediction performance, and evaluating the contributions of different features.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/tsung/

實驗室網址(Research Information) :
https://ms.iis.sinica.edu.tw/COmics/index.html
https://

Email :
tsung@iis.sinica.edu.tw
王建堯
Chien Yao Wang
即時物件偵測之困難案例探討

Hard Cases Study of Real-time Object Detection
即時物件偵測是眾多電腦視覺應用的基本要素。
本實驗室開發了世界頂尖的即時物件偵測方法。
然而面對真實世界應用仍有相關子議題須解決。
暑期實習的目標便是探討即時物件偵測中諸如小物件偵測、影片物件偵測、局部物件偵測等困難案例的解決方案。

Real-time object detection is a key technology in various computer vision tasks.
Our lab creates state-of-the-art real-time object detector in the world.
However, there are some sub-issues need to be solved when applying real-time object detector in real-world applications, such as tiny object detection, video object detection, part object detection, and etc.
PI個人首頁(PI's Information) :
https://homepage.iis.sinica.edu.tw/pages/kinyiu/index_zh.html

實驗室網址(Research Information) :
https://homepage.iis.sinica.edu.tw/pages/kinyiu/index_zh.html
https://

Email :
kinyiu@iis.sinica.edu.tw
廖弘源
Mark Liao
影像中遠距與失焦物件之分析

Analysis of distant and out-of-focus objects in images
電腦視覺技術的目的是讓機器能夠像人類一樣感知世界。
然而一般的攝影機不像人眼能夠主動針對不同的關注目標賦予焦點與注意力。
在真實環境收集的影像中常會有失焦的狀況發生,
尤其以鏡頭焦距外的遠距離物件更是如此。
距離與失焦問題造成了物件在影像中語意資訊的流失,
暑期實習的目標便是針對此現象進行分析並設計方法找出這些目標物件。

The purpose of computer vision technology is to allow machines to perceive the world like humans. However, unlike the human eye, ordinary cameras cannot actively give focus and attention to different targets. Out-of-focus situations often occur in images collected in real environments, especially for distant objects beyond the focal length of the lens. The distance and out-of-focus problems cause the loss of semantic information of objects in the image.  We want to analyze this phenomenon and design methods to find distant and out-of-focus objects
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liao/

實驗室網址(Research Information) :
https://homepage.iis.sinica.edu.tw/pages/liao/
https://

Email :
liao@iis.sinica.edu.tw