2021 研究主題清單 (2021 Research List)

Ling-Jyh Chen

Advanced Data Analysis using Fine-grained and Spatio-temporal AirBox Data
在過去幾年中,我們已建立一個跨國性的大型細懸浮微粒(PM2.5)網路感測系統,擁有每天散佈在 58 個國家,超過 15,000 個 PM2.5 微型感測站,每個感測站以每五分鐘一筆的頻率上傳溫濕度與 PM2.5 的即時感測資料,目前已成為全球數一數二的 PM2.5 微型感測資料中心。

在這個專案中,我們希望透過兼具時間與空間高解析度的 PM2.5 感測資料,進行兼具學理、創意與 應用價值的資料混搭與進階分析。內容可以是(但並不局限於)即時污染源的溯源、微型感測器的 資料品質確保分析、中尺度的 PM2.5 擴散模式推估、中尺度的 PM2.5 濃度預報模式建構、PM2.5 衍生的社經資源成本推估、PM2.5 濃度與即時生理訊號的整合分析,甚或是其他更具創新與挑戰的 研究議題。


In the past years, we have successfully built a large-scale PM2.5 sensing system with more than 15,000 participating devices over more than 58 countries. Each device conducts environmental sensing, and uploads its temperature, humidity, and fine particulate matter (PM2.5) sensing results to our server every five minutes. As a result, our system has become one of the most well-known data hub of PM2.5 sensing systems world-wide.

In this summer project, we wish to utilize the fine-grained and spatio-temporal data of our system, and conduct advanced data analysis with both research and practical values. The topics include (but are not limited to) PM2.5 emission source tracking, fine-grained PM2.5 dispersion modeling, fine-grained PM2.5 concentration forecasting, social economic impacts of PM2.5 pollution estimation, and the correlation between PM2.5 concentration and physiological signals investigation. We also welcome innovative and even more challenging topics on the related problems.

We are looking for self-motivated, creative, and open minded people to join us. We will learn together, work together, enjoy the process together, and produce good results at the end together. For further questions, please feel free to contact us.
Churn-Jung Liau

applied logic

theory and application of symbolic logic to knowledge representation and reasoning
Bo-Yin Yang

Post-quantum Cryptography and Its Implementation
後量子密碼學指的是能在大型量子電腦的攻擊下存活的(公開金鑰)密碼系統。美國國家標準與技術研究院從 2016 年起舉行一個決定未來下一代公開金鑰密碼學標準的競賽,目前進行到第三階段。有興趣的人這個夏天請來和我一起研究以晶格,多變量二次式,偵錯修正碼,雜湊函數,或超奇異橢圓曲線同源的密碼系統 (主要是它們的實作)-- 我們的研究可能會影響這個競賽的結果,也就是或許會影響下一代的數十億人,所以你也可以說是為了世界大同與人類福祉在奮鬥。今年不同往年,來的各位還有機會向訪台的 Daniel J. Bernstein 和 Tanja Lange 這兩位現代密碼學大師學習。

Postquantum Cryptography (PQC) studies the (Public-Key) Cryptosystems that can survive the attack from an adversary armed with a large quantum computer.  The U.S. National Institute of Standards and Technology is running a competition to determine the next generation standard for public-key cryptography, which is currently in the third phase.  Those interested please
contact me to study cryptosystems based on lattices, multivariate quadratics, error-correcting codes, hash functions, or supersingular isogenies this summer,principally their implementations.  

Our research has a chance of affecting the outcome of the NIST competition, which will influence billions of people of the next generation, so you might say that this is also working toward world peace and the welfare of mankind.  This year, you also have the chance of learning from the visiting world-renowned cryptographers, professors Daniel J. Bernstein and Tanja Lange.
byyang@iis.sinica.edu.tw, hmlin@iis.sinica.edu.tw
De-Nian Yang

Analysis and Innovative Applications for Social Networks and Mobile Multimedia Networks


Analysis and Innovative Applications for Social Networks and Mobile Multimedia Networks
Yuan-Hao Chang

Study of embedded systems and storage systems
研究主題包含: 嵌入式系統、作業系統、檔案系統、記憶體管理、儲存系統管理、非揮發性記憶體研究、能源採集系統

Studied topics include embedded systems, file systems, operating systems, memory management, storage system management, non-volatile memory, and energy-harvesting system.
Jun-Cheng Chen

Deep Generative Model for the Applications of AI Forensics, Privacy, Security


This research internship will focus on the topic of the application of generative adversarial network and its variant to anomaly detection for deepfake and forgery detection, face anonymization for identity protection, the generation of physical adversarial patch for visual privacy, and other related applications.

You will work closely with the PI and senior research assistant for the research and could expect to gain related knowledge of deep learning, computer vision and AI security.
Meng Chang Chen

Extreme Value Theory for PM2.5 Prediction

Neural networks used in deep learning tends to learn a Gaussian learner from training data that cannot support applications emphasizing anomaly prediction. In this project, we plan to incorporate the extreme value theory in the PM2.5 prediction to predict the anomaly.
Tsan-sheng Hsu

Intensive data computing foundations with applications

Motivated by applications, we plan to investigate problems involving massive data sets using methods such as algorithms, parallelization, implementation techniques, machine learning and deep learning
Lun-Wei Ku
(1) 多模態故事生成 (看圖說故事) (2) 假新聞干預 (3) 推薦系統 (4) 自然語言處理相關應用開發

(1) Multimodal Story Generation (2) Fake News Intervention (3) Recommendation System (4) Application Development

(1) 在多模態故事生成專案中,我們注重在圖像故事生成 (看圖說故事),目前研究方向上,首先會辨識出每張圖片中的物件及動作當作素材,並且使用這些素材來構成前後呼應並與圖片相關的故事。我們接下來會嘗試生成新的故事插圖。

(2) 在假新聞干預研究中,我們著重於研究甚麼樣的新聞內容與呈現形式,讀者會傾向於相信或不相信,我們將進行內容理解,網路模擬及使用者端的研究。


(4) 在應用開發上,我們將選擇實驗室既有技術可支援的潛力下游應用,開發展示技術的程式。

http://www.lunweiku.com/ 參考相關論文。

Interns will learn how to use basic natural language processing tools, extract information from texts, classify documents, recommendation systems and generate dialogs. Machine learning and deep learning technologies for NLP will be touched. Interns can select the topic/team they wish to join.

(1) In multimodal storytelling project, we are focusing on visual storytelling which machine generates a story by a given image sequence. In the current project, we first detect terms’ entities and actions, in each image and then utilize these terms to compose a coherent story. This summer we will focus on free length context generation (both texts and images).

(2) In fake news intervention project, we focus on studying why and how readers trust fake news. We will explore approaches which mitigate the impact of fake news.

(3) In recommendation system project, we will get the real-world logs  and try to utilize NLP as well as deep learning techniques on these data to enhance the performance of the recommendation system.

(4) Interns can also choose to develop demo applications for the existing technologies in our lab.

The research topics include but not limited to the above.
After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
Kai-Min Chung

Independent Research on Cryptography, Complexity Theory, or Quantum Cryptography
The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

Chih-Yu Wang
無線網路最佳化 / 社群網路分析

Wireless Networking Optimization / Social Network Analysis


Our goal is to identify, analyze, predict, and manage the strategic behaviors of humans in various of networks such as communication network, information network, social network, etc.
Chuan-Ju Wang

Representation learning and its applications
異質性資料涵括各式結構化(如:消費記錄、產品規格)及非結構化數據(如:網友文字評論),其各自的資料結構及特徵空間大不相同,因此如何進行彼此間的關聯、整合及推論仍屬當代人工智能技術及其相關應用的一大挑戰。然而透過機器學習的非監督式學習法則有可能將異質性資料表現於共同的特徵空間之中,倘若又能在此空間中獲得優良的資料表示法,則可作為異質性資料分析的穩固基石。因此,本研究主題從深度學習及網路表示法的框架切入,深入探究其空間轉換的特性及其保留的訊息,並將針對不同的資料型式及應用情境設計對應之演算法。除了演算法設計及理解外,本實習亦具有有以下三個特色:1) 將使用真實世界的資料進行資料分析及學習;2) 將學習如何在unix-like 環境下處理大量資料並運行實驗;3) 將學習如何使用網頁前端技術進行結果之視覺化呈現。

The research topics will be related to the processing and understanding heterogeneous data (including texts, pictures, audio signals, social relations, and user behaviors) and using the deep learning and/or network embedding techniques for various AI-enable applications. In addition to the model design, during the internship, the participant will also 1) have hands-on experience with real-world data, 2) learn how to deal with large-scale data and conduct experiments under unix-like systems, 3) learn how to visualize the learned results using front-end web programming techniques.
Ronald Y. Chang

AI-Based Next-Generation Wireless Communications

Several paid internships are available at CITI, Academia Sinica, the most preeminent academic research institution in Taiwan. The intern will participate in one of the following projects: 1) space and cellular network integration toward sixth generation (6G) wireless communication for early disaster detection/mitigation, 2) machine learning for wireless communications and Internet of Things (IoT) applications. Intern responsibilities include attending weekly group meetings, conducting research at a similar level as full-time RAs, and preparing research reports, slides, presentations, and/or papers.

Preferred qualifications:
1) EE/CS/Communication Engineering major or a related area;
2) Strong knowledge of wireless communication and/or machine learning (neural networks, reinforcement learning, etc.);
3) Good programming skills;
4) Plans to pursue advanced study domestically or abroad.
Yu-Fang Chen

Developing techniques for automatic software testing and verification

本實驗室目標為開發軟體正確性確保的關鍵技術。從理論到工具的實作都有涉獵。在相關領域的重要會議,如PLDI, OSDI, CAV, ICSE, LICS經常性的有相關發表。


Our focus is development of core techniques for ensuring software quality. The interns will help to survey recent research topics in this direction.
Chi-Jen Lu

Deep learning: foundations and applications

Study the foundation of deep learning, and explore its diverse applications in various areas such as computer vision and natural language processing.
Yu Tsao

Biomedical Signal Processing for Computer-assisted Diagnosis
我們透過各式感測器和設備收集不同的生理訊號(例:動作、聲音、肌肉和神經電訊號等),結合生醫訊號處理技術和機器學習,進行新穎的健康照護和臨床輔助診斷系統之開發與設計。實習過程中,將有機會與醫學中心團隊(耳鼻喉科、神經內科、復健科)進行討論,並實際參與研究開發之中。研究主題包含 (1)跌倒偵測 (2)中耳功能評估 (3)電生理診斷 (4)五十肩診斷及評估系統 (5)個人化健康監測系統

We use various sensors and devices to collect bio-signals (e.g., motion, sound, EMG, ENG) to design and develop novel healthcare and computer-assisted diagnosis systems with bio-signal processing and machine learning techniques. During the internship, the intern has the chance to have contact with medical groups (e.g., Otolaryngology, Neurology, and Rehabilitation) and join the study groups. The research topics involve (1) fall detection (2) functional assessment of middle ear (3) electrophysiological diagnosis (4) diagnosis and evaluation of frozen shoulder (5) personal health monitoring systems.
Chung-Yen Lin

Harness Biomedical Big data for Genome Biology in AI

Our team's main goal is to analyze big omic data, which may lead us to know more about the secrets of biological regulations hidden among massive data deluge. By combining open source tools and self-developed programs/ platforms, we have assembled, annotated, and decoded several aquatic genomes with high economic importance. We are currently developing new approaches to fill the gaps in the assembled human genome to pave the way for personalized medicine and precision medicine.  New approaches like deep learning will be introduced to rediscover our studies. Several platforms/ applications we developed in AI and biological knowledge are focus on smart typing of upper respiratory pathogens and novel antibiotics identification even design.
Li Su

Music and artificial intelligence, music information retrieval, human-computer interaction, computational musicology
音樂與文化科技實驗室(Music and Culture Technology Lab)成立於2017年。我們致力於研發最先端的數位訊號處理、深度學習技術,應用在各種結合音樂與人工智慧的議題上。2020年的主要成果包括自動採譜、與人互動的虛擬音樂家、生成式音樂、應用在傳統音樂之計算音樂學等,其應用場域橫跨音樂之聆賞、分析、製作、展演等活動。期能展開科技與人文的深度對話,提供新的視角瞭解音樂文化。

The Music and Culture Technology Lab was founded in 2017. We devote ourselves to develop cutting-edge digital signal processing and deep learning techniques on music and AI. Major research achievements in 2020 included automatic music transcription, real-time music interactive system, virtual musicians, generative music, and computational musicology. Applications are found across music listening, analysis, production, and even performance activities. Our goal is to launch a deep and fruitful dialogue between technology and humanity, and make music culture as a part of our everyday life.
Shin-Cheng Mu

Functional Programming and Reasoning about Programs
我的研究興趣是程式語言與函數編程,近年來也包括 concurrent 程式的型別系統與 Hoare logic. 它們的共同點是使用符號推理的方式確保程式的正確性。不論在哪個典範中,我們都希望把寫程式視作一個可用數學與邏輯方式推理的行為。程式的正確性可用型別系統或邏輯推演保證,甚至可用規格與需求開始,經由數學方法一步步推導出程式。

* 設計幫助推理用的符號、程式語言、型別系統等。
* 挑選一些演算法問題,嘗試以數學方法實際證明演算法之正確性,或將演算法推導出來。
* 研究 concurrent 程式以及其型別系統 (session type) 與邏輯之關係。
* 以函數編程語言為工具,開發 Hoare logic 與指令式語言編程課程使用的教學系統。


My research interest concerns programming language and functional programming, and extends to Hoare logic and type systems for concurrent programs. The common theme is that programming is seen as a formal, mathematical activity. Correctness of a program can be guaranteed by logical reasoning or type system. Or, a program can even be derived stepwise from its specification.

Possible topics include:

* design symbols, languages, or type systems that aids the programmers in reasoning about programs;

* pick an algorithm, and apply our approaches to prove its correctness or even to derive an algorithm;

* study the type system (session type) for concurrent programs and its relationship with logic;

* develop tools for teaching Hoare logic and reasoning of imperative programs, using a functional programming language.

More details can be discussed. If you are interested, we can spend the first 1 to 1.5 months of the internship studying the background knowledge, before diving into developing something new.
I-Chen Wu

Deep Reinforcement Learning and its Applications
我的研究興趣是機器學習相關應用與電腦對局應用,尤其在深度強化式學習技術(Deep Reinforcement Learning; DRL)(為深度學習(DL)與強化式學習(RL)的結合),研究方向與主題主要以DRL相關應用為主,分類如下:

1.輕量型模組應用:適用於以MCTS為主的程式,提出新的多策略價值蒙地卡羅搜尋樹(Multiple Policy Value MCTS)演算法,改良棋力。
-  應用:如棋盤遊戲、卡牌遊戲、益智遊戲等。
-  研究主題:如AlphaZero中之多重策略價值(Multiple Policy Value)之蒙地卡羅樹搜尋(Monte-Carlo Tree Search; or MCTS)、人工智慧相關計畫等。

-  應用:機器人、無人機、自動駕駛等。
-  研究主題:Many value-based, policy-based and model-based RL methods.

-  應用:Video games、模擬機器人、ITM、計畫相關問題等。
-  研究主題:Imitation learning(模仿學習)、transfer learning(遷移學習)、Meta-Learning(元學習)等。

My research interests include machine learning and computer games, particularly for Deep Reinforcement Learning (DRL), a combination of Deep Learning (DL) and Reinforcement Learning (RL). Research directions and topics are mainly on DRL applications, classified into the following.

1.Lightweight-model applications: Environment models are well known or tractable, so backtracking and Monte-Carlo tree search (MCTS) are allowed.
⁃Applications: Games such as board games, card games, puzzle games, etc. In the future, e-tutoring is a potential application.
⁃Research topics: AlphaZero, MCTS, planning, explainable AI (XAI).

2.Heavyweight-model applications: Environment models are well known, but may be complex or intractable, so most training methods are based on training on trajectories.
⁃Applications: Video games, robots with simulator, ITM, scheduling problems, etc.
⁃Research topics: Many value-based, policy-based and model-based RL methods.

3.Real-world-model applications: Environment models are unknown or too complex, and these cannot be trained with a large number of trajectories.
⁃Applications: Robots, drones, autonomous driving, etc.
⁃ Research topics: Imitation learning, transfer learning, meta learning, etc.
Huai-Kuang Tsai

Machine learning in bioinformatics research


We study big data from biological systems using bioinformatics techniques and statistical methods. We work with biologists to seek insights into the genomics of eukaryotic organisms. By integrating multi-omics data, we study genome-wide regulatory systems on gene expressions and their significance in evolution. In addition, we are currently expanding into the area of biomedical informatics, aiming at integrating disease information with sequencing data for development of applications in precision medicine. We use methods such as data mining and machine learning in our studies on regulatory mechanisms in genomics, with the goal of building predictive models with potential for applications.
We are seeking interns with a background in computer science and interested in bioinformatics. The applicant should be familiar with at least one programming language or has a good grasp on how database management systems work. Background in biology is not required, but you should be a student comfortable with cross-disciplinary research, i.e. able to work with researchers from different backgrounds. Our laboratory will provide training in domain knowledge related to bioinformatics. If you are passionate about solving biological problems with techniques in informatics, we welcome you to join our team!
Yu-Chya Lee
利用機器學習提供網路、社群媒體安全的使用 環境

Disinformation Firewall Powered by Machine Learning for a Safer Online Environment
本計畫目標為發展研究現下當紅之技術以建構即時平台以協助民眾判別資訊的真偽,透過大量機器學習(machine learning)演算法以及人工智慧(AI)模型,針對從社群等主流數位媒體平台擷取之資料,開發一般民眾可輕易操作之網頁平台,讓民眾可主動查詢或在遭遇可疑網址/資訊時,即時收取相關提醒/警示資訊(reminder/alert)。
透過這些標註訊息,可提供民眾對網路與社群媒體流通訊息的可信度評估,進而提升媒體識讀能力。從教育層面做起,讓民眾充分享有自由民主社會下言論與資訊自由同時,提升判別能力,並且受到數位世界的資訊安全防火牆(disinformation firewall)之保護。

The purpose of this proposal is to use machine learning model as base technology
and apply it to the digital venues where the common populations are prone to be affected, to correct
the misleading, to remind the public of the suspicious, and to educate the people on their capability of
differentiating what they see.

We start with data collecting from major social media, such as Facebook, Weibo, Reddit, Twitter,
YouTube, and websites where human act as active agents of information dissemination, and then
proceed with data cleaning, natural language processing and tagging for further use in later speed
lookup in the system. We also trace and analyze how data/news spread, i.e. information dissemination
pattern, among social network, aiming to build a miniature social network for quick information
monitor and alert.

The Information Firewall sits upon a huge social data farm encompassing a wide variety of news and
media and is powered by self-developed search engine. Information Firewall expects to deliver by
stage, from technical modules such as disinformation discriminator and generator, to end user
products for fact checking to help educate and enable the public to distinguish
any disinformation and make the most appropriate choices.
Chun-Shien Lu

Security and Privacy in Deep Learning
Motivation. There are an increasing number of multimedia datasets that can be applied across a variety of industries, and these datasets provide an opportunity for collaboration between data owners and the machine learning research community. However, many datasets contain sensitive information that discourages data owners from sharing the datasets with the machine learning research community.
Goal. We will intend to develop approaches to privatize the multimedia dataset and then publish the privatized dataset, with the purpose of both keeping the dataset utility and ensuring the privacy. More specifically, due to the rich information of face, we will consider the facial image dataset and video dataset. Thus, our goal is to develop approaches to (formally) privatize facial images such that the privatized images retain the utility such as the high accuracy of face detection, face recognition, and model training.

Bow-Yaw Wang

Formal Verification on Cryptographic Programs
密碼系統之安全性是資訊安全的基本條件。若是密碼系統實作出現錯誤,則資訊安全便受到極大之危害。本研究將利用實驗團隊所開發之形式驗證工具 Cryptoline 以驗證密碼程式庫(如:OpenSSL等)中程式之正確性。

Computer security relies on the security of cryptosystems. If there are errors in cryptosystem implementations, computer security is no longer attainable. This research will use our formal verification tool Cryptoline to verify programs in various cryptography libraries such as OpenSSL.
Tyng-Ruey Chuang
1. 相互豐富的研究資料與維基資料 2. 創新以及可永續的研究資料管理和協作

1. Mutual Enrichment between Research Data and Wikidata 2. Innovative and Sustainable Research Data Management and Collaboration
1. 開放研究資料已不再是新的口號。從 Open Data 到 FAIR Data 有各種倡議與原則,但面臨研究實務上的議題時,針對不同學科領域的研究資料,存在著不同程度的想像空間及挑戰。好的研究,始於好的資料。除了使用 Wikidata 做為「研究資料寄存所」 (網址: https://data.depositar.io/about ) 的資料集關鍵字來源,以加強資料集之間的語意連結之外,本實驗室也陸續嘗試與不同學科領域的研究夥伴,進行各種資料的爬梳及結構化的處理。

2. 我們將從社群 (community)、技術 (technology)、協作 (collaboration)、以及研究 (research) 四面向, 協力發展台灣本地在研究資料管理的實踐社群。此實踐社群將以我們已開發的「研究資料寄存所」(網址: https://data.depositar.io/about )為實踐的場域之一。本計畫的預期成果包括:培養研究資料管理人才、參與開放資料軟體系統的國際協作專案、提昇研究資料管理實踐社群在台灣的規模與內涵、以及參與並貢獻所能到全球研究社群。


1. Improving data discovery through Wikidata - WikidataCon 2019


2. The Use of A Data Repository in Soundscape Monitoring and Ecological Assessment - RDA 15th Plenary Meeting


1. We will study WiIkidata, and use Wikidata to enrich research datatsets, and vice versa. We will study and use Graph DB, for example TerminusDB, to build and maintain knowledge store and to connect it to Wikidata.  We will further enhance our research data repository (called depositar, website: https://data.depositar.io/about ) with Wikidata, and vice versa.

2. We will work on the community, technology, collaboration, and research aspects of research data management. We will help develop a community of practice for research data management in Taiwan. A research data repository we have developed (called depositar, website: https://data.depositar.io/about ) can function as a starting place where the communities practice research data management. The expected outcome of such a effort includes: cultivating research data management talents, participating in international collaborative projects for open data software systems, elevating the scale and capacity of the research data management community in Taiwan, and participating in and contributing to the global research community.

Please refer to the following two posters for more information:

1. Improving data discovery through Wikidata - WikidataCon 2019


2. The Use of A Data Repository in Soundscape Monitoring and Ecological Assessment - RDA 15th Plenary Meeting

Jen-Chun Lin

Learning To Visualize Music Through Shot Sequence


An experienced director usually switches among different types of shots to make visual storytelling more touching. However, while the visual storytelling technique is often used in making professional recordings, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. To this end, in this project, we aim to create deep learning techniques, ranging from shot classification to music-to-shot translation, to assist amateur creators to create more professional videos.

The intern is expected to perform independent research on selected topics in shot classification, music-to-shot translation, or relevant topic that interest him/her. After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
Da-Wei Wang

Medical data analysis using machine learning technology
研究主題為資料分析與機器學習技術在醫療領域的應用,包含:自動語音辨識 (Automatic  Speech  Recognition, ASR)、結構化資料分析等。

The research area is using statistical data analysis methods and machine learning technology in the medical field. The study topic can include automatic speech recognition (ASR) and structured data analysis.
Jan-Jan Wu

Optimizing performance of deep learning on heterogeneous system architecture
近年來,將多種neural network模型結合起來以提高深度學習能力的趨勢日益增加,此稱為複合式神經網路模型(hybrid neural network model)。例如,許多應用程序將 CNN 和 RNN 結合起來進行視頻字幕,視頻問題解答,自動醫療報告生成,股票交易分析,電影評論分析和污染物預測。隨著越來越多的AI應用程式採用複合式模型,優化複合式模型的執行以縮短推理時間已成為一項及時而關鍵的研究課題。此外, CPU+GPU 異構系統架構是現代計算機中的常見架構。目前常見的運算方式是在GPU上同時運行CNN和RNN, 此GPU-only運算方式未能充分利用CPU + GPU異構系統架構所提供的計算能力而導致較長的推理時間。

此外, 許多新型AI應用, 例如推薦系統, 知識圖譜等, 使用GNN/GCN作為深度學習訓練與推理的網路模型. GNN/GCN包含較複雜的不規則計算行為以及大量的稀疏矩陣(sparse matrix)計算. 這些計算在傳統GPU不易獲得良好執行效能. 然而近年CPUs提供強大的向量指令(例如Intel AVX512 向量指令可同時計算8個64-bit資料) , 其gather/scatter指令可快速存取非連續記憶體位址資料, 為不規則計算與sparse matrix計算開啟新契機. Tensor core GPU 也為sparse matrix計算做特殊硬體優化, 在稀疏度為50%時可達到兩倍加速. 如何運用AI compiler技術以及優化演算法設計使GNN/GCN等複雜模型充分利用向量指令或 Tensor core GPU的硬體優勢以達最佳運算效能亦是極具挑戰性的研究議題.

本實驗室研究方向為:(1) 透過AI編譯器(例如TVM 和MLIR)的優化技術,並配合資源配置和排程演算法設計, 研究如何利用異質運算平台(heterogeneous platform)上多CPUs、多GPUs、以及CPU+GPU+AI加速器等運算環境,提高深度學習模型(特別是複合式模型)的執行效能。(2) 使用MLIR AI compiler framework發展一系列GNN/GCN 優化技術 並實作於AVX512 + GPU + Tensor core之異質系統架構.

Develop AI compiler optimization techniques and resource management/task scheduling algorithms to map complex Deep Neural Network models to heterogeneous system architectures in order to improve inference time.
Hsin-Min Wang

speech recognition, speaker/language recognition, speech synthesis and conversion, speaker separation and segmentation, speech translation, spoken question answering system
語音處理是有高度發展前(錢)景,但入門門檻高的領域。在目前 AI 領域中,相對於影像、電腦視覺、自然語言處理,可說是一片藍海。我們致力於契合台灣語境(國語、臺語、客語、英語)的語音研究,在學術上既能與國際最高殿堂接軌,在系統上也不失本土化的應用意涵。





Speech processing is a field with a highly developed (money) scene but a high barrier to entry. In the current AI field, compared to imaging processing, computer vision, and natural language processing, it can be said to be a blue ocean. We are committed to speech processing research that fits the Taiwanese context (Mandarin, Taiwanese, Hakka, and English), and can be academically connected with the highest international halls, and the system does not lose the meaning of localized applications.

1) In terms of speech recognition, our recognizer must be able to recognize Mandarin speech of young people and Taiwanese speech of the elderly. For those who like to mix English in their speech, it should not be troublesome. Coupled with the technology of adaptation, it is necessary to be able to identify the voice of a specific speaker specifically to achieve the effect of customization and robustness to environmental interference.

2) In terms of speech synthesis, our synthesis system must not only speak Mandarin and Taiwanese, but also be able to use voice conversion technology to customize the user-specified human voice, which is not only entertaining, but also an indispensable technology for spoken language preservation and audiobook production.

3) The current text machine translation and text Q&A are not uncommon. It is really convenient to use in smart homes and speaker environments with the help of voice. Therefore, not only our machine translation system must be able to handle the translation between Mandarin and Taiwanese and between Mandarin and English that local people need, but also our Q&A system must achieve the effect of fast listening and efficient response.

Note: We work closely with Dr. Ming-Tat Ko on Taiwanese speech processing. Students who are highly enthusiastic and interested in the research of Taiwanese speech processing can also apply to join Dr. Ko's laboratory. If admitted, we will work together.
Yi-Hsuan Yang
自動音樂生成: MIDI、聲音、與圖像

Automatic music generation: Generating MIDI, Sounds, Images

We are interested in both symbolic (MIDI) domain and audio-domain music generation; the former concerns with generating MIDI scores [1, 2, 3, 4] while the latter generate sounds [5, 6, 7, 8].  We are also interested in multi-modal generation models that generate not only audio but also the visual counterpart [9, 10]. We welcome intern candidates who has solid backgrounds/experiences/understandings in deep generative models such as Transformers, GANs, and flow based models [11], with strong motivation to publish papers in top AI/ML conferences as a result of the internship.  Experience in music playing and/or composition is a plus but not a must.  Our lab has close collaboration with the Taiwan AI Labs, Sony Japan, and research labs in other countries.  Please feel free to drop me a mail to show passions and for questions.

[1] CP Transformer. https://arxiv.org/abs/2101.02402
[2] Pop Music Transformer. https://arxiv.org/abs/2002.00212
[3] Guitar Transformer. https://arxiv.org/abs/2008.01431
[4] Jazz Transformer. https://arxiv.org/abs/2008.01307
[5] Jukebox. https://openai.com/blog/jukebox/
[6] UNAGAN. https://arxiv.org/abs/2005.08526
[7] Loop Combiner. https://arxiv.org/abs/2008.02011
[8] DeepSinger. https://arxiv.org/abs/2007.04590
[9] StyleGAN. https://arxiv.org/abs/1812.04948
[10] DALL.E. https://openai.com/blog/dall-e/
[11] https://courses.cs.washington.edu/courses/cse599i/20au/
Hsiang-Shang ‘Josh’ Ko

Interactive type-driven programming / Diagrammatic quantum programming

1. 型別互動程式設計
Interactive type-driven programming

程式語言若有基本的型別系統 (type systems),我們便能避免寫出某些無意義的程式(例如使用某函式時應輸入字串,我們卻傳入整數)。若有更強的型別系統,我們不僅能排除更多無意義的程式,甚至只能寫出有意義的程式。依值型別 (dependent types) 的表達能力直接對應於高階邏輯 (higher-order logic),足以表達正確程式應滿足的性質;當依值型別與互動式開發環境 (IDE, interactive development environment) 結合,寫程式時 IDE 就能進行型別推導、回答我們程式各部分應滿足什麼性質,並依型別資訊幫我們(自動或半自動地)產生程式。

這部分實習內容將以 ‘PLFA’ 這份線上教材入門:

* Philip Wadler, Wen Kokke, and Jeremy G. Siek [2020]. Programming language foundations in Agda. https://plfa.github.io.

有了 PLFA 的基礎後,我們可試著寫一些較複雜的依值型別程式/演算法,例如:

* Hsiang-Shang Ko [2021]. Programming metamorphic algorithms: An experiment in type-driven algorithm design. The Art, Science, and Engineering of Programming, 5(2):7:1–34. https://josh-hs-ko.github.io/#publication-9f9adfcc.

* Hsiang-Shang Ko and Jeremy Gibbons [2017]. Programming with ornaments. Journal of Functional Programming, 27:e2:1–43. https://josh-hs-ko.github.io/#publication-696aedff.

2. 量子程式的圖像推理
Diagrammatic quantum programming

一派計算學家於本世紀發展出「範疇量子力學」(Categorical Quantum Mechanics),以高度抽象的範疇論 (category theory) 重新省視量子理論並構築一套更高階 (higher-level) 的論述。這套抽象的論述本質其實是一套圖像化算則 (graphical calculus),操作起來相當省力(特別相較於傳統的線性代數計算),並能清楚顯現計算上的直覺。而且因為理論高度抽象,這套算則也能用於具機率或不確定性之程式 (probabilistic or non-deterministic programs),甚至可以在同一語言內融合地論證量子與古典/機率性質。

這部分實習內容將以研讀討論 ‘PQP’ 這本教科書為主:

* Bob Coecke and Aleks Kissinger [2017]. Picturing Quantum Processes. Cambridge University Press. ISBN: 9781107104228. https://doi.org/10.1017/9781316219317.

若進度夠快,我們可試著比較 PQP 和標準教科書之作法:

* Michael A. Nielsen and Isaac L. Chuang [2010]. Quantum Computation and Quantum Information. Cambridge University Press, 10th anniversary edition. ISBN: 9781107002173. https://doi.org/10.1017/CBO9780511976667.

以及用 PQP 的圖像化算則試著(重新)寫一些關於量子演算法的正確性或複雜性論證。

There are two possible topics.

1. Interactive type-driven programming

Basic type systems help to preclude a class of non-sensical programs (for example, passing integers to functions expecting strings). Stronger type systems preclude more non-sensical programs, and even better, allow only sensical programs. Corresponding to higher-order logic, dependent types are highly expressive and capable of describing program correctness properties; when programming with dependent types in an interactive development environment (IDE), the IDE can reason about types on our behalf and let us know what properties should be satisfied by any part of a program upon request, and generate programs (automatically or semi-automatically) based on type information.

We will study the ‘PLFA’ online tutorial:

* Philip Wadler, Wen Kokke, and Jeremy G. Siek [2020]. Programming language foundations in Agda. https://plfa.github.io.

Afterwards we will work on some more sophisticated dependently typed programs/algorithms, for example:

* Hsiang-Shang Ko [2021]. Programming metamorphic algorithms: An experiment in type-driven algorithm design. The Art, Science, and Engineering of Programming, 5(2):7:1–34. https://josh-hs-ko.github.io/#publication-9f9adfcc.

* Hsiang-Shang Ko and Jeremy Gibbons [2017]. Programming with ornaments. Journal of Functional Programming, 27:e2:1–43. https://josh-hs-ko.github.io/#publication-696aedff.

2. Diagrammatic quantum programming

The ‘Categorical Quantum Mechanics’ project of this millennium re-examines quantum theory and builds a higher-level formulation based on the highly abstract language of category theory. Despite being abstract, the essence of the new formulation is a graphical calculus, which is easy to manipulate (especially compared to the traditional linear algebraic calculations) and readily reveals the computational intuitions. The abstract nature of the formulation makes it applicable to probabilistic or non-deterministic programs, and we can even uniformly reason about quantum and classical/probabilistic properties within the same language.

We will mainly study the ‘PQP’ book:

* Bob Coecke and Aleks Kissinger [2017]. Picturing Quantum Processes. Cambridge University Press. ISBN: 9781107104228. https://doi.org/10.1017/9781316219317.

If time permits, we could try to compare the approaches of PQP and the standard textbook:

* Michael A. Nielsen and Isaac L. Chuang [2010]. Quantum Computation and Quantum Information. Cambridge University Press, 10th anniversary edition. ISBN: 9781107002173. https://doi.org/10.1017/CBO9780511976667.

Also we could try to (re-)write some correctness or complexity arguments about quantum algorithms in terms of PQP’s graphical calculus.
Chien-Min Wang

Cloud Computing and Human-Centered Computing
(1) 整合記憶體內資料儲存的雲端計算平台:MapReduce是目前利用雲端計算來處理巨量資料方面,最常用的平行計算模型。然而我們發現有一類雲端應用,雖然非常適合MapReduce模型,但是其執行效能卻非常低落,而且計算規模也有很大的限制。這類應用包括用於基因定序的後綴陣列排序和近來很受重視的演化式計算。我們將對現有的Hadoop平台進行擴充和改進,融合記憶體內資料儲存,提出一個泛用的加強型雲端計算平台,以提升執行效能和規模擴充性。我們也將實作後綴陣列排序以及演化式計算,以驗證我們所提出的架構,對於這兩種應用的執行效能和應用規模有多大的提升。我們相信這樣的雲端計算平台不但對於學術研究有很大的貢獻,還能大幅拓展雲端計算平台的應用。

(2) 使用遺傳式編程探究監督式機器學習:本研究計畫透過嘗試解決兩個不同需求的應用問題,來探討監督式機器學習的兩個不同階段。不同於時下熱門的深度學習方法使用類神經網路模型和倒傳遞式訓練,本研究計畫探索機器學習的另一種可能性與方向,也就是遺傳式編程(Genetic programming)。其使用數學表達式模型和演化式搜尋學習,有益於機器學習結果的理解、推導與運用,符合Explainable AI所提倡之概念。要完成本計畫的目標,預期需探討的主要研究議題包括:適應(目標)函數設計、學習(演化)運算子新增與修改、觀察樣本資料處理、平行化或GPU加速、以及效能測試與方法驗證等。

(3) 人智運算的穿戴運算系統:研究穿戴式電腦及裝置在人智計算中的應用,特別是在社交網路方面的應用。我們計劃中的人智運算系統應具備的三種能力:具有瞭解周遭環境與人們情況的能力,可提供使生活更美好的服務,和透過感官與人類自然地互動。為了實現這三個能力,我們計劃中將從三個研究學科來發展:情境識別,雲服務,以及擴增實境。藉由研究相關的穿戴式電腦及裝置,開發更佳的人機整合功能,並透過社交網路系統之系統分析,研究並開發穿戴式社交網路系統,以提升使用者經驗為目標,並且提供更適合的情境感知技術與實境服務的增強實境功能。

(1) A MapRedice framework with an In-memory Data Store: MapReduce is a powerful programming model for processing large data sets with a parallel, distributed algorithm on clouds. The Hadoop framework is the most popular implementation of MapReduce and widely adopted in the processing of large datasets. However, our previous experience on suffix array construction with Hadoop shows that it might result in excessive disk usage and access. Therefore, the performance is degraded and the scale of the application is limited. In this project, we aim at efficient and scalable processing of expansive MapReduce (EMR) applications with in-memory data stores. EMR applications, including suffix array construction and evolutionary computation, are a group of applications that have performance and scalability issues with Hadoop. We shall integrate an in-memory data store with Hadoop and propose a MapReduce framework for EMR applications  to enhance their performance and scalability. To validate the benefit of the proposed framework, we shall use suffix array construction and evolutionary computation as our testbed.

(2) Exploring supervised machine learning with genetic programming: Instead of adopting the widely used deep learning techniques, this project aims at another possibility and research direction, i.e., genetic programming, which employs evolutionary searching/learning strategy and mathematical expression-based models that are helpful for the understanding and use of the outcomes of machine learning process. To reach our goal, the research issues that need to be carefully addressed include the design of fitness function, the invention and modification of evolutionary operators, data processing for observation samples, acceleration with parallel or GPU computing technologies, and performance testing and validation.

(3) Wearable Computing Systems and Applications in Human-Centered Computing: The goal of this project is to investigate the application of wearable computers and devices in Human-Centered Computing, especially those applications on social networks. A human centered computing system should have three abilities: understanding the context of the surrounding area and humans, providing the service that makes the lives better, and interacting with human naturally through perception. To realize these three abilities, we plan to adopt three corresponding research disciplines: context recognition, cloud service, and augmented reality. Wearable computers and social network services will be integrated to build the proposed wearable social network system. The proposed system will provide more convenient and user-friendly human-computer interaction.
Wen-Liang Hwang

Research on Deep Neural Network Learning Theory

Study the method of ADMM for deep neural network learning and convergence problems
Yennun Huang

IoT / AIoT system development and AI scientific data analysis
1. 物聯網系統研發
- Arduino、Micro Linux韌體開發
- 藍芽偵測、連線、資料傳遞開發
- 未來將於TI之藍芽平台上開發應用, 須學習新開發平台
- 網頁前端與網頁後端開發
- 人機互動(Human-Computer Interaction)研究

2. AIoT 系統研究
- 工作內容為為邊緣運算與聯邦式學習(Federated Learning)的相關研究
- 具良好服務品質的聯邦式學習之邊緣運算服務
- 基於聯邦式學習整合MLOps於邊緣運算平台的應用
- 多代理人(multiagent)為基礎的AIoT於邊緣運算平台之開發與測試

3. 資料科學系統程式開發
- Python、Linux、Matlab、Deep Learning Toolbox基礎程式撰寫能力
- 資料去識別化保護技術相關研究
- 架設與整合IoTtalk物聯網平台,與基礎資料科學知識
- 以上將從事資料安全、文字分析等deep learning, AI security等相關領域研究工作及論文發表

4. 智慧農業計畫
- 工作內容為資料蒐集、彙整、分析
- 使用機器學習方法訓練模型
- 資料科學相關

1. IoT System Development
- Arduino, Micro Linux Firmware
- Knowledge about Bluetooth protocol
- Will develop Bluetooth applications on Texas Instrument chipsets
- Experience in web front-end and web back-end development
- Research on Human-Computer Interaction

2. AIoT system research
- Work content is related to edge computing and Federated Learning
- Edge computing services for federated learning with good service quality
- Integration of MLOps on edge computing platform based on federated learning
- Development and testing of multi-agent (multiagent)-based AIoT in edge computing platform

3. Data science system program development
- Basic programming ability for Python, Linux, Matlab, Deep Learning Toolbox
- Research on data privacy technology
- Establish and integrate IoTtalk IoT platform, and basic data science knowledge
- Over the research work and publish for data privacy, text analysis with deep learning and AI security

4. Smart Agriculture Project
- Work includes data collection, compilation and analysis
- Use machine learning methods to train models
- Data science related
Meng-Tsung Tsai

Graph Streaming Algorithms
我的研究興趣在探討如何使用 O(n) 的記憶體空間處理各式的圖論計算問題,這裡 n 是指輸入圖的節點個數。

我們假設圖的邊是按照某個最糟的順序一條一條給演算法,而且只給一次。一張 n 個節點的圖,最多會有 Ω(n^2) 條邊,因為限制只能使用 O(n) 的記憶體空間,勢必得強迫演算法 "忘記" 大部分曾經讀進來的邊。在這個前提下,如何設計演算法完成各式的圖論計算問題?

在這嚴格的限制下,或許心裡的第一問題是:"是否大部分的圖論問題都不能在使用 O(n) 記憶體的狀況下完成計算?" 目前的研究文獻已經證實,許多圖論計算問題可以,但也有許多圖論計算問題,保證無法在這限制下計算出來。後者的情況,常常能找到方法在使用少量的空間下,找到 (1) 不錯的近似解、(2) 具有隨機成分的最佳解 (在很高的成功機率下)、或 (3) 具有隨機成分的不錯的近似解 (在很高的成功機率下)。這邊的機率與輸入的圖無關,只和演算法使用的隨機成分有關。


1. 存在 NP-complete 的圖論計算問題,可以使用 O(n) 空間回答!
2. 對於將輸入圖拆分成盡可能少的無環子圖這個圖論計算問題,任何演算法都需要 Ω(n^2) 的記憶體空間才能找到最佳的拆分法!但存在演算法,只要 O(n) 的記憶體空間就能找到近似於最佳解的拆分法。


We are interested in whether a graph problem can be computed using O(n) space, where n denotes the number of vertices in the input graph.

We assume that the edges of the input graph are given to algorithms one by one, in an arbitrary order, and only once. Note that an n-vertex graph may have Ω(n^2) edges. If an algorithm uses O(n) space, then it has to "forget" much information of the input. Given the restriction, can we design algorithms to solve graph problems?

One may wonder whether there are many problems that can be solved using little space. It has been shown in the literature that: dozens of graph problems can be solved using little space, while dozens of graph problems cannot. In the latter case, the community usually can come up with a solution that approximates the best possible to within some factor, a solution that matches an optimal one with high probability, or a solution that approximates the best possible to within some factor with high probability. The probabilities here depend only on the randomness used in algorithms, and do not depend on the input graph.

The recent results obtained by our lab include:

1. There exists some NP-complete graph problem that can be computed using O(n) space!
2. For any streaming algorithm, decomposing a graph into the least number of acyclic subgraphs requires Ω(n^2) space. However, this problem can be well approximated using O(n) space.

In this independent study, we expect to learn how to apply mathematical methods to answer the questions: whether a graph problem can be solved using little space, and which category your favorite graph problem belongs to?
Hsiang-Yun Cheng

Energy-efficient future memory systems for data-intensive applications
近年來資料密集程式,像是深度學習、圖論分析、基因序列分析等,越來越盛行,這些資料密集程式在運算時往往需要大量的記憶體存儲空間與高效的資料存取,然而目前主流的運算系統無法滿足這些需求, 使得我們必須重新思考如何設計未來的電腦系統。許多新興的記憶體技術,像是 Intel 的Optane Memory、電阻式記憶體(ReRAM) 等,由於具備高密度低漏電之特性且兼具存儲與運算功能,提供了設計未來運算系統時新的可能: (1) 模糊化記憶體與存儲系統間的界線,使得程式能直接透過memory bus快速存取可持久保存之資料 (2) 從傳統運算單元為主的系統切換到記憶單元為主的系統設計,在記憶體內直接做運算減少資料傳輸造成的額外耗時與耗能。然而由於這些新興記憶體元件之不穩定性,在系統設計上有許多尚待克服之挑戰。本實習計劃的目標為針對資料密集程式之應用情境,探討不同層面上之設計挑戰,包括電路與元件階層、計算結構階層、及演算法階層,並以軟硬體協同設計的方式, 設計高效能低耗電之新世代記憶體系統。實習生可選擇參與下列研究主題,或其他相關研究議題。

1. 利用新興記憶體兼具存儲與運算能力之特性,設計高效能且滿足運算精準度需求之深度學習加速器。
2. 設計適用於加速圖論分析演算法之新興記憶體系統。
3. 利用新興記憶體兼具存儲與運算能力之特性,以軟硬體協同設計提升基因序列分析演算法之效能。

In recent years, data analytics applications that must process increasingly large volumes of data, such as deep learning, graph analytics, genome data analytics, etc, have become more and more popular. These big data applications demand large memory capacity and efficient data accesses. Unfortunately, mainstream computing systems with DRAM-based main memory are not designed to meet their needs. This forces us to fundamentally rethink how to design future computing platforms.

Emerging memory technologies, such as Intel's Optane Memory, resistive RAM, etc, offer superior density, non-volatile property, and computing-in-memory capability. These promising features enable them to open up new opportunities for designing future computing platforms: (1) blurring the boundary between memory and storage to allow fast accesses to large persistent store; (2) shifting from contemporary processor-centric design towards the revolutionary memory-centric design to reduce costly data movements. Despite it is promising, bringing such a system into practice remains challenging due to the non-ideality of these new memory devices.  Our goal is to study the design challenges at different system layers, including device/circuit level, architecture level, and algorithm level, and propose cross-layer designs to fully exploit the potential of these new memory technologies. Candidate topics include, but are not limited to, the following:

1. Cross-layer co-design to improve the reliability and energy efficiency of computing-in-memory based deep learning accelerators.
2. Exploiting emerging memory technologies and cross-layer co-design to accelerate graph analytic algorithms.
3. Cross-layer co-design to accelerate genome data analytics in memory-centric systems.
Ding-Yong Hong

Virtual Platform and Compiler Optimization for AI Accelerators
我們將研究AI加速器的(1)虛擬平台和(2)編譯器。在虛擬平台部份, 我們將針對AI加速器之模擬器的執行效能, 研究如何利用多核心處理器, GPU, 或新的記憶體技術等, 來設計一個高效能的全系統AI加速器模擬平台。 在編譯器部份, 我們將研究如何利用編譯器技術, 優化深度學習程式, 使其在AI加速器上能達到最佳的運算效能。我們也會研究如何結合環境中各種不同的運算資源: 例如CPU, GPU, AI加速器等, 協調這些可用資源的運算, 來達到高效能或低功耗的目標。

The goal of this research is to study the (1) virtual platform and (2) AI compiler, for deep learning accelerators. We will focus on how to design an efficient and scalable full-system virtual platform by exploiting the host multicore/manycore hardware and new memory technologies. In addition, we will develop compiler optimization techniques to accelerate deep learning programs on AI accelerators and coordinate the computing resources (e.g. CPU, GPU, AI accelerators) to achieve high performance or low power consumption purposes.
Tung Chou

Efficient implementations and attacks for post-quantum cryptography

Due to the advance in development of quantum computers, existing cryptosystems will be replaced by those resistant to attacks from quantum computers, i.e., post-quantum cryptosystems. There are two types of researches for this internship. The first type is to build efficient software/hardware for post-quantum cryptosystems or related protocols. The second type is to study existing attacks and try to design more efficient ones.
Ting-Yi Sung

Bioinformatics and big data analysis for proteomics and proteogenomics studies


我們實驗室所訓練出來的人才,也是國外亟需的人才,之前的博士班學生畢業後到美國如:Johns Hopkins U、U of Michigan醫學院擔任博士後,之後一位返國於國立大學任教;也有國外著名蛋白體研究學者主動來邀請成員加入其機構。我們竭誠歡迎有志學習、有熱情的同學,尤其是資訊領域的同學,加入暑期實習。

Proteins are the final product of genes that execute various biological functions in cells. Furthermore, in biomedicine, proteins are the most prominent drug targets. Therefore, after the genomics era as the advancement of mass spectrometry (MS) technology, proteomics research has received ever-increasing attention in cancer research. Furthermore, though genomics study can identify actionable genomic mutations for therapies, many actionable mutations do not respond to targeted therapy and many responses are temporary. It also has been reported that proteomics study can detect new subtypes of cancer with clinical association, in addition to those being found from genomics studies. Therefore, proteomics and proteogenomics studies have recently become essential in precision medicine for cancer research.

Mass spectrometry is the most commonly used experiment technology to conduct proteomics and proteogenomics research. As the advancement of MS technology, high-throughput MS data are generated. The analysis of such big MS data is a very important topic. Our lab is one of the very few labs conducting research on bioinformatics for proteomics in Taiwan. Our lab has been particularly working on mass spectrometry data analysis, including algorithm design and software development, for over fifteen years. In addition, we are also interested in proteogenomics study to detect genomic or transcriptomic variations at the protein level from MS data because those variations can be related to cancers. We are developing computational methods for identifying variant peptides in some specific proteins. Though we have some preliminary results, we need to develop a data analysis pipeline to facilitate the discovery of variant peptides and their validation. Furthermore, because a huge amount of MS spectra with peptide annotation have been publicly available, we are also interested in research on machine learning and data mining for proteomics analysis.

Some of our lab members received post-doctoral positions in medical schools of Johns Hopkins U. and U. of Michigan in US. One lab member received a research associate position in a prestigious institute in US. We invite those who are informatics or statistics major and interested in bioinformatics for cancer research to apply.
Wei-Yun Ma

Automatic Advertisement or News Generation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
今年的實習我們邀請同學透過深度學習(Deep Learning)進行以下專案的其中之一,並鼓勵暑期實習生在實習期間能大膽地進行技術或應用的創新。

1. 廣告文案或新聞的自動生成:當輸入是一款手機的規格表,系統能自動生成出一篇具有說服力的廣告文案。或是當輸入是一場NBA的比賽數據表,系統能自動生成出一篇緊張刺激的播報新聞。我們希望透過深度學習當中的增強式學習(Reinforcement Learning)以及語言模型,打造一個這樣的文字生成系統,能夠一方面忠於輸入的表格內容,另一方面能發揮創造力,寫出多變化又文情並茂的文章。

2. 不限主題的閒聊機器人:即所謂 Chitchat Chatbot,也就是沒有特定目的的聊天. 目前這類型的bot大多數的作法是利用深度學習當中的seq-to-seq model來建構,但是,這樣的作法通常無法產生有意義或是較為深入的回應,多數會流於插瞌打渾或是賣萌。其中的關鍵,在於bot缺少了對於聊天主題相關的基本常識,就像是user要跟bot討論劉德華,bot應該對劉德華的各個fact(身份,作品...)有足夠認識,回的response才會豐富有意義,不然巧婦難為無米之炊,沒有知識就不容易產生有意義的回應,我們希望將grounded knowledge以及更豐富的語義訊息encode在model之中。我們透過深度學習當中的增強式學習(Reinforcement Learning),已經訓練出一個不限主題的LINE閒聊機器人-詞庫小妍(LINE官方帳號:@359mcmgs)。

3. 自動知識學習系統:我們知道新的知識會夜以繼日的不斷產生,一個具有AI能力的系統最重要的功能之一就是能夠從大量的資料當中,分析資料,加以理解,組織成結構化知識。我們實驗室過去已經開發了人類的知識網(E-HowNet),打下堅實基礎,此專案的目標是進一步加以擴張,利用深度學習技術將關鍵的關係三元組合從閱讀的文章中自動抽取出來,如 (”哈登” ,MemberOf,”火箭隊”) 或是 (“麥特載蒙”,PlayerOf,”心靈捕手”)等等。

4. 事實推論或事件預測系統:對於一個新事物,人們往往會根據基本常識、已知的事實、經驗的法則等等進行新事物的推測,包含事實或是事件的推論,例如以下的事實推論:已知A說中文,A又是B的哥哥,那麼很高的機率B也會說中文。又例如以下的事件推論:“買麵包”後會有很高的機率會在近期“吃麵包”。在一個龐大的文本或是複雜的知識圖譜當中,推論的關係往往數量龐大,有時甚至複雜到超越人力所能規範與理解,我們希望藉由深度學習技術能自動化的在文本或是知識圖譜當中進行新事物的推測。

Automatic Advertisement or News Generation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
Ming-Tat Ko

Taiwanese Speech Recognition, Synthesis, and Conversion, Taiwanese-Chinese-English Speech Machine Translation, and Taiwanese Dictionary and Corpus Development




3)目前廣為使用的 Google 小姐並不會講臺語,

As the Chinese version.
Tyng-Luh Liu
Developing state-of-the-art techniques and algorithms for computer vision applications

Developing state-of-the-art techniques and algorithms for computer vision applications

Mi-Yen Yeh

Deep Learning for Big Graph Mining
在深度學習盛行且專注於文字與影像資料的同時,近幾年興起以處理圖(Graph)為主的圖神經網路學習模型,其中較有名的兩類模型為圖神經模型(Graph Neural Network, GNN)和圖卷積模型(Graph Convolution Network, GCN)。圖型/網路資料結構可以很直覺地以節點(node)和連結(link)來表示個體與個體之間的關係,例如社群網路可表示人與人之間的交友關係, 而知識圖譜可表示不同實體之間的各種關係。當資料可以圖型表示時,很多應用可將問題表示成節點分類(Node classification)、連結預測(Link prediction)、尾端實體預測(Tail entity prediction)、圖型分類(Graph classification)等工作來解決。本實習希望能深究圖神經模型和圖卷積模型這種堅督式模型在各種假設和應用的可能性,可能的應用主題包含:
(1)知識圖譜的建構、推論、與應用於自動問答;(2)建立資料特徵關係圖,利用圖卷積網路學習並預測廣告點擊率;(3) 將程式碼轉成對應之物件關係圖,利用圖神經網路模型做偵錯與修正; (4) 如何利用高維度資訊或多型態資訊,讓圖神經模型有更強的學習與預測能力。

The internship provides an opportunity to study the graph-based deep learning model such as GNN (graph neural network) and GCN (graph convolutional network). We will explore how to leverage these models in various real-world applications that deal with graph structure data.
Jan-Ming Ho

Bioinformatics and Financial Computing

Our research focus on de novo genome assembly based on state-of-the-art sequencing technology, and developing algorithms for trading and risk prediction and management in financial markets.
