2020 研究主題清單 (2020 Research List)


主持人(PI)研究主題(Research Topic)研究介紹(Introduction)其他資訊(Other Information)
林仲彥
Lin, Chung-Yen
生物醫學大數據資料解析

Harness Biomedical Big data for Genome Biology
我們的團隊主要研究模式與非模式生物之多維基因體學(OMICS),包括基因體、轉錄體、單細胞轉錄體、蛋白質交互網路、腸道微生物與疾病關連等巨量資訊數據分析,同時也定序、重組與註解了多個重要經濟生物之基因體,目前也致力於個人化基因體的重新組裝,探討不同程度的序列變異與疾病之間的關係。此外,研究團隊並專注於跨領域的研究工作,歡迎不同領域(資訊、統計、數學及生物相關)的人才一起合作。研究範圍以單細胞基因解析、水生經濟動物基因體育種及人類腸道與環境微生物之互動等課題為主,同時發展新的高速計算工具及雲端分析平台,以及引入深度學習等策略,來探討基因、病原與環境的三角互動關係。

The main goal of our team is to analyze omic big data which may lead us to know more about the secrets of biological regulations hidden among massive data deluge.  By combination of open source tools and self-developed programs/ platforms, we have assembled, annotated and decoded the several aquatic genomes with high economic importance. New approaches like deep learning will be introduced and polished our studies.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cylin/

實驗室網址(Research Information) :
http://eln.iis.sinica.edu.tw
https://hub.docker.com/u/lsbnb

Email :
cylin@iis.sinica.edu.tw
徐讚昇
Hsu, Tsan-sheng
資料密集運算

Data-Intensive Computing
研究巨量資料下的有效率計算問題

Research is issues of efficient computing
facing big data
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~tshsu/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :
carol@iis.sinica.edu.tw
楊得年
Yang, De-Nian
社群網路與行多媒體網路分析與各式新穎應用

Analysis and Innovative Applications for Social Networks and Mobile Multimedia Networks
研究方向如下: (1) 虛擬實境:隨著虛擬實境技術的蓬勃發展,近來已有許多結合實體世界資源和虛擬世界服務需求的新興應用,例如:(i) 在虛擬實境遊戲中,我們需同時考慮使用者在實體世界網路之訊號和傳輸技術 (例如: 5G、WiFi) 來決定網路資源配置及排程方式,以及在虛擬世界中與物件之間的互動來決定所需傳輸及合成之場景,以確保沉浸式的使用者體驗;(ii) 在虛擬實境商場和社交應用中,人對於不同物品的喜好會表現在逛街行為,如:行走路徑、物品種類、觀看視角、觀看物品所花的時間......等,同時社群網路上的社群關係及互動也會影響人對物品的喜好和看法,因此我們將考慮各種商場及社交因素,以推薦合適的商品及共同喜好之好友。  (2) 新興社群網路技術應用:隨著社群網路在近年來日益普及,社群應用也與日俱增,例如:(i) 社群活動規劃:在規劃社群活動時,除考慮參加者之間在社群網路上的關係距離外,亦需考慮時間、地理位置、活動主題等,以進行更有效且快速的規劃;(ii) 直播社群推薦:時下的直播平台使得觀眾及直播主可進行即時互動,故直播平台被視為一項新興社群應用。在這樣的社群應用中,我們可分析使用者的社群行為,亦可將最佳群組組合及最佳物件內容推薦給使用者,讓使用者得到最佳的體驗;(iii) 社群折價券:許多公司行號常以社群網路為媒介進行病毒式行銷,以達到社群影響力/利潤最大化,近年來更發展出引入「社群推薦」概念的「社群折價券」,適當地分配社群折價券可有效地增加使用者兌換的意願,進而使公司利潤得以增加;(iv) 社群網站成癮探測及行為治療:雖然社群網路的盛行為人們帶來了許多便利,但也帶來一些風險,如:網路成癮。為了能夠及早發現使用者可能罹患的心理症狀,使用者在社群網站上的各種行為模式可做為特徵進行分析,進而判斷使用者可能罹患的心理症狀類型。再者,為了能及早治療使用者的成癮症狀,我們可分析使用者在社群媒體的動態發文資料及其他行為模式,以得知動態發文對使用者造成的成癮指數,再進一步依據行為治療理論,將較低成癮指數的動態發文抽換給使用者觀看,以降低使用者的成癮程度。  將安排下列訓練課程: (1) 近似演算法:以上應用所帶來之最佳化問題因具有多維度及複雜結構,一般來說皆為NP-困難問題,即無法多項式時間內求解最佳解。針對虛擬實境和社群網路中的NP-困難問題,我們將同時探索兩個研究方向:(一)近似演算法:是否存在多項式時間的演算法可保證求得與最佳解足夠靠近的近似解?為了設計出高效能的近似演算法,我們使用的工具包含整數/線性/半正定規劃、動態規劃、隨機湊整(randomized rounding)、對偶理論(primal-dual)、抽樣法等高階演算法設計技巧,以及圖論與機率方法等數學工具。(二)不可近似性:針對非常困難的最佳化問題,是否可嚴謹的證明:不存在任何多項式時間的演算法能保證求得某個特定倍率以內的近似解?為了證明困難最佳化問題的不可近似性,我們則使用複雜度理論中的間隙保存轉換(gap-preserving reduction)以及間隙製造(gap-producing reduction)等技巧。我們亦使用線上演算法(online algorithm)及資料流演算法(streaming algorithm)等新概念來符合特殊情境的需求,如動態即時最佳化或巨量資料分析等。  (2) 機器學習與深度學習:機器學習是實現人工智慧的一個重要途徑,包含相當廣泛的應用層面,如電腦視覺、自然語言處理、搜尋引擎、推薦系統、醫學診斷等。在機器學習中,常會以張量來儲存資料,透過張量分解可以得到資料內隱含各個變量間的關係,藉此預測機器學習所要完成的任務;例如推薦系統將使用者對商品的互動(如瀏覽、購買等)、和哪些使用者一起購物、以及購物時貨架上有哪些商品等資訊儲存在張量中,利用張量分解得到使用者和商品的關係,並以之計算商品適合使用者的分數來推薦商品陳列方式。亦將訓練近年來熱門的深度學習利用多層的類神經網路,更可以獲取資料內深藏的特徵,尤其在社群網路、知識圖譜上,以圖神經網路(GNN)或圖卷積網路(GCN)更能精確地在低維空間中將節點或邊的特徵表現出來,有益於各樣的應用。

Analysis and Innovative Applications for Social Networks and Mobile Multimedia Networks
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/
https://

Email :
dnyang@iis.sinica.edu.tw
陳伶志
Chen, Ling-Jyh
以空氣盒子系統為基礎的環境感測資料分析研究

Advanced Data Analysis using Fine-grained and Spatio-temporal AirBox Data
在過去幾年中,我們已建立一個跨國性的大型細懸浮微粒(PM2.5)網路感測系統,擁有每天散佈在 57 個國家,超過 10,000 個 PM2.5 微型感測站,每個感測站以每五分鐘一筆的頻率上傳溫濕度與 PM2.5 的即時感測資料,目前已成為全球數一數二的 PM2.5 微型感測資料中心。

在這個專案中,我們希望透過兼具時間與空間高解析度的 PM2.5 感測資料,進行兼具學理、創意與應用價值的資料混搭與進階分析。內容可以是(但並不局限於)即時污染源的溯源、微型感測器的資料品質確保分析、中尺度的 PM2.5 擴散模式推估、中尺度的 PM2.5 濃度預報模式建構、PM2.5 衍生的社經資源成本推估、PM2.5 濃度與即時生理訊號的整合分析,甚或是其他更具創新與挑戰的研究議題。

我們歡迎對本項研究主題有興趣、有想法,並且願意接受挑戰的優秀人才加入我們的團隊,一同學習、努力、並對當前重大的環境議題做出貢獻。

In the past years, we have successfully built a large-scale PM2.5 sensing system with more than 10,000 participating devices over more than 57 countries. Each device conducts environmental sensing, and uploads its temperature, humidity, and fine particulate matter (PM2.5) sensing results to our server every five minutes. As a result, our system has become one of the most well-known data hub of PM2.5 sensing systems world-wide.

In this summer project, we wish to utilize the fine-grained and spatio-temporal data of our system, and conduct advanced data analysis with both research and practical values. The topics include (but are not limited to) PM2.5 emission source tracking, fine-grained PM2.5 dispersion modeling, fine-grained PM2.5 concentration forecasting, social economic impacts of PM2.5 pollution estimation, and the correlation between PM2.5 concentration and physiological signals investigation. We also welcome innovative and even more challenging topics on the related problems.

We are looking for self-motivated, creative, and open minded people to join us. We will learn together, work together, enjoy the process together, and produce good results at the end together. For further questions, please feel free to contact us.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cclljj/

實驗室網址(Research Information) :
https://sites.google.com/site/cclljj/NRL
https://

Email :
cclljj@iis.sinica.edu.tw
呂及人
Lu, Chi-Jen
深度學習的原理與應用

Deep learning: foundations and applications
研究深度學習的原理,並拓展深度學習在影像、自然語言等各個領域的應用。

Study the foundation of deep learning, and explore its diverse applications in various areas such as computer vision and natural language processing.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/
https://

Email :
cjlu@iis.sinica.edu.tw
蘇黎
Su, Li
音樂資訊檢索、音樂人工智慧、音樂互動系統、音樂訊號處理、計算音樂學

Music information retrieval, music artificial intelligence, music interactive systems, music signal processing, computational musicology
音樂與文化科技實驗室(Music and Culture Technology Lab)成立於2017年。我們致力於研發最先端的數位訊號處理、深度學習技術,應用在各種結合音樂與人工智慧的議題上,包括自動採譜、機器鑑賞、即時音樂互動、生成式音樂、計算音樂學等,其應用場域橫跨音樂之聆賞、分析、製作、展演等活動,期能展開科技與人文的深度對話,促進音樂文化融入生活。

The Music and Culture Technology Lab was founded in 2017. We devote ourselves to develop cutting-edge digital signal processing and deep learning techniques on music and AI, such as automatic music transcription, machine connoisseurship, real-time music interactive system, generative music, and computational musicology. Applications are found across music listening, analysis, production, and even performance activities. Our goal is to launch a deep and fruitful dialogue between technology and humanity, and make music culture as a part of our everyday life.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lisu/

實驗室網址(Research Information) :
https://sites.google.com/view/mctl/
https://

Email :
lisu@iis.sinica.edu.tw
蘇克毅
Su, Keh-Yih
應用深度學習進行智慧型問答與跨文件處理

DNN-based Intelligent QA and Multi-Document Processing
自然語言理解(Natural Language Understanding, NLU)為人工智慧當前最重要的研究領域之一。其中「機器閱讀」是指電腦能夠自行透過閱讀學習知識(read to learn)、並能以學習的知識來增強自己的閱讀能力(learn to read),本研究室目前致力於建立機器閱讀系統,並應用於智慧型問答以及多文件處理。此研究專題的重點將放在 (1) 把領域背景知識融入深度學習網路(DNN),以增進 DNN 之學習效率;(2) 建立基於知識庫(Knowledge Base)的問題回答系統,加強問題辨識能力與推論能力;(3)結合遠監督式機器學習與 DNN,以期能減少領域知識的訓練資料量,並有效擴增知識庫。

Natural Language Understanding (NLU) is one of the most popular fields among AI studies. One of the major tasks, Machine Reading, enables computers to obtain knowledge from given texts in aids of logic inferences, and aims not only "read to learn" but also "learn to read". We are establishing a machine reading system with applications to intelligent QA systems and multi-document processing. In this project, we expect to integrate domain knowledge into DNN to increase its performance, and apply distant supervision to reduce training data of domain knowledge while improve inference power and enlarge knowledge base as well.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/kysu/

實驗室網址(Research Information) :
http://nlul.iis.sinica.edu.tw/
https://

Email :
kysu@iis.sinica.edu.tw
王新民
Wang, Hsin-Min
語音、語言與音樂處理

Speech, Language and Music Processing
我的研究興趣是語音處理、自然語言處理、多媒體資訊檢索、機器學習,研究目標是開發多媒體音訊(主要是語音與音樂)分析、抽取、辨識、索引、檢索及生成技術。進行中的研究工作包括自動語音辨識、語者辨識、語音轉換(例如說話人語音轉換、中性語音轉表達性語音、受損語音轉正常語音)、語音合成、語音文件檢索/摘要、問答系統、音樂資訊檢索、音樂生成等。

My research interests include speech processing, natural language processing, multimedia information retrieval and machine learning. The research goal is to develop techniques for analysis, extraction, recognition, indexing, retrieval and generation of multimedia audio data (mainly speech and music). The ongoing research includes automatic speech recognition, speaker recognition, voice conversion (e.g., speaker voice conversion, neutral speech to expressive speech conversion, impaired speech to normal speech conversion), speech synthesis, spoken document retrieval and summarization, question answering, music information retrieval, music generation, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/whm/

實驗室網址(Research Information) :
http://slam.iis.sinica.edu.tw/
https://

Email :
whm@iis.sinica.edu.tw
呂俊賢
Lu, Chun-Shien
深度學習的安全與隱私

Security and Privacy in Deep Learning
隨著深度學習技術的發展與演進,人工智慧(AI)的運用無所不在且跨越多個領域,其影響層面之寬廣實屬罕見。然而,因人工智慧的應用,伴隨而來的安全與隱私議題卻也益發重要,特別是在有安全需求的環境裡,忽略這些議題有時將導致災難性的傷害。近來的研究發現,精心設計的adversarial examples (inputs)對於訓練良好的深度學習網路(DNN)能達到相當程度的愚弄效果,而且這adversarial examples所引進的``adversarial perturbations’’,對於人眼或人耳感受不到與原資料(benign inputs)有差異,具備良好的imperceptible特質。根據文獻紀載, 目前AI Security/Privacy的著作發表數量從2014年以來呈現指數成長,顯示相關研究議題受到極大重視。
我們觀察到AI Model與Security/Privacy/Data Hiding之間的關係,已規劃分別探討AI models本身隱含的Security與Privacy等問題。

Due to population and development of deep learning technologies,the applications of artificial intelligence (AI) are ubiquitous and across multiple areas, and their impact are never seen before. Nevertheless, the accompanying security and privacy threats due to the use of AI applications have received remarkably attention recently, especially in a security or privacy-critic environments. Ignoring these issues may lead to disastrous damages. Recent researches reveal that the sophisticated design of dversarial examples (inputs) can achieve efficient fooling effect on well-trained deep neural networks (DNNs). Meanwhile, the ``adversarial perturbations’’ introduced by adversarial samples are indistinguishable from the benign inputs in terms of human perception. According to the literature, the amount of publications, pertaining to AI Security and Privacy, has been grown exponentially since 2014. This indicates that the issues of AI security and privacy has received much attention recently.
In view of the observations from the literature regarding the relationship among AI Model, Security, Privacy, and Data Hiding, we plan to address the issues of balancing model accuracy, AI security, and AI privacy.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~lcs

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~lcs
https://

Email :
lcs@iis.sinica.edu.tw
陳孟彰
Chen, Meng Chang
使用深度學習使用於惡意軟體的分析

Deep Learning for Malware Analysis
資安威脅是數位時代重要議題,要如何經由有效惡意活動偵察、即早預警,達到主 動式防禦,是目前全世界資安領域的共識。為了達到這個 目標,重要關鍵是深度地掌握、充分地瞭解惡意活動的特徵、與其所使用的資源是 如何關連此惡意活動目標。本計畫將結合外部資料(包括MITRE ATT&CK框架)資料 與知識來建立惡意活動的本體論(ontology),透過惡意程式的側錄資料(execution trace)萃取出關鍵行為特徵,發展深度學習演算法,自動偵測並產生可以描述該執 行活動之企圖(intent)的自然語言描述與報告。本總計畫預計包含三項子計畫:「 惡意軟體攻擊意圖大數據動態分析及攻擊鑑識報告自動產生」、「使用圖神經網路 技術自動建構惡意程式序列行為之高階語意用以攻擊偵測與本體論分析」及「結合 動靜態分析的Python程式特徵化與自動攻擊生成技術研究」。三個子計畫與總計 畫合作,主要目標為建立惡意行為ontology、蒐集巨量各式惡意程式、側錄不同語 意層級的惡意活動、以動靜態方式分析應用程式的控制流程、萃取惡意行為表示式 、對應高低階惡意行為間的關係、並自動產生自然語言攻擊鑑識報告。針對一惡意 程式,本計畫完成後將可自動產出其生命週期中各階段行為的高階語意意圖與及低 階執行序列,資訊安全人員可以了解其行為發生的脈絡發展,並進行鑑識與分析。 本平台之產出可用於惡意程式偵測、惡意行為辨識、惡意活動脈絡分析、建立惡意 行為ontology。藉此,本計畫成果可減輕資訊安全人員在分析大量程式所花費的時 間與成本,直接產出高品質的網路威脅情報報告。


Cyber threat is one of the most important topics in the digital age. It has been a common consensus around the world to effectively detect adversary threat and early alert in order to facilitate proactive defense. The key point is to sufficiently understand the characteristics of malicious activities and the approach to achieve their malicious goals by using correlated resources on the target devices. The goal of the research is to build an ontology of malicious activity from external knowledge base, such as MITRE ATT&CK framework, to develop deep learning techniques to analyze an execution trace composed of a sequence of Windows API invocation calls from real-time malware execution, and finally to generate cyber threat intelligent report to describe its adversary lifecycle and attack behavior. The project includes three sub-projects, namely, "Big data dynamic analysis and automatic threat report generation of malware attack intent", "Using Graphic Neural Network Techniques to Automatically Build High-Level Semantics on Malware Sequential Behavior Data for Attack Detection and Ontology Analysis", and "Python Program Characterization and Trigger Synthesis with Static Symbolic Execution and Runtime Verification". The main research purpose involves collecting a database of malwares, building malware ontology, recording malicious activities with system and API calls, analyzing the control graphs of programs, extracting malicious behavior and attack path, mapping high-level malicious behaviors to low-level trace, and generating cyber threat intelligence report. Given a malware, the proposed platform could automatically produce its high and low level sematic intent in each phase of attack lifecycle that the platform will be useful to security administrators because the output is a straightforward and easy-to-understand description of what the program does. The outcome of the research project can give great helps for security analysts because of time and effort reduction and directly produce high-quality cyber threat intelligence report.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mcc/

實驗室網址(Research Information) :
http://www.plash.tw/
https://

Email :
mcc@iis.sinica.edu.tw
陳郁方
Chen, Yu-Fang
字串限制式求解器--網頁程式安全的自動化檢測的核心技術/具有系統崩潰自動回復功能的快閃記憶體韌體設計

String constraint solving--an enabling technique for automatic web program security analysis/ A formally verified, crash recoverable FTL design
實習生預計參與下面兩個計劃其中一個的執行

1. 字串限制式求解器--網頁程式安全的自動化檢測的核心技術:

網頁程式安全漏洞,例如注入式攻擊(injection)和跨網站指令碼攻擊(XSS, cross site scripting),是入侵系統的一個重要門戶。舉跨網站指令碼攻擊為例子,幾乎所有知名網站,包括Facebook和Google都曾經被發現過可利用為XSS的跳板。不知情的使用者,往往會相信這些知名公司,在誤點選相關連結後,讓攻擊者得到控制其瀏覽器的完全權限(包括下載惡意程式,監控瀏覽行為等等)。

目前所有較先進的自動化程式檢測技術,例如符號執行(symbolic execution),具體-符號並行執行(concolic execution),和軟體模型驗證(software model checking)等等,共同的核心技術,都是把程式執行路徑是否可行的問題,化簡到限制式求解的問題。回顧目前主流網頁程式安全漏洞,例如之前提到的注入式攻擊和跨網站指令碼攻擊,他們都是由字串型態的輸入所驅動的。因此其程式路徑可執行問題,會被化簡到字串限制式求解問題。我們實驗室在字串限制式求解工具的發展,目前處於領先群中。和微軟的Z3還有史丹佛大學的CVC4求解器,效能上互有領先。我們今年將針對一些和JavaScript字串操作相關的指令,做直接的支援。

2. 具有系統崩潰自動回復功能的快閃記憶體韌體設計:

快閃記憶體和上層應用或是操作系統的互動,都要經由其韌體,或是更精準的說快閃記憶體轉譯層(FTL, Flash Translation Layer)來進行。目前FTL的設計,系統崩潰自動回復功能是一個熱門的研究議題。就市面上的快閃記憶體,常常可以看見因為意外斷電或是當機,讓資料損毀至完全無法使用的情況。我們這次計畫設計一個新的韌體,它帶有當系統崩潰時,自動回復到上次檢查點狀態的性質。這樣可以大大地減低例如意外斷電等等狀況的危害。而且可以減低上層檔案系統的負擔,不用做多餘的系統日誌,可以增加其效率。系統崩潰自動回復功能,可以說是FTL設計最需要正確性的一個環節。有可能因為一個位元的處理錯誤,導致整個資料結構的崩壞。因此我們計劃利用形式化方法,對於設計出來的FTL做數學的正確性證明。確保他的設計是完全沒有問題的。

.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~yfc

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~yfc
https://

Email :
yfc@iis.sinica.edu.tw
王大為
Wang, Da-Wei
醫療資料分析與應用

Health data analytics and applications
利用各式分析工具研討健康相關問題

Use various analytic techniques and tools to study health related problems
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wdw/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/wdw
https://

Email :
carol@iis.sinica.edu.tw
廖純中
Liau, Churn-Jung
應用邏輯

applied logic
我們想探討符號邏輯在各領域的應用,包含計算機科學,數學與哲學等。我們特別有興趣的主題包含modal logic, fuzzy (many-valued) logic, 與 categorical logic等。

We are interested in  applications of symbolic logic to computer science, math, and philosophy, especially in modal logic, fuzzy (many-valued) logic, and categorical logic.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liaucj/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/?cat=2
https://

Email :
liaucj@iis.sinica.edu.tw
王建民
Wang, Chien-Min
雲端運算與人智運算

Cloud Computing and Human-Centered Computing
(1) 整合記憶體內資料儲存的雲端計算平台:MapReduce是目前利用雲端計算來處理巨量資料方面,最常用的平行計算模型。然而我們發現有一類雲端應用,雖然非常適合MapReduce模型,但是其執行效能卻非常低落,而且計算規模也有很大的限制。這類應用包括用於基因定序的後綴陣列排序和近來很受重視的演化式計算。我們將對現有的Hadoop平台進行擴充和改進,融合記憶體內資料儲存,提出一個泛用的加強型雲端計算平台,以提升執行效能和規模擴充性。我們也將實作後綴陣列排序以及演化式計算,以驗證我們所提出的架構,對於這兩種應用的執行效能和應用規模有多大的提升。我們相信這樣的雲端計算平台不但對於學術研究有很大的貢獻,還能大幅拓展雲端計算平台的應用。

(2) 人智運算的穿戴運算系統:研究穿戴式電腦及裝置在人智計算中的應用,特別是在社交網路方面的應用。我們計劃中的人智運算系統應具備的三種能力:具有解周遭環境與人們情況的能力,可提供使生活更美好的服務,和透過感官與人類自然地互動。為了實現這三個能力,我們計劃中將從三個研究學科來發展:情境識別,雲服務,以及擴增實境。我們計畫透過先進的系統設計,提供更適合未來人類生活以及具備更友善人機互動的應用程式。藉由研究相關的穿戴式電腦及裝置,開發更佳的人機整合功能,並透過社交網路系統之系統分析,研究並開發穿戴式社交網路系統。我們將著重於友善的使用者界面,以提升使用者經驗為目標,並且提供更適合的情境感知技術與實境服務的增強實境功能。

(1) A MapRedice framework with an In-memory Data Store: MapReduce is a powerful programming model for processing large data sets with a parallel, distributed algorithm on clouds. The Hadoop framework is the most popular implementation of MapReduce and widely adopted in the processing of large datasets. However, our previous experience on suffix array construction with Hadoop shows that it might result in excessive disk usage and access. Therefore, the performance is degraded and the scale of the application is limited. In this project, we aim at efficient and scalable processing of expansive MapReduce (EMR) applications with in-memory data stores. EMR applications, including suffix array construction and evolutionary computation, are a group of applications that have performance and scalability issues with Hadoop. We shall integrate an in-memory data store with Hadoop and propose a MapReduce framework for EMR applications  to enhance their performance and scalability. To validate the benefit of the proposed framework, we shall use suffix array construction and evolutionary computation as our testbed.

(2) Wearable Computing Systems and Applications in Human-Centered Computing: The goal of this project is to investigate the application of wearable computers and devices in Human-Centered Computing, especially those applications on social networks. A human centered computing system should have three abilities: understanding the context of the surrounding area and humans, providing the service that makes the lives better, and interacting with human naturally through perception. To realize these three abilities, we plan to adopt three corresponding research disciplines: context recognition, cloud service, and augmented reality. Wearable computers and social network services will be integrated to build the proposed wearable social network system. The proposed system will provide more convenient and user-friendly human-computer interaction.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cmwang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/page/research/ComputerSystem.html?lang=zh
https://

Email :
cmwang@iis.sinica.edu.tw
何建明
Ho, Jan-Ming
生物資訊與金融計算

Bioinformatics and Financial Computing
我們的研究聚焦在運用新世代定序技術作非模式物種的基因組譯,以及市場自動交易策略和風險預測與管理。

Our research focus on de novo genome assembly based on state-of-the-art sequencing technology, and developing algorithms for trading and risk prediction and management in financial markets.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hoho/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/hoho/
https://

Email :
hoho@iis.sinica.edu.tw
鐘楷閔
Chung, Kai-Min
密碼學、複雜度理論或量子密碼學之獨立研究

Independent Research on Cryptography, Complexity Theory, or Quantum Cryptography
The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/kmchung/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~kmchung/
https://

Email :
kmchung@iis.sinica.edu.tw
王釧茹
Wang, Chuan-Ju
表示學習演算法於人工智能之應用

Representation learning and its applications
異質性資料涵括各式結構化(如:消費記錄、產品規格)及非結構化數據(如:網友文字評論),其各自的資料結構及特徵空間大不相同,因此如何進行彼此間的關聯、整合及推論仍屬當代人工智能技術及其相關應用的一大挑戰。然而透過機器學習的非監督式學習法則有可能將異質性資料表現於共同的特徵空間之中,倘若又能在此空間中獲得優良的資料表示法,則可作為異質性資料分析的穩固基石。因此,本研究主題從深度學習及網路表示法的框架切入,深入探究其空間轉換的特性及其保留的訊息,並將針對不同的資料型式及應用情境設計對應之演算法。除了演算法設計及理解外,本實習亦具有有以下三個特色:1) 將使用真實世界的資料進行資料分析及學習;2) 將學習如何在unix-like 環境下處理大量資料並運行實驗;3) 將學習如何使用網頁前端技術進行結果之視覺化呈現。



The research topics will be related to the processing and understanding heterogeneous data (including texts, pictures, audio signals, social relations, and user behaviors) and using the deep learning and/or network embedding techniques for various AI-enable applications. In addition to the model design, during the internship, the participant will also 1) have hands-on experience with real-world data, 2) learn how to deal with large-scale data and conduct experiments under unix-like systems, 3) learn how to visualize the learned results using front-end web programming techniques.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cjwang/

實驗室網址(Research Information) :
http://cfda.csie.org/
https://

Email :
cjwang@citi.sinica.edu.tw
古倫維
Ku, Lun-Wei
(1) 多模態故事生成 (看圖說故事) (2) 假新聞干預 (3) 推薦系統 (4)自然語言處理相關應用開發

(1) Multimodal Story Generation (2) Fake News Intervention (3) Recommendation System (4) Application Development
在這些研究主題中,將學習到自然語言處理之資訊擷取、文章分類、文字生成、知識庫使用,推薦系統等概念,另涵蓋自然語言基礎工具的使用及機器學習、深度學習的模型建立等先進技術,可與老師討論希望選擇的研究主題。實習期間會專注於上述研究主題並參與模型開發及論文撰寫。各主題研究內容詳述如下:

(1) 在多模態故事生成專案中,我們注重在圖像故事生成 (看圖說故事),目前研究方向上,首先會辨識出每張圖片中的物件及動作當作素材,並且使用這些素材來構成前後呼應並與圖片相關的故事。我們接下來會嘗試生成新的故事插圖。

(2) 在假新聞干預研究中,我們著重於研究甚麼樣的新聞內容與呈現形式,讀者會傾向於相信或不相信,我們將進行內容理解,網路模擬及使用者端的研究。

(3)在推薦系統中,我們研究深度學習的技巧,提升推薦系統的效果。

(4) 在應用開發上,我們將選擇實驗室既有技術可支援的潛力下游應用,開發展示技術的程式。

實驗室尚有其他研究主題正在進行,可到
http://www.lunweiku.com/ 參考相關論文。
實習結束後,表現優良的同學可繼續與實驗室合作研究並發表論文。


Interns will learn how to use basic natural language processing tools, extract information from texts, classify documents, recommendation systems and generate dialogs. Machine learning and deep learning technologies for NLP will be touched. Interns can select the topic/team they wish to join.

(1) In multimodal storytelling project, we are focusing on visual storytelling which machine generates a story by a given image sequence. In the current project, we first detect terms’ entities and actions, in each image and then utilize these terms to compose a coherent story. This summer we will focus on free length context generation (both texts and images).

(2) In fake news intervention project, we focus on studying why and how readers trust fake news. We will explore approaches which mitigate the impact of fake news.

(3) In recommendation system project, we will get the real-world logs  and try to utilize NLP as well as deep learning techniques on these data to enhance the performance of the recommendation system.

(4) Interns can also choose to develop demo applications for the existing technologies in our lab.

The research topics include but not limited to the above.
After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
PI個人首頁(PI's Information) :
http://www.lunweiku.com/

實驗室網址(Research Information) :
http://academiasinicanlplab.github.io/
https://

Email :
lwku@iis.sinica.edu.tw
王柏堯
Wang, Bow-Yaw
形式化隱私模型

Formal Privacy Models
在人工智慧及大數據之環境中,如何保障個人隱私是一項無法避免的問題。本計畫將利用形式化工具,為資料分析演算法建立模型,並分析隱私相關性質。

In the age of artificial intelligence and big data analysis, privacy protection is an unavoidable social issue. In this project, we will construct formal models for data analysis algorithms and analyze their privacy properties.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/bywang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~bywang
https://

Email :
bywang@iis.sinica.edu.tw
曹昱
Tsao, Yu
基於機器學習之生理聲學訊號處理

AI-based biomedical acoustic signal processing
Bio-ASP Lab (Biomedical Acoustic Signal Processing Lab) 成立於2011年11月,致力於開發新穎的聲音訊號處理技術以及人工智慧演算法,並將開發出的技術及演算法應用於醫學工程以及生物相關領域的研究。Bio-ASP Lab 近年的研究課題包括:(1) 基於AI之口語溝通輔具科技;(2) 基於深度學習之音訊處理;(3) 結合多模態之語音訊號處理技術;(4) 聲紋鑑識技術; (5) 聲景資料檢索。

The Bio-ASP Lab (Biomedical Acoustic Signal Processing Lab) in CITI, Academia Sinica was found in November, 2011. We are dedicated to develop novel acoustic signal processing and artificial intelligence algorithms and apply them to biomedical and biology related tasks. The main research focuses include five parts: (1) AI for assistive speech communication technologies, (2) Deep learning based speech signal processing, (3) Multi-modal speech signal processing; (4) Voice-based forensics; (5) Soundscape information retrieval.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yu.tsao/

實驗室網址(Research Information) :
http://bio-asplab.citi.sinica.edu.tw/
http://bio-asplab.citi.sinica.edu.tw/Lab.html

Email :
yu.tsao@citi.sinica.edu.tw
張佑榕
Chang, Ronald Y.
運用機器學習於無線通訊

Machine Learning for Wireless Communications
請見「研究主題英文介紹」

The candidate will investigate machine learning (especially deep neural network methods) for wireless communications and IoT applications. The candidate will write up research reports, slides, and/or papers. The ideal candidate has knowledge of wireless communication and/or machine learning, and plans to pursue advanced study abroad.
PI個人首頁(PI's Information) :
https://www.citi.sinica.edu.tw/pages/rchang/index_en.html

實驗室網址(Research Information) :
https://www.citi.sinica.edu.tw/~rchang
https://

Email :
rchang@citi.sinica.edu.tw
穆信成
Mu, Shin-Cheng
函數編程與程式正確性推理之相關問題

Functional Programming and Program Reasoning
我的研究興趣是程式語言與函數編程,近年來也包括 concurrent 程式的型別系統與 Hoare logic. 它們的共同點是使用符號推理的方式確保程式的正確性。不論在哪個典範中,我們都希望把寫程式視作一個可用數學與邏輯方式推理的行為。程式的正確性可用型別系統或邏輯推演保證,甚至可用規格與需求開始,經由數學方法一步步推導出程式。

本領域可做的大方向包括
* 設計幫助推理用的符號、程式語言、型別系統等。
* 挑選一些演算法問題,嘗試以數學方法實際證明演算法之正確性,或將演算法推導出來。
* 研究 concurrent 程式以及其型別系統 (session type) 與邏輯之關係。
* 以函數編程語言為工具,開發 Hoare logic 與指令式語言編程課程使用的教學系統。

如對以上題目有興趣,在三個月的實習期間,我們可用一到一個半月的時間學習相關理論(函數編程、型別、邏輯等),用剩下的時間研究新東西或開發系統。

My research interest concerns programming language and functional programming, and extends to Hoare logic and type systems for concurrent programs. The common theme is that programming is seen as a formal, mathematical activity. Correctness of a program can be guaranteed by logical reasoning or type system. Or, a program can even be derived stepwise from its specification.

Possible topics include:

* design symbols, languages, or type systems that aids the programmers in reasoning about programs;

* pick an algorithm, and apply our approaches to prove its correctness or even to derive an algorithm;

* study the type system (session type) for concurrent programs and its relationship with logic;

* develop tools for teaching Hoare logic and reasoning of imperative programs, using a functional programming language.

More details can be discussed. If you are interested, we can spend the first 1 to 1.5 months of the internship studying the background knowledge, before diving into developing something new.
PI個人首頁(PI's Information) :
https://scm.iis.sinica.edu.tw/home/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/scm/
https://scm.iis.sinica.edu.tw/ncs/

Email :
scm@iis.sinica.edu.tw
柯向上
Ko, Hsiang-Shang (Josh)
編寫「自證正確的程式」

Writing programs that prove themselves correct
程式語言理論的一個主要流派將程式視為數學物件(之描述),並用型式/數學方法去證明它們的各種正確性質。我的主要興趣是依值型別編程 (dependently typed programming),在型式化證明程式正確性的各種方法裡是比較極端且實驗性強的一種:主流作法是程式和證明分開寫,但依值型別編程的思維基礎 Curry–Howard correspondence 是觀察到程式與證明實為同一種構造,在一個夠強(能表示我們想處理的正確性質)的型別系統中,我們寫的程式只要通過型別檢查就相當於正確性獲得證明。一個可能的計畫是翻出我先前在 ‘ornaments’ 上的工作(將正確性編寫進資料結構內的技巧),推廣其理論並套用至更多編程例子上。(可參考 https://josh-hs-ko.github.io/#publication-696aedff 和 https://josh-hs-ko.github.io/#publication-d146668e 兩篇的感覺,但不需要先讀懂 — 預計會先用不少時間學習必要知識。)

In programming language theory, there is a major tradition where we regard programs as (representations of) mathematical objects and prove their various correctness properties formally/mathematically. My main interest is in dependently typed programming, a somewhat radical and still experimental approach to formal program correctness: instead of writing programs and proofs separately, we identify programs and proofs via the Curry–Howard correspondence, and write programs that prove themselves correct by passing type-checking under a very strong type system that is capable of expressing our desired correctness properties. One possible project to work on is to revisit my previous work on ‘ornaments’ (about encoding correctness properties into data structures), generalising the theory and applying the techniques to more programming examples. (For some references, see https://josh-hs-ko.github.io/#publication-696aedff and https://josh-hs-ko.github.io/#publication-d146668e, but reading these is not a prerequisite — I expect that we’ll spend quite some time going through the necessary background first.)
PI個人首頁(PI's Information) :
https://www.iis.sinica.edu.tw/pages/joshko/

實驗室網址(Research Information) :
https://josh-hs-ko.github.io
https://

Email :
joshko@iis.sinica.edu.tw
張原豪
Chang, Yuan-Hao
嵌入式系統、非揮發性記憶體與類神經網路之電腦系統研究

Study on embedded system, non-volatile memory, and computer system for neural network
主題一: 嵌入式深度學習之系統設計與優化
主題二: 高效圖形處理系統設計與研究-從裝置到系統及應用
主題三: 非揮發性記憶體為基礎的系統設計以支援大數據及記憶體內運算
(1) 重新設計高效的儲存系統與大數據應用系統
(2) 基於記憶體內大數據運算之新記憶體/儲存體系統設計


Topic 1 - System Design and Optimization for Embedded Deep Learning:
The main theme of this project will focus on embedded deep learning with the support of (1) the emerging memory or non-volatile memory technologies and (2) the heterogeneous computing architecture with various computation accelerators (e.g., GPU and FPGA). We will tackle the critical issues of embedded deep learning toward (1) storage and I/O bandwidth issue, (2) memory preprocessing issues, and (3) heterogeneous computing issues.

Topic 2 - GraphStor: Efficient & Recoverable Graph-processing System Design, from Devices to Systems and Applications:
GraphStor aims at designing an efficient and recoverable graph-processing system. As shown in Figure 1, it intervenes at every level from the graph analysis to the memory and storage system design: (1) Graph Analysis: GraphStor will investigate/analyze the processing behavior of large-scale graph applications and co-design graph algorithms with heterogeneous computing units. (2) Operating System: To unleash the potential of the emerging memory/storage devices (including non-volatile devices), GraphStor will reconsider the current design of both memory management (e.g., data prefetching, and I/O swapping optimization) and file system by achieving graph-friendly data selection and placement. (3) Storage System: GraphStor will achieve graph-friendly data indexing by rethinking the correlation between the graph-related data and different storage technologies such as NVM, Open-Channel SSDs, and SMR HDDs. (4) Device Verification: To achieve a verified high-performance storage stack, GraphStor will provide verified storage stack with new physical device design and implementation with a stronger guarantee of crash recovery.

Topic 3 - NVM-based System Design to Support Big Data Storage and In-memory Computing:
Big data has garnered much attention from academic and industrial communities because of the explosive growth of data volume in various fields, such as social networks, internet of things, and cloud storage services. In the big data area, both data warehouse as well as in-memory computing (or database) are two widely discussed topics because big data requires cost-effective and large-capacity storage devices for data warehouse, as well as high-performance system architecture (e.g., in-memory computing) for big data processing. For instance, to cost-effectively maintain big data in a data warehouse application, several vendors would apply emerging storage devices, such as 3D flash memory and shingled magnetic recording (SMR), to their storage system; however, the performance of emerging storage devices is degraded owing to a high-density design. On the other hand, in-memory computing could employ emerging memory technology (e.g., phase-change memory and STT-MRAM) to collaborate with DRAM so as to reduce the energy consumption of memory systems, but cause significant performance degradation. Thus, to provide cost-effective and performance-efficient memory-storage system for big data applications become an emerging yet important research topic. Based on our fruitful results on storage systems, we will continue our memory and storage system research on this direction and will tightly collaborate with academic and industrial institutes. We have the following two main focuses:
(1) Redesign cost-effective emerging storage system for high-performance big data application
(2) New memory-storage system design for in-memory big data computing
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/johnson/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/johnson/
https://

Email :
johnson@iis.sinica.edu.tw
吳真貞
Wu, Jan-Jan
異質系統架構之人工智慧編譯器設計及優化技術研究

AI Compiler for Servers/Embedded Systems with Heterogeneous Architectures
近年來,許多深度學習模型 (例如CNN, RNN, LSTM, GAN) 已被廣泛應用於各個領域。除此之外,計算機架構也朝向多元發展 (例如CPU, GPU, GPGPU, DSP, FPGA, AI加速器等),無論是高階伺服器或是嵌入式系統,我們也觀察到這些平台上通常配備了異質或非對稱性 (heterogeneous/asymmetric) 的運算裝置,如CPUs + GPUs, and/or FPGAs。在這樣多個深度學習模型與多個運算架構的組合下,我們需要一個方法,能夠有效地將各種深度學習模型對應到各種計算機架構。為完成此目標,本實驗室將設計一個優化深度學習的編譯器 (AI Compiler)。針對異質性或非對稱性平台,深度學習編譯器將根據平台上運算裝置的特性或限制,以及分析深度學習模型圖像,編譯出最佳的協同運算方式。此編譯流程涉及異質運算,排程,以及編譯器優化技術。效能的可移植性 (performance portability) 和運算裝置的可擴展性 (extensibility) 將是設計此編譯器的重要研究目標。另外,我們也將研究如何利用機器學習理論來輔助我們找到最佳的編譯優化方案。

In recent years, deep learning has become a rapidly growing trend in big data analysis and has been successfully applied to various fields. Many deep learning models (e.g. CNN, RNN, LSTM, and GAN) have been proven to work very well for recognition of images, natural language, etc. For this AI domain, we need a solution to efficiently run such deep learning models on a wide diversity of computing architectures. High-end servers may be equipped with powerful computing devices, e.g., a combination of high-end CPUs, GPUs, FPGAs, and AI accelerators. Small embedded systems may have a low-end CPU or DSP, and small memory capacity. It requires different compilation strategies to achieve the optimal performance for any configuration of the computing devices and the deep learning models. Hence, there is a need of an optimizing compiler framework for deep learning; that is, an AI compiler. The goal of the AI compiler is to translate deep learning architectures and optimize performance according to the hardware architecture’s capability. That is, developing one single optimizing compiler framework that can compile for a variety of hardware architectures. Performance portability and extensibility are also key issues in the design of the AI compiler.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wuj/

實驗室網址(Research Information) :
https://www.iis.sinica.edu.tw/page/research/ComputerSystem.html?lang=zh
https://none

Email :
wuj@iis.sinica.edu.tw
蔡懷寬
Tsai, Huai-Kuang
生物資訊

Bioinformatics
如果把基因體想像成一張唱片,播放出來的歌曲就可類比為基因體的轉錄產物。如果拿一首歌曲讓大家來試唱,每個人唱出來的歌聲都是獨一無二;但是,我們基因體序列的差異其實少於 0.1%。基因體絕對不是一堆ATCG的序列,裡面蘊藏許多有意義的樣式跟特性。我們研究的興趣是了解基因體怎麼來運作,如何來調控基因轉錄的產物。我們著重於轉錄因子以及長鏈非編碼RNA對於基因表現的調控,利用生物資訊方法整合各種生物大數據以及建立機器學習模型來分析與探勘基因體的特性。

If a CD is the metaphor of a genome, the melody would be the transcription products of the genome. Everyone’s sound is unique even we are singing the same song. However, the difference of our genome sequence is less than 0.1%. Indeed, a genome is not just sequences of ATCG, but rather contains many meaningful pattern and features. Our research interests focus on the functions of genomes and how they regulate the transcription products as RNA expression. Specifically, we focus on how transcription factors and long non-coding RNAs play roles in the transcriptional regulation. We apply bioinformatics approaches including integration of multi-omics data and machine learning methods to examine our hypotheses, and thus, to explore the hidden rules in the genome.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hktsai/

實驗室網址(Research Information) :
http://bits.iis.sinica.edu.tw/?id=1
https://

Email :
hktsai@iis.sinica.edu.tw
王志宇
Wang, Chih-Yu
機器學習在無線與社群網路中的應用

Machine Learning in Wireless and Social Networks
從事無線網路、社群網路、資料科學、機器學習、網路經濟等相關研究。著重於結合機器學習與無線網路和網路經濟相關研究,以學理研究為主,亦有參與產學合作的機會

Study the potential of machine learning in wireless and social networks. Majorly academia-oriented research but also offers opportunities to join industrial collaboration project if interested.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cywang/

實驗室網址(Research Information) :
https://snaclab.citi.sinica.edu.tw/
https://

Email :
cywang@citi.sinica.edu.tw
宋定懿
Sung, Ting-Yi
蛋白體及蛋白基因體之生物資訊及大數據分析

Bioinformatics and big data analysis for proteomics and proteogenomics studies
蛋白質是基因最後的產物,在細胞內執行各種不同的生物功能;在醫藥研究方面,蛋白質是最主要的藥物標的。因此在後基因體時代,蛋白體研究也因質譜儀實驗技術的精進而蓬勃發展;癌症研究由基因體分析跨入蛋白體、甚至蛋白基因體的研究。癌症基因體的研究已出現瓶頸,例如:無法回答為何相同基因呈現的癌症病人,使用針對該基因的藥物而有不同的藥效,有些病人有效,有些無效;此外,蛋白體分析比基因體可確認更多種癌症的亞型,能較精準地用藥。在此追求精準醫療的時刻,癌症相關的蛋白體、蛋白基因體的研究亦發重要。

蛋白體和蛋白基因體研究都是使用質譜儀的實驗技術,再針對質譜儀產生的大數據進行資料分析,以瞭解癌症腫瘤中蛋白體表現,甚而進行基因體層次的突變在蛋白質層次的驗證。本實驗室是台灣極少數進行蛋白體學生物資訊研究的實驗室,我們這16年來專攻蛋白體學研究上質譜儀大規模資料處理之計算方法及軟體系統開發。我們另一研究領域是蛋白基因體分析,希望從龐大的質譜實驗資料找出與癌症相關的特定蛋白質及其中變異胜肽,目前雖已有初步成果,但需建立一套完整的分析與計算方法,希望能讓患者在早期就能被偵測出來,以減少遺憾。我們實驗室亦參與由中研院、台大醫學院及其他機構共同執行的台灣癌症登月計畫(Taiwan Cancer Moonshot Project)。此外,目前已經有龐大被鑑定的質譜圖資料能公開取得,我們也希望透過無監督學習或機器學習進行蛋白體研究。

我們實驗室所訓練出來的人才,也是國外亟需的人才,出國唸書時都能申請到獎學金,或於博士畢業後到美國如:Johns Hopkins U、U of Michigan醫學院擔任博士後,也有國外著名蛋白體研究學者主動來邀請成員加入其機構。我們竭誠歡迎有志學習、有熱情的同學,尤其是資訊領域的同學,加入暑期實習。

Proteins are the final product of genes that execute various biological functions in cells. Furthermore, in biomedicine, proteins are the most prominent drug targets. Therefore, after the genomics era as the advancement of mass spectrometry (MS) technology, proteomics research has received ever-increasing attention in cancer research. Furthermore, though genomics study can identify actionable genomic mutations for therapies, many actionable mutations do not respond to targeted therapy and many responses are temporary. It also has been reported that proteomics study can detect new subtypes of cancer with clinical association, in addition to those being found from genomics studies. Therefore, proteomics and proteogenomics studies have recently become essential in precision medicine for cancer research.

Mass spectrometry is the most commonly used experiment technology to conduct proteomics and proteogenomics research. As the advancement of MS technology, high-throughput MS data are generated. The analysis of such big MS data is a very important topic. Our lab is one of the very few labs conducting research on bioinformatics for proteomics in Taiwan. Our lab has been particularly working on mass spectrometry data analysis, including algorithm design and software development, for 16 years. In addition, we are also interested in proteogenomics study to detect genomic or transcriptomic variations at the protein level from MS data because those variations can be related to cancers. We are developing computational methods for identifying variant peptides in some specific proteins. Though we have some preliminary results, we need to develop a data analysis pipeline to facilitate the discovery of variant peptides and their validation. Currently, we are also conducting Taiwan Cancer Moonshot Project, in collaboration with other institutes in Academia Sinica, National Taiwan University Medical School, and other organizations. Furthermore, because a huge amount of MS spectra with peptide annotation have been publicly available, we are also interested in research on clustering for proteomic subtyping of diseases and machine learning for proteomics analysis.

Our lab members could always receive fellowship when applying further study in US or could receive offers of post-doctoral positions in medical schools of universities in US. We invite those who are informatics or statistics major and interested in cancer research to apply.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/tsung/

實驗室網址(Research Information) :
http://ms.iis.sinica.edu.tw/Comics/
https://

Email :
tsung@iis.sinica.edu.tw
陳祝嵩
Chen, Chu-song
深度模型整合與多媒體應用

Deep Model Integration and Its Applications to Multimedia
中研院資訊所 IVC Lab 由陳祝嵩教授領導,目前執行科技部AI專案計畫「異質性深度模型整合與檢索特徵學習」。研究主題包含兩個方向:
—深度學習—
含模型壓縮與簡化、深度模型整合、深度永續終身學習、強化學習環境探索、醫療大數據分析、整合影像與聲音的身份辨認等。
—電腦視覺—
含影像與視訊中的物體偵測、人物偵測(人臉、衣著),情緒辨認、美感判讀、空拍機影像偵測、瑕疵檢測、圖像修復、影像序列 3D 建模、醫學影像分析、及大型影像庫快速檢索等。

IVC Lab, led by Dr. Chu-Song Chen, is granted by the MOST project "Merging heterogeneous deep models and learning retrieval features" under Most Joint Research Center on AI Technology and All Vista Healthcare. The main research directions include
-- Deep Learning --
deep model compression and simplification, deep model merging, continual lifelong learning, reinforcement learning for environment exploration, medical data analysis, multi-modal face verification.
-- Computer Vision --
object detection in videos, human detection (face, clothing), emotion/expression recognition, aesthetic value assessment, defect detection, 3D reconstruction from image sequences, medical image analysis, and image retrieval from large datasets.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/song/

實驗室網址(Research Information) :
http://imp.iis.sinica.edu.tw/
https://

Email :
song@iis.sinica.edu.tw; chusong@csie.ntu.edu.tw
楊柏因
Yang, Bo-Yin
美國國家標準局徵選新一代後量子密碼學標準第三階段實作

Implementations of NIST postquantum standardization 3rd round candidates
美國國家標準局目前正在徵選新一代後量子密碼學標準;預計在2020年六月展開第三(最後)階段。第三階段中除將審視參選密碼系統的安全性之外亦將考量實作性能諸元。我們的工作包括實作及其正確性的驗證。請一起來動手做,你可能協助改變這世界的走向和改進數十億人的安全。

The U.S. National Institute for Standards and Technology (NIST) is holding its standardization process for next-generation postquantum cryptography.  Starting June 2020 will be the final (third) round. During this round the efficiency of implementations will be considered as well as security issues.  Our work includes implementations and their verification of correctness.  Please come and help, you might change the direction of the world and improve security for billions of people
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/byyang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/byyang/
https://

Email :
byyang@iis.sinica.edu.tw, by@crypto.tw
馬偉雲
Ma, Wei-Yun
廣告文案或新聞的自動生成/不限主題的閒聊機器人/自動知識學習系統/事實推論或事件預測系統

Automatic Advertisement or News Generation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
今年的實習我們邀請同學透過深度學習(Deep Learning)進行以下專案的其中之一,並鼓勵暑期實習生在實習期間能大膽地進行技術或應用的創新。

1. 廣告文案或新聞的自動生成:當輸入是一款手機的規格表,系統能自動生成出一篇具有說服力的廣告文案。或是當輸入是一場NBA的比賽數據表,系統能自動生成出一篇緊張刺激的播報新聞。我們希望透過深度學習當中的增強式學習(Reinforcement Learning)以及語言模型,打造一個這樣的文字生成系統,能夠一方面忠於輸入的表格內容,另一方面能發揮創造力,寫出多變化又文情並茂的文章。

2. 不限主題的閒聊機器人:即所謂 Chitchat Chatbot,也就是沒有特定目的的聊天. 目前這類型的bot大多數的作法是利用深度學習當中的seq-to-seq model來建構,但是,這樣的作法通常無法產生有意義或是較為深入的回應,多數會流於插瞌打渾或是賣萌。其中的關鍵,在於bot缺少了對於聊天主題相關的基本常識,就像是user要跟bot討論劉德華,bot應該對劉德華的各個fact(身份,作品...)有足夠認識,回的response才會豐富有意義,不然巧婦難為無米之炊,沒有知識就不容易產生有意義的回應,我們希望將grounded knowledge以及更豐富的語義訊息encode在model之中。我們透過深度學習當中的增強式學習(Reinforcement Learning),已經訓練出一個不限主題的LINE閒聊機器人-詞庫小妍(LINE官方帳號:@359mcmgs)。

3. 自動知識學習系統:我們知道新的知識會夜以繼日的不斷產生,一個具有AI能力的系統最重要的功能之一就是能夠從大量的資料當中,分析資料,加以理解,組織成結構化知識。我們實驗室過去已經開發了人類的知識網(E-HowNet),打下堅實基礎,此專案的目標是進一步加以擴張,利用深度學習技術將關鍵的關係三元組合從閱讀的文章中自動抽取出來,如 (”哈登” ,MemberOf,”火箭隊”) 或是 (“麥特載蒙”,PlayerOf,”心靈捕手”)等等。

4. 事實推論或事件預測系統:對於一個新事物,人們往往會根據基本常識、已知的事實、經驗的法則等等進行新事物的推測,包含事實或是事件的推論,例如以下的事實推論:已知A說中文,A又是B的哥哥,那麼很高的機率B也會說中文。又例如以下的事件推論:“買麵包”後會有很高的機率會在近期“吃麵包”。在一個龐大的文本或是複雜的知識圖譜當中,推論的關係往往數量龐大,有時甚至複雜到超越人力所能規範與理解,我們希望藉由深度學習技術能自動化的在文本或是知識圖譜當中進行新事物的推測。


Automatic Advertisement or News Generation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
PI個人首頁(PI's Information) :
https://www.iis.sinica.edu.tw/pages/ma/index_zh.html

實驗室網址(Research Information) :
https://ckip.iis.sinica.edu.tw/
https://

Email :
ma@iis.sinica.edu.tw
鄭湘筠
Cheng, Hsiang-Yun
適用於資料密集程式之新世代記憶體系統設計

Energy-efficient NVM-based systems for data-intensive applications
可位元存取之非揮發性記憶體(NVM)是極具發展潛力之記憶體技術並可能成為未來記憶體與存儲系統的主流,許多公司開始逐漸推出相關產品,例如 Intel 的Optane Memory,來將非揮發性記憶體應用於電腦系統的記憶體架構中。與傳統記憶體相比,非揮發性記憶體具有高密度低漏電之特性,且兼具存儲與運算功能,能夠在記憶體內平行完成大量基於邏輯演算之數學運算以及矩陣乘法運算,然而,非揮發性記憶體在寫入資料時較耗時耗能,且在運算精準度與架構設計上有尚待克服之挑戰。本實習計劃的目標為針對資料密集程式之應用情境,探討非揮發性記憶體不同層面上之設計挑戰,包括電路與元件階層、計算結構階層、及演算法階層,並以軟硬體協同設計的方式, 設計高效能低耗電之新世代記憶體系統。實習生可選擇參與下列研究主題,或其他相關研究議題。

1. 利用非揮發性記憶體內部的運算能力,設計高效能且滿足運算精準度需求之深度學習加速器。
2. 利用非揮發性記憶體內部的運算能力,設計圖論分析演算法之加速器。
3. 根據非揮發性記憶體之特性,利用軟硬體協同設計提升基因序列比對之效能。
4. 根據非揮發性記憶體之特性,利用軟硬體協同設計提升資料檢索與圖論分析演算法之效能。


Byte-addressable non-volatile memories (NVMs) are promising and are likely to become the mainstream in the near future. Products such as Intel’s Optane memory are emerging in the market to demonstrate the usage of such new technologies in the memory hierarchy. Compared to traditional DRAM, emerging NVMs have advantages of near-zero leakage power, high density, and can retain data without power supply. In addition, it can be exploited to perform logic-based arithmetic and matrix-vector multiplications directly in memory. Nevertheless, the write latency/energy of NVMs is high and several design challenges needs to be addressed to realize robust and efficient computations. Our goal is to study the design challenges of NVM-based systems at different system layers, including device/circuit level, architecture level, and algorithm level, for data-intensive applications. Based on our analysis, we aim to propose cross-layer co-design frameworks for NVM-based systems to improve energy efficiency. Candidate topics include, but are not limited to, the following:

1. Cross-layer co-design to improve the reliability and energy efficiency of NVM-based deep learning accelerators.
2. Exploiting processing-in-NVM capability to accelerate graph analytic algorithms, and propose cross-layer techniques to improve energy efficiency.
3. NVM-aware software-hardware co-design for DNA sequence alignment.
4. NVM-aware software-hardware co-design for database indexing and graph analytics.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/
https://

Email :
hycheng@citi.sinica.edu.tw
陳駿丞
Chen, Jun-Cheng
AI Forensics for Manipulated Face Detection: Deepfake Detection, Face Anti-spoofing, and Adversarial Attack

竄改人臉檢測之人工智慧鑑識家: 深偽偵測、人臉反欺騙、對抗樣本攻擊
隨著深度學習的成熟,人工智慧技術開始對於我們的日常生活產生越來越大的影響。如深度人臉識別,它已經在幾個具有挑戰性的資料集中取得了超越人類的性能,例如著名的Labeled Face in the Wild數據集中的優異識別性能,該技術已在各種身份驗證場景中被廣泛採用,包括移動設備登錄,ATM,在線購物等。但儘管深度學習方法的優異性能,但許多研究學者發現在惡意攻擊下,它們的安全性不如我們預期可靠。這個暑期實習的目標是探索和研究相關的安全問題,這些問題可以幫助用戶檢測和抵抗這些惡意攻擊,包括人臉欺騙攻擊(例如,演示攻擊,顯示攻擊等)和最近的深偽攻擊。通過對該專案的研究,我們期望能達成的完備事實驗證和安全身份驗證奠定更堅實的基礎,並遏止假內容的散播。

With the recent breakthroughs of deep learning, the technology of artificial
intelligence starts to come into our lives as practical intelligent applications and to make increased impact on us. One prominent example is deep face recognition which has already achieved surpassing-human performance in several challenging benchmarks, such as labeled face recognition in the wild (LFW) dataset, and been widely adopted in various authentication scenarios, including mobile device login, ATM, online shopping, etc. Although the superior performances of deep learning approaches have been achieved in
various supervised learning tasks, under malicious attacks, many researchers have found that they are not as secure and robust as we expect. The goal of this summer internship project aims to explore and work onthe related security problems which can assist users to detect and resist these malicious attacks, including face spoofing attacks (e.g., presentation attacks, display attacks, etc) which are spoofed content used to deceive the machines and recent deepfake attacks which are spoofed content used to deceive human. Through the research of this project, we expect to establish a stronger foundation for the fact verification and secure authentication for more diverse applications.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/pullpull/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/pullpull/
https://

Email :
pullpull@citi.sinica.edu.tw
黃文良
Hwang, Wen-Liang
深層神經網路分析,演算法,與數值實驗

Deep Neural Network analysis, algorithm, and numerical experiments
以運算子進行大規模數值最佳化

Large scale optimization with operators
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/whwang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/whwang/
https://

Email :
whwang@iis.sinica.edu.tw
高明達
Ko, Ming-Tat
台語語音辨識與合成

Taiwanese Speech Recognition and Synthesis
台語語音辨識系統與合成系統的研究與實做。內容包括:語音辨識技術,語音合成技術的研究,台語詞典的製作,台語語音語料庫的製作。

Research and implementation of Taiwanese speech recognition and synthesis. Topics include speech recognition and synthesis techniques, Taiwanese dictionary and Taiwanese speech corpus.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mtko/

實驗室網址(Research Information) :
http://slam.iis.sinica.edu.tw/
https://

Email :
mtko@iis.sinica.edu.tw
許聞廉
Hsu, Wen-Lian
「簡化法」在自然語言理解之各項應用

Applications of the reduction approach on natural language understanding
自然語言理解是人工智慧一直以來沒有處理得很好的問題。近年來自然語言處理的研究方法主要分為規則式(RB)以及機器學習(ML,包括統計及深度學習)兩種方法。然而,RB需仰賴大量人力建構及維護規則,ML不擅於處理「推論」。這些問題導致目前自然語言理解的研究面臨不少的瓶頸。我們實驗室新開發的自然語言「簡化法」是一個全新的方法,應可免除上述演算法的限制,我們將使用「簡化法」來處理許多自然語言的應用,包括中文及生物醫學領域之專有名詞辨識(Named Entity Recognition, NER)及關聯擷取(Relation Extraction, RE)、中文問答系統(Question Answering, QA)對談系統(Dialogue system)、機器閱讀(Machine reading)、小學數學問題解析(Analysis of mathematical word problems in primary school)、語音理解(Chinese speech understanding)、文件主題偵測(Topic detection)、意見分析(Opinion mining)等。

Natural language understanding has remained as a tough problem in artificial intelligence (AI). Recent approaches for natural language processing can be roughly categorized into two types, namely, rule-based (RB) methods and statistical machine learning (ML) (including deep learning) methods. The former relies on a large number of hand-crafted rules, and the latter is not good at inference. These problems has hindered the research on natural language understanding for years. Our lab has recently designed a new "reduction" approach, which seems to be able to resolve all of the above problems. We shall focus on applying the reduction approach to the following problems: named entity recognition, question answering, machine reading, analysis of mathematical word problems in primary school, Chinese input method, topic detection, opinion mining, etc.
PI個人首頁(PI's Information) :
http://iasl.iis.sinica.edu.tw/hsu/zh/

實驗室網址(Research Information) :
http://iasl.iis.sinica.edu.tw/index.html
https://

Email :
hsu@iis.sinica.edu.tw
黃彥男
HUANG, YENNUN
IoT/Web系統、嵌入式視覺系統開發及AI工業數據分析

IoT / Web system, image processing on embedded system development and AI industrial data analysis
1. IoT系統開發
- 以下經驗須具備其一, 須實際具備toolchain / crosscompile之經驗
-- Arduino開發經驗
-- Micro Linux韌體開發經驗
-- Raspberry PI GPIO介面開發經驗
- 具備藍芽偵測、連線、資料傳遞開發經驗
- 未來將於TI之藍芽平台上開發應用, 須學習新開發平台
- 具備Linux驅動程式開發經驗尤佳
- 具備web開發經驗尤佳

2. Web系統開發
- 具備以PHP / Javascript開發網站之能力, 須精熟
- 具備web串接API之經驗
- 對Python / Linux有基本了解, 有精進意願
- 熟悉MQTT等message queue流程, 或願意學習
- 未來將負責網站開發與維運, 網站非一般內容網站, 需要與IoT設備密切聯繫
- 具備IoT設備開發經驗尤佳

3. 嵌入式視覺系統開發及AI工業數據分析
- 電機,機械,資工,工管科系優
- 須具備程式撰寫能力(C, python, MATLAB)
- 具備嵌入式系統開發能力(Raspberry Pi)
- 具備數據分析基礎能力(資料探勘,訊號/影像處理,機器學習等)

1. IoT System Development
- Experiences about toolchain / cross-compile are required. One of the following experience is required:
-- Arduino
-- Micro Linux Firmware
-- GPIO interface on Raspberry PI
- Knowledge about Bluetooth protocol
- WIll develop Bluetooth applications on Texas Instrument chipsets
- Linux driver development experience is a plus
- Web development experience is a plus

2. Web development
- Need to be an experienced coder of PHP / Javascript
- Experience on web / API interaction
- Willing to learn Python / Linux
- Familiar with MQTT or similar message queue
- Will be responsible for website management with  intensive communications to IoT devices
- IoT Development

3. Embedded vision system development and AI industrial data analysis
-Excellent Department of Electrical Engineering, Mechanical Engineering, Industrial Engineering, Industrial Management
-Must have programming skills (C, python, MATLAB)
-With embedded system development capabilities (Raspberry Pi)
-Have basic data analysis capabilities (data exploration, signal / image processing, machine learning, etc.)
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yennunhuang/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/yennunhuang/
https://

Email :
starmoon6619@gmail.com
莊庭瑞
Chuang, Tyng-Ruey
Three Topics: 1. Maps Re-imagined; 2. Open Repository for Research Data; 3. Communal Data Sharing

Three Topics: 1. Maps Re-imagined; 2. Open Repository for Research Data; 3. Communal Data Sharing
Three summer interns sought:

1. To participate in our experimentation on re-imagining historical maps by using OpenStreetMap and other open source tools. For an introduction, please see our paper "Maps Re-imagined: Digital, Informational, and Perceptional Experimentations in Progress" (presented at Digital Humanities 2019):

https://www.iis.sinica.edu.tw/~trc/public/publications/DH2019/

2. To participate in our work in building an open repository for research data. For an introduction, please refer to "Improving Data Discovery through Wikidata" (presented at WikidataCon 2019) <http://media.academia.tw/u/trc/m/improving-data-discovery-through-wikidata-pdf/> and "Retooling An Open Data Repository for A Research Data Repository" (presented at ECAI 2018) <http://media.academia.tw/u/trc/m/depositar-ecai-pnc-2018-slide/>. The data repository is open to the public at:

https://data.depositar.io/about

3. To participate in our research into models of communal data sharing. For an introduction, please see our paper "Governance of Communal Data Sharing":

http://media.academia.tw/u/trc/m/chapter-12-good-data/

Three summer interns sought:

1. To participate in our experimentation on re-imagining historical maps by using OpenStreetMap and other open source tools. For an introduction, please see our paper "Maps Re-imagined: Digital, Informational, and Perceptional Experimentations in Progress" (presented at Digital Humanities 2019):

https://www.iis.sinica.edu.tw/~trc/public/publications/DH2019/

2. To participate in our work in building an open repository for research data. For an introduction, please refer to "Improving Data Discovery through Wikidata" (presented at WikidataCon 2019) <http://media.academia.tw/u/trc/m/improving-data-discovery-through-wikidata-pdf/> and "Retooling An Open Data Repository for A Research Data Repository" (presented at ECAI 2018) <http://media.academia.tw/u/trc/m/depositar-ecai-pnc-2018-slide/>. The data repository is open to the public at:

https://data.depositar.io/about

3. To participate in our research into models of communal data sharing. For an introduction, please see our paper "Governance of Communal Data Sharing":

http://media.academia.tw/u/trc/m/chapter-12-good-data/
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~trc/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~trc/
https://data.depositar.io/

Email :
trc@iis.sinica.edu.tw
葉彌妍
Yeh, Mi-Yen
巨量資料探勘與深度學習於人工智慧應用

Big Data Mining and Learning on AI applications
本實習將探討如何利用資料探勘與深度學習技術,設計相關演算法,以解決人工智慧應用的問題,可能的議題包含:(1)網路顯示廣告即時競標之獲勝價格與點擊率預測模型 (2)利用知識圖譜做問答知識推論 (3)以Graph Convolutional Network)GCN為主之圖形資料學習應用 (4)資料探勘與機器學習之基礎演算法探究

In this intern study, we will study big data mining and machine learning techniques. We hope to design algorithms to solve the related AI applications. The potential research topics include: (1) Price prediction/Click through rate models in real-time bidding for display advertisement, (2) Knowledge Inference for query and answer system based on knowledge graph, (3) Applications and advanced algorithm design for graph convolutional network, and (4)Data Mining and Machine Learning Foundations.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/miyen/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/miyen/
https://

Email :
miyen@iis.sinica.edu.tw
周彤
Chou, Tung
後量子密碼學

Post-quantum cryptography
本實習的內容為後量子密碼學研究。研究內容大致可分為「建設性」研究以及「破壞性」研究。建設性研究包含使用軟體或硬體來實作高效率且安全的的密碼系統。破壞性研究則包
含破解密碼系統背後的數學難題或是找出設計上的漏洞。參與本實習的同學將學習當代最新的後量子密碼系統如何運作,並且實際參與進行密碼系統的實作或攻擊。

This internship involves researches on post-quantum cryptography. Basically, there are two types of cryptographic researches: constructive ones and destructive ones. Constructive researches include building efficient software or hardware implementations of cryptosystems. Destructive researches include breaking the underlying "hard" problems or finding weaknesses in the design. Students joining this internship are going to learn how the latest post-quantum cryptosystems work and are expected to build cryptosystems or carry out attacks on their own.
PI個人首頁(PI's Information) :
https://tungchou.github.io/

實驗室網址(Research Information) :
https://tungchou.github.io/
https://

Email :
blueprint@citi.sinica.edu.tw