2019 研究主題清單 (2019 Research List)


主持人(PI)研究主題(Research Topic)研究介紹(Introduction)其他資訊(Other Information)
陳孟彰
Chen, Meng Chang
複合視神經網路:理論與應用

Composite Neural Network: theory and implementation
本計畫針對複合神經網路進行研究,以探索其性能與各種性質,並提出演算法,以及建置雛型系統,讓非AI專家之使用者可以遵循使用,設計可解決其問題的複合網路。近年來深度學習技術在影像、聲音、自然語言等領域大放異彩,但是對複雜的應用,例如PM2.5濃度預測或是股價預測等,則至今尚無一個好的方法。探究其主要原因是複雜應用的影響因素多,因素之間或與結果關係複雜,所以沒有一個數學模式可以解釋,也自然沒有單一深度學習神經網路可以徹底解決問題。本計畫即為針對這個問題來進行研究,提出解答。

This project aims at the essential issues of composite neural network, including the performance characteristics and algebraic properties of neural network composition, algorithms for discovering an optimal equivalent neural network, and developing an GUI prototype to assist user to design an problem-solving composition network for their complicated problem.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mcc/

實驗室網址(Research Information) :
http://ants.iis.sinica.edu.tw
https://

Email :
mcc@iis.sinica.edu.tw
曹昱
Tsao, Yu
結合多模態 (聽覺、視覺、觸覺) 資訊之口語溝通輔具開發

Assistive Hearing and Speaking Devices based on Integrating Multi-modality Information (Audio, Visual and Tactile Cues)
語音辨識為多感官共同合作的過程,聽覺管道和視覺管道訊息的整合為辨識語音中不可或缺的歷程。聽者藉由「讀話」(speechreading),一邊聆聽語音,一邊觀察說話者的嘴型、臉部表情、手勢等,以獲得完整的語音訊息。而當欲聆聽的語音訊息不完整時,例如當聽者身處吵雜的情境中時,聽覺和視覺的訊息整合能力更有其重要性。研究發現,溝通障礙的族群整合視聽覺訊息的能力呈現困難,而聽力損失或老化亦是造成視聽覺整合能力不佳的影響因素。另一方面,在機器自動語音辨識的研究中亦發現,語音及影像事件的整合能更有效增強語音、提高自動語音辨識的結果。因此,本研究將應用多感官訊息整合的技術,整合語音、嘴型、臉部表情與手勢動作的資訊,提出一套創新型的聽覺輔具,以協助聽損者提升語音辨識的成效。研究成果亦將發展為一套整合多感官訊息之創新型聽能訓練平台,作為聽語專業人員為聽損者進行訓練及評量成效的工具。


Speech recognition is a process that occurs not only in the auditory modality but also the visual modality. The role of visual information is particularly prominent when the auditory speech is contained in a less favorable signal-to-noise ratio. In human communications, the integration of auditory and visual information is a mandatory process and speech disorders may result from difficulties integrating and processing heard and seen speech signals.

Our research is to include visual and tactile information to provide enhanced recognition of speech signals for the individuals with hearing impairment. Towards this end, we plan to work on a multimodal framework expanding from the integrated deep and ensemble learning algorithm. The role of visual information in recognizing tonal speech, especially Mandarin Chinese, will be first explored in the human-computer interfaces for devising the optimal parameters to integrate audio and visual information in the proposed framework. We will then evaluate the derived optimization criteria and test the proposed framework structure using objective speech quality index methods and subjective human speech perception tests. It has been verified that brain responses can be used as an indicator to speech perception results. To achieve an unbiased and direct evaluation, we plan to incorporate the Electroencephalography (EEG) measurements into the evaluation system to check the rehabilitation progress and optimize the rehabilitation program. The optimized framework structure will be developed into a multimodal training tool for hearing-impaired individuals to consistently and regularly practice in the speech-language intervention and aural rehabilitation process.

The outcomes of the proposed project will be development of three tools for enhancement of hearing assistive framework, performance evaluation and a multimodal training program for hearing-impaired individuals.

PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yu.tsao/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/yu.tsao/
https://

Email :
yu.tsao@citi.sinica.edu.tw
徐讚昇
Hsu, Tsan-sheng
資料密集運算的高效率實做

Efficient implementation of massive data computing
一些實驗室因實際應用而發展出之資料密集運算演算法的
高效率實做

In this summer, we will seek to a way to efficient
implement some data massive computing algorithms which our lab designed for some real applications
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.t/~tshsu

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :
carol@iis.sinica.edu.tw
蘇克毅
Su, Keh-Yih
應用深度學習進行智慧型問答與跨文件處理

DNN-based Intelligent QA and Multi-Document Processing
自然語言理解(Natural Language Understanding, NLU)為人工智慧當前最重要的研究領域之一。其中「機器閱讀」是指電腦能夠自行透過閱讀學習知識(read to learn)、並能以學習的知識來增強自己的閱讀能力(learn to read),本研究室目前致力於建立機器閱讀系統,並應用於智慧型問答以及多文件處理。此研究專題的重點將放在 (1) 把領域背景知識融入深度學習網路(DNN),以增進 DNN 之學習效率;(2) 建立基於知識庫(Knowledge Base)的問題回答系統,加強問題辨識能力與推論能力;(3)結合遠監督式機器學習與 DNN,以期能減少領域知識的訓練資料量,並有效擴增知識庫。

Natural Language Understanding (NLU) is one of the most popular fields among AI studies. One of the major tasks, Machine Reading, enables computers to obtain knowledge from given texts in aids of logic inferences, and aims not only "read to learn" but also "learn to read". We are establishing a machine reading system with applications to intelligent QA systems and multi-document processing. In this project, we expect to integrate domain knowledge into DNN to increase its performance, and apply distant supervision to reduce training data of domain knowledge while improve inference power and enlarge knowledge base as well.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/kysu/

實驗室網址(Research Information) :
http://nlul.iis.sinica.edu.tw/
https://

Email :
kysu@iis.sinica.edu.tw
蘇黎
Su, Li
音樂資訊檢索、音樂人工智慧、音樂互動系統、音樂訊號處理、計算音樂學

Music information retrieval, music artificial intelligence, music interactive systems, music signal processing, computational musicology
音樂與文化科技實驗室(Music and Culture Technology Lab)成立於2017年。我們致力於研發最先端的數位訊號處理、深度學習技術,應用在各種結合音樂與人工智慧的議題上,包括自動採譜、機器鑑賞、即時音樂互動、生成式音樂、計算音樂學等,其應用場域橫跨音樂之聆賞、分析、製作、展演等活動,期能展開科技與人文的深度對話,促進音樂文化融入生活。

The Music and Culture Technology Lab was founded in 2017. We devote ourselves to develop cutting-edge digital signal processing and deep learning techniques on music and AI, such as automatic music transcription, machine connoisseurship, real-time music interactive system, generative music, and computational musicology. Applications are found across music listening, analysis, production, and even performance activities. Our goal is to launch a deep and fruitful dialogue between technology and humanity, and make music culture as a part of our everyday life.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lisu/

實驗室網址(Research Information) :
https://sites.google.com/view/mctl/
https://

Email :
lisu@iis.sinica.edu.tw
王新民
Wang, Hsin-Min
語音、語言與音樂處理

Speech, Language and Music Processing
我的研究興趣是語音處理、自然語言處理、多媒體資訊檢索、機器學習與模型識別,研究目標是開發多媒體音訊(主要是語音與音樂)分析、抽取、辨識、索引、檢索及生成技術。進行中的研究工作包括自動語音辨識、語者辨識、語音轉換(例如說話人語音轉換、中性語音轉表達性語音)、語音文件檢索/摘要、問答系統、音樂資訊檢索、音樂生成等。

My research interests include speech processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. The research goal is to develop techniques for analysis, extraction, recognition, indexing, retrieval and generation of multimedia audio data (mainly speech and music). The ongoing research includes automatic speech recognition, speaker recognition, voice conversion (e.g., speaker voice conversion and neutral speech to expressive speech conversion), spoken document retrieval and summarization, question answering, music information retrieval, music generation, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/whm/

實驗室網址(Research Information) :
http://slam.iis.sinica.edu.tw/
https://

Email :
whm@iis.sinica.edu.tw
楊得年
Yang, De-Nian
網路與巨量資料分析、網路最佳化、機器學習、演算法設計

Network and Big Data Analysis, Network Optimization, Machine Learning, Algorithm Design
1. 網路、社群網路、資料探勘或應用數學相關研究
2. 或演算法設計、圖論、最佳化、賽局理論、機器學習、統計推論、隨機程序相關背景之數學研究
3. 或網路或巨量資料分析系統實作、程式撰寫

1. Network, Social Network, Data Mining, Applied Math Research
2. Algorithm, Graph Theory, Optimization, Game Theory, Machine Learning, Statistics Inference, Stochastic Process
3. Programming and System Implementation for Network and Big Data Analysis
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/dnyang/
https://

Email :
dnyang@iis.sinica.edu.tw
修丕承
Hsiu, Pi-Cheng
嵌入式深度學習於電腦視覺應用

Embedded Deep Learning for Computer Vision Applications
中研院資創中心電腦視覺實驗室 & 嵌入式暨行動計算實驗室 和台大電機/台大資工的研究團隊 (包含Computer Vision、Deep Learning、Embedded Systems、Hardware Chips的團隊) 合作一個跨領域的計畫做Embedded Deep Learning for Computer Vision and Graphics,目標是讓計算能力有限的裝置也有能力做深度學習,執行特定的電腦視覺與圖學應用。學生將由修丕承博士與林彥宇博士共同指導,從事嵌入式系統與深度學習研究。

Deep learning technologies are at the core of the current AI revolution. The affordable GPU hardware and large annotated datasets jointly allow the training of deep learning models with hundreds of layers and millions of parameters. The deep learning-enabled breakthroughs result in great successes in numerous AI research fields. However, deep learning models typically have severe demands on the computing resources of devices, which makes leveraging the power of deep learning on resource-constrained platforms or in real-time applications infeasible. The mission of this project is to address this issue and increase the applicability of deep learning. Students involved will be co-advised by Dr. Pi-Cheng Hsiu and Dr. Yen-Yu Lin and gain excellent experience in embedded systems and deep learning research.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/pchsiu/

實驗室網址(Research Information) :
http://emclab.citi.sinica.edu.tw/
http://cvlab.citi.sinica.edu.tw/

Email :
pchsiu@citi.sinica.edu.tw
張原豪
Chang, Yuan-Hao
(1)嵌入式深度學習與(2)次世代記憶體內運算之系統設計與優化

(1) Embedded Deep Learning and (2) System Design and Optimization for Next-generation In-memory Computing
(1) 嵌入式深度學習
近年類神經網路被廣泛用於許多領域,而隨著大型神經網路的發展,神經網路訓練階段對記憶體的需求日漸增加。此趨勢下,由於具備低耗能與高空間價位比等特性,相變化記憶體 (Phase Change Memory) 被認為是足以取代動態隨機存取記憶體 (DRAM) 的非揮發性記憶體。然而將該記憶體用於主記憶體仍面臨過長的存取速度以及有限的寫入次數 (壽命) 等挑戰。本計畫利用我們觀察到之神經網路資料行為來設計資料行為感知的編程技術 (Data-Aware Programming Module) 以優化相變化記憶體之存取速度與壽命。
   本研究計畫為主要的目的在研發能提升相變化記憶體類神經訓練系統 (PCM-Based NN-Training Systems) 之效能與壽命的關鍵技術。計畫第一年的主軸著重於提升寫入編程 (write programming) 的效能,我們利用觀察到的data flow行為設計新的寫入編程,同時為了讓該資料行為能從CPU傳至相變化記憶體的控制器,我們亦提出了一套控制器協同設計。在計畫第二年,我們將觀察到的data content行為整合進第一年提出的編程技術,並提出一套雙模式寫入 (dual-SET programming),使得寫入效能有更近一步的提升。計畫第三年的主軸為提升記憶體之壽命,我們不僅根據資料行為設計一套抹除平衡器 (wear leveler),更考慮相變化記憶體於長短期將遭遇之可靠性議題。

(2)次世代記憶體內運算之系統設計與優化
記憶體內運算一直以來都是相當熱門探討的議題,然而在傳統儲存系統架構,由於動態隨機存取記憶體(DRAM)需要消耗電力來刷新(Refresh)記憶體內的資料,避免資料遺失,因此,利用動態隨機存取記憶體之儲存架構一直無法在平台閒置(Idle)狀態時降低其系統耗能,以致於在低耗能記憶體內運算的研究上碰到了無法突破之瓶頸,有幸,近年來次世代的可位元存取非揮發性記憶體 (non-volatile memory)快速發展,依據其非揮發、位元組存取與超低閒置耗能等特性,是相當具有潛力同時取代未來DRAM。此外,由於傳統式儲存裝置也受到物理上的發展限制,因此有更高密度的儲存技術—疊瓦式硬碟(SMR)以及多流可位元修改三維快閃記憶體的技術被推出,因此,基於以上兩個主要之次世代儲存裝置,本計畫將重新設計記憶體內運算系統中與資料存取及應用相關之系統核心,藉由系統核心與相關應用的重新思考設計來發揮出次世代新興記憶儲存裝置之極致效能,並達成超綠能(Ultra-low green)之記憶體內運算系統設計


(1) Embedded Deep Learning
In recent years, neural networks (NNs) have been expanded to wide range of application domains. Their fast-growing scale has a fast-growing demand to the capacity of the main memory during the model training. However, the high unit cost and leakage power of DRAM bring up serious challenges on scaling up its capacity. In this trend, phase-change memory (PCM) is becoming popular candidates because of their low cost, high density and nearly-zero leakage power. Nevertheless, compared with DRAM, PCM suffers from insufficient write performance and limited endurance. On the other hand, based on our observations, neural networks show special access behaviors on accessing their weights, biases and outputs of each NN layers. Such an observation motivates this project to develop and design key enabling techniques for PCM-based systems for neural networks by resolving these challenges on enabling the data-aware programming module.
   This project aims at enabling the data-aware programming module for PCM-based NN-training systems to resolve the major challenges in the design of using PCM as the main memory of NN-training systems. In the first year, we will focus on improving the write performance by exploiting the data flow behaviors. Specifically, we will not only conduct a series of observational experiments to observe the data flow behaviors, but also design the PCM programming approach by being aware of the NN data flow behaviors. Meanwhile, to bridge the information gap between CPU and PCM controller, we will explore the possibility to enable the controller co-design approach to bypass NN information. In the second year, we will further improve the write performance by designing another programming approach to trade the programming accuracy and performance with the help of the observed behaviors of data content. Besides, to make current PCM programming support the proposed programming approach, we will also design the dual-SET programming operation. Notably, both programming approaches in the first two years can be integrated for further performance improvement. In the third year, we will focus on dealing with the limited endurance issue. We will not only design a NN-training friendly wear leveler by exploiting data behaviors, but also considering two urgent reliability issues of PCM devices, i.e., the long-term un-leveling and the short-term data corruption issues.

(2) System Design and Optimization for Next-generation In-memory Computing
In-memory computing is nowadays a popular research topic in the area of computer system designs. In order to develop an ultra-green in-memory computing system, dynamic radon access memory (DRAM) can be replaced by non-volatile memory, because DRAM needs to consume electric power to refresh memory cell for preventing data loss, even if a platform is in idle state. Owing to persistent random access memory (PRAM’s) several nice features, such as non-volatility, byte-addressability, and high access performance, PRAM becomes a promising candidate for next-generation memory for replacing DRAM. In addition to PRAM, shingled magnetic recording (SMR) and 3D NAND flash memory with multi-stream and bit-alterable technologies are next-generation storage technologies as the data storage and data backup of in-memory computing applications. However, due to the physical limitations of traditional storage drives, SMR is one of the candidate for replacing the hard disk drives (HDDs) to further increase storage density and overall per-drive storage capacity. Based on the above emerging memory and storage technologies, this project aims at rethinking and redesigning the mechanisms of data management in next-generation in-memory computing systems so as to exploit the benefits of emerging memory and storage technologies. By exploiting the benefits of next-generation memory and storage technologies, this project will build an ultra-green in-memory computing system based on PRAM, 3D flash memory, and SMR.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~johnson/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/johnson/
https://

Email :
johnson@iis.sinica.edu.tw
王志宇
Wang, Chih-Yu
機器學習在無線與社群網路中的應用

Machine Learning in Wireless and Social Networks
從事無線網路、社群網路、資料科學、機器學習、網路經濟等相關研究。著重於結合機器學習與無線網路和網路經濟相關研究,以學理研究為主,亦有參與產學合作的機會

Study the potential of machine learning in wireless and social networks. Majorly academia-oriented research but also offers opportunities to join industrial collaboration project if interested.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cywang/

實驗室網址(Research Information) :
https://snaclab.citi.sinica.edu.tw
https://

Email :
cywang@citi.sinica.edu.tw
黃文良
Hwang, Wen-Liang
適用於影像分割的深度學習演算法

Deep Learning method on Image Segemation
透過設計數值最佳化的模型,學習適合的 Graph Laplacians ,進一步使用 Normalized Cuts 的技術來實現圖形分割。

We design the numerical optimization model, learning the appropriate Graph Laplacians matrix. Further, we implement the image segmentation task via Normalized Cuts technique.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/whwang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/whwang/
https://

Email :
whwang@iis.sinica.edu.tw
陳祝嵩
Chen, Chu-song
深度學習技術開發與整合

Deep Learning Techniques Development and Integration
深度學習模型之開發,簡化,與整合,及其在電腦視覺與多媒體領域上的應用。

Deep learning model development, reduction, and integration, together with their applications on computer vision and multimedia.
PI個人首頁(PI's Information) :
https://www.semanticscholar.org/author/Chu-Song-Chen/1720473

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/song/
https://scholar.google.com.tw/citations?user=WKk6fIQAAAAJ&hl=zh-TW&oi=ao

Email :
song@iis.sinica.edu.tw
王釧茹
Wang, Chuan-Ju
表示學習演算法於人工智能之應用

Representation learning and its applications
異質性資料涵括各式結構化(如:消費記錄、產品規格)及非結構化數據(如:網友文字評論),其各自的資料結構及特徵空間大不相同,因此如何進行彼此間的關聯、整合及推論仍屬當代人工智能技術及其相關應用的一大挑戰。然而透過機器學習的非監督式學習法則有可能將異質性資料表現於共同的特徵空間之中,倘若又能在此空間中獲得優良的資料表示法,則可作為異質性資料分析的穩固基石。因此,本研究主題從深度學習及網路表示法的框架切入,深入探究其空間轉換的特性及其保留的訊息,並將針對不同的資料型式及應用情境設計對應之演算法。除了演算法設計及理解外,本實習亦具有有以下三個特色:1) 將使用真實世界的資料進行資料分析及學習;2) 將學習如何在unix-like 環境下處理大量資料並運行實驗;3) 將學習如何使用網頁前端技術進行結果之視覺化呈現。



The research topics will be related to the processing and understanding heterogeneous data (including texts, pictures, audio signals, social relations, and user behaviors) and using the deep learning and/or network embedding techniques for various AI-enable applications. In addition to the model design, during the internship, the participant will also 1) have hands-on experience with real-world data, 2) learn how to deal with large-scale data and conduct experiments under unix-like systems, 3) learn how to visualize the learned results using front-end web programming techniques.
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/cjwang/

實驗室網址(Research Information) :
https://cfda.csie.org
https://

Email :
cjwang@citi.sinica.edu.tw
林仲彥
Lin, Chung-Yen
生物醫學大數據資料解析

Analyze Biomedical Big data for Genome Biology
我們的團隊主要研究模式與非模式生物之多維基因體學(OMICS),包括基因體、轉錄體、單細胞轉錄體、蛋白質交互網路、腸道微生物與疾病關連等巨量資訊數據分析,同時也定序、重組與註解了多個重要經濟生物之基因體,研究團隊並專注於跨領域的研究工作,歡迎不同領域(資訊、統計、數學及生物相關)的人才一起合作。研究範圍以人體幹細胞、水生經濟動物及環境微生物為主,同時發展新的高速計算工具及雲端分析平台,以及引入深度學習等策略,來探討基因、病原與環境的三角互動關係。

The main goal of our team is to analyze omic big data which may lead us to know more about the secrets of biological regulations hidden among massive data deluge.  By combination of open source tools and self-developed programs/ platforms, we have assembled, annotated and decoded the several aquatic genome with high economic importance. New approaches like deep learning will be introduced and polished our studies.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cylin/

實驗室網址(Research Information) :
http://eln.iis.sinica.edu.tw
https://

Email :
cylin@iis.sinica.edu.tw
鐘楷閔
Chung, Kai-Min
密碼學、複雜度理論或量子密碼學之獨立研究

Independent Research on Cryptography, Complexity Theory, or Quantum Cryptography
The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

The intern is expected to perform independent research on selected topics in (Quantum) Cryptography, Complexity Theory, or general theoretical computer science (TCS) that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Quantum Key Distributions (QKD), Post-quantum Cryptography, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/kmchung/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~kmchung/
http://www.iis.sinica.edu.tw/~kmchung/

Email :
kmchung@iis.sinica.edu.tw
古倫維
Ku, Lun-Wei
(1) 多模態故事生成  (2) 基於知識庫問答系統  (3) 網路文章推薦系統

(1) Multimodal Story Generation  (2) Knowledge-based Question Answering  (3) Content-based Recommendation System
在這些研究主題中,將學習到自然語言處理之資訊擷取、文章分類、文字生成、推薦系統等概念,另涵蓋自然語言基礎工具的使用及機器學習、深度學習的模型建立等先進技術,可與老師討論希望選擇的研究主題。實習期間會專注於上述研究主題並參與模型開發及論文撰寫。各主題研究內容詳述如下:

(1) 在多模態故事生成專案中,我們注重在圖像故事生成 (看圖說故事),目前研究方向上,首先會辨識出每張圖片中的物件及動作當作素材,並且使用這些素材來構成前後呼應並與圖片相關的故事。

(2) 在知識庫問答系統(KBQA)中,我們研究如何將自然語言的問題與知識庫的關係計算相似性的分數。技術上包括自然語言理解(NLU)和信息檢索
(IR)。在 NLU 方面,藉由建構神經網絡模型,利用Attention、Residual等技術來提升模型效能。在IR方面,藉由開發演算法來解決如何在近乎無限量級的推論關係中找到正確的關係。

(3)在推薦系統專案中,我們與廠商合作,研究真實文章和使用者的線上紀錄,並利用NLP,推薦系統和深度學習的技巧,提升推薦系統的效果。過程中,我們不僅會開發技術,視進度撰寫論文投稿頂級會議,也會實際到合作公司討論。

實習結束後,表現優良的同學可繼續與實驗室合作研究並發表論文。



Interns will learn how to use basic natural language processing tools, extract information from texts, classify documents, recommendation systems and generate dialogs. Machine learning and deep learning technologies for NLP will be touched. Interns can select the topic/team they wish to join.

(1) In multimodal storytelling project, we are focusing on visual storytelling which machine generates a story by a given image sequence. In the current project, we first detect terms’ entities and actions, in each image and then utilize these terms to compose a coherent story.

(2) In knowledge base question answering (KBQA), we are working on the matching between natural language question and structural relation in knowledge base. In general, knowledge base question answering (KBQA) is a task combining natural language understanding (NLU) and information retrieval (IR). On the side of NLU, we build neural network model, with attention, residual and so on to enchase the model performance. On the side of IR we are developing algorithm to tackle on the problem to find correct relation among all relations in knowledge base, of which the number is regarding as nearly unlimited.

(3) In recommendation system project, we will get the real-world articles and user logs collected from the collaborated company, and try to utilize NLP, content-based filtering and deep learning techniques on these data to enhance the performance of the recommendation system. During the process, we will not only submit multiple top-conference papers but also discuss with the collaborated company which deploys recommendation system on actual websites.

After the internship, students with good performance can continue to work with the laboratory to research and publish papers.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lwku/

實驗室網址(Research Information) :
http://academiasinicanlplab.github.io/
http://academiasinicanlplab.github.io/

Email :
lwku@iis.sinica.edu.tw
廖純中
Liau, Churn-Jung
巨量資料運算暨管理實驗室

Massive Data Computation and Management Laboratory
詳見http://chess.iis.sinica.edu.tw/lab/?cat=2

please  take a look at http://chess.iis.sinica.edu.tw/lab/?cat=2
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liaucj/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/?cat=2
https://

Email :
liaucj@iis.sinica.edu.tw
吳真貞
Wu, Jan-Jan
雲端與行動裝置之協同式深度學習系統

A Cloud and Mobile Devices Based Collaborative Deep Learning Framework
訓練一個好的深度學習模型通常需要大量且具有標籤的資料。根據目前資訊傳播及資料生成的速度,我們很容易就能夠透過網路收集到大量的資料。但是這些資料可能並不具有標籤,或者標籤的正確性需要被驗證。然而,以人工的方式去添加及驗證資料標籤的可靠程度是不太實際的。另一方面來說,手機是另一個對深度學習所需的資料很好的來源。手機使用者每天都在手機上產生大量的資料,並在有意或無意間透過使用者行為,為這些資料加上標籤。然而,基於保護隱私的原因,使用者可能不想提供這些資料。因此,我們需要一個有效率的方法,透過雲端伺服器與手機的合作,在兼顧使用者隱私及手機資源有限的情況下,解決因資料量不足無法訓練好的深度學習模型這個問題。

在本計畫中,我們提出一個系統架構,透過雲端伺服器與使用者裝置的合作,能夠自動為資料產生標籤,並透過多次運行我們提出的機制,提高產生的標籤的可靠程度。本計畫提出的系統架構首先利用雲端伺服器充足的運算能力及少量的標籤資料,訓練一個初步的共享Convolutional Neural Network (CNN). 經網路。該CNN透過使用者下載到手機端,在手機端上藉由使用者資料學習改進。這些改進後的模型會被傳送到伺服器端,透過我們設計的機制,為伺服器上未標籤的資料提出建議的標籤,並藉此改進共享CNN。在這個過程中,使用者的資料並沒有離開手機裝置,因此可確保使用者隱私。該系統對使用者的誘因是能夠藉此獲得表現更好的共享模型,且不用大量消耗手機上電量與運算資源。

Training a deep neural network with only a small amount of labeled data is a challenging issue. With the high velocity of data generation nowadays, it is easy to gather a huge amount of data into a cloud system. However, these data may not be properly labelled. Manually labelling the data is time consuming and impractical. On the other hand, mobile device is a possible source of data for deep learning. Mobile device users generate a large amount of data every day while advertently or inadvertently adding labels to these generated data. However, the users may not wish to share these labelled data with the cloud server due to privacy concerns. As a result, we need an efficient approach to resolve the issue of not having sufficient amount of labelled data for training a deep neural network.

In this project, we propose a framework that automatically generates labels for data used in deep learning, improves the quality of these generated labels on the cloud servers, and most importantly, enables the collaboration between cloud servers and mobile devices. Our proposed framework first trains a generic deep convolutional neural network on the cloud server, and pushes the shared generic model to all mobile devices. The model on the mobile device is improved by learning from user data collected at the mobile devices. The updates to the generic model are then summarized and uploaded to the cloud server. The cloud server aggregates these feedbacks to improve the shared generic model in cloud. Note that all the user data remains on the device, and thus user privacy can be guaranteed. The incentive for the users to participate in this collaborative learning is that they will be able to benefit from an improved model, which provides better inference results, without draining their batteries or providing their private data to the cloud.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wuj/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/page/research/ComputerSystem.html?lang=zh
https://

Email :
wuj@iis.sinica.edu.tw
王柏堯
Wang, Bow-Yaw
Transprivacy - 一個強化隱私之透明化架構

Transprivacy - A Transparent Framework for Privacy Enhancement
本計畫將利用區塊鏈以實作Transprivacy架構。

In this project, we will use Blockchain to implement the Transprivacy framework.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~bywang

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~bywang
https://

Email :
bywang@iis.sinica.edu.tw
穆信成
Mu, Shin-Cheng
函數編程與程式正確性推理之相關問題

Functional Programming and Program Reasoning
我的研究興趣是程式語言與函數編程,近年來也包括 concurrent 程式的型別系統與 Hoare logic. 它們的共同點是使用符號推理的方式確保程式的正確性。不論在哪個典範中,我們都希望把寫程式視作一個可用數學與邏輯方式推理的行為。程式的正確性可用型別系統或邏輯推演保證,甚至可用規格與需求開始,經由數學方法一步步推導出程式。

本領域可做的大方向包括
* 設計幫助推理用的符號、程式語言、型別系統等。
* 挑選一些演算法問題,嘗試以數學方法實際證明演算法之正確性,或將演算法推導出來。
* 研究 concurrent 程式以及其型別系統 (session type) 與邏輯之關係。
* 以函數編程語言為工具,開發 Hoare logic 與指令式語言編程課程使用的教學系統。

如對以上題目有興趣,在三個月的實習期間,我們可用一到一個半月的時間學習相關理論(函數編程、型別、邏輯等),用剩下的時間研究新東西或開發系統。

My research interest concerns programming language and functional programming, and extends to Hoare logic and type systems for concurrent programs. The common theme is that programming is seen as a formal, mathematical activity. Correctness of a program can be guaranteed by logical reasoning or type system. Or, a program can even be derived stepwise from its specification.

Possible topics include:

* design symbols, languages, or type systems that aids the programmers in reasoning about programs;

* pick an algorithm, and apply our approaches to prove its correctness or even to derive an algorithm;

* study the type system (session type) for concurrent programs and its relationship with logic;

* develop tools for teaching Hoare logic and reasoning of imperative programs, using a functional programming language.

More details can be discussed. If you are interested, we can spend the first 1 to 1.5 months of the internship studying the background knowledge, before diving into developing something new.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/scm/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/scm/
https://

Email :
scm@iis.sinica.edu.tw
蔡懷寬
Tsai, Huai-Kuang
生物資訊

Bioinformatics
如果把基因體想像成一張唱片,播放出來的歌曲就可類比為基因體的轉錄產物。如果拿一首歌曲讓大家來試唱,每個人唱出來的歌聲都是獨一無二;但是,我們基因體序列的差異其實少於 0.1%。基因體絕對不是一堆ATCG的序列,裡面蘊藏許多有意義的樣式跟特性。我們研究的興趣是了解基因體怎麼來運作,如何來調控基因轉錄的產物。我們著重於轉錄因子以及長鏈非編碼RNA對於基因表現的調控,利用生物資訊方法整合各種生物大數據以及建立機器學習模型來分析與探勘基因體的特性。

If a CD is the metaphor of a genome, the melody would be the transcription products of the genome. Everyone’s sound is unique even we are singing the same song. However, the difference of our genome sequence is less than 0.1%. Indeed, a genome is not just sequences of ATCG, but rather contains many meaningful pattern and features. Our research interests focus on the functions of genomes and how they regulate the transcription products as RNA expression. Specifically, we focus on how transcription factors and long non-coding RNAs play roles in the transcriptional regulation. We apply bioinformatics approaches including integration of multi-omics data and machine learning methods to examine our hypotheses, and thus, to explore the hidden rules in the genome.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hktsai/

實驗室網址(Research Information) :
http://bits.iis.sinica.edu.tw/?id=1
https://

Email :
hktsai@iis.sinica.edu.tw
呂俊賢
Lu, Chun-Shien
人工智慧安全遇上資料隱藏

AI Safety Meets Data Hiding
AI safety has attracted considerable attention recently. We will explore
(1). How the small perturbations in adversarial attack mimics the embedded information in data hiding?
(2). How to learn invariant representations in deep learning models and how the learnt invariant representation achieves robust hiding?
(3). How to learn robust adversarial attacks and robust defense schemes?

AI safety has attracted considerable attention recently. We will explore
(1). How the small perturbations in adversarial attack mimics the embedded information in data hiding?
(2). How to learn invariant representations in deep learning models and how the learnt invariant representation achieves robust hiding?
(3). How to learn robust adversarial attacks and robust defense schemes?
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/lcs/

實驗室網址(Research Information) :
https://www.iis.sinica.edu.tw/~lcs
https://

Email :
lcs@iis.sinica.edu.tw
黃彥男
HUANG, YEN-NUN
物聯網

Internet of Things
以機器學習為基礎之社群媒體自動化智慧代理內容投放系統 & 室內人員定位與異常偵測安全管理之穿戴式與嵌入式系統設計

Implementation of Intelligent Agent for Social Media Automatic Posting Using Machine Learning Techniques & Physical computing implementation of safety and health injury prevention using location base system and wearable device
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/yennunhuang/

實驗室網址(Research Information) :
https://www.citi.sinica.edu.tw/pages/yennunhuang/index_zh.html
https://

Email :
starmoon6619@citi.sinica.edu.tw
宋定懿
Sung, Ting-Yi
癌症蛋白體及蛋白基因體之生物資訊研究

Bioinformatics for Proteomics and Proteogenomics analyses of Cancers
蛋白質是基因最後的產物,在細胞內執行各種不同的生物功能;在醫藥研究方面,蛋白質是最主要的藥物標的。因此在後基因體時代,蛋白體研究也因質譜儀實驗技術的精進而蓬勃發展,癌症研究由基因體分析跨入蛋白體、甚至蛋白基因體的研究。癌症基因體的研究已出現瓶頸,例如:無法回答為何相同基因呈現的癌症病人,使用針對該基因的藥物而有不同的藥效,也就是有些病人有效,有些無效;此外,蛋白體分析比基因體可確認更多種癌症的亞型,能較精準地用藥。在此追求精準醫療的時刻,癌症相關的蛋白體、蛋白基因體的研究亦發重要,這也是美國國家癌症研究所大力提倡的研究課題。

蛋白體和蛋白基因體研究都是使用質譜儀的實驗技術,再針對質譜儀產生的大數據進行資料分析,以瞭解癌症腫瘤中蛋白體表現,甚而進行基因體層次的突變在蛋白質層次的驗證。本實驗室是台灣極少數進行蛋白體學生物資訊研究的實驗室,我們這15年來專攻蛋白體學研究上質譜儀資料處理之計算方法及軟體系統開發。自去年起,我們參與由中研院和台大醫學院為主共同執行的台灣癌症登月計畫(Taiwan Cancer Moonshot Project);癌症登月計畫是2016年一月由美國歐巴馬總統宣布成立,台灣也加入此計畫的聯盟。台灣團隊目前是針對肺癌進行研究,因為肺癌是癌症死因之首,且東西方肺癌病人有明顯不同的族群,例如:美國多為長期吸煙者,但台灣許多從未抽煙的婦女罹患肺癌。我們實驗室在整個計畫的團隊中,負責蛋白體的生物資訊研究與質譜大數據分析。

我們研究的目標,是希望找出與肺癌相關的特定蛋白質及其中變異胜肽,並建立一套合適的分析與計算方法,希望能讓患者在早期就能被偵測出來,以減少遺憾。我們實驗室所訓練出來的人才,也是國外亟需的人才,出國唸書時都能申請到獎學金,或到美國如:Johns Hopkins U、U of Michigan醫學院擔任博士後。我們竭誠歡迎有志學習、有熱情的同學,尤其是資訊領域的同學,加入暑期實習。


Proteins are the final product of genes that execute various biological functions in cells. Furthermore, in biomedicine, proteins are the most prominent drug targets. Therefore, after the genomics era as the advancement of mass spectrometry (MS) technology, proteomics research has received ever-increasing attention in cancer research. Furthermore, though genomics study can identify actionable genomic mutations for therapies, many actionable mutations do not respond to targeted therapy and many responses are temporary. It also has been reported that proteomics study can detect new subtypes of cancer with clinical association in addition to those being found from genomics studies. Therefore, proteomics and proteogenomics studies have recently become essential in precision medicine for cancer research as suggested by US National Cancer Institute.

Mass spectrometry is the most commonly used experiment technology to conduct proteomics and proteogenomics research. As the advancement of MS technology, high-throughput MS data will be generated. The analysis of such big MS data is a very important topic. Our lab is one of the very few labs conducting research on bioinformatics for proteomics in Taiwan. Our lab has been particularly working on mass spectrometry data analysis, including algorithm design and software development, for 15 years. Our lab members could always receive fellowship when applying further study in US or could receive offers of post-doctoral positions in medical schools of universities in US.

Since Jan 2018, we have been participating Taiwan Cancer Moonshot Project to conduct lung cancer research. Our lab is responsible for proteomics analysis and MS data analysis. We have been analyzing mass spectral data acquired from MS experiments on paired tumor and adjacent normal tissues of lung cancer patients. Since mutations at genomic level have been reported for lung cancer tumor in the literature, we are also interested to find variant peptides in some specific proteins, i.e., validating those genomic mutations at proteomic level from MS data, and to develop a data analysis pipeline to facilitate the discovery of variant peptides and their validation.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/tsung/

實驗室網址(Research Information) :
http://ms.iis.sinica.edu.tw/Comics/
https://

Email :
tsung@iis.sinica.edu.tw
陳郁方
Chen, Yu-Fang
分析使用字串資料型態(string data type)變數的程式的關鍵技術

String constraint solving -- a core technique for static program analysis
1. 分析使用字串資料型態(string data type)變數的程式的關鍵技術

字串變數分析的關鍵技術為字串求解(string solving)。我們與歐洲幾個大學(瑞典Uppsala大學, 捷克Brno科大等)有固定合作,並和其他團隊維持友善的競爭關係(加拿大滑鐵盧大學, 史丹佛大學, 美國微軟, 英國牛津大學)。已經公開了一個效能很好的字串求解工具。目前我們會繼續發展新的演算法跟求解步驟推進目前工具的效能,並與目前世界上其他頂尖的工具(Z3)做整合。

2. 字串求解(String Solving)在網路程式上之應用

在目前計算機生態系統的分散式雲端架構下,各節點(客戶端、行動裝置等)之間的通訊多會使用字串資料型態變數(以下簡稱字串變數)來處理,因此驗證包含字串變數的程式的需求持續增加,尤其是在資安方面需要有好的方法檢查是否有資安漏洞。而字串求解(String solving)即為主要關鍵技術。我們計畫應用字串求解技術來檢查JavaScript程式的資安漏洞(如跨站腳本攻擊),這個計畫會需要與其他研究程式分析的團隊合作。可能的合作對象包含英國Imperial College London以及美國IBM TJ Watson 研究所。

String solving is a relatively new area in SMT. String solving handles a variety of string constraints, including word equations, regular language membership constraints, length constraints, etc. In this field, we have proposed an efficient approach based on flat-automata which combines flattening and CEGAR techniques. We plan to develop a novel decision procedure towards quadratic word equations based on Nielsen transformation. This procedure is expected to accelerate the solving process of quadratic word equations that other tools may suffer in solving. We plan to integrate the developed technology to Z3, a popular open-source SMT solver that provides flexible integration mechanisms for various theories. This can be used as the back-end for various testing/verification system that works on programs with string variables (e.g., all kinds of scripting languages such as JavaScript, PHP).
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~yfc/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~yfc/
https://

Email :
yfc@iis.sinica.edu.tw
鄭湘筠
Cheng, Hsiang-Yun
適用於新世代記憶體技術與資料密集程式之記憶體系統設計

Memory system designs for emerging technologies and data-intensive applications
可位元存取之非揮發性記憶體(NVM)是極具發展潛力之記憶體技術並可能成為未來記憶體與存儲系統的主流,許多公司開始逐漸推出相關產品,例如 Intel 的Optane Memory,來將非揮發性記憶體應用於電腦系統的記憶體架構中。本實習計劃的目標為探討非揮發性記憶體對電腦系統設計所帶來的新挑戰,並針對資料密集程式之應用情境,設計高效能低耗電之非揮發性記憶體系統。實習生可選擇參與下列研究主題,或其他相關研究議題。

1. 利用非揮發性記憶體內部之運算能力,設計資料密集程式(深度學習、圖論分析等)的加速器
2. 根據非揮發性記憶體之特性,利用軟硬體協同設計設計改進資料密集程式之效能
3. 設計適用於非揮發性記憶體與存儲系統之資料管理機制以提高效能減少耗電

Byte-addressable non-volatile memories (NVMs) are promising and are likely to become the mainstream in the near future. Products such as Intel’s Optane memory are emerging in the market to demonstrate the usage of such new technologies in the memory hierarchy. Our goal is to tackle the design challenges introduced by NVMs to build energy-efficient memory systems for data-intensive applications. Candidate topics include, but are not limited to, the following:
1. Exploiting processing-in-NVM capability to accelerate data-intensive applications (including deep learning, graph analytics, etc)
2. NVM-aware software-hardware co-design for data-intensive applications
3. Energy-efficient data management in NVM-based memory and storage systems
PI個人首頁(PI's Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/

實驗室網址(Research Information) :
http://www.citi.sinica.edu.tw/pages/hycheng/
https://

Email :
hycheng@citi.sinica.edu.tw
張佑榕
Chang, Ronald Y.
運用機器學習於智能通訊

Machine Learning for Intelligent Communications
請見「研究主題英文介紹」

The candidate will investigate machine learning for intelligent communications and IoT applications. The candidate will develop data analytics frameworks to analyze and interpret the machine learning model in these applications. The candidate will write up research reports, slides, and/or papers. Knowledge of wireless communication and/or machine learning is a plus.

Related readings:
[1] R. Y. Chang, S.-J. Liu, and Y.-K. Cheng, "Device-Free Indoor Localization Using Wi-Fi Channel State Information for Internet of Things," IEEE Global Communications Conference (GLOBECOM), December 2018.
[2] Y.-K. Cheng and R. Y. Chang, "Device-Free Indoor People Counting Using Wi-Fi Channel State Information for Internet of Things," IEEE Global Communications Conference (GLOBECOM), December 2017.
PI個人首頁(PI's Information) :
https://www.citi.sinica.edu.tw/~rchang/

實驗室網址(Research Information) :
https://www.citi.sinica.edu.tw/~rchang/
https://

Email :
rchang@citi.sinica.edu.tw
呂及人
Lu, Chi-Jen
深度學習的原理與應用

Deep learning: foundations and applications
研究深度學習的原理,並拓展深度學習在影像、自然語言等各個領域的應用。

Study the foundation of deep learning, and explore its diverse applications in various areas such as computer vision and natural language processing.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/cjlu/
https://

Email :
cjlu@iis.sinica.edu.tw
高明達
Ko, Ming-Tat
臺語 語言與語音處理

Taiwanese Language and Speech Processing
研究的目標在於建立一個臺語語音辨識系統,工作包括臺語語音與文字語料的研發,語音辨識之深度演算法的研發。

The goal is to build a Taiwanese speech recognition system.  The work includes the development of Taiwanese speech and text corpus and the DNN speech recognition training system.  
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/mtko/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/page/research/ComputationTheoryandAlgorithms.html
https://

Email :
mtko@iis.sinica.edu.tw
劉庭祿
Liu, Tyng-Luh
電腦視覺技術與人工智慧應用

Computer Vision Techniques for Artificial Intelligence Applications
You are expected to participate in research activities that focus on developing computer vision techniques for various artificial intelligence applications, including NLP+CV applications, automatic action and activity recognition, learning efficient DNN architecture, etc.

You are expected to participate in research activities that focus on developing computer vision techniques for various artificial intelligence applications, including NLP+CV applications, automatic action and activity recognition, learning efficient DNN architecture, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liutyng/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/~liutyng/
https://

Email :
liutyng@iis.sinica.edu.tw
張韻詩
Liu, Jane Win Shih
大數據室內定位系統(室內導航、室內定位、物件追蹤)

Big Data Indoor Positioning System (indoor navigation, indoor position, object tracking)
-開放災害管理資訊系統 (Open Disaster Management Information System)
-主動型智能防災應變系統 (Active emergency Response System)
-大數據室內定位系統 (Big Data Indoor Positioning System) (目前主要在做此項)
-災情截取及紀錄編輯技術 (Disaster Scenario and Record Capture – Authoring Technology)
-建築/環境資訊雲 (Building/Environment Information Cloud)
-使用者經驗設計 (User Experience)
-群眾外包 (Crowdsourcing)
-物聯網 (Internet of Things)
-災情監測資料 (Surveillance Data)

We are developing our own device, Location Beacon (LBeacon), which transfers signal by bluetooth low energy (BLE) and developing the indoor navigation app for Android/ iOS mobile.
If you want to improve your coding skill, you can join us to develop indoor position system together.

please contact me to know the detail.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/janeliu/

實驗室網址(Research Information) :
http://www.openisdm.com
https://

Email :
lynette@iis.sinica.edu.tw
王大為
Wang, Da-Wei
應用資料分析與機器學習來研究醫學資料

Data analytics and machine learning for medical data
應用資料分析與機器學習的方法來研究醫學資料

Our lab studies the medical data using data analytics and machine learning
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/wdw/

實驗室網址(Research Information) :
http://chess.iis.sinica.edu.tw/lab/
https://

Email :
carol@iis.sinica.edu.tw
王建民
Wang, Chien-Min
雲端運算與人智運算

Cloud Computing and Human-Centered Computing
(1) 整合記憶體內資料儲存的雲端計算平台:MapReduce是目前利用雲端計算來處理巨量資料方面,最常用的平行計算模型。然而我們發現有一類雲端應用,雖然非常適合MapReduce模型,但是其執行效能卻非常低落,而且計算規模也有很大的限制。這類應用包括用於基因定序的後綴陣列排序和近來很受重視的演化式計算。我們將對現有的Hadoop平台進行擴充和改進,融合記憶體內資料儲存,提出一個泛用的加強型雲端計算平台,以提升執行效能和規模擴充性。我們也將實作後綴陣列排序以及演化式計算,以驗證我們所提出的架構,對於這兩種應用的執行效能和應用規模有多大的提升。我們相信這樣的雲端計算平台不但對於學術研究有很大的貢獻,還能大幅拓展雲端計算平台的應用。

(2) 人智運算的穿戴運算系統:研究穿戴式電腦及裝置在人智計算中的應用,特別是在社交網路方面的應用。我們計劃中的人智運算系統應具備的三種能力:具有解周遭環境與人們情況的能力,可提供使生活更美好的服務,和透過感官與人類自然地互動。為了實現這三個能力,我們計劃中將從三個研究學科來發展:情境識別,雲服務,以及擴增實境。我們計畫透過先進的系統設計,提供更適合未來人類生活以及具備更友善人機互動的應用程式。藉由研究相關的穿戴式電腦及裝置,開發更佳的人機整合功能,並透過社交網路系統之系統分析,研究並開發穿戴式社交網路系統。我們將著重於友善的使用者界面,以提升使用者經驗為目標,並且提供更適合的情境感知技術與實境服務的增強實境功能。

(1) A MapRedice framework with an In-memory Data Store: MapReduce is a powerful programming model for processing large data sets with a parallel, distributed algorithm on clouds. The Hadoop framework is the most popular implementation of MapReduce and widely adopted in the processing of large datasets. However, our previous experience on suffix array construction with Hadoop shows that it might result in excessive disk usage and access. Therefore, the performance is degraded and the scale of the application is limited. In this project, we aim at efficient and scalable processing of expansive MapReduce (EMR) applications with in-memory data stores. EMR applications, including suffix array construction and evolutionary computation, are a group of applications that have performance and scalability issues with Hadoop. We shall integrate an in-memory data store with Hadoop and propose a MapReduce framework for EMR applications  to enhance their performance and scalability. To validate the benefit of the proposed framework, we shall use suffix array construction and evolutionary computation as our testbed.

(2) Wearable Computing Systems and Applications in Human-Centered Computing: The goal of this project is to investigate the application of wearable computers and devices in Human-Centered Computing, especially those applications on social networks. A human centered computing system should have three abilities: understanding the context of the surrounding area and humans, providing the service that makes the lives better, and interacting with human naturally through perception. To realize these three abilities, we plan to adopt three corresponding research disciplines: context recognition, cloud service, and augmented reality. Wearable computers and social network services will be integrated to build the proposed wearable social network system. The proposed system will provide more convenient and user-friendly human-computer interaction.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/cmwang/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/page/research/ComputerSystem.html?lang=zh
https://

Email :
cmwang@iis.sinica.edu.tw
何建明
Ho, Jan-Ming
基因體資訊演算法設計與深度學習

Development of algorithms and deep learning models for genome informatics
作內容包含程式撰寫、演算法設計、軟體專案開發以及深度學習開發。軟體專案會發表至GitHub與DOCKER上。
基因體序列是生命的密碼書,解密是充滿挑戰的過程。1) 第一個挑戰是基因體組裝(genome assembly),基因體序列是長度可達百億的字串,而目前的定序技術,如: short reads (Illumina), linked reads (10X genomics), long reads (PacBio/Nanopore) 只能產生平均長度數百到數萬的序列片段,且須考量錯誤率與重複區段,加上群體基因體分析的需求,如: 精準醫療(precision medicine)及各項生物基因體多樣性計畫,亟須快速且高品質的新式基因體組裝方法。2) 同時,隨著大量的基因體序列初稿產生,品質分析與驗證變得更重要,尤其是與人類醫學相關的研究與應用。3) 隨著眾多基因體序列組裝完成後,如何運用深度學習達成快速正確的基因與功能註解(如: Google DeepVariant)是現今重要的課題。


The jobs include programming, algorithm design, software project development, and development of deep learning models. Our software is open-source and will publish on both GitHub and DOCKER.
DNA sequences are the codebook of living organisms. The decoding process is full of challenges. 1) The first issue is (de novo) genome assembly. Genomic sequences are strings with lengths up to hundreds of billions. Existing DNA sequencing technologies, such as short reads (Illumina), linked reads (10X genomics), long reads (PacBio/Nanopore), can only generate sequences with average lengths from hundreds to tens of thousands (with error rates from 0.1% to ~10%). Besides, genomes have repetitive regions, e.g., human genome is ~50% repetitive. It is important to develop fast and high-quality genome assemblers for solving hybrid assembly issues in population genomics, precision medicine, and large-scale bio-diversity genome sequencing projects today. 2) Meanwhile, as many genome assemblies are available as drafts, development of fast methods for assembly quality assessment becomes an important task, especially for bio-medical applications. 3) After the high-quality assemblies of genomes, the next and hot issue is how to design deep learning models for fast and accurate genome annotations (e.g, Google DeepVariant).
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hoho/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/hoho/
https://

Email :
yslin@iis.sinica.edu.tw
陳伶志
Chen, Ling-Jyh
以空氣盒子系統為基礎的環境感測資料分析研究

Advanced Data Analysis using Fine-grained and Spatio-temporal AirBox Data
在過去幾年中,我們已建立一個跨國性的大型細懸浮微粒(PM2.5)網路感測系統,擁有每天散佈在 41 個國家,超過 7,000 個 PM2.5 微型感測站,每個感測站以每五分鐘一筆的頻率上傳溫濕度與 PM2.5 的即時感測資料,目前已成為全球數一數二的 PM2.5 微型感測資料中心。

在這個專案中,我們希望透過兼具時間與空間高解析度的 PM2.5 感測資料,進行兼具學理、創意與應用價值的資料混搭與進階分析。內容可以是(但並不局限於)即時污染源的溯源、微型感測器的資料品質確保分析、中尺度的 PM2.5 擴散模式推估、中尺度的 PM2.5 濃度預報模式建構、PM2.5 衍生的社經資源成本推估、PM2.5 濃度與即時生理訊號的整合分析,甚或是其他更具創新與挑戰的研究議題。

我們歡迎對本項研究主題有興趣、有想法,並且願意接受挑戰的優秀人才加入我們的團隊,一同學習、努力、並對當前重大的環境議題做出貢獻。

In the last year, we have successfully built a large-scale PM2.5 sensing system with more than 7,000 participating devices over more than 41 countries. Each device conducts environmental sensing, and uploads its temperature, humidity, and fine particulate matter (PM2.5) sensing results to our server every five minutes. As a result, our system has become one of the most well-known data hub of PM2.5 sensing systems world-wide.

In this summer project, we wish to utilize the fine-grained and spatio-temporal data of our system, and conduct advanced data analysis with both research and practical values. The topics include (but are not limited to) PM2.5 emission source tracking, fine-grained PM2.5 dispersion modeling, fine-grained PM2.5 concentration forecasting, social economic impacts of PM2.5 pollution estimation, and the correlation between PM2.5 concentration and physiological signals investigation. We also welcome innovative and even more challenging topics on the related problems.

We are looking for self-motivated, creative, and open minded people to join us. We will learn together, work together, enjoy the process together, and produce good results at the end together. For further questions, please feel free to contact us.
PI個人首頁(PI's Information) :
https://sites.google.com/site/cclljj/

實驗室網址(Research Information) :
https://sites.google.com/site/cclljj/NRL
https://pm25.lass-net.org/

Email :
cclljj@iis.sinica.edu.tw
許聞廉
Hsu, Wen-Lian
「統計準則式模型」在自然語言理解之各項應用

Applications of the statistical principle-based approach on natural language understanding
自然語言理解是人工智慧一直以來沒有處理得很好的問題。近年來自然語言處理的研究方法主要分為規則式(RB)以及機器學習(ML,包括統計及neural)
兩種方法。然而,RB需仰賴大量人力建構及維護規則,ML不擅於處理「推論」。這些問題導致目前自然語言處理及理解的研究面臨不少的困難。我們計畫將針對上述演算法的限制,開發統計準則式方法(Statistical Principle-based Approach, SPBA)應用於當前的自然語言理解的研究,包括中文及生物醫學領域之專有名詞辨識(Named Entity Recognition, NER)及關聯擷取(Relation Extraction, RE)、中文問答系統(Question Answering, QA)對談系統(Dialogue system)、機器閱讀(Machine reading)、小學數學問題解析(Analysis of mathematical word problems in primary school)、語音理解(Chinese speech understanding)、文件主題偵測(Topic detection)、意見分析(Opinion mining)等。


Natural language understanding has remained as a tough problem in artificial intelligence (AI). Recent approaches for natural language processing can be roughly categorized into two types, namely, rule-based (RB) methods and statistical machine learning (ML) methods. The former relies on a large number of hand-crafted rules, and the latter is not good at inference. These problems hinders the research on natural language understanding. We shall focus on the application of the statistical principle-based approach (SPBA), which was developed specifically to avoid the common pitfalls of RB and ML methods. Possible applications include named entity recognition, question answering, machine reading, analysis of mathematical word problems in primary school, Chinese input method, text topic detection, opinion mining, etc.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/hsu/

實驗室網址(Research Information) :
http://iasl.iis.sinica.edu.tw/index.html
https://

Email :
hsu@iis.sinica.edu.tw
葉彌妍
Yeh, Mi-Yen
深度學習方法於資料探勘的應用

Deep Learning Methods for Data Mining Applications
本實習將探究如何用深度模型於資料探勘應用,並探討如何提升相關演算法的效能與品質。

We will study how to apply deep learning methods on data mining applications, as well as the performance and quality of the designed algorithms.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/miyen/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/miyen/
https://

Email :
miyen@iis.sinica.edu.tw
馬偉雲
Ma, Wei-Yun
具推薦功能的對話機器人/不限主題的閒聊機器人/自動知識學習系統/事實推論或事件預測系統

Chatbot with Recommendation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
今年的實習我們邀請同學透過深度學習(Deep Learning)進行以下專案的其中之一,並鼓勵暑期實習生在實習期間能大膽地進行技術或應用的創新。

1. 具推薦功能的對話機器人:透過深度學習當中的增強式學習(Reinforcement Learning),打造一個具推薦功能的機器人,跟一般的FAQ或是單純搜尋任務(如訂票)不同,機器人必須根據對話了解使用者的需求,綜合分析產品或服務,然後以具有邏輯與說服力的說明,來推薦其中的產品或服務。過去暑期intern在此項目有豐碩成果,所開發的美妝保養顧問-PerMu機器人拿到2017 痞克邦黑客松最佳產品力獎。另外,我們目前在跟LINE Taiwan進行新聞聊天機器人的合作,將LINE Today的新聞推薦給使用者,並具備閒聊新聞的功能。

2. 不限主題的閒聊機器人:即所謂 Chitchat Chatbot,也就是沒有特定目的的聊天. 目前這類型的bot大多數的作法是利用深度學習當中的seq-to-seq model來建構,但是,這樣的作法通常無法產生有意義或是較為深入的回應,多數會流於插瞌打渾或是賣萌。其中的關鍵,在於bot缺少了對於聊天主題相關的基本常識,就像是user要跟bot討論劉德華,bot應該對劉德華的各個fact(身份,作品...)有足夠認識,回的response才會豐富有意義,不然巧婦難為無米之炊,沒有知識就不容易產生有意義的回應,我們希望將grounded knowledge以及更豐富的語義訊息encode在model之中。

3. 自動知識學習系統:我們知道新的知識會夜以繼日的不斷產生,一個具有AI能力的系統最重要的功能之一就是能夠從大量的資料當中,分析資料,加以理解,組織成結構化知識。我們實驗室過去已經開發了人類的知識網(E-HowNet),打下堅實基礎,此專案的目標是進一步加以擴張,利用深度學習技術將關鍵的關係三元組合從閱讀的文章中自動抽取出來,如 (”哈登” ,MemberOf,”火箭隊”) 或是 (“麥特載蒙”,PlayerOf,”心靈捕手”)等等。

4. 事實推論或事件預測系統:對於一個新事物,人們往往會根據基本常識、已知的事實、經驗的法則等等進行新事物的推測,包含事實或是事件的推論,例如以下的事實推論:已知A說中文,A又是B的哥哥,那麼很高的機率B也會說中文。又例如以下的事件推論:“買麵包”後會有很高的機率會在近期“吃麵包”。在一個龐大的文本或是複雜的知識圖譜當中,推論的關係往往數量龐大,有時甚至複雜到超越人力所能規範與理解,我們希望藉由深度學習技術能自動化的在文本或是知識圖譜當中進行新事物的推測。


Chatbot with Recommendation/Chitchat Chatbot/Automatic Knowledge Acquisition/Fact Reasoning or Event Prediction
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/ma/index_zh.html

實驗室網址(Research Information) :
http://ckip.iis.sinica.edu.tw:8080/
https://

Email :
ma@iis.sinica.edu.tw
莊庭瑞
Chuang, Tyng-Ruey
1. 創新以及可永續的研究資料管理和協作; 2. 敏感資料群組共享模式之研究

1. Innovative and Sustainable Research Data Management and Collaboration; 2. Communal Sharing of Sensitive Data
1. 創新以及可永續的研究資料管理和協作

以資料為基礎、以資料為驅動的協同研究合作,已成為常態。研究資料可說已嵌入研究工作的各環節,而且其作用角色,往往超越單一研究計畫、團隊、或機構,而是與整體科學的永續發展,息息相關。我們將從社群 (community)、技術 (technology)、協作 (collaboration)、以及研究 (research) 四面向, 協力發展台灣本地在研究資料管理的實踐社群。此實踐社群將以我們已開發的「研究資料寄存所」( https://data.depositar.io/ ) 為實踐的場域之一。

我們預計就開放科學 (Open Science) 議題進行案例研究,並將案例分析及文獻整理成果,呈現在一個以研究資料管理為主題的網站上。進行研究資料管理指南的在地化, 並探討不同學科領域在研究資料管理的現況。就研究資料管理議題,進行需求分析。擴增「研究資料寄存所」功能與改善使用體驗。結合「可重現的研究」以及「研究資料管理」兩個議題 ,進行探索研究。以研究發現的原始資料的寄存、其計算分析程式的寄存、以及線上運算資源的使用這三面向的結合,達到便捷的研究重現的目的。就資料論文 (data paper)、動態論文 (live paper)、以及線上資料服務 (online data service) 與研究資料寄存的相互依存與增進關係,進行研究。

2. 敏感資料群組共享模式之研究

巨量資料科技的發展為當代人口、社會行為以及公共衛生等仰賴對巨量資料進行二次利用之系統性研究帶來變革。然而,這些技術性的革命也同時帶來諸多適法性的爭議,例如:資料共享的正當法律程序、隱私權的維護以及個人資料的保護等。除此之外,如何在鼓勵資料共享的同時,透過對資訊流動的妥適管制確保資訊的私隱性,亦屬巨量資料科技下另一重要的倫理道德挑戰。而這些巨量資料科技下的適法性與倫理道德考量,或可透過促進公眾參與之方式,進而將如何衡平資料二次使用所帶來之風險與易用性,以制度化之模式達到對隱私權的保障。

本研究計畫預計提出一可行性架構,並將隱私權納入使該架構以系統化之方式符合法律規範與要求,包括正當法律程序、資料掌控之透明化、社會參與以及負責任的自我管理等。為了改良現有的隱私權框架,本計畫將進一步探討三個研究主題,包括:集結個人資料之規範、基準與管制;共有資料分享之具可保密性以及具可審計性;及以參與者為中心之資料分享管理架構。最終,本計畫期能建構適用於各種領域,並以參與者為中心且以社會為基準,亦能符應社會歸責之資料二次使用隱私框架。

1. Innovative and Sustainable Research Data Management and Collaboration

We plan to work on the community, technology, collaboration, and research aspects of research data management. We aim to help develop a community of practice for research data management in Taiwan. A research data repository we have developed ( https://data.depositar.io/ ) can function as a starting place where the communities practice research data management. The expected outcome of this project includes: cultivating research data management talents, participating in international collaborative projects for open data software systems, elevating the scale and capacity  of the research data management community in Taiwan, and participating in and contributing to the global research community.

2. Communal Sharing of Sensitive Data

The recent information technology revolution has brought new challenges in the legal arena for the due process data sharing, the right to privacy, and personal data protection. How to appropriately manage the flows of information and to encourage data sharing yet keep shared information private remains a challenge. These concerns have moved beyond the traditional privacy frameworks that focus merely on anonymity and de-identification. Instead, it relies on the establishment of a more socially accountable and communicative framework that not only can balance the risk and usability of secondary data usage, but also can institutionalize that demand by improving public participation.

By critically reviewing existing data access models, techniques and practices, this project aims at proposing a doable framework by designing privacy into a comprehensive system that can accommodate the legitimate requirements of community participation, transparent data control, and responsible self-management in the big data era. Specifically, this project will survey and develop the governing principles of a communal approach to personal data management where members of a community pool sensitive information about themselves for mutual benefits and public good.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/~trc/

實驗室網址(Research Information) :
https://data.depositar.io/
https://roadkill.tw/

Email :
trc@iis.sinica.edu.tw
廖弘源
Liao, Mark
智慧城市

Smart City
智慧城市之智慧交通車流與智慧商業人流分析嵌入式系統


AI traffic flow & AI crowd analysis embedding system for smart city
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liao/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/liao/
https://

Email :
kinyiu@iis.sinica.edu.tw
劉進興
Liu, Jing-Sin
自駕車

Autonomous Driving Vehicles Introduction
研究車道變換的機器學習

Machine learning approach to lane change maneuver.
PI個人首頁(PI's Information) :
http://www.iis.sinica.edu.tw/pages/liu/

實驗室網址(Research Information) :
http://www.iis.sinica.edu.tw/pages/liu/
https://

Email :
liu@iis.sinica.edu.tw