2016 研究主題清單 (2016 Research List)

主持人(PI)研究主題(Research Topic)研究介紹(Introduction)其他資訊(Other Information)
Chen, Sheng-Wei (Kuan-Ta)

Business Data Analytics and Visualization
詳見 http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html

please see  http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Wang, Hsin-Min

Speech, Language and Music Processing

Our research interests include speech processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. Our research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special emphasis on speech and music.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Liu, Jane Win Shih

Disaster Resiliency through Big Open Data and Smart Things (DRBoaST)
Chinese Abstract
巨量開放資料與互聯網防備災架構計畫(DRBoast)將基於中研院永續科 學研究計畫 OpenISDM (2011.1~2014.12) 研究成果之上,持續致力發展防 救災之相關學術與實務研究。與 OpenISDM 計畫比較,DRBoaST 計畫將會 更致力為前瞻之防救災議題與資訊技術,並發展雛型系統,使計畫成果可 以具體運用在實際災害管理與決策中,協助官方與民間單位,在面對緊急 災害時,具備更完善之抗災能力與更有效率之應變機制。
DRBoast 計畫將由七個子計畫組成,分別為 SIDiRC、RTEIC、DiSRC、 ADiPLE、CSAI、DRCom 與 TEPP。其中 DiSRC、ADiPLE、DRCom 和 TEPP 四個子計畫,是基於我們執行 OpenISDM 計畫之觀察與經驗所得,提出之 全新計畫,其目的在發展關鍵防救災技術,以滿足目前救災實務上之迫切 需求。子計畫 SIDiRC、RTEIC 和 CSAI 之目的在擴展 OpenISDM 計畫之成 果,進一步探討相關基礎學理,研發關鍵資訊技術,統合運用計畫成果, 以更完善相關雛形系統,包括即時地震資訊資料庫,社區防災資訊管理系 統,群眾外包災害資訊蒐集系統等,使這些系統能被實際使用。
DRBoast 計畫之預期成果包括以社區為基礎之防救災資訊雲、虛擬即時 地震資訊雲、以群眾外包為基礎之災情訊息蒐集系統、主動型智慧居家防 災系統、災害情境截取與紀錄編輯技術、以 NDN-SDN 為基礎之抗災網路 元件等雛型系統,同時,為能明確體現上述系統之優勢與應用層面,計畫 成果也包含相關之基礎學理與關鍵技術之發表。
為了使 DRBoaST 研究成果可以具體落實,DRBoaST 計畫將持續與 NCDR 保持在去三年執行 OpenISDM 計畫之密切合作關係,在台灣的其他合作夥 伴將包括工業研究院 ITRI、中華電信研究所與其他技術轉移的早期目標, 以順利推動技術轉移之相關工作,同時,為增加 DRBoaST 計畫之國際影響 力,我們也將與致力於巨量資料處理,物聯網設計、雲端與人工計算運用 於災害管理的國際研究機構及計畫合作。

Chinese Keywords
開放災害管理資訊系統、巨量資料、即時地震資訊雲、社區防救災資訊雲、 抗災通訊架構、主動智慧防災系統、災情截取及紀錄編輯、互聯網、 群眾 外包、隱私保護

English Abstract
The proposed multi-disciplinary, applied research project, titled “Disaster Resiliency through Big Open Data and Smart Things,”(DRBoaST, 巨量開放資料與互聯網防備災架構) will build on the foundation established by project OpenISDM, which will end in December 2014. Like OpenISDM, DRBoaST project aims to produce results that will advance the sciences and technologies in disaster risk reduction and related areas. Compared with OpenISDM, DRBoaST will devote even more effort on developing its novel, innovative ideas and proof-of-concept prototypes into deployable technologies to make our emergency management decision support infrastructures more disaster resilient, enhance our disaster preparedness, and advance the state of the practice of disaster response.
DRBoaST project will have the seven subprojects. They are SIDiRC (Strategies and Information for Disaster Resilient Communities), RTEIC (Real-Time Earthquake Information Cloud for Disaster Preparedness and Response), DiSRC (Disaster Scenario and Record Capture), ADiPLE (Active Disaster Prepared Smart Living Environment) CSAI (Crowdsourcing Situation Awareness Information), DRCom (Disaster Resilient Communication), and TEPP (Trustworthy Emergency Privacy Protection).
The proposed work of Subprojects DiSRC, ADiPLE, DRCom and TEPP is new. The innovative ideas and motivations behind the work sprung from observations on critical needs and technology gaps gained through our efforts within OpenISDM project. The proposed work will offer the project excellent opportunities to make significant technological contributions and strong impacts on disaster preparedness and response practices. Subprojects SIDiRC, RTEIC, and CSAI will extend and enhance some of the proof-of-concept prototypes built within OpenISDM project to make them deployable and ready for general use. The prototypes include virtual repositories of real-time data on earthquakes and earth behavior and community-specific data and information on susceptibility, and tools needed for crowdsourcing human sensor data to enhance physical sensor coverage. New thrusts and emphases within DRBoaST project will be on the synergistic use of the complementary contents and capabilities of these solutions and the assessment of their effectiveness.
Anticipated accomplishments and deliverables include a community-specific disaster information cloud; a virtual real-time earthquake information cloud; platform, APPs and tools for crowdsourcing disaster surveillance data; active disaster response system for smart living environments; prototype components of disaster scenario record capture and authoring system; design and prototype components of NDN-SDN-based network. Our accomplishments will also include technical and theoretical results that underpin these prototypes or enable us to bound the merits and limitations of our solutions.
The DRBoaST project plans to collaborate closely with NCDR, as OpenISDM has been in the past three year. Collaborators in Taiwan will also include ITRI and Chung-Hua Telecom Research Lab and other early targets of technology transition. Many initiatives abroad are exploring the use of big data, Internet of Things, cloud computing and human computing for disaster management. DRBoaST project aims to collaborate with them.

Open DMIS, Disaster Preparedness and Response, Disaster Resilient Communication, Real-time Earthquake Information, Community-Specific Decision Support, Crowdsourcing, Intelligent Guards against Disasters, Authoring Technology, Privacy Protection
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Lin, Chung-Yen

Bio Big-Data analysis for clinical samples of colorectal cancer

This study will focus on the relationship among colorectal cells and the metagenome around the normal/cancer tissue. We will perform the bioinformatic skills to deal with these biological big data (RNA-seq) for deciphering the mechanism of carcinogenesis and pathogenesis about colorectal cancer samples.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Tsai, Huai-Kuang


PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chen, Ling-Jyh

Research of networked sensing systems

This project will focus on air quality monitoring using mobile opportunistic sensing techniques. In addition to conducting real-world experiments, we will perform extensive data analysis to construct the spatio-temporal distribution model of air pollution in the urban area.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Hsu, Wen-Lian

Semantic Template Learning for Text Categorization

With prosperity of internet, we are often overwhelmed by the amount of information that can be obtained online. Although numerous keyword-based document retrieval systems have been developed, we still experience difficulties in accurate classification of knowledge from a wide range of search results. In light of this rationale, we proposed a flexible semantic template-based approach (STA) for text categorization that simulates such process in human perception. STA is a highly automated process that integrates various sources of knowledge to generate discriminative linguistic patterns for representing essential information in the text. These patterns, or templates, can be acknowledged as the fundamental knowledge for each topic, and are notably comprehensible for humans. Our experiments demonstrate that STA can achieve better performance than other well-known methods of text categorization on different tasks. We plan adopt STA to other applications.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chen, Wen-Tsuen

Intelligent Sensing and Applications ; Mobile Computing ; High-speed Communications Networks ; Parallel Algorithms and Systems ; Software Engineering
請參考個人網頁 http://www.iis.sinica.edu.tw/pages/chenwt/descriptions_zh.html

請參考個人網頁 http://www.iis.sinica.edu.tw/pages/chenwt/descriptions_zh.html
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
chenwt@iis.sinica.edu.tw; christy@iis.sinica.edu.tw
Lu, Chi-Jen

Machine learning and game theory

Many situations in daily life require us to make repeated decisions before knowing the resulting outcomes and paying the corresponding prices. This motivates the study of the so-called online decision problem, in which one must iteratively choose an action and then receive some corresponding loss for a number of rounds. It is a fundamental problem in the area of machine learning, and it has surprising applications in several other areas as well. We would like to design better online algorithms which can learn from the past and make better decisions as time goes by. We would also like to find more applications in other areas, especially in the area of game theory.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chen, Chu-song

Deep learning for image classification and retrieval

Developing deep learning techniques, and apply them to object recognition, clothing image retrieval, and photo aesthetic quality estimation
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Su, Keh-Yih

Designing and Building a Domain Specific Machine Reading System

we have designed and implemented a prototype for solving some elementary school Math Word Problems (MWP). Since a Machine Reading (MR) for solving MWP has practical applications (e.g., Math Computer Tutor and Helper for Math in Daily Life, etc.), we will first improve our prototype to handle more problem types (such as Time-Expression, Fraction, etc.) via enhancing the capability of Logic Form Converter (LFC) and Inference Engine (IE), adopting a new statistical model, and automatically learning the patterns/parameters from the training corpus.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chang, Yuan-Hao

Embedded Systems and Emerging Memory Technologies
My research interests lie in the area of computer systems with the emphasis on storage systems and operating systems. My research focus is on theories, algorithms, architectures, and tools for building computer systems, especially embedded systems.

One of my recent research focuses is on storage system designs for embedded systems. Embedded systems, especially battery-backed consumer electronics such as smartphones, usually adopt flash memory as their storage media due to flash-memory’s nice features such as shock resistance and low energy consumption. Because of the cost-reduction issue and the advance of manufacturing technologies, future flash-based storage system designs face critical challenges on performance and reliability problems. We exploited the file-system designs in the operating systems and the management firmware in the storage devices. In the operating systems, we designed new cache systems for general-purpose file systems (e.g., FAT32, NTFS, and ext4) with new non-volatile storage media to prevent from data losses upon power losses and to improve the file-system’s efficiency over flash storage devices; we developed new designs for native flash file systems to improve the performance and reliability of the data stored on flash-based storage devices. In the management firmware, we developed new management schemes to solve the problems inhibited in the next-generation flash-memory chips of fast-growing capacity. Now we are exploring the possibility to integrate data compression and de-duplication technologies to further reduce the energy consumption of embedded storage systems to realize the concept of “green storage” in embedded systems.

Another recent research focus is on the operating system designs for embedded systems. Due to fast-growing capability of hardware resources and the fast-growing complexity of operating systems and mobile applications, the development of embedded systems faces critical challenges in energy consumption and system performance. To reduce the energy consumption, we developed new hibernation techniques to shut down the operating system when the battery is running low, and to resume the operating system efficiently by stealing the idle time of storage devices. We developed new technologies to adjust the CPU frequencies and the device’s power modes for energy saving according to the system runtime information. Now we are investigating the possibility to develop new fast hibernation technologies with finer hibernation granularity so as to minimize the hibernation overhead and to enable fast system resuming and migration on mobile embedded systems, and exploring the interplay between hardware resources for the developing of advanced resource scheduling technologies to further reduce the energy consumption of embedded systems.

My research interests lie in the area of computer systems with the emphasis on storage systems and operating systems. My research focus is on theories, algorithms, architectures, and tools for building computer systems, especially embedded systems.

One of my recent research focuses is on storage system designs for embedded systems. Embedded systems, especially battery-backed consumer electronics such as smartphones, usually adopt flash memory as their storage media due to flash-memory’s nice features such as shock resistance and low energy consumption. Because of the cost-reduction issue and the advance of manufacturing technologies, future flash-based storage system designs face critical challenges on performance and reliability problems. We exploited the file-system designs in the operating systems and the management firmware in the storage devices. In the operating systems, we designed new cache systems for general-purpose file systems (e.g., FAT32, NTFS, and ext4) with new non-volatile storage media to prevent from data losses upon power losses and to improve the file-system’s efficiency over flash storage devices; we developed new designs for native flash file systems to improve the performance and reliability of the data stored on flash-based storage devices. In the management firmware, we developed new management schemes to solve the problems inhibited in the next-generation flash-memory chips of fast-growing capacity. Now we are exploring the possibility to integrate data compression and de-duplication technologies to further reduce the energy consumption of embedded storage systems to realize the concept of “green storage” in embedded systems.

Another recent research focus is on the operating system designs for embedded systems. Due to fast-growing capability of hardware resources and the fast-growing complexity of operating systems and mobile applications, the development of embedded systems faces critical challenges in energy consumption and system performance. To reduce the energy consumption, we developed new hibernation techniques to shut down the operating system when the battery is running low, and to resume the operating system efficiently by stealing the idle time of storage devices. We developed new technologies to adjust the CPU frequencies and the device’s power modes for energy saving according to the system runtime information. Now we are investigating the possibility to develop new fast hibernation technologies with finer hibernation granularity so as to minimize the hibernation overhead and to enable fast system resuming and migration on mobile embedded systems, and exploring the interplay between hardware resources for the developing of advanced resource scheduling technologies to further reduce the energy consumption of embedded systems.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chen, Meng Chang

1. 公開資料之收集、監控、與管理 – 為了促進公開資料的應用與研究之進行,並方便研究者與相關開發者進行公開資料之收集與使用,本計畫以網站爬蟲技術、儀錶板前端網站開發技術、巨量資料運算技術為基礎開發一個ODD開放資料處理儀錶板網站,讓使用者可以方便的針對想收集的公開資料進行存取的動作,並持續監控資料收集是否有中斷或異常的狀況,最後透過視覺化的資料呈現技術,提供有效的人員控管與理解介面。
2. 人類活動辨識輔助高解析環境品質資訊檢索 – 為了促進永續科學的發展與提升熱浪脆弱度評估的效能,利用更高解析度的環境品質資訊檢索來讓一般民眾與相關研究者了解更精細的空氣汙染物分布狀況與影響是迫切且極重要的研究議題。大部分的氣象資訊系統與空氣污染物暴露系統的檢索方式都是大範圍或以城市為單位,但往往因為城市中各小區域發展特性的不同,空氣汙染物暴露量是會有很大差異的;以及環境調查之問卷方法常常需要針對使用者的特定情境來發問與調查,這些需求是單純的大範圍資訊提供與經緯度定位方式所無法滿足的。因此,本計畫深入探討人類活動辨識的方法,以及空氣品質變動、區域暴露、與個人暴露的環境品質知識探勘方式,並研發H-EQIR暴露檢索系統,來達到使用者軌跡資料與環境品質資料的收集、分析、探勘、與應用之目的。
3. 空氣品質測站位置推薦 –為了強化空氣品質之監測與空氣品質測站之運用效益,本計畫整合多元政府開放資料,包含空氣品質資訊、氣象資訊、道路交通資訊,再利用機器學習方法,預測各地理空間空氣品質之PM2.5和PM10數值。本研究運用之資料分析特徵包含空氣品質資訊:{PM2.5, PM10, O3, PSI, WBGT}、氣象資訊:{氣溫,雨量,濕度,氣壓,風向,風力,紫外線}、道路交通資訊:{道路長度, 高速公路長度, 十字路口數量},並以台北市為研究實驗對象,將台北市及部分新北市之地理空間切割為30x38個格子,用以計算更精細的地理空間資訊,並整合相關的研究技術,包含:OpenStreetMap開放地圖, Leaflet互動地圖函式庫, Turf.js地理空間處理函式庫, Scikit-learn機器學習函式庫, GeoJSON地理資訊格式。預計分析出較高維度的空氣品質狀況,在同一地理空間中,從17個格點提升到1140個格點之高精細度資訊檢索內容,進而可提出政策建議,發布高精細度的即時空氣品質監控資訊。以及,針對沒有空氣品質測站的格點,分析出哪些地區的空氣品質數值較不易預測,進而可提出政策建議,提出新建空氣品質測站之位置建議。

In Taiwan, the Environmental Protection Administration, Executive Yuan R.O.C. (Taiwan) announces the air quality determined by the Taiwan Air Quality Monitoring Network. This network monitors the levels of PM10, SO2, CO, O3, and NO2 in the air. Air quality monitoring data constitute the basis of air quality protection and air pollution control. An effective control on air quality would depend on a long-term operating and well-maintained monitoring system.

The innovations of this project design are as follows. (1) This project will develop human activity recognition to support advanced situation constraint of questionnaire subjects. Because most (even location-based) questionnaire survey systems do not support the advanced constraint setting functionality. (2) This project will develop rich air quality information provision and high-resolution retrieval service by integrating the thematic project. Because most weather report systems do not support the high quality environmental information retrieval services.

The phenomenon of information asymmetry often obstructs the development of society. The situation is also critical in sustainable development. Residents should be kept abreast of air pollution exposure for proper protection and damage prevention. High-resolution environmental quality retrieval can provide the crucial air quality variation, regional exposure, and personal exposure information to the general public. The project results can enhance the basic weather information to provide more plentiful and fine-grained air quality information. The systematic trans-disciplinary framework and human activity recognition mechanism established can certainly be applied to the assessment of heat-wave vulnerability in other areas.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Lee, Der- Tsai

The Multi-Service Center Problem
傳統的設施建置問題在建置多組設施時通常假設所有設施提供相同服務。但對大規模的整合服務而言,每組設施都提供所有種類的服務的代價太高,只能提供數量有 限的服務。因此建置這些設施時,必須同時考慮每位使用者對各種服務的不同需求度和向對應設施取得服務的距離,以計算最佳的建置位置,這類問題統稱 為多重服務設施建置問題。
   我們的研究目標是其中的多重服務中心(p-service center)問題,考慮在網路上建置p組設施,每組設施提供不同但唯一的服務。每位使用者的服務距離為其對每項服務的需求度乘以距離的總和,而任一設施 建置方案的成本為所有使用者之服務距離的最大值,此問題的目標就是最小成本的設施建置方案。

Traditionally, facility location problems consider only identical facilities while determining the optimal location of multiple facilities. However, while providing large-scale and integrated services, each single facility may only be able to provide limited kinds of services, since providing all of them is too expensive. Therefore, the optimal placement of these facilities has to consider simultaneously for every customer its requirement and distance to each kind of service. This kind of facility location problems are called the multi-service location problems.
   We are interested in one of this kind, the p-service center problem. There are p facilities to be placed in the network, each of which provides one distinct kind of service. Each client in the network has its own requirement for each kind of service, and how well the client is served is measured by its p-service distance, the total transportation cost from the p facilities to the client. The objective of the p-service center problem is to find a placement of the p facilities that minimizes the maximum value among the p-service distances of all clients.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Wu, Jan-Jan

Cloud/cluster resource management for improving user quality of experience in wearable virtual reality
虛擬實境(Virtual Reality,VR)為了讓使用者能夠有身歷其境的感覺,必須配合使用者的動作(擺頭、移動)在短時間內進行場景及物件繪圖運算,以提供高解析度畫面來提高使用者滿意度。由於所需要進行的運算量很大,需要高階硬體(顯示卡)配合。以Oculus Rift為例,官方建議硬體需備有8G以上記憶體,及GTX 970等級顯示卡。在較低階的機器上運行可能造成延遲導致使用者體驗不佳。
其中一種解決方式是利用雲端運算或伺服器叢集協助虛擬實境(Cloud/Cluster-assisted VR)。Cloud/Cluster-assisted VR的概念是利用資料中心內伺服器群的資源,來協助處理虛擬實境的運算量。這方面主要議題除了網路傳輸速度之外,如何在cloud/cluster中調配資源, 為各個不同VR應用提供足夠的運算資源,以保證產生預期的解析度及 framerate來維持使用者體驗,也是很重要的議題。
本計畫預計建構Cloud/Cluster-assisted VR 系統框架,目標是利用雲端資源來提供未達建議硬體設備的機器運行虛擬實境,並盡可能提升使用者滿意度。除此之外,我們也將運用機器學習等技術,透過收集並分析使用者頭部動作資訊,為每位使用者打造專屬設定,減低(因長時間)使用虛擬實境導致的不適感。

Virtual Reality replicates an environment that simulates a physical presence in places in the real world or an imagined world, allowing the user to interact in that world. However, to provide satisfactory user experience, high-end equipment with high computing capacity are required. Take Oculus rift for example, the recommended hardware specs are 8G RAM and GTX 970 graphic card or higher. The user experience may degrade if using other machines with specs lower than recommended.
Cloud or Server cluster-assisted VR is one of the solutions for improving user experience on machines that do not meet the requirement. The idea of cloud/cluster-assisted VR is to move the computation from local device to cloud servers. In addition to the quality of network connection, there are other important issues, such as how to allocate sufficient computing resources to different VR applications, such that the resolution and framerate can meet some threshold in order to maintain user experience.
This project aims to build an infrastructure for cloud/cluster-assisted VR.  In addition to scaling resolution and framerate, we also plan to apply techniques such as machine learning on the sensing data collected from VR headset in order to build personalized configuration for individual users to reduce uncomfortable feelings such as dizziness when using wearable VR.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Wang, Bow-Yaw
Spark 雲端程式實作

Spark Program Implementation

We will implement several parallel algorithms on the Spark platform.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Lu, Chun-Shien

Secure (Multimedia) Big Data Gathering, Transmission, and Reconstruction--From the Viewpoint of Compressive Sensing
壓縮感測(Compressive Sensing/Sampling, abbreviated as CS)是近年來於資訊理論與訊號處理等多方面相當熱門的研究議題,是一種新型態的取樣理論。其主要是針對稀疏訊號(sparse signal)可以突破傳統取樣頻率至少必需達到Nyquist rate的限制,CS僅需擷取少量的samples或measurements,在待還原訊號滿足sparsity的條件下,即可利用一些最佳化的方法來還原原始訊號至相當程度。CS的特色就是在取樣的同時兼具的壓縮的效果,因此也適用於各種電力或運算能力有限的裝置,CS已廣泛應用於signal processing、networking、communications、machine learning、medical imaging、computational biology等等領域。
針對CS的特性與我們過去幾年在CS的研究成果,我們著眼於巨量資料的趨勢以及其在安全又有效率的資料收集、傳輸與重建,提出這一項三年期計畫。在計畫第一年,我們將研究具安全性的巨量(多媒體)資料取樣(sampling),而不必要將raw data事先儲存起來,採用的方法是本實驗室多年來已投入許多人力並有些成果的壓縮感測。透過這些取樣資料與重建演算法,能在server/decoder端將原始資料重建出來,這對大數據型態的資料在運算與儲存方面顯得相當有效率。然而為了安全與隱私考量,本計畫將以設計secure sensing matrix為第一年目標。在第二年計畫部份,考量雲端方運算與儲存能力,我們將所收集的samples傳輸給雲端處理,但著眼於隱私保護與安全性,需要好的加密方式來保護這些資料,並找到適合的方法,使得這些受加密的資料也能於雲端做運算。因此本計畫第二年目標在於研究從samples/measurements做安全的CS signal recovery。第三年的計劃,則是關於如何處理large-scale (多媒體)資料的重建,而且是關於CS取樣資料的大規模還原。關鍵是要如何在短時間內重建回原來的訊號,並且還需要維持住高品質,以及盡可能地降低記憶體的消耗量,此議題在壓縮感測中是十分具挑戰性的。

Compressive Sensing/Sampling (abbreviated as CS), a kind of new paradigm for simultaneous sampling and compression, has attracted considerable attention recently in diverse fields, including signal processing and information theory. Without being restricted to the constraint of Nyquist rate, compressive sensing can, in theory, perfectly reconstruct the original signal under the constraints that if only a few samples or measurements extracted from an original signal are available and the signal is sparse in the time/space domain or transform (such as DCT, wavelet, and so on) domain. The unique characteristic of CS is that sampling and compression can be simultaneously achieved such that CS is suitably used for resource-limited digital devices and sensors. Based on the assumption of signals with sparsity, CS has been broadly applied to the fields of signal processing、networking、communications、machine learning、medical imaging、computational biology、and so on.
Bases on the unique characteristics and our research results in compressive sensing, we present a three-year project to achieve secure and efficient big data gathering, transmission, and recovery. In the first year, we will study secure sampling for big (multimedia) data based on CS without needing to store raw data. According to the sensed samples and sparse signal recovery algorithms, the original data can be reconstructed at server/decoder. This is especially efficient for processing and store of big data. For preserving privacy and security, we plan to study secure sensing matrix design in the first year. In the second year, we will transmit the collected samples to the cloud for processing and store. Nevertheless, by considering security and privacy-preserving, we need to develop a secure CS recovery algorithm. Finally, in the third year, we study big (multimedia) data recovery in the context of CS. The key is how to achieve fast high-quality recovery with low memory consumption. This issue is very challenging in compressive sensing.
As far as we know, such a proposal, composed of these three topics, has not appeared in the literature.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Liu, Tyng-Luh

Machine Learning and Computer Vision

My current research focuses on developing deep-net techniques for mainstream computer vision applications, such as image caption generation, large-scale object detection and recognition, and online visual information processing.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Mu, Shin-Cheng

Functional Programming and Program Derivation

In this mini-project we plan to pick an interesting algorithm and try to derive it and prove its correctness via functional program calculation. The applicant will be able to learn about functional programming and proofs of programs.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Ma, Wei-Yun

Deep Learning on Natural Language Processing of Big Data
你曾經想過蘋果公司的Siri是怎麼做的嗎?事實上, 自然語言處理(NLP)正是幕後英雄, 同時也是大數據時代最重要的技術之一.  因為人們的主要溝通媒介是語言, 因此NLP應用遍布在每一個地方, 例如搜尋引擎, 推薦服務, 問答系統, 機器翻譯等等. 這幾年NLP最重要的一個研究方向是應用深度學習(Deep Learning)技術於大數據NLP. 我們特別投入在設計新的深度學習模型來從網路上的大量資料當中學習出知識, 例如系統能藉由閱讀網路而自動學出"林書豪"和"黃蜂隊"具有"隊員-球隊"的關係. 我們同時將學出來的知識去結合已有的知識系統, 如 Wikipedia, Freebase,  E-HowNet等等, 形成一個整體的知識系統, 作為一個真正人工智慧系統的關鍵成分.                                                          

Have you ever wondered how Apple Siri is built? In fact, Natural Language Processing (NLP) is the hero behind the scenes, which is also one of the most important technologies of big data era. Applications of NLP is everywhere because people communicate most everything in language:  search engine, recommendation survive, QA system, language translation, etc.                                                                                                                                          
One of the most crucial NLP research topics these days is to apply deep learning techniques on big data.  We are especially studying how to design novel deep learning models to learn knowledge from a huge of unlabeled data on Internet, such as Member-Team Relation of Jimmy Lin and Charlotte Bobcats. We also integrate the learned knowledge with existing knowledge bases, such as Wikipedia, Freebase and E-HowNet, etc. to form a unified knowledge system as the key component of a strong AI system.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Chuang, Tyng-Ruey

Open Geospatial Informatics


關鍵字(非必備條件):GML, KML, OpenStreetMap, PostgreSQL, Gazetteer, POI, Semantic Web, Linked Data, Javascript, PHP, Python.

Proposed Intern Work: Research open geospatial information standards and systems. Program new functions as modules in existing geospatial systems.

Required Core Ability: Read and write technical documents; Work with researchers from various backgrounds on collaborative geospatial information systems.

Keywords (not prerequisites): GML, KML, OpenStreetMap, PostgreSQL, Gazetteer, POI, Semantic Web, Linked Data, Javascript, PHP, Python.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Yang, De-Nian

Innovative Applications of Social Networks and Multimedia Networks

Applications of Social Networks Analysis and Mining
Multimedia Network Optimization and Analysis
Implementation of Innovative Applications
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Yeh, Mi-Yen

Mining and Learning on Big Heterogeneous Data

Big data are usually heterogeneous since they contain data of rich types, varying from numerical to categorial, one-dimension to multi-dimension, and structured to un-structured. For this internship, we aim to let students learn how to apply data mining and machine learning techniques to analyze heterogeneous Big data with 4V characteristics (volume, velocity, abd variety, and veracity). Furthermore, we expect to design more effective and efficient algorithms to discover the useful information from heterogeneous Big Data.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :

Wang, Chien-Min

Cloud Computing and Human Centered Computing
1. 以索引讀取為主的雲端分散式檔案系統:針對以索引讀取為主的應用,對於現有的雲端分散式檔案系統進行擴充和最佳化,以同時達到高效能、規模擴充性和相容性這三個要求。首先我們認為一個理想的雲端分散式檔案系統應該能同時兼顧以串流讀取為主和以索引讀取為主的應用,因此我們採用擴充現有雲端分散式檔案系統的方式。其次雲端分散式檔案系統的設計和實作,對於處理大量資料的應用程式的效能有很大的影響。因此我們將先以索引讀取為主的應用,例如生物資訊領域的應用,對雲端分散式檔案系統進行最佳化。我們相信這樣的雲端分散式檔案系統不但能大幅提升索引讀取的效能,還能大幅拓展雲端運算平台的應用,使得更多應用服務被移植到雲端運算平台上,享受雲端運算平台所提供的強大資源與便利性。

2. 人智運算的穿戴運算系統:研究穿戴式電腦及裝置在人智計算中的應用,特別是在社交網路方面的應用。我們計劃中的人智運算系統應具備的三種能力:具有解周遭環境與人們情況的能力,可提供使生活更美好的服務,和透過感官與人類自然地互動。為了實現這三個能力,我們計劃中將從三個研究學科來發展:情境識別,雲服務,以及擴增實境。我們計畫透過先進的系統設計,提供更適合未來人類生活以及具備更友善人機互動的應用程式。藉由研究相關的穿戴式電腦及裝置,開發更佳的人機整合功能,並透過社交網路系統之系統分析,研究並開發穿戴式社交網路系統。我們將著重於友善的使用者界面,以提升使用者經驗為目標,並且提供更適合的情境感知技術與實境服務的增強實境功能。

1. Cloud Distributed File Systems for Indexed Reads: Indexed reads are very important for many applications such as bioinformatics. For example, a suffix array is a list of indexes to the starting positions of the suffixes of the text, sorted by their alphabetical order. The popularity of suffix arrays in bioinformatics is evident from their application in a wides range of tasks. In this project, we aim at the extension and optimization of cloud distributed file systems for indexed reads. The expected results of this project would include the following major components: (1) A memory centric storage system. (2) Optimization of cloud distributed file systems for indexed reads. (3) Extension of application programming interface. We believe that such extension and optimization of cloud distributed file systems would benefit those application with a huge number of indexed reads and extend the applications of cloud computing.

2. Wearable Computing Systems and Applications in Human-Centered Computing: The goal of this project is to investigate the application of wearable computers and devices in Human-Centered Computing, especially those applications on social networks. A human centered computing system should have three abilities: understanding the context of the surrounding area and humans, providing the service that makes the lives better, and interacting with human naturally through perception. To realize these three abilities, we plan to adopt three corresponding research disciplines: context recognition, cloud service, and augmented reality. Wearable computers and social network services will be integrated to build the proposed wearable social network system. The proposed system will provide more convenient and user-friendly human-computer interaction.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Sung, Ting-Yi

Bioinformatics for cancer-related mass spectrometry-based proteomics

All of the eight FDA-approved serum proteins for cancer diagnosis are glycoproteins, and aberrant glycosylation is frequently related with cancer. We will particularly study glycoproteins specific to lung cancer and liver cancer, which are the top two causes of cancer death in Taiwan.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :
Ku, Lun-Wei
(1) 網路文字內容主題與情感分析 (2) 英文混淆字最佳例句推薦

(1) Web post topic and sentiment analysis (2) Recommending Example sentences of confusing English words to ESL learners
(1) 本研究主題旨在發展立場判斷的方法,不僅能同時判斷使用者及文章的立場;亦能克服文章標記不足之問題。此方法考慮了文章內容及使用者發文和按讚的行為。首先,我們對於文章內容進行語意分析,得到初始的立場標記;接著我們使用疊代式的運算,由發文內容判斷使用者的立場,或藉由按讚的人來判斷文章的立場。值得注意的是整個處理過程無需任何的標記資料就可以進行,因此能夠解決缺乏標記的問題。其中這些使用者互動的行為我們使用了真實記錄的資料,也嘗試著應用機器學習的技術-考慮已知的使用者行為判斷未知的使用者行為-來判斷潛在的按讚者。對於使用者立場的結果,我們希冀當前對於文章立場的優秀表現能夠佐證所判斷出的使用者立場,亦十分有效。事實上,除非是使用者本人揭露了自己的立場,否則我們幾乎無從得知他的真實想法。

(2) 近年來,網路資源的快速增長已經改變了語言學習行為。越來越多的人利用網路資源,而不是紙本書籍;然而,衍生問題是如何從大量的網路資源中,找到有用的信息。良好的例句在表明詞彙之間的細微差別是非常有用的,但是學習者(使用者)卻難以在大量的網路資源中找到描述詞彙之間差異的良好例句,因為大多數的網路詞典只有包含單一詞彙的解釋。為了解決這個問題,我們提出了一個系統,可以自動搜索一組最好例句,以利辨別容易混淆的詞彙。我們所提出的系統,對每個容易混淆的詞彙進行機器學習,得到每個詞彙用法的模型以及泛用閱讀困難度模型,以便提出簡單卻明顯的例句給學習者(使用者)。

(1) Web post stance classification is challenging not only because of the informality and variation in language on the web but also because of the lack of labeled data of faster-merging new topics – and even the labeled data we do have are usually heavily skewed. In this paper, we propose a web post stance classification approach to mitigate the latter two difficulties. The proposed approach considers post content as well as posting and liking behavior to classify post stances. Sentiment analysis is applied to posts to acquire their initial stance, and then the post stance is updated iteratively with correlated posting-related actions. The whole process works with no labeled data, which solves the first problem. We use the real interactions of authors and readers for stance classification, and further attempt to find potential likers through stance propagation to better consider the preferences of all users for all posts. Experimental results show that the proposed approach not only substantially improves content-based web post stance classification, but also achieves better performance for the minor stance class, which solves the second problem.

(2)Recently, the rapid growth of web source has changed language learning behavior. More and more people utilized the web source instead of the paper book. However, the problem now is that it is overwhelming to find useful information. In addition, when considering using different words, good example sentences demonstrating nuance among words are extremely helpful but learners can hardly find them as most web dictionaries contain the explanation for only single word. To solve this problem, we propose a system which can automatically search for the best example sentences of a group of confusing words. The proposed system learns the word usage model for each word in the confusing word group and an universal difficulty model for all sentences, in order to propose simple but clear example sentences for learners.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :

Chung, Kai-Min
Cryptography, Complexity Theory, 或Quantum Cryptography之獨立研究

Independent Research on Cryptography, Complexity Theory, or Quantum Cryptography

The intern is expected to perform independent research on selected topics in Cryptography, Complexity Theory, or Quantum Cryptography that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

The intern also has opportunity to join our group meeting, and is encouraged to interact with other group members to learn different research topics.

The intern is expected to perform independent research on selected topics in Cryptography, Complexity Theory, or Quantum Cryptography that interest him/her. This often starts by surveying research papers and presenting it to the PI. Along the way, the intern can identify research questions with the PI, perform independent study on the questions, and discuss with the PI in research meetings. Candidate topics include, but not limited to, Lattice-based Cryptography, Differential Privacy, Non-malleable Codes, Device-independent Cryptography,  PRAM Cryptography, Zero Knowledge, Randomness Extractors, etc.

The intern also has opportunity to join our group meeting, and is encouraged to interact with other group members to learn different research topics.
PI個人首頁(PI's Information) :

實驗室網址(Research Information) :

Email :