Spark Streaming Scheduling
Currently, we extend our Multi-Path scheduling design into new steaming application. We found that some nature of streaming application in Spark make our design more suitable.
RDD Eviction and Memory Managemnet
Spark improves the system performance by storing as much intermediate results (e.g., RDDs) into the memory instead of spinning disks (e.g., HDDs) as possible. However, this all-in-memory design has a concern related with the caching management scheme: with the limited size of memory, cached RDDs will be evicted from the RAM by LRU policy by default, which may be not optimal for all use cases. Motivated by this, we develop a caching replacement algorithm which selects the victim with the consideration of both the importance degrees and sizes of cached RDDs.
Spark Scheduling for Mutli-Path Application
We found that current Spark implementation is lack of task scheduling mechanism which does not yield the optimal task arrangement unnecessarily extending the makespan of stages. To address this issue, I am currently design a novel scheduling policy for multi-path multi-stages applications in Spark which can effectively reduce the makespan of Spark application by leveraging the task information in multiple parallel stages, especially for unbalanced job (i.e. Some parallel successive stages' execution time are much longer than others ).
MapReduce / Hadoop Scheduling
I am currently focus on the continuing work of a Yarn resource scheduling improvement. The work is adding additional heuristic score to current scheme for adapting iterative job in MapReduce.
Another part of work is collaborate with CCIS studetens to solve I/O Contention and improve matrix multiplication proformance in MapReduce jobs
Frequency Planning Based on Fixed Relay
We develop a new scheme of frequency planning for mobile cellular among base station, relay and user to eliminate the channel interference and increase SNR (Signal to Noise Ratio)
- Design cellular architecture based on math calculation and simulate performance with MATLAB.
- Modify the exist cellular model to improve SNR, Throughput and other performance.
- Han Gao, Zhengyu Yang, Janki Bhimani, Bo Sheng, Ningfang Mi. “AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-stage Big Data Framework, International Conference on Computer Communication and Networks (ICCCN) 2017
- Yao, Yi, Han Gao, Jiayin Wang, Ningfang Mi, and Bo Sheng. "OpERA: Opportunistic and Efficient Resource Allocation in Hadoop YARN by Harnessing Idle Resources." In Computer Communication and Networks (ICCCN), 2016 25th International Conference on, pp. 1-9. IEEE, 2016.
- Han Gao, Haiyuan Liu, and Siyu Wang. "Research of spectrum sharing method based on channel heterogeneity." In Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on, pp. 1699-1705. IEEE, 2012.
- Zhengyu Yang, Yi Yao, Han Gao, Jiayin Wang, Ningfang Mi, and Bo Sheng "New YARN Non-Exclusive Resource Management Scheme through Opportunistic Idle Resource Assignment" Transactions on Cloud Computing (TCC)
- Yi Yao, Han Gao, Jiayin Wang, Bo Sheng, Ningfang Mi. " New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters" Transactions on Cloud Computing (TCC)
Thesis & Paper
- Scheduling Policy in Cloud Computing Framework MapReduce: A Survey
- Full Stack Engineering: Florent(start up)
- Software Engineering Intern: Amazon