Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amr Ayad, Ziad Shawwash, and Alaa Abdalla

Similar presentations


Presentation on theme: "Amr Ayad, Ziad Shawwash, and Alaa Abdalla"— Presentation transcript:

1 Amr Ayad, Ziad Shawwash, and Alaa Abdalla
Optimization Days, Montreal- Energy and Environment Session TB May8th, 2012 A Multi-agent Reinforcement Learning Approach to Develop the Water Value Function for Multireservoir Hydroelectric Systems Amr Ayad, Ziad Shawwash, and Alaa Abdalla

2 Introduction The prupose of (MARLOMMR ) is to establish the marginal value of water and value of water-in-storage for multireservoir hydroelectric power systems as well as optimal policy” releases”. The new algorithm uses the multiagent reinforcement learning (MARL) technique to compose a long/medium term reservoir operation optimization model. To validate the new model, a stochastic dynamic programming algorithm (SDPOM2R ) was developed and will be used to benchmark the MARLOMMR model. SDPOM2R will be used to validate other models as well. * MARLOMMR : Multi-agent reinforcement learning optimization model for multiple reservoirs * SDPOM2R: Stochastic Dynamic Programming Optimization Model for 2 Reservoirs

3 OUTLINE OF THE PRESENTATION:
BC Hydro System Research Project MARL Technique Problem Definition Main Model: MARLOMMR Conclusions

4 BC Hydro System Commercial Crown corporation owned by the Province of British Columbia Serving approximately 95 % of the province’s population and approximately 1.8 million customers Clean or renewable generation accounts for 90% of total supply Responsible for reliably generation between 42,000 and 52,000 GWh of electricity per year Peak Load ~ 11,000 MW Transmission network of over 18,500 kilometres and 57,000 kilometres of distribution lines Among the lowest electricity rates in North America.

5 BC Hydro System 61 dams 37 Hydroelectric Stations (10,500 MW)
Peace river system provides 34% of the energy requirement Columbia river system with 31% of the energy requirement 1 Gas-fired Thermal plant: (912 MW) 3 Combustion Turbine plants (110 MW) Many Run of River, Biomass etc, ~ 1450 MW, (soon more ROR) Wind, 222 MW (soon ~ 740 MW+) 100+ Generating units

6 BC Hydro System

7 Research Project” Water Value Capital Project” at BC Hydro
Amr Ayad, Ph.D. Student September-11-18 Research Project” Water Value Capital Project” at BC Hydro Jointly funded by NSERC and BC Hydro. Principal Investigator is Prof. Ziad Shawwash The main purpose is to create, compare and test several models that use different techniques to determine the best model/models to allocate the value of water-in- storage specially for the large multi-year-storage reservoirs which is used as a planning/decision making tool The work will not entirely start from scratch. 7

8 Research Project” Water Value Capital Project” at BC Hydro
Amr Ayad, Ph.D. Student September-11-18 Research Project” Water Value Capital Project” at BC Hydro Other than determining the water value and marginal value of water, the focus is on: Deriving the optimal operation policy for the planning horizons Forecast for expected revenue, energy and market transactions Capture more of the system complexity Better representation of the stochasticity/uncertainty involved Incorporating the CRT flood constraints and others 8 8

9 Technique: Multi-agent Reinforcement Learning Technique
Background MARL defined by Busoniu et al (2008), as “A group of autonomous, interacting entities sharing a common environment, which they perceive with sensors and upon which they act with actuators”. MARL can be regarded as a fusion of temporal-difference reinforcement learning , game theory, and more general direct policy search techniques. The main issues with the MARL are: The stability of the agents’ learning dynamics, Type of interaction between the agents

10 MARL Technique Tasks Fully cooperative: proper coordination and breaking ties in join-action value function Indirect coordination: agents indirectly guided to do biased actions to maximize the common return Coordination-based Methods: the global Q-function is decomposed Fully Competitive: unlike the nature of our problem Mixed Task: for the stateless cases Agent independent methods: each agent has its own Q-table and needs to replicate the other agents tables Agent tracking: module to track Agent-aware Methods: adapt by heuristics

11 MARL Technique Related Issues Benefits Learning in MARL
Reward Allocation Communication of Multiple Agents Centralized vs. Decentralized Control in MARL  Function Approximation Benefits Speedup/ efficiency of the computation process Sharing the experience between agents Unlike the single agent RL, failure of an agent in its task does not mean failure of the optimization,

12 MARL Technique Challenges
As the computation complexity increases exponentially with the increase of the state-action pairs in single agent RL, the same issue exists in MARL, There is difficulty to define a well-structured goal for multiple agents as the optimization cannot be performed without taking the correlation of the agents’ returns. As the agents are learning simultaneously, each agent has to follow the other agents’ non-stationary behavior The scalability  of the algorithm to realistic problem sizes which is also encountered in single agent RL ,

13 MARL Technique Challenges
Exploration/exploitation balance problem is even harder in MARL than in single RL Convergence to a strategy regardless what the other agents are doing  Rationality and best response to other agents’ behavior  Q-tables need storage for the multiple agents, otherwise use function approximation. As of today, this technique was used to model systems such as AI, game theory and robotics, ITS and others but not used to solve a similar problem to the one at hand.

14 Problem Definition Reservoirs and plants State-Space Decision-Space
Large and Complex System Reservoirs and plants State-Space Decision-Space Planning Horizon Main Constraints Objective Function Stochastic Variables implementation and representation Uncertainty Stochastic Main Constraints Maximum and minimum limits on turbine flow , Maximum and minimum limits on total plant discharge, Trade limits on exports and imports (transmission limits), and Maximum and minimum limits on generation. Environmental and other non-power constraints

15 Main Model: MARLOMMR Decomposition: Dantzig and Wolfe * MARLOMMR : Multi-agent reinforcement learning optimization model for multiple reservoirs

16 Main Model: MARLOMMR Each plant is represented by a decentralized-single-agent. Plants/agents are divided in groups depending on the river system they are at. For example MCA, REV and ARD will be in one group and GMS, PCN will be in another group and so on. For plants that are not in any groups; they might be added to some already-set-up groups or represented by single agents. All the plants in each group/ river system will be considered neighbors and they will be having a level of indirect communication between each other.

17 Main Model: MARLOMMR Each group(Squad) communicates with others and this would be through communication between individual agents in different groups or through (central agent module) for each group that communicates for the group with the other alike central agents from other groups.

18 Main Model: MARLOMMR Environment could be GOM or another model! Depending on the final structure of the problem. Decomposition and parallel processing are ideas under consideration Automating the learning parameters, stability and dynamics of learning as well as using the most efficient state-space discretization are focus areas. Start off with one stochastic variable and 5 reservoirs Dantzig and Wolfe *GOM: Generalized Optimization Model

19 Conclusions Developed a stochastic dynamic programming model (SDPOM2R) to handle the problem of two reservoirs (GMS and MCA) and has been tried successfully with three reservoirs. Currently developing the MARL Model Considered as an extension for the current applicable models such as RLROM (by Abdalla) Expected to have the first version in this month (May) Still, there are challenges in application of MARL technique *RLROM: reinforcement Learning Reservoir Optimization Model

20 Amr Ayad, Ph.D. Student September-11-18 QUESTIONS ?


Download ppt "Amr Ayad, Ziad Shawwash, and Alaa Abdalla"

Similar presentations


Ads by Google