Memory Consumption Based Decision-making in the Area of Plan Based Scheduling

worked on by: Cora Glaß

Exposé

Plan based scheduling on High Performance systems is a practice that is used, for example, to make it possible that a company can run the simulations of multiple projects on one system without running into delays in their development cycles of their products due to long waiting periods until their simulations are executed. Simulations are an useful tools to avoid building multiple costly prototypes of a product to test them out. The simulations can run with simply the design model parameters and their results can be used in the next development cycle to improve the design before building actual ,and in the best case, more efficient prototypes. Currently, a plan based scheduler schedules the tasks of a process on CPU cores based on the deadlines that is provided for the process/tasks as well as the CPU capacity required and available. In my master thesis I will extend the scheduler to also include the memory requirements/consumption of processes and their tasks. To do so I will focus on the first two steps of the following enumeration:
  1. Memory Usage obtain and process memory consumption data
  2. Task Mapping how to organise mapping while ensuring that tasks are mapped to CPU cores that can satisfy the CPU and memory requirements
  3. Data Mapping focusing on optimizing the mapping to make the most out of the cache hierarchy
The first step is to retrieve the data which is provided by the operation system concerning the memory consumption of the tasks, process these in a way to support the scheduler later on to make meaningful decisions.
The second step concerns creating and building a mapping concept which ensures that the tasks are mapped to CPU cores which can provide the necessary resources by, for example, dividing the resources of a node and reserving them for each CPU core.
The third step is planned to be part of the future work section. In case I do have the capacity to add more complexity during the limited time of my master thesis, I will work on the optimization based on the cache hierarchy to at least some extend. To do so, it is necessary to understand how the mapping of memory pages and their sizing does influence the access time, rate of cache flushes and how main memory is shared between CPU cores and nodes in the Grid architecture on which my master thesis is based on.

Thesis Requirements

formulate requirements here (together with your adviser)

Weekly Status

Week - (CW -38)

Activities

  • Check expose
  • Submitted request to start my master thesis officially (deadline 07.03.2024)

Next Steps

  • Start setting up the development environment and verify the scope, architecture, requirements of the project

Week - (CW 39)

Activities

  • Meeting
  • Get familiar with the linux kernel code (source code: https://github.com/torvalds/linux)
  • First attemts to setup a functional development/test envrionment (VM/QEMU) which uses self-build kernel image

Results

Next Steps

  • continue witht the setup and continue verifying scope etc in meetings

Problems

Week - (CW 40)

Activities

  • Meeting
  • Designing and implementing memory consumption model

Results

  • Concept of model and implementation of struct + helper functions

Next Steps

  • In case the memory consumption must be determined in a more complex way this has to be implemented

Problems

Week - (CW 41-44)

Activities

  • Weekly meetings
  • Debugging multiple issues:
    • development setup booted into unvalid pysical memory (solved by rebuilding whole setup)
    • error waas thrown when trying to assign pointer to another value (solved by adding memory reservation)
  • started obtaining and to store the memory consumption of processes and their tasks

Results

  • Development setup works again

Next Steps

  • Continue with obtaining and storing the memory consumption

Problems

  • Stuck on multiple issues for a long time

Week - (CW 45)

Activities

  • Had to rebuild my development setup
  • Adjusted code according to discussed topic (check during scheduling the memory consumption of the processes instead of it's child processes)
  • Researching whether the total vm consumption provided by the kernel is sufficient or if I need to check against flags, the stack etc.

Results

  • Development setup works again
  • First draft for step 1 of my prototype

Next Steps

  • In case the memory consumption must be determined in a more complex way this has to be implemented

Problems

  • Tend to complicate aspects. (At this point I only had to concentrate on the memory consumptions of the process as a whole and not the tasks)

Week - (CW 46)

Activities

  • Weekly meeting
  • Did some research to verify that the memory consumption I determine in my code is correct
  • Start with the second step
    • Create concepts about how the machine details, memory consumption model data and requests should look like
    • Start designing the reservation structure

Results

  • Step 1 (Memory Usage) implementation is done
  • Design drafts for step 2 (Task Mapping)
  • Example data and implementation of data readers
  • Draft in progress for the reservation structure

Next Steps

  • Continue thinking about the optimal design structures and work on reservation structure implementation

Problems

Week - (CW 47)

Activities

  • Weekly meeting
  • Re-Confirmed the architecture the protoype should support
  • Worked on the design for the reservation as it is not functionall in this context
  • Research how to implement the scheduling, important steps, necessary models and structures

Results

  • More knowledge about how to implement, but also still the need to design a good schedule concept

Next Steps

  • continue working on an useful design (concerning necessary structures, models, the inteface to them etc.)

Problems

Week - (CW 48)

Activities

  • Weekly meeting
  • Re-Confirmed the architecture the protoype should support
  • Worked on the design for the reservation as it is not functionall in this context
  • Research how to implement the scheduling, important steps, necessary models and structures

Results

  • More knowledge about how to implement, but also still the need to design a good schedule concept

Next Steps

  • continue working on an useful design (concerning necessary structures, models, the inteface to them etc.)

Problems

Week - (CW 49)

Activities

  • Weekly meeting
  • sick

Week - (CW 50)

Activities

  • mostly busy catching up with other university work
  • decided to simplify reservation handling as memory is split (fixed) between cores and not dynamicly handled

Week - (CW 51)

Activities

  • Weekly meeting (on Thursday)
  • Reservation structure completed and manually tested
  • starting on Job/Simulation structure (processes, tasks etc.)

Results

  • Reservation structure + functions and simple visualisation of content

Next Steps

Problems

Week - (CW 52) - sick

Week - (CW 1) - sick

Week - (CW 2)

Activities

  • Weekly meeting (on Thursday)
  • Added logic that a process reserves the memory for its whole execution (before the logic was based on the assumption that the memory is reserved per task and freed after the task is done)
  • Adjust reservation/scheduling of forked processes after parent process reaches fork task that creates them
  • Writing on the master thesis

Results

  • Reservation/scheduling logic improved (processes reserving memory for whole runtime is now taken into account)
  • First drafts for section introduction and background, collected literature of similar work section

Next Steps

  • Discuss open technical questions
  • Discuss open question about architecture
  • continue scheduling improvement

Problems

Week - (CW 3)

Activities

  • Weekly meeting (on Thursday)
    • got feedback for section 1-2 content
    • discussed open technical questions
  • Adjust reservation/scheduling that the join task of a parent process will be scheduled at a time after the child process that will join finished its last task
  • Add deadline handling
  • Implement PlanBuild that creates plans (in csv format) based on the reservations

Results

  • Reservation/Scheduling logic is taken into account the time of fork and join of parents/children
  • Scheduler provides plan incase on error occured during validation or scheduling

Next Steps

  • Add logic that incase the scheduling fails a retry will be started that schedules differently
  • Refactoring
  • Recheck that everything works
  • Adjust section 1-2 based on the feedback of the weekly meeting
  • Write next sections

Problems

*