In this chapter we will discuss about the different services that we use to test our implementation on.
Of course we introduce to you, Tiramola, the system that is our goal to improve, using our new approach.
We are using Tiramola as it is, with an exception in its decision making system.
We are presenting tiramola in Section \ref{sec:tiramola}
Our cluster is a Cassandra cluster. We are discussing about Cassandra on Section \ref{sec:cassandra}.
We test our cluster, using ycsb benchmark service, which we present in Section \ref{sec:ycsb} and collect metrics using Ganglia, as shown in Section \ref{sec:ganglia}
\section{Tiramola}\label{sec:tiramola}
The platform that we are going to implement our work on is Tiramola\cite{tiramola}. Tiramola is a cloud-enabled, open-source framework for automatic resizing of NoSQL clusters. In the earliest versions of Tiramola the decision of adding or removing resources were modeled as a Marcof Decision Proccess. In his deploma thesis Konstadinos Lolos \cite{tiramola-lolos} tested a different approach using Adaptive State Space Partitioning of Markov Decision Processes. In this thesis we are going to approach the problem of of Tiramola's decision using deep neural networks. But let us firs, to introduce to you Tiramola. Tiramola offeres the following features \begin{itemize} \item A generic VM-based module that monitors cloud-based NoSQL clusters. This module is further modified in order to report real-time, client-side statistics, offering multi-grained, scalable monitoring. \item An implementation of the decision-making module as a Markov Decision Process or RL q learning algorithms later using Adaptive State Space Partitioning of Markov Decision Processes, enabling optimal policy generation relative to both changes in the environment and different cost functions. \item A real-time system that integrates these modules; utilizing popular open-source implementations for NoSQL, Cloud APIs and benchmarking tools, our system decides on the appropriate add/remove VM action according to the chosen optimization function and relative to cluster performance. \end{itemize} \subsection{Tiramola's Architecture} Tiramola's architecture is illustrated in this figure \ref{fig:tiramola.pdf}. The Decision Making module incorporates both the user-policy defined through an optimization function as well as cluster- and client-side monitored metrics and periodically decides on cluster resize actions. It outputs resize actions to the Cloud Management module that interacts with the cloud vendor in order to release or acquire more virtual machines. The Cluster Coordinator is then responsible for orchestrating the addition and removal commands relative to the particular NoSQL cluster in hand. The Monitoring module maintains up-to-date performance metrics collected from both cluster nodes and client nodes. let's describe each module in detail \diagram{Tiramola's architecture}{tiramola.pdf} \subsubsection{Decision Making Module} This module is responsible for deciding the appropriate cluster resize action according to the applied load, cluster and user-perceived performance and optimization policy. Former versions of Tiramola formulated this process as a Markov Decision Process (MDP). We approach the subject usin deep neural networks (REF) as predictors that continuously identifies the most beneficial action relative to the current system state. The user goals are defined through a reward function that translates the optimization each application wishes to adhere to. Upon reaching a resize decision, the module forwards this command to the Cloud Management module. \subsubsection{Monitoring} TIRAMOLA uses Ganglia, a scalable \ref{sec:ganglia} distributed monitoring tool that allows remote collection of live or historical cluster statistics (such as CPU load averages, network, memory or disk space utilization, number of open client threads, etc) through its XML API. Apart from the server-side metrics, modifications have been …show more content…
\item implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.
\end{itemize}
\newpage
Below we present some special features of Cassandra
\begin{itemize}
\item \textit{Fast linear-scale performance -} Cassandra is linearly scalable. It increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time. \item \textit{no single point of failure -} Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure.
\item \textit{Fast linear-scale performance -} Cassandra is linearly scalable. It increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time. \item \textit{Elastic scalability -} Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.
\item \textit{Fast linear-scale performance -} Cassandra is linearly scalable. It increases your throughput as you increase the number of