First we took the data from the the log of http web server and store it in a csv file.Then we scale our data for better accuracy,after that we dump this data on our Machine Learning model that will do clustering.Clustering helps us to detect the malicious IPs.As we get the IPs , It will be automatically blocked by the jenkins job.
Dumping the log Data,Since the log data has space deliminator but by default dataframe has comma deliminator , so we have to tell that the deliminator is space.
After dumping the data , we changed the column name for our ease.
To analyse the log , we have to find the total number of request getting from particular ip with particular status code.
Since we are going to analyse the log for ddos attack , so we eliminate all the columns that we donot require except (‘ip’, ‘status’, ‘count’).Now our new dataset has only three columns .In Ip column all the IPs are of string datatype ,and model do not support string , so we have to convert it into int.
For better accuracy of our model we have to scale our data , For scaling we are using StandardScaler function of sklearn.
Now our data is ready for training the model.Since we are going to find the group of IPs with similar attributes, so we will go for clustering , and we have to find either the IP is malicious or not so we will take two clusters. Now we train our model with scaled dataset.
We have “pred” that have the categories. so we attach this to the dataset_scaled , so that we can plot the graph and see the distributed groups. and get the malicious IP.
Now again we convert IPs from int to string for our ease. then we plot the graph to see the clustering . and we can see that there is one IP which is different from all the IPs so we check that IP how much request it sends to the server. and here we can find that this IP is doing DoS attack on the server.
As we can see in the above picture that there is one ip that hits the server for 4944 times with status code 408.After getting the malicious IP , we put these IP in a file.So that all the IPs are stored in a particular position and we can do further things.
I have written the whole code on jupyter , because it feels easy for me , but we are going to do the automation with the help of docker ,jenkins and github.So, we have to convert the .ipynb to .py file.
We push the log file and the python code to the github and as soon as we pushed it to github , the jenkins will trigger its first job , that will run the python code on the top of Docker container , and get the malicious IPs that will be stored in a file.
JOB1 will auto triggered as soon as as we push the log.csv and python file to the github.
After successful build of 1st job then it will automatically trigger the job 2.
Job 2 will read the file with malicious code and it will block the IPs ,so that in future they canot connect to our server
After successful build of job two, we can see that the malicious IP has been blocked by our server.
Now we have to just push our log in csv fomat to the github, then automatically jenkin will pull it and launch the container that run python program , and after getting the malicious IPs , it will auto block it .We have to do nothing , only just push the log.csv file to github.