Exercise 6

Part of the course Machine Learning for Materials and Chemistry.

Kernel methods can be implemented quite straightforwardly for small datasets. Use the Morse potential with all constants being 1 as target function.

Task 6.1: Generate training data

Write a function training_data(npoints: int) which generates a training set of npoints training points over the interval (0.2, 5) including their y value, i.e. the Morse potential.

Task 6.2: Implement Kernel Ridge Regression

Do not use external libraries for this task, but rather implement KRR from the equations in the slides using scipy/numpy only. Implement a function build_model(points: np.ndarray) which takes the data points from task 1 and returns the model coefficients (alpha on the slides) for a Gaussian kernel. Write another function evaluate(points: np.ndarray, alphas: np.ndarray, testset: np.ndarray) which returns the mean absolute error on a number of points in the testset of the same length as the training data.

Task 6.3: Learning curves

Run your code for different sizes of the training set and plot the error as a function of the nuber of training points. What do you observe? Do you have some ideas how the learning curve could become steeper or be lower? Hint: What if the domain was not (0.2, 5) but rather (0.2, 100)?