Exercise 6
Part of the course Machine Learning for Materials and Chemistry.
Kernel methods can be implemented quite straightforwardly for small datasets. Use the Morse potential with all constants being 1 as target function.
Task 6.1: Generate training data
Write a function training_data(npoints: int)
which generates a training set of npoints
training points over the interval (0.2, 5) including their y value, i.e. the Morse potential.
Task 6.2: Implement Kernel Ridge Regression
Do not use external libraries for this task, but rather implement KRR from the equations in the slides using scipy/numpy only. Implement a function build_model(points: np.ndarray)
which takes the data points from task 1 and returns the model coefficients (alpha on the slides) for a Gaussian kernel. Write another function evaluate(points: np.ndarray, alphas: np.ndarray, testset: np.ndarray)
which returns the mean absolute error on a number of points in the testset of the same length as the training data.
Task 6.3: Learning curves
Run your code for different sizes of the training set and plot the error as a function of the nuber of training points. What do you observe? Do you have some ideas how the learning curve could become steeper or be lower? Hint: What if the domain was not (0.2, 5) but rather (0.2, 100)?