Complete Getting Started
From creating project to model training.
In this guide, we will use a sample problem to go over all steps necessary for federated learning with FELT. This process consists of:
- 1.Initial setup
- 2.Preparing datasets
- 3.Starting local training
- 4.Aggregating results from local training
- 5.Using final model
Video tutorial following similar structure as this guide.
Right now, the application only works with the MetaMask wallet. For instructions on how to install MetaMask, please visit:
Once you have the MetaMask installed, you can head to the FELToken application:
Here in the top-right corner, you should see CONNECT button with the MetaMask icon. Make sure that
Polygon Mumbaiis selected in the MetaMask and click connect. After that you just need to approve the connection in the MetaMask pop-up window.
Right now, the app is deployed on the Polygon Mumbai testnet. First, you will need some MATIC tokens to pay for the transaction fees. You can obtain these using a Polygon faucet. Just visit the following link and paste your wallet address:
You will also need OCEAN tokens to pay for datasets and algorithms. You can collect them through OCEAN faucet by submitting your wallet address here:
Right now, you will have to approve each transaction separately. If you want to make the process more smooth, you can use the Activate Automation option. Keep in mind that you will have to top-up your automation account with MATIC and OCEAN before using it. For more information, please follow this guide:
For the demonstration of federated learning, let’s imagine two towns collaborating on analyzing housing data. The data might contain sensitive information. Therefore, they can’t fully disclose the data. Each town publishes its dataset on Ocean, allowing only computation over data without direct access. We will try to predict a house price based on house parameters (size in square feet, number of bedrooms, bathrooms, material, etc.). Below you can see a demonstration of our data (original data file).
Example of house prices dataset. The target column we want to predict (prices) is the last. In data published on Ocean, we also need to remove the header row.
We already have the data published on Ocean (using the Mumbai chain) as the following assets, which we will use in this guide:
For using FELT with your own data, you will first need to have data in the correct data format. Right now, we support only CSV format. With the following rules:
- CSV contains only numerical data
- The last column is the target column
- Remove header row from data
- All datasets used during training must have the same number of columns
Now that we have our data ready. It’s time to start the training! Head to the app.feltoken.ai. Before you begin, you will need to connect your MetaMask account. So click on Connect button in the top-right corner. Make sure that in MetaMask, you are connected to the correct account and Mumbai testnet.
Then you will select between training on single dataset or on multiple datasets. For our case we will use the multiple datasets option. In the first step you must fill in the name of training (you can pick an arbitrary one). Then you fill in the DIDs of data; for our demo, you can use:
Screenshot of how the form should look before you go to next step.
Then you proceed to the next step, where you select the model. Right now, you can pick from scikit-learn models or calculations of basic statics (mean, variance...). For our case we can pick any regression model, for example Ridge regression.
Selection of model, you can pick from multiple model types.
In the final step you can pick hyperparameters of your model. One of the most important options is to pick target column index. This is the index representing column which we want to predict. Setting value to -1 will use the last column. You can click on submit once you select you hyperparameters.
Picking parameters for the selected model. Target column set to -1 means that we want to predict the last column.
Once you hit Submit button, you will see a progress bar. You will have to approve a few transactions using MetaMask to start the training. This is only neccessary if you didn't activate the automation. Here is the list of all transactions you will have to confirm:
- 1.Approve OCEAN token spend to purchase the dataset
- 2.Purchase the dataset (now approve and purchase are separate transactions)
- 3.Approve OCEAN token spend to purchase FELT algorithm for training
- 4.Purchase the algorithm
- 5.Sign request to start the compute job (training)
You have to start separate training for each DID; therefore, you will have to approve the set of the above transactions twice.
Screenshot from starting training and approving transactions through MetaMask.
Once you start the local training, you can go to launched jobs page (you can use Launched jobs button). Here you can monitor the compute job progress. You have to click the Reload button to get the latest status.
Once both jobs finish, you can start the aggregation. On right side of each local training you have check box which you can use to select which local trainings should be aggregated (you need to select at least 2). After selecting jobs to aggregate, you can click Aggregate button and start the aggregation.
Displaying job status; selecting jobs to aggregate and starting the aggregation.
After starting the aggregation, the progress bar will pop up. You will have to approve some transactions once again (if you didn't activate the automation):
- 1.Sign URLs to access local models
- 2.Approve OCEAN token to pay for provider fees
- 3.Order dataset for the compute job
- 4.Approve OCEAN token spend to purchase FELT algorithm for aggregation
- 5.Purchase the algorithm
- 6.Sign request to start the compute job (aggregation)
You can watch aggregation progress. Once it finishes, you will see the Download final model button. You will sign the request and download the final model (in our case
final-model-House Prices.json). The file is not a standard machine learning file format. You will have to use the FELT library to import it.
pip install feltlabs==0.3.0
Then you can load the model using
feltlabs.model.load_model(model_path)function. This function will take the path of the model file as an argument and return the model object. Right now, we support two types of models: federated learning and federated analytics. The behavior of each is slightly different.
When using the federated learning option and importing the model using
load_model(...)function. The function returns the model, which can be used as a standard scikit-learn model object. The model can then be used for prediction using the function
model.predict(data). You can check the following code for sample usage:
That’s it. You just trained your first model on a distributed dataset! Now it’s up to your imagination to find projects where you can use this technology.
Similarly to federated learning models, these models can be loaded using
load_model(...)a function. This time you don't have to pass any data to the model, and you can obtain calculated value (of sum, mean, variance, or std) using the
model.predict(None)function. See the example below:
# Using version: 0.3.0
from feltlabs.model import load_model
# Load model
model = load_model("final-model-mean.json")
# Call predict function without any input
mean = model.predict(None)
# This will print the value of the mean calculated by the model
These should be the main parts for getting started with the FELT. In the following guides, you might find more detailed instructions for specific tasks.