tradestaya.blogg.se - Rapidminer studio tutorial

These are Gender, Age, Payment Method, Churn, and LastTransaction. The column names (or attributes as RapidMiner calls them) are those identified by the Nameannotation in the previous step. There are four important aspects to this step: Each row or example indicates one customer the entire spreadsheet constitutes the example set. Accept the default and click Next.ĭefine the data for import. (It has set the annotation to Name.) If that was incorrect, you could change it here but for your data set, this is unnecessary. In this step you can see that RapidMiner has preselected the first row as the row containing column names. For this tutorial, you want all cells (the default). Step 2 also allows you to select a range of cells for import. If there were more sheets, it may look like this: There is only one sheet, RapidMiner Data, in this file, but it is always a good idea to verify. Check the tabs at the top to verify that you are importing the correct Excel sheet. The wizard guides you to pull the data into the repository. Browse to the location where you saved customer-churn-data.xlsx, select the file, and click Next. In this section you will use the Import Wizard to import the data set, customer-churn-data.xlsx, into your repository.įrom the Repositories view, select Import Excel Sheet from the pull down to import the training data set. Your Repositories view should look similar to this:ĭownload the data set, customer-churn-data.xlsx, to your computer. Repeat the procedure from the step above (this time right-clicking on the new folder you created) and add a data folder and a processes folder. This is the place where your data, processes, and results are stored:Ĭreate a new folder in the Repository view:Įnter a name for the new folder, for example Getting Started, and click OK. Locate the Repository view of the Design perspective. (The other main area is the Resultsperspective, where you can play with your results in a variety of ways. It is here that you will combine operators into data mining processes. The Design perspective is your artistic canvas and the place where you will spend much of your time. Open the Design perspective by clicking the Design button in the upper right. (See the installation guide if you need assistance with this step.) The demo is presented by Fabian Temme (RapidMiner).Here is a video for Part 1: Rapid Miner Intro Part 1: Importing Files The video tutorial provided below demonstrates how this magic happens. Using the RapidMiner studio with the streaming extension enabled, we are able to drag and drop the aforementioned modules in our workflow and connect them without the need to write any code. The simplified trajectories that are derived from this component are then forwarded to either the Maritime Event Detector, to detect events that occurred, or to the Complex Event Forecasting module, to forecast events that are about to happen. In this workflow, we consume AIS events in a streaming fashion from a Kafka topic, and we create synopses using the SDE component.

It is based on symbolic automata and Markov models and in this workflow it is used to forecast maritime events (e.g., “a vessel is about to enter a port”). An open source version of the CEF has been released recently here. The detection of fishing events (e.g., carried out by trawlers and longliners) supported by this module is documented in the respective DEBS paper.Ĭomplex Event Forecasting (Wayeb). The Maritime Event Detector is an engine implemented by MarineTraffic, that is able to detect maritime events, such as deviations from normalcy (e.g., a vessel switching off its transponder) and vessel activities (e.g., transhipments, fishing, bunkering). The SDE is publicly available as an open source project and you can read more about its internal architecture and implemented algorithms SDE in the respective CIKM paper. In this workflow, we use it in order to create synopses of AIS data, creating simplified trajectories in order to improve the execution time of the workflow. The Synopsis Data Engine of INFORE is built on top of Apache Flink and exploits parallel processing and data summarisation techniques for performing scalable interactive analytics. The dataset contains information about vessels (e.g., navigational status, location, speed, etc.).

For the tutorial, a public AIS dataset provided by MarineTraffic is used. This extension provides operators that allow RapidMiner users to design and deploy arbitrary streaming jobs. This tutorial showcases how we can easily create scalable workflows for the maritime domain consisting of multiple INFORE components using the RapidMiner Studio and its Streaming Extension, that has been recently released under an open source license.