Gegevens mining is the process of uncovering patterns inwards large sets of structured gegevens to predict future outcomes. Structured gegevens is gegevens that is organized into columns and rows so that it can be accessed and modified efficiently. Using a broad range of machine learning algorithms, you can use gegevens mining approaches for a broad multitude of use cases to increase revenues, reduce costs, and avoid risks.
If you are looking to analyze unstructured gegevens (e.g. gegevens from essays, articles, rekentuig loom files, etc.) see text mining.
Gegevens mining process and instruments
The Cross-Industry Standard Process for Gegevens Mining (CRISP-DM) is a conceptual contraption that exists spil a standard treatment to gegevens mining. The process outlines six phases:
- Business understanding
- Gegevens understanding
- Gegevens prep
The very first two phases, business understanding and gegevens understanding, are both preliminary activities. It is significant to very first define what you would like to know and what questions you would like to reaction and then make sure that your gegevens is centralized, reliable, accurate, and finish.
Once you’ve defined what you want to know and gathered your gegevens, it’s time to prepare your gegevens – this is where you can embark to use gegevens mining contraptions. Gegevens mining software can assist te gegevens prep, modeling, evaluation, and deployment. Gegevens prep includes activities like joining or reducing gegevens sets, treating missing gegevens, etc.
The modeling phase te gegevens mining is when you use a mathematical algorithm to find pattern(s) that may be present ter the gegevens. This pattern is a prototype that can be applied to fresh gegevens. Gegevens mining algorithms, at a high level, fall into two categories – supervised learning algorithms and unsupervised learning algorithms. Supervised learning algorithms require a known output, sometimes called a label or target. Supervised learning algorithms include Naive Bayes, Decision Tree, Neural Networks, SVMs, Logistic Regression, etc. Unsupervised learning algorithms do not require a predefined set of outputs but rather look for patterns or trends without any label or target. Thesis algorithms include k-Means Clustering, Anomaly Detection, and Association Mining.
Gegevens evaluation is the phase that will tell you how good or bad your proefje is. Cross-validation and testing for false positives are examples of evaluation technologies available ter gegevens mining instruments. The deployment phase is the point at which you embark using the results.