What is anomaly detection, and why you need it?

Thu Feb 20 2025

Anomaly detection is a significant subject that has been studied in a variety of academic and application fields. Many Detection of anomalies approaches are specialized for certain application areas, while others are more generic. This article tries to provide a comprehensive and broad review of researches in the topic of anomaly detection techniques, with an emphasis on anomaly detection in the context of deep learning and surface defect detection.

What is anomaly detection?

Anomaly detection is the process of identifying out-of-the-ordinary features or events in data sets. it is commonly conducted on unlabeled data, which is known as "unsupervised anomaly detection". It is a significant topic of research in the machine learning field, to distinguish between normal and abnormal samples in a dataset. Many Detection of anomalies approaches has been developed specifically for certain application areas, while others are more general.

What are anomalies?

Anomalies are data patterns that do not match a well-defined notion of normal behavior. The figure below shows anomalies in a basic two-dimensional data collection. The data contains two normal areas, N1 and N2 because the majority of observations fall into these two sectors. Anomalies are points that are sufficiently far away from the regions, such as points O1 and O2, as well as points in area O3. Anomalies in data can be caused by a variety of factors, including illegal activity, such as credit card fraud, cyber-intrusion, terrorist action, or system failure, but all of these factors have one thing in common: they are all intriguing to the analyst. A fundamental feature of anomaly detection is the "interestingness" or "real-life significance" of abnormalities. Anomaly detection is related to but separated from, noise reduction in data containing undesired noise. Noise is a phenomenon in data that is not of interest to the analyst and it is a restriction to data analysis. Noise reduction is motivated by the requirement to eliminate undesired items before doing data analysis.

What are the types of anomalies?

Not all anomalies are the same. Three different types of anomalies have been researched:

Point Anomalies

A point anomaly occurs when a single data point deviates from the predicted pattern, range, or norm. Point anomalies are individual occurrences that are abnormal in comparison to the majority of other individual instances, such as a patient's abnormal health signals.

Contextual Anomalies

Contextual anomalies, also known as "conditional anomalies," relate to individual abnormal occurrences but in a specific context, i.e., data instances are anomalous in the specific context but otherwise normal. In real-world applications, the contexts might be very varied, for example, a quick temperature drop or increase in a specific temporal context or fast credit card transactions in atypical geographical settings.

Collective Anomalies

Collective anomalies are a collection of data instances that are abnormal as a whole in comparison to other data instances; individual members of the collective anomaly may or may not be anomalies. For example, exceptionally dense subgraphs generated by scammers in social networks are considered anomalies as a collective, but the individual nodes in those subgraphs might be as normal as actual accounts.

What is anomaly detection in machine learning?

The detection of anomalies is one of the most prominent uses of machine learning. Finding and detecting anomalies aids in the prevention of fraud, adversary assaults, and network breaches that can threaten your company's future. This is a technique that is often carried out using statistics and machine learning methods.

The majority of companies nowadays that require anomaly detection operate with massive volumes of data, like transactions, text, images, video content, and so on. You'd have to spend days going through all of the transitions that occur within a bank every hour, and even more, are produced every second.

Another issue is that the data is frequently unstructured, which means that the data was not structured in any particular way for data analysis. Unstructured data includes items such as business papers, emails, and images. To collect, filter, arrange, analyze, and store data, you must employ technologies capable of processing large amounts of data. Machine learning approaches get the greatest results when dealing with big data sets. Most types of data can be processed by machine learning techniques. Furthermore, you may select the algorithm based on your issue and even combine several strategies for the best results. Machine learning in real-world applications helps streamline the process of anomaly identification and save resources.

Anomaly Localization and Interpretation

In many applications like image analysis or sensor streams, it is critical to identify the specific locations of anomalies, not just their overall presence. Advanced techniques can highlight anomalous regions within complex multimedia data. For image data, attention layers can draw focus to irregular image patches. In temporal sensor data, time series segmentation can detect anomalous subsequences. This localization provides interpretable outputs beyond binary alerts to better understand model decisions. Influence functions are another approach to discern feature contributions and trace detected anomalies in data back to specific inputs. Overall, precise localization coupled with interpretation techniques improves model transparency.

What are the different types of anomaly detection?

There are different types of anomaly detection with machine learning.

Supervised

A training dataset is required by a machine learning engineer while doing supervised anomaly detection. The dataset's items are classified as normal or abnormal. The model will utilize these examples to extract patterns and discover abnormal patterns in previously unknown data. The quality of the training dataset is important in supervised learning. There is a significant amount of manual work needed since examples must be collected and labeled.

Unsupervised

Unsupervised algorithms are the most common type of anomaly detection, and neural networks are the most well-known example.

Artificial neural networks reduce the amount of human effort required to preprocess examples by eliminating the requirement for manual labeling. Even unstructured data may be processed using neural networks. When working with fresh data, neural networks may discover abnormalities in unlabeled data and apply what they've learned.

Semi-supervised

Semi-supervised anomaly detection approaches combine the advantages of the two preceding methods. Engineers can use unsupervised learning approaches to accelerate feature learning and work with unstructured data. However, by integrating it with human supervision, developers can monitor and regulate what types of patterns the model learns. This generally improves the model's predictions.

Detection Anomalies in Novel Data Types

Here are the different novel data types using ai abnormally detection.

Graph-Based Detection Anomalies

Many real-world datasets can be represented as graphs, such as social networks, molecular structures, and transportation networks. Graph-based detection of anomalies focuses on identifying anomalous nodes, edges or subgraphs that exhibit unusual topological patterns or neighborhood configurations deviating from the norm.

With the rise of graph neural networks (GNNs), node embeddings can be learned to capture network context and then used for anomaly scoring. Variational autoencoders for graphs show promise for graph-based detection of anomalies by modeling graph structure. Future research on scalable graph learning and inductive graph ai abnormally detection holds potential.

Detection anomalies in Videos

Surveillance videos are a key data source for recognizing abnormal events and activities. AI abnormally detection in videos is challenging due to complex spatio-temporal dynamics. Recent methods use 3D convnets to extract spatio-temporal features from video clips and then estimate anomaly scores based on reconstruction errors or other predictive metrics.

Other approaches include predictive modeling using LSTMs, optical flow-based analysis and generative adversarial models. Future detection of anomalies for video research is moving towards weakly supervised and online learning paradigms.

Text Detection Anomalies

With large language models, deep contextualized text representations can enable more semantic text ai abomaly detection. Autoencoder reconstruction errors and language model perplexity scores are promising techniques. AI abnormally detection for specific NLP tasks like text classification using influence functions is also being explored.

Detection anomalies in 3D Point Cloud

3D point clouds provide detailed spatial representations for objects and scenes. Identifying shape, texture and topological anomalies in such point clouds has applications from industrial inspection to autonomous navigation. Geometric deep learning techniques combined with autoencoders or GANs provide a promising approach for localization and characterization of 3D point cloud anomalies.

IoT Sensor Data abnormally Detection

The proliferation of IoT devices produces high velocity sensor streams. Detected outliers in such streams could indicate equipment faults, performance degradation or cyberattacks. However, handling noisy streams with missing values poses challenges. Real-time capable methods using LSTMs, dynamic time warping classifiers and one-class classifiers tailored for IoT data are being developed.

Methodological Advances

In this section we overview methodological advances in ai abnormally detection field.

Adversarial and Self-Supervised Learning

Recent research has obtained promising results by using adversarial and self-supervised learning objectives to impose constraints that yield better separation between normal and anomalous data representations.

Reconstruction-based approaches using autoencoders and GANs are also gaining traction for ai abnormally detection across modalities like image, text, video and graph data. Such methods do not require anomalous samples during training.

Active Detection anomalies

Active learning paradigms for abnoramlly detection aim to selectively sample likely anomalies to minimize manual labeling costs. Uncertainty-based sampling strategies and techniques like active adversarial learning show promise to guide data collection.

Few-Shot abnoramally Detection

One of the key challenges in ai abnormally detection is the lack of sufficient anomalous instances required for training models. Few-shot techniques in detection anomalies aim to detect anomalies using very few (even one or two) examples of anomalous data. Meta-learning algorithms are well-suited for quick adaptation based on few examples.

Ensemble Models

Ensembles combining multiple outlier detection models can improve robustness through consensus-based anomaly scoring. Hybrid ensembles using a mix of classical ML models and deep neural networks are also being explored. Ensemble techniques allow adapting to gradually evolving data distributions over time.

Explainable Models

For sensitive detection anomalies applications, explainability is critical. Decision tree-based models and emerging techniques like LIME allow attributing anomalies to specific patterns in the input data. This provides actionable insights for users.

Evaluation Metrics

Quantitatively assessing anomaly detection requires specialized evaluation metrics. The confusion matrix aggregates true/false positives and negatives. Derived from this, precision and recall are key metrics reflecting the tradeoff between accurate positive classification and avoiding false alarms. F1 score combines precision and recall into a composite measure. For probabilistic methods, ROC and PR curves visually assess varying decision thresholds. The area under these curves provides scalar metrics like AUC-ROC and AUC-PR. Computation time, memory usage, and other resource metrics are also important for assessing real-world deploy ability. No single metric dominates, so a holistic view is required for model selection.

What is anomaly detection used for?

Anomaly detection is a technique for identifying unusual patterns that do not conform to expectations. Anomalies are another term for abnormalities. This method has several uses.

Intrusion detection

Cybersecurity is critical for many organizations that deal with sensitive information, intellectual property, and the personal information of their workers and clients. Intrusion detection systems scan the network for potentially threatening traffic and report it. If suspicious behavior is discovered, the IDS software alerts the team.

Defect detection

Companies might face millions of dollars in litigation if they provide their clients with defective systems or mechanism information. A single detail that does not meet manufacturing standards can cause a plane to crash, killing hundreds of people.

Computer vision-based outlier detection systems are capable of detecting microscopic cracks in sheet metal used in the manufacture of planes even when there are thousands of identical pieces. Detection of anomalies systems can also be linked to controls that keep track of internal systems like fuel levels, engine temperatures, and other variables.

Health surveillance

Anomaly detection systems are extremely useful in the medical field. They assist doctors in diagnosing patients by recognizing unexpected patterns in MRI and test data. Typically, neural networks trained on thousands of samples are used here, and they may occasionally provide a more accurate diagnosis than doctors with 20 years of expertise.

Detection of Fraud

Machine learning fraud detection helps in the prevention of illegally obtained money or property. Banks, credit unions, and insurance businesses all use fraud detection software. Banks, for example, review loan applications before making a judgment. If the system discovers that some of the papers are bogus, such as your tax number not being in the system, it will inform the bank's employer.

System monitoring

Noticing unusual performances patterns, request flows, or metric spikes to identify potential software or hardware issues.

Applications of Anomaly Detection

As you noticed, anomaly detection has expanded field of different applications. Here are some of the most important uses of anomaly detection:

Anomaly Detection in Natural Language Processing (NLP)

Anomalies in NLP can manifest in different ways, depending on the context and the specific NLP task at hand. Here are some common types of anomalies in NLP:

Syntax Anomalies

Syntax anomalies refer to errors or inconsistencies in the grammatical structure of a sentence or phrase. These anomalies violate the rules of grammar and can hinder the proper understanding of the text by NLP models.

Semantic Anomalies

Semantic anomalies involve words or phrases that are contextually inappropriate or do not fit the overall meaning of the text. These anomalies can lead to misunderstandings and inaccurate interpretations by NLP algorithms.

Contextual Anomalies

Contextual anomalies occur when a word or phrase has different meanings depending on the surrounding text. NLP models may struggle to disambiguate such anomalies, leading to misinterpretations.

Anomaly Detection in Industrial IoT (IIoT)

In industrial settings, anomaly detection is crucial for predictive maintenance, ensuring machinery and equipment operate efficiently. Monitoring sensor data in real-time can help identify deviations from normal operating conditions.

Applications of Computer Vision for Anomaly Detection

Computer vision in anomaly detection is a key tool and provides various of applications:

Manufacturing Defect Detection

Computer vision in anomaly detection algorithms enable automating visual inspection for production line flaws and irregularities unnoticeable to humans. Defects in product shape, texture, dimensions, and assembly can be automatically flagged by analyzing camera streams. This allows rapid corrective interventions minimizing waste. Intelligent anomaly detection transforms traditional quality control.

Infrastructure Monitoring

Civil infrastructure like bridges, railways, and tunnels require continuous monitoring for early signs of damage and deterioration. Computer vision models trained on normal infrastructure imagery can identify the onset of cracks, corrosion, obstructions, and structural distortions by detecting anomalous patterns difficult for manual inspectors to discern consistently. This prevents catastrophic failures.

Medical Computer Vision Anomaly Detection

Computer vision is advancing medical diagnosis and treatment by automatically identifying anomalies in radiology images, CT scans, MRI scans and microscopy slides indicating potential diseases and conditions requiring intervention. For instance, MRI brain scan analysis helps identify tumors. Such assistive computer vision aids clinicians.

Anomaly Detection in Images

Anomaly detectors attempt to solve the difficult challenge of automatically detecting anomalies in a background image, which could be anything from a fabric to a mammogram. Thousands of detection approaches have been proposed since each challenge necessitates a unique background model. It is demonstrated by examining previous methodologies that the challenge may be simplified to detect anomalies in residual images (obtained from the target image), where noise and abnormalities prevail. As a consequence, the general and unsolvable problem of background modeling is replaced by a simple noise that enables strict detection criteria to be calculated. The best method is unsupervised anomaly detection, which can be applied to any image. The dense features of neural networks can be used to calculate residual pictures advantageously.

Data Preprocessing Techniques for Anomaly Detection

Before performing anomaly detection, it's essential to preprocess the data. This involves data cleaning and outlier removal, ensuring accurate results. Feature scaling and normalization are critical to ensure all features have equal importance. Handling missing values and dealing with categorical data are other vital preprocessing steps. For time series data, specific techniques such as time-based preprocessing are used.

Anomaly Detection in Streaming Data

In real-world applications, data often arrives in a continuous stream. Detecting anomalies in such streaming data poses unique challenges. Real-time anomaly detection requires online and incremental learning approaches. Techniques like sliding windows and time-based approaches ensure anomalies are detected promptly.

Causes and Sources on anomalies in data

Anomalies can arise in data for diverse reasons. Simple errors, noise, outliers, or missing values in the data collection and preprocessing pipeline may manifest as anomalies. More fundamentally, anomalies in data occur when there is novelty or concept change resulting in new data points unlike previous patterns. The emergence of a new class not seen during training is a common cause. Deliberate actions like cyberattacks, fraud attempts, system faults, or other disruptions also generate anomalous data deviating from normal behavior. Identifying the root causes and sources of anomalies provides useful diagnostics.

Ethical Considerations in Anomaly Detection

As with any AI application, anomaly detection raises ethical concerns. Privacy issues and data protection should be a top priority. Ensuring fairness and mitigating bias in anomaly detection models is crucial. Transparency and accountability in AI systems build trust with users.

What is a surface defect?

The quality of manufactured goods is extremely significantly affected in the industrial production process due to the defects and limits of the present technology, working conditions, and other variables. Surface defects are the most obvious sign of a product's quality being compromised. As a result, identifying product surface defects is critical to ensuring a high qualification ratio and consistent quality. The term "defect" refers to the absence, flaw, or region that differs from the normal sample.

Surface defect detection is the detection of scuffs, defects, foreign body shielding, color contamination, holes, and other defects on the surface of the sample to be tested to collect a series of necessary information such as the category, contour, location, and size of surface defects on the sample to be tested. Manual defect detection was formerly the standard approach, but it is inefficient; the detection results are frequently influenced by human subjectivity and cannot match the criteria of real-time detection. Other approaches have gradually replaced it.

Surface defect detection using deep learning

Artificial intelligence is now used extensively in a variety of social applications. Effective surface defection is critical to the quality management of the industrial environment in the realm of industrial automation. The old manual detection approach takes time and does not work for large-scale items. Another significant cause of difficulty in manual detection is that many of the variables impact detection accuracy. Deep learning, which has acquired a lot of popularity in recent years, notably with convolutional neural networks (CNN), has performed well in many computer vision tasks, including picture identification and classification. CNN is currently routinely used to target industrial inspection jobs as a result of recent improvements in deep learning.

Deep learning techniques solve defect detection and classification by employing appropriate classifiers. Deep convolutional neural networks are one of the most effective technologies for image categorization on large datasets, having been developed on the subject of surface defect inspection, several deep learning algorithms are employed. A completely convolutional neural network (CNN) is a network that comprises a segmentation stage and a detection stage that is operated by two distinct fully convolutional networks.

Types of Anomalies Detectable Via Computer Vision

Through computer vision users can detect different types of anomalies:

Shape Anomalies

Irregular shapes and form factors that deviate from expected geometric patterns are indicators of issues in manufactured products and infrastructure defects.

Texture Anomalies

Abnormal surface textures like cracks, corrosions, spotting and discoloration often signal incipient problems identifiable using computer vision early on.

Dimensional Anomalies

Computer vision systems measure dimensions like length, width, radius etc. and flag significant deviations from tolerances. This aids quality control.

Contextual Anomalies

Certain anomalies manifest only in specific contexts. For instance, computer vision can identify passengers in abnormal prohibited zones of train platforms by learning expected platform usage patterns.

Challenges and Open Problems

However, anomaly detection still presents many challenges:

Imbalanced data – Defects and anomalies are inherently rare, making anomalous example data scarce.
Novelty detection – Models must detect new types of anomalies never seen during training.
Explain ability – Interpretability is key for domains like healthcare. But many advanced models are black boxes.
Dataset bias – Models can inherit biases from datasets that underrepresent certain anomaly types.

Further research in areas like active learning, adversarial training, and hybrid models is important to address these issues.

Anomaly detection in Saiwa

Anomaly detection automates the difficult task of detecting anomalies or faults in a background image. Identifying uncommon occurrences that differ from the normal cases that constitute the majority of a dataset, we investigated several types of surface defects in Saiwa and will continue to add anomalies in the future. For each instance and dataset, several deep networks for classification and segmentation are used.

Currently, 15 different datasets and surface defect detection methods are available for testing. These datasets include surface defects such as metal, steel, polymer, and texture. You can freely test the algorithms on your images utilizing our simple UI, and if you like, you can leave us a modification request to retrain the networks on your unique dataset or various sorts of surfaces and defects.

The features of the Saiwa anomaly detection service:

Detecting several types of anomalies using a single interface.
There are 15 different datasets of various flaws on metal, steel, polymer, and texturing surfaces covered.
We provide cutting-edge, latest deep learning-based algorithms for each dataset.
Deep neural networks with several classifications and segmentations.
Each dataset contains preview examples of faults.
Image aggregation is used to apply the method to several photos at the same time.
Preview and save the results.
The results can be exported and archived locally or on the user's cloud.
The Saiwa team can customize services by using the "Request for Customization" option.

What is anomaly detection, and why you need it?