From Noise to Knowledge - Turning Cyber Threat Data into Actionable Insight with MCP and LLM

This research highlight methodology to visualize the cyber threat from public report using LLM and MCP.

Introduction

This article introduces our approach to Cyber Threat Visualization, leveraging automation to transform raw, text-based threat reports into structured, actionable intelligence. By utilizing a suite of Model Context Protocol (MCP) tools, we automate the collection and parsing of publicly available threat data, particularly those related to malware and threat actor activity.

From this unstructured information, our workflow subsequently extracts key Indicators of Compromise (IoCs) and converts them into standardized formats such as STIX (Structured Threat Information eXpression). This machine-readable format will make it easier to analyze threat data and build visualizations that reveal relationships, highlight patterns, and support faster decision-making.

Following the data collection, the workflow extracts crucial Indicators of Compromise (IoCs) from the unstructured text. Subsequently, another MCP tool is employed to convert these extracted IoCs into the STIX (Structured Threat Information eXpression) format. This conversion into a standardized, machine-readable language is the critical foundation for enabling meaningful analysis and, ultimately, powerful visualization. By structuring threat data in this manner, we pave the way for creating visual models that help analysts quickly identify relationships, patterns, and emerging threats, thereby enhancing an organization's overall security posture.

What & Why STIX

STIX (Structured Threat Information eXpression) is a standardized language that turns chaotic, text-based threat reports into structured, machine-readable data. It defines cyber threat objects—like malware, threat actors, and Indicators of Compromise (IoCs)—and the relationships between them.

The "why" is critical for our workflow:

Enables Automation: STIX provides a common format that allows security tools to automatically share, process, and understand threat intelligence without manual intervention.
Creates Context: Instead of just listing isolated IoCs, STIX connects the dots, linking a specific piece of malware to the threat actor who uses it and the campaign it's part of.
Powers Visualization: This structured, contextual data is the essential foundation for building visual models. It allows us to graph relationships and patterns that are otherwise hidden in text, turning complex reports into clear, actionable intelligence.

Practical Application: Visualizing a Real-World Threat

As illustrated in the diagram below, the visualization process is driven by an integrated suite of MCP tools. A Large Language Model (LLM) serves as the central coordination mechanism, responsible for orchestrating the workflow and executing control over these individual tools to generate the final output.

1.Workflow

For this experiment, we utilized a suite of custom-designed Model Context Protocol (MCP) tools to handle specific tasks within the workflow. Broadly, the function of these tools is to perform a series of automated actions:

Crawling: Systematically scan and retrieve data from cyber threat report websites.
STIX Generation: Convert the extracted IoCs into an accurate, standardized STIX JSON format.
Intelligence Centralization: Upload the generated STIX JSON files to an OpenCTI platform to consolidate and manage the threat intelligence.

2.Crawling & Data Extraction

The crawling process is performed as follows: If the requested information exists within the local knowledge base, the data will be retrieved from there. If not, the crawling process will be initiated on the Threat Actor category on SOCRadar blog.

Extracting information from public report

The crawling tools will get relevant data from specific article socradar blog

All the information from report will extracted, the LLM model will focus extracting the IoC information, then call and parse to the MCP STIX generator.

Extracting the IoC from public report and convert to STIX format

Below is a sample snippet of the IoC that has been extracted from report:

Cring.exe: c5d712f82d5d37bb284acd4468ab3533

Ghost.exe:
- 34b3009590ec2d361f07cac320671410
- d9c019182d88290e5489cdf3b607f982

ElysiumO.exe:
- 29e44e8994197bdb0c2be6fc5dfc15c2
- c9e35b5c1dc8856da25965b385a26ec4
- d1c5e7b8e937625891707f8b4b594314

Locker.exe: ef6a213f59f3fbee2894bd6734bbaed2

IOX related files:
- iex.txt/pro.txt: ac58a214ce7deb3a578c10b97f93d9c3
- x86.log: 
  * c3b8f6d102393b4542e9f951c9435255
  * 0a5c4ad3ec240fbfd00bdc1d36bd54eb
- sp.txt: ff52fdf84448277b1bc121f592f753c5
- main.txt: a2fd181f57548c215ac6891d000ec6b9
- isx.txt: 625bd7275e1892eac50a22f8b4a6355d
- sock.txt: db38ef2e3d4d8cb785df48f458b35090

Sample IoC Extracted from Report

3.Visualization Result

Below is a simplified snippet of the generated STIX JSON bundle:

{
    "type": "bundle",
    "id": "bundle--648ba823-f078-440e-a479-db69863ec752",
    "objects": [
        {
            "type": "identity",
            "id": "identity--99f5aa61-12a0-4f75-ba56-74bd8b385806",
            "name": "CISA",
            "identity_class": "organization",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z"
        },
        {
            "type": "threat-actor",
            "id": "threat-actor--83dea824-b3be-4a3c-9e98-84d1a94d4001",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "name": "Ghost (Cring) Ransomware",
            "description": "Financially motivated ransomware threat actor active since 2021",
            "threat_actor_types": [
                "ransomware-operator",
                "criminal"
            ],
            "aliases": [],
            "first_seen": "2025-06-16T10:57:13.000Z"
        },
        {
            "type": "malware",
            "id": "malware--8d1909fe-0756-4b5a-840b-b2de85f9d4b1",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "name": "GHOSTPULSE",
            "malware_types": [
                "ransomware"
            ],
            "is_family": true
        },
        {
            "type": "relationship",
            "id": "relationship--bde0349b-d495-43c3-884c-99bcc4e29025",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "relationship_type": "uses",
            "source_ref": "threat-actor--83dea824-b3be-4a3c-9e98-84d1a94d4001",
            "target_ref": "malware--8d1909fe-0756-4b5a-840b-b2de85f9d4b1"
        },
        {
            "type": "malware",
            "id": "malware--508fd573-69fa-486f-b58e-264e34920a3a",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "name": "Lumma Stealer",
            "malware_types": [
                "infostealer"
            ],
            "is_family": true
        },
        {
            "type": "relationship",
            "id": "relationship--9ffc3130-8515-4bba-8b88-3a016ccaf8b6",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "relationship_type": "uses",
            "source_ref": "threat-actor--83dea824-b3be-4a3c-9e98-84d1a94d4001",
            "target_ref": "malware--508fd573-69fa-486f-b58e-264e34920a3a"
        },
        {
            "type": "malware",
            "id": "malware--07439311-efaf-4fdb-9dc1-2535a40cb40a",
            "created": "2025-06-16T10:57:13.000Z",
            "modified": "2025-06-16T10:57:13.000Z",
            "name": "Cring",
            "malware_types": [
                "ransomware"
            ],
            "is_family": true
        }
    ]
}

The STIX result can be visualize using stix visualization website.

The final step in the pipeline is to upload the STIX bundle into a centralized threat intelligence platform, such as OpenCTI. Once ingested, the true value of the process is realized through visualization.

Instead of a flat text file, the analyst is presented with an interactive knowledge graph. In the center is the Ghost APT entity for "Ghost(Cring) Ransomware". Branching out from it are the various Malware entities, each containing a specific pattern (the domain, the IP address, the file hash). The relationships are clearly drawn lines, labeled "indicates".

Upload the threat information to OpenCTI platform

What We Learned from the Experiment

The experiment yielded several key findings that validate the efficacy of the proposed LLM-driven intelligence automation pipeline. The results were assessed based on efficiency, accuracy, data quality, and the ultimate analytical value of the output.

Result 1: Dramatic Reduction in Intelligence Processing Time The most significant finding was the drastic improvement in operational efficiency. Manual analysis of a single threat report—including reading, identifying IoCs, structuring them, and entering them into a platform
Result 2: High-Fidelity Indicator of Compromise (IoC) Extraction The LLM-powered extraction mechanism demonstrated a high degree of accuracy and contextual awareness. Unlike traditional regex-based parsers which are often rigid, the LLM successfully identified context-heavy entities such as malware family names and their associations with specific C2 domains mentioned in prose.
Result 3: Enhanced Analytical Insight Through Visualization Upon ingestion into OpenCTI, the structured data immediately yielded an interactive knowledge graph that provided significant analytical value.

Conclusion

Confronting the challenge of overwhelming unstructured threat data, the workflow detailed in this article presents a new paradigm for intelligence-led security. By leveraging a Large Language Model to orchestrate a suite of MCP tools, this system creates a robust pipeline that transforms chaotic, text-based reports into a structured, actionable knowledge base in STIX format. This automation does more than simply boost efficiency; it fundamentally redefines the cybersecurity professional's role, shifting focus from tedious data collection to high-value strategic analysis, proactive threat hunting, and rapid incident response. This fusion of AI-driven speed and human expertise is the cornerstone of next-generation defense, establishing a new baseline where the ability to automate the intelligence cycle is no longer an advantage, but an essential component for maintaining a resilient security posture in a complex digital world.