managing-types.mdx 42 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253
  1. ---
  2. title: "Managing types"
  3. description: "This guide walks you through the process of creating a new data type in the Flowsint ecosystem and integrating it throughout the platform. Types in Flowsint serve as the foundation for all data modeling, providing structure, validation, and schema generation for the entire system."
  4. category: "Developers"
  5. order: 8
  6. author: "Flowsint Team"
  7. tags: ["tutorial", "developers", "creating-a-new-type"]
  8. version: "1.2.8"
  9. last_updated_at: "2026-05-15"
  10. ---
  11. ## Understanding the type system
  12. The Flowsint type system is built on Pydantic models and lives in the `flowsint-types` package. Every type is a python class that inherits from `FlowsintType`, which itself inherits from `pydantic.BaseModel`, **and must be decorated with `@flowsint_type`** to be registered in the global type registry. This provides automatic validation, serialization, JSON schema generation, auto-discovery, and graph-specific functionality like automatic label generation. The architecture is deliberately simple with minimal inheritance hierarchies. Each type inherits from FlowsintType and defines its own fields and behavior.
  13. The package structure is straightforward. Inside `flowsint-types/src/flowsint_types/`, you'll find individual python files for each type. Most types get their own file, though closely related types sometimes share a file. For example, `wallet.py` contains `CryptoWallet`, `CryptoWalletTransaction`, and `CryptoNFT` because they work together as a conceptual unit.
  14. Currently, Flowsint includes 39 built-in types covering everything from network entities like domains and IPs to identity information like individuals and organizations, security data like credentials and breaches, and financial information like bank accounts and crypto wallets.
  15. ### What is FlowsintType?
  16. `FlowsintType` is the base class for all Flowsint entity types. It extends Pydantic's `BaseModel` with additional functionality specific to Flowsint's graph database and UI needs:
  17. ```python
  18. class FlowsintType(BaseModel):
  19. """Base class for all Flowsint entity types with nodeLabel support.
  20. nodeLabel is optional but computed at definition time.
  21. All classes that inherit from FlowsintType must be decorated with @flowsint_type
  22. to be registered in the global TYPE_REGISTRY and accessed by their class name.
  23. Usage:
  24. from flowsint_types.registry import flowsint_type
  25. @flowsint_type
  26. class Domain(FlowsintType):
  27. domain: str
  28. """
  29. nodeLabel: Optional[str] = Field(
  30. None,
  31. description="UI-readable label for this entity, the one used on the graph.",
  32. title="Label",
  33. )
  34. # Allow extra keys to support additional properties from user
  35. class ConfigDict:
  36. extra = "allow"
  37. ```
  38. The `nodeLabel` field is automatically set by types using a `@model_validator` decorator, and this label is what appears on graph nodes in the Neo4j database and in the frontend UI. Every type should compute its own meaningful label based on its fields.
  39. The `ConfigDict` with `extra = "allow"` means types accept additional properties beyond their defined fields, which is useful for user-provided metadata.
  40. ### The `@flowsint_type` decorator
  41. Every type **must** be decorated with `@flowsint_type` from `flowsint_types.registry`. This decorator registers the type in the global `TYPE_REGISTRY`, which enables:
  42. - Auto-discovery of all types at startup via `load_all_types()`
  43. - Lookup by class name (e.g., `TYPE_REGISTRY.get("Domain")`)
  44. - Lookup by lowercase name (e.g., `TYPE_REGISTRY.get_lowercase("domain")`) for Neo4j matching
  45. ```python
  46. from flowsint_types.registry import flowsint_type
  47. from .flowsint_base import FlowsintType
  48. @flowsint_type # Required for registration
  49. class MyType(FlowsintType):
  50. ...
  51. ```
  52. Without this decorator, your type will not be discoverable by the system.
  53. ## Creating a new type
  54. Let's walk through the process of creating a new type from scratch. We'll use a hypothetical `Vehicle` type as our example.
  55. ### Setting up the file
  56. Start by creating a new python file in the types directory. The filename should be lowercase and match your type name in snake_case. For a `Vehicle` type, you would create `vehicle.py`:
  57. ```bash
  58. cd flowsint-types/src/flowsint_types/
  59. touch vehicle.py
  60. ```
  61. ### Basic structure
  62. Every type follows the same structural pattern. Here's what a basic type looks like:
  63. ```python
  64. from pydantic import Field, model_validator
  65. from typing import Optional, Self
  66. from .flowsint_base import FlowsintType
  67. from .registry import flowsint_type
  68. @flowsint_type
  69. class Vehicle(FlowsintType):
  70. """Represents a vehicle with identifying information."""
  71. license_plate: str = Field(
  72. ...,
  73. description="Vehicle license plate number",
  74. title="License Plate",
  75. json_schema_extra={"primary": True},
  76. )
  77. brand: Optional[str] = Field(
  78. None,
  79. description="Vehicle manufacturer such as Toyota or Ford",
  80. title="Make"
  81. )
  82. model: Optional[str] = Field(
  83. None,
  84. description="Vehicle model name",
  85. title="Model"
  86. )
  87. year: Optional[int] = Field(
  88. None,
  89. description="Year of manufacture",
  90. title="Year"
  91. )
  92. @model_validator(mode='after')
  93. def compute_label(self) -> Self:
  94. """Compute a human-readable label for this vehicle."""
  95. if self.brand and self.model and self.year:
  96. self.nodeLabel = f"{self.license_plate} ({self.brand} {self.model} {self.year})"
  97. else:
  98. self.nodeLabel = self.license_plate
  99. return self
  100. ```
  101. Let's break down the key components:
  102. **Inheritance, imports, and decorator:**
  103. - The class inherits from `FlowsintType`
  104. - Import `FlowsintType` from `.flowsint_base`
  105. - Import `flowsint_type` from `.registry` and apply it as a decorator
  106. - Import `model_validator` and `Self` from Pydantic for the label computation
  107. **Docstring:**
  108. - Every type starts with a clear docstring explaining what it represents
  109. **Field definitions:**
  110. - Each field is defined as a class attribute with type hints
  111. - Use Pydantic's `Field()` function to provide metadata
  112. - Required fields use the ellipsis (`...`) as their default value
  113. - Optional fields use `Optional[Type]` in their type hint and `None` as the default value
  114. - Always provide `description` (for API docs) and `title` (for UI labels)
  115. **Primary field:**
  116. - The `json_schema_extra={"primary": True}` marks the unique identifier for this type
  117. - This field is used as the key when creating Neo4j nodes
  118. - **Critical:** Every type must have exactly one primary field
  119. - Choose a field that uniquely identifies instances of this type
  120. **Label computation:**
  121. - The `@model_validator(mode='after')` decorator runs after all field validation
  122. - The method must be named `compute_label` and return `self`
  123. - It sets `self.nodeLabel` to a human-readable string that will appear in the UI and graph
  124. - Handle cases where optional fields might be `None` to avoid ugly labels
  125. - The label should help users quickly identify what this entity is
  126. ### Naming conventions
  127. Flowsint follows strict naming conventions to maintain consistency across the codebase. Class names use PascalCase (like `Vehicle`, `SocialAccount`, or `CryptoWallet`). Field names use snake_case (like `license_plate`, `phone_number`, or `email_address`). This matches python's standard conventions and makes the codebase more readable.
  128. ### Understanding primary fields and labels
  129. Two concepts are crucial for every Flowsint type: the **primary field** and the **nodeLabel**. Understanding these will help you create types that work seamlessly with the graph database and UI.
  130. **Why it matters:**
  131. - When creating Neo4j nodes, this field is used as the key in `MERGE` operations
  132. - It ensures each entity is uniquely identified in the graph
  133. - The graph service extracts this field to determine node uniqueness
  134. **Rules for primary fields:**
  135. - Every type must have exactly one primary field
  136. - The primary field should uniquely identify instances
  137. - It's typically a required field (using `...` as default)
  138. - Common choices: IDs, usernames, emails, license plates, domain names
  139. **Examples of good primary fields:**
  140. - `Domain`: `domain` field (e.g., "example.com")
  141. - `Email`: `email` field (e.g., "user@example.com")
  142. - `Username`: `value` field (e.g., "john_doe")
  143. - `Ip`: `address` field (e.g., "192.168.1.1")
  144. - `SocialAccount`: `id` field (computed as "username@platform")
  145. #### The nodeLabel field and compute_label
  146. The `nodeLabel` is what users see in the UI and on graph nodes. It should be human-readable and help users quickly understand what an entity represents.
  147. **How it works:**
  148. 1. `FlowsintType` provides a `nodeLabel` field (`Optional[str]`)
  149. 2. Your type defines a `compute_label` method to set this field
  150. 3. The method runs automatically after validation using `@model_validator(mode='after')`
  151. **Basic pattern:**
  152. ```python
  153. from pydantic import model_validator
  154. from typing import Self
  155. @model_validator(mode='after')
  156. def compute_label(self) -> Self:
  157. """Compute a human-readable label."""
  158. self.nodeLabel = f"@{self.value}"
  159. return self
  160. ```
  161. **Advanced patterns:**
  162. When you have optional fields, handle `None` values gracefully:
  163. ```python
  164. @model_validator(mode='after')
  165. def compute_label(self) -> Self:
  166. """Compute label with optional display name."""
  167. if self.display_name:
  168. self.nodeLabel = f"{self.display_name} (@{self.username.value})"
  169. else:
  170. self.nodeLabel = f"@{self.username.value}"
  171. return self
  172. ```
  173. For types with multiple identifiers, you might compute a composite ID:
  174. ```python
  175. @model_validator(mode='after')
  176. def compute_label_and_id(self) -> Self:
  177. """Compute both ID and label."""
  178. # Compute unique ID from username and platform
  179. if self.username and self.platform:
  180. self.id = f"{self.username.value}@{self.platform}"
  181. elif self.username:
  182. self.id = self.username.value
  183. # Compute display label
  184. if self.display_name:
  185. self.nodeLabel = f"{self.display_name} (@{self.username.value})"
  186. else:
  187. self.nodeLabel = f"@{self.username.value}"
  188. return self
  189. ```
  190. **Best practices for labels:**
  191. - Keep labels concise but informative
  192. - Include the most identifying information first
  193. - Handle `None` values for optional fields
  194. - Use parentheses or separators to structure complex labels
  195. - Think about what users need to see at a glance on the graph
  196. **Real-world examples**
  197. ```python
  198. # Simple: just the value
  199. # Username: "@john_doe"
  200. self.nodeLabel = f"@{self.value}"
  201. # With context: show platform if available
  202. # Username: "@john_doe (twitter)"
  203. if self.platform:
  204. self.nodeLabel = f"@{self.value} ({self.platform})"
  205. else:
  206. self.nodeLabel = f"@{self.value}"
  207. # Rich: combine multiple fields
  208. # Individual: "John Doe (john@example.com)"
  209. if self.email:
  210. self.nodeLabel = f"{self.full_name} ({self.email})"
  211. else:
  212. self.nodeLabel = self.full_name
  213. # Complex: show key information
  214. # Breach: "LinkedIn (2021) - 700M records"
  215. self.nodeLabel = f"{self.title} ({self.breachdate.split('-')[0]}) - {self.pwncount:,} records"
  216. ```
  217. ### Working with different field types
  218. Pydantic supports a wide range of field types beyond simple strings and integers. Here are the most common ones you'll use:
  219. ```python
  220. from pydantic import Field, HttpUrl, model_validator
  221. from typing import Optional, List, Dict, Any, Self
  222. from datetime import datetime
  223. from .flowsint_base import FlowsintType
  224. from .registry import flowsint_type
  225. @flowsint_type
  226. class ExampleType(FlowsintType):
  227. """Demonstrates various field types."""
  228. # Primary identifier
  229. id: str = Field(
  230. ...,
  231. description="Unique identifier",
  232. title="ID",
  233. json_schema_extra={"primary": True}
  234. )
  235. # Primitive types
  236. text_field: str = Field(..., description="A text string", title="Text")
  237. number_field: int = Field(..., description="An integer number", title="Number")
  238. decimal_field: float = Field(..., description="A decimal number", title="Decimal")
  239. boolean_field: bool = Field(..., description="True or false value", title="Boolean")
  240. # Optional fields
  241. optional_text: Optional[str] = Field(None, description="Optional text", title="Optional Text")
  242. # Collections - note the use of default_factory
  243. tags: List[str] = Field(
  244. default_factory=list,
  245. description="List of tag strings",
  246. title="Tags"
  247. )
  248. metadata: Dict[str, Any] = Field(
  249. default_factory=dict,
  250. description="Arbitrary metadata dictionary",
  251. title="Metadata"
  252. )
  253. # Special Pydantic types
  254. website: HttpUrl = Field(..., description="A validated URL", title="Website")
  255. timestamp: datetime = Field(..., description="Date and time", title="Timestamp")
  256. @model_validator(mode='after')
  257. def compute_label(self) -> Self:
  258. """Compute label for this example."""
  259. self.nodeLabel = f"{self.id} - {self.text_field}"
  260. return self
  261. ```
  262. When working with mutable types like lists and dictionaries, always use `default_factory` instead of providing a default value directly. Using `default_factory=list` is correct, while using `default=[]` would cause all instances to share the same list object, leading to subtle bugs.
  263. ### Adding validation
  264. Sometimes you need more sophisticated validation than just type checking. Pydantic lets you add custom validators using the `field_validator` decorator:
  265. ```python
  266. from pydantic import Field, field_validator
  267. from typing import Optional, Any, Self
  268. import ipaddress
  269. from .flowsint_base import FlowsintType
  270. from .registry import flowsint_type
  271. @flowsint_type
  272. class Ip(FlowsintType):
  273. """Represents an IP address with geolocation and ISP information."""
  274. address: str = Field(
  275. ...,
  276. description="IP address",
  277. title="IP Address",
  278. json_schema_extra={"primary": True},
  279. )
  280. ...
  281. @field_validator("address")
  282. @classmethod
  283. def validate_ip_address(cls, v: str) -> str:
  284. """Validate that the address is a valid IP address."""
  285. try:
  286. ipaddress.ip_address(v)
  287. return v
  288. except ValueError:
  289. raise ValueError(f"Invalid IP address: {v}")
  290. ```
  291. Validators receive the field value and can either return a (potentially modified) value or raise a `ValueError` with an error message. Note that `@field_validator` runs before `@model_validator`, so the field is validated and normalized before the label is computed.
  292. ### Referencing other types
  293. Types often need to reference other Flowsint types. You can import and use them just like any other python type:
  294. ```python
  295. from pydantic import Field, model_validator
  296. from typing import Optional, Self
  297. from .flowsint_base import FlowsintType
  298. from .registry import flowsint_type
  299. from .email import Email
  300. from .phone import Phone
  301. @flowsint_type
  302. class Contact(FlowsintType):
  303. """Represents contact information for a person."""
  304. name: str = Field(
  305. ...,
  306. description="Contact name",
  307. title="Name",
  308. json_schema_extra={"primary": True}
  309. )
  310. email: Optional[Email] = Field(None, description="Email address", title="Email")
  311. phone: Optional[Phone] = Field(None, description="Phone number", title="Phone")
  312. @model_validator(mode='after')
  313. def compute_label(self) -> Self:
  314. """Compute label for this contact."""
  315. self.nodeLabel = self.name
  316. return self
  317. ```
  318. For types with circular references or complex relationships, you may need to call `model_rebuild()` at the end of your file:
  319. ```python
  320. from pydantic import Field, model_validator
  321. from typing import Optional, Self
  322. from .flowsint_base import FlowsintType
  323. from .registry import flowsint_type
  324. @flowsint_type
  325. class CryptoWallet(FlowsintType):
  326. """Represents a cryptocurrency wallet."""
  327. address: str = Field(
  328. ...,
  329. description="Wallet address",
  330. title="Address",
  331. json_schema_extra={"primary": True}
  332. )
  333. @model_validator(mode='after')
  334. def compute_label(self) -> Self:
  335. """Compute label for this wallet."""
  336. self.nodeLabel = self.address
  337. return self
  338. @flowsint_type
  339. class CryptoWalletTransaction(FlowsintType):
  340. """Represents a transaction between wallets."""
  341. transaction_id: str = Field(
  342. ...,
  343. description="Unique transaction ID",
  344. title="Transaction ID",
  345. json_schema_extra={"primary": True}
  346. )
  347. source: CryptoWallet = Field(..., description="Source wallet", title="Source")
  348. target: Optional[CryptoWallet] = Field(None, description="Target wallet", title="Target")
  349. amount: float = Field(..., description="Transaction amount", title="Amount")
  350. @model_validator(mode='after')
  351. def compute_label(self) -> Self:
  352. """Compute label for this transaction."""
  353. self.nodeLabel = f"{self.amount} ({self.transaction_id[:8]}...)"
  354. return self
  355. # Rebuild models to resolve forward references
  356. CryptoWallet.model_rebuild()
  357. CryptoWalletTransaction.model_rebuild()
  358. ```
  359. ## Exporting your type
  360. Once you've created your type, the `@flowsint_type` decorator handles registration automatically. However, you also need to export it from the package for convenient imports.
  361. ### Updating the package exports
  362. Open `flowsint-types/src/flowsint_types/__init__.py` and add two things. First, import your new type at the top of the file with the other imports:
  363. ```python
  364. from .address import Location
  365. from .affiliation import Affiliation
  366. from .alias import Alias
  367. # ... other imports ...
  368. from .vehicle import Vehicle # Add your import here
  369. ```
  370. Second, add your type name to the `__all__` list:
  371. ```python
  372. __all__ = [
  373. "Location",
  374. "Affiliation",
  375. "Alias",
  376. # ... other types ...
  377. "Vehicle", # Add your type here
  378. ]
  379. ```
  380. The `__all__` list explicitly defines what gets exported when someone does `from flowsint_types import *`. While wildcard imports aren't always recommended, this ensures your type is properly exposed by the package.
  381. Note that the `@flowsint_type` decorator already registers your type in the `TYPE_REGISTRY` automatically when the module is imported, so the explicit import in `__init__.py` ensures it gets loaded at startup alongside all other types.
  382. ### Installing the package
  383. After making these changes, you need to reinstall the package for them to take effect:
  384. ```bash
  385. make prod
  386. #or
  387. cd flowsint-types
  388. poetry install
  389. ```
  390. This updates the package in your development environment so enrichers and the API can import your new type.
  391. ## Integrating with the API
  392. The final step is making your type available through the API so frontends can discover it and create instances.
  393. ### Categorizing your type
  394. The API organizes types into logical categories that appear in the frontend. In the `TypeRegistryService._get_category_definitions()` method (located in `flowsint-core/src/flowsint_core/core/services/type_registry_service.py`), you'll find a list of category dictionaries. You need to add your type to an appropriate category or create a new one.
  395. Each category's `children` list contains tuples of `(TypeName, label_key, icon)`:
  396. - **TypeName**: The PascalCase class name of your type (e.g., `"Vehicle"`)
  397. - **label_key**: The field name used as the display key (e.g., `"license_plate"`)
  398. - **icon**: Optional icon override, or `None` to use the lowercase type name as icon
  399. You can either add to an existing category or create a new one.
  400. ```python
  401. def _get_category_definitions(self) -> List[Dict[str, Any]]:
  402. """Get the category definitions for types."""
  403. return [
  404. {
  405. "id": uuid4(),
  406. "type": "global",
  407. "key": "global_category",
  408. "icon": "phrase",
  409. "label": "Global",
  410. "fields": [],
  411. "children": [
  412. ("Phrase", "text", None),
  413. ("Location", "address", None),
  414. ],
  415. },
  416. {
  417. "id": uuid4(),
  418. "type": "person",
  419. "key": "person_category",
  420. "icon": "individual",
  421. "label": "Identities & Entities",
  422. "fields": [],
  423. "children": [
  424. ("Individual", "full_name", None),
  425. ("Username", "value", "username"),
  426. ("Organization", "name", None),
  427. ],
  428. },
  429. ...
  430. ```
  431. ### Available categories
  432. Flowsint currently organizes types into these standard categories:
  433. - **Global** contains general-purpose types like Location and Phrase that don't fit neatly into other categories.
  434. - **Identities & Entities** includes Individual, Username, and Organization for representing people and groups.
  435. - **Organization** contains Organization for dedicated organizational lookups.
  436. - **Communication & Contact** covers Phone, Email, Username, SocialAccount, and Message for communication-related data.
  437. - **Network** encompasses all network-related types including ASN, CIDR, Domain, Website, Ip, Port, DNSRecord, SSLCertificate, and WebTracker.
  438. - **Security & Access** groups security-relevant types like Credential, Session, Device, Malware, and Weapon.
  439. - **Files & Documents** contains Document and File for representing digital files.
  440. - **Financial Data** includes BankAccount and CreditCard for financial information.
  441. - **Leaks** covers data breach information with the Leak type.
  442. - **Crypto** contains cryptocurrency-related types including CryptoWallet, CryptoWalletTransaction, and CryptoNFT.
  443. You can add your type to any of these categories or create a new category if none fit.
  444. <Alert variant="info">
  445. <AlertTitle>Registered but uncategorized types</AlertTitle>
  446. <AlertDescription>
  447. Some types are registered (via `@flowsint_type`) and used as enricher inputs or outputs, but are intentionally not placed in any built-in category: `Affiliation`, `Alias`, `Breach`, `Gravatar`, `ReputationScore`, `RiskProfile`, `Script`, and `Whois`. They show up in the graph as nodes produced by enrichers (e.g. `Whois` is produced by `domain_to_whois`) but they don't appear in the type picker until you add them to `_get_category_definitions()`.
  448. </AlertDescription>
  449. </Alert>
  450. ## Complete examples
  451. Let' see some complete, real-world examples to illustrate different patterns.
  452. ### Simple type example
  453. The simplest types have just one or two required fields and minimal complexity:
  454. ```python
  455. from pydantic import Field, model_validator
  456. from typing import Self
  457. from .flowsint_base import FlowsintType
  458. from .registry import flowsint_type
  459. @flowsint_type
  460. class Hashtag(FlowsintType):
  461. """Represents a social media hashtag."""
  462. tag: str = Field(
  463. ...,
  464. description="Hashtag text without the # symbol",
  465. title="Hashtag",
  466. json_schema_extra={"primary": True}
  467. )
  468. @model_validator(mode='after')
  469. def compute_label(self) -> Self:
  470. """Compute label for this hashtag."""
  471. self.nodeLabel = f"#{self.tag}"
  472. return self
  473. ```
  474. ### Type with validation
  475. This example shows a Social Security Number type with format validation:
  476. ```python
  477. from pydantic import Field, field_validator, model_validator
  478. from typing import Self
  479. from .flowsint_base import FlowsintType
  480. from .registry import flowsint_type
  481. import re
  482. @flowsint_type
  483. class SocialSecurityNumber(FlowsintType):
  484. """Represents a US Social Security Number."""
  485. ssn: str = Field(
  486. ...,
  487. description="Social Security Number in format XXX-XX-XXXX",
  488. title="SSN",
  489. json_schema_extra={"primary": True}
  490. )
  491. @field_validator('ssn')
  492. @classmethod
  493. def validate_ssn_format(cls, v: str) -> str:
  494. """Validate SSN format and normalize to standard format."""
  495. clean = v.replace("-", "").replace(" ", "")
  496. if not re.match(r"^\d{9}$", clean):
  497. raise ValueError(
  498. "SSN must be exactly 9 digits (format: XXX-XX-XXXX or XXXXXXXXX)"
  499. )
  500. return f"{clean[:3]}-{clean[3:5]}-{clean[5:]}"
  501. @model_validator(mode='after')
  502. def compute_label(self) -> Self:
  503. """Compute label for this SSN."""
  504. # Mask most digits for privacy
  505. self.nodeLabel = f"SSN ***-**-{self.ssn[-4:]}"
  506. return self
  507. ```
  508. ### Type with related types
  509. This example shows how types can reference other types to build rich data models:
  510. ```python
  511. from pydantic import Field, model_validator
  512. from typing import Optional, Self
  513. from .flowsint_base import FlowsintType
  514. from .registry import flowsint_type
  515. from .email import Email
  516. from .domain import Domain
  517. @flowsint_type
  518. class Whois(FlowsintType):
  519. """Represents WHOIS domain registration information."""
  520. domain: Domain = Field(
  521. ...,
  522. description="Domain",
  523. title="Domain",
  524. )
  525. registrar: Optional[str] = Field(
  526. None,
  527. description="Name of the domain registrar",
  528. title="Registrar"
  529. )
  530. email: Optional[Email] = Field(
  531. None,
  532. description="Contact email address from WHOIS record",
  533. title="Contact Email"
  534. )
  535. creation_date: Optional[str] = Field(
  536. None,
  537. description="Date when the domain was first registered",
  538. title="Creation Date"
  539. )
  540. expiration_date: Optional[str] = Field(
  541. None,
  542. description="Date when the domain registration expires",
  543. title="Expiration Date"
  544. )
  545. @model_validator(mode='after')
  546. def compute_label(self) -> Self:
  547. """Compute label for this WHOIS record."""
  548. if self.registrar:
  549. self.nodeLabel = f"{self.domain.domain} (via {self.registrar})"
  550. else:
  551. self.nodeLabel = f"WHOIS: {self.domain.domain}"
  552. return self
  553. ```
  554. ### Complex type with collections
  555. This example demonstrates a type with lists of other types and rich metadata:
  556. ```python
  557. from pydantic import Field, model_validator
  558. from typing import Optional, List, Dict, Any, Self
  559. from .flowsint_base import FlowsintType
  560. from .registry import flowsint_type
  561. from .individual import Individual
  562. from .address import Location
  563. @flowsint_type
  564. class Organization(FlowsintType):
  565. """Represents an organization with comprehensive business information."""
  566. name: str = Field(
  567. ...,
  568. description="Legal name of the organization",
  569. title="Organization Name",
  570. json_schema_extra={"primary": True}
  571. )
  572. registration_number: Optional[str] = Field(
  573. None,
  574. description="Official business registration number",
  575. title="Registration Number"
  576. )
  577. headquarters: Optional[Location] = Field(
  578. None,
  579. description="Primary headquarters location",
  580. title="Headquarters"
  581. )
  582. executives: List[Individual] = Field(
  583. default_factory=list,
  584. description="List of company executives and board members",
  585. title="Executives"
  586. )
  587. locations: List[Location] = Field(
  588. default_factory=list,
  589. description="All office and facility locations",
  590. title="Locations"
  591. )
  592. employee_count: Optional[int] = Field(
  593. None,
  594. description="Total number of employees",
  595. title="Employee Count"
  596. )
  597. revenue: Optional[float] = Field(
  598. None,
  599. description="Annual revenue in USD",
  600. title="Revenue"
  601. )
  602. industry: Optional[str] = Field(
  603. None,
  604. description="Primary industry sector",
  605. title="Industry"
  606. )
  607. metadata: Dict[str, Any] = Field(
  608. default_factory=dict,
  609. description="Additional metadata and custom fields",
  610. title="Metadata"
  611. )
  612. @model_validator(mode='after')
  613. def compute_label(self) -> Self:
  614. """Compute label for this organization."""
  615. if self.industry:
  616. self.nodeLabel = f"{self.name} ({self.industry})"
  617. else:
  618. self.nodeLabel = self.name
  619. return self
  620. ```
  621. ## Best practices and common patterns
  622. ### Documentation
  623. Keep documentation at the forefront. Every type should have:
  624. - A clear docstring explaining what it represents
  625. - A descriptive `description` parameter for each field (for API docs)
  626. - A meaningful `title` parameter for each field (for UI labels)
  627. Future developers (including yourself) will thank you for this clarity.
  628. ### Required vs optional fields
  629. Think carefully about what should be required versus optional:
  630. - **Required fields** (using `...`): Only fields that uniquely identify an entity or are absolutely essential
  631. - **Optional fields** (using `Optional[Type]` and `None`): Most other fields should be optional since intelligence gathering is incremental and you rarely have complete information upfront
  632. ### Always inherit from FlowsintType and use the decorator
  633. Never inherit directly from Pydantic's `BaseModel`. Always use `FlowsintType` and the `@flowsint_type` decorator:
  634. ```python
  635. # Correct
  636. from .flowsint_base import FlowsintType
  637. from .registry import flowsint_type
  638. @flowsint_type
  639. class MyType(FlowsintType):
  640. ...
  641. # Wrong - missing decorator
  642. from .flowsint_base import FlowsintType
  643. class MyType(FlowsintType): # Not registered!
  644. ...
  645. # Wrong - wrong base class
  646. from pydantic import BaseModel
  647. class MyType(BaseModel): # Missing FlowsintType features
  648. ...
  649. ```
  650. ### Always implement compute_label
  651. Every type must implement a `compute_label` method to set the `nodeLabel` displayed in the UI and graph:
  652. ```python
  653. @model_validator(mode='after')
  654. def compute_label(self) -> Self:
  655. """Compute a human-readable label."""
  656. # Handle None values gracefully
  657. if self.optional_field:
  658. self.nodeLabel = f"{self.primary_field} ({self.optional_field})"
  659. else:
  660. self.nodeLabel = self.primary_field
  661. return self
  662. ```
  663. **Best practices for labels:**
  664. - Keep them concise but informative
  665. - Handle None values for optional fields gracefully
  666. - Put the most important information first
  667. - Think about what users need to see at a glance on the graph
  668. ### Type hints and validation
  669. Use type hints everywhere. They provide:
  670. - Automatic validation
  671. - Better IDE support and autocomplete
  672. - Inline documentation
  673. - Runtime type checking via Pydantic
  674. For mutable default values like lists and dictionaries, always use `default_factory`:
  675. ```python
  676. # Correct
  677. tags: List[str] = Field(default_factory=list)
  678. metadata: Dict[str, Any] = Field(default_factory=dict)
  679. # Wrong - all instances will share the same object!
  680. tags: List[str] = Field(default=[])
  681. metadata: Dict[str, Any] = Field(default={})
  682. ```
  683. ### Importing other types
  684. When referencing other Flowsint types, use relative imports to avoid circular import issues:
  685. ```python
  686. # Correct
  687. from .email import Email
  688. from .phone import Phone
  689. # Avoid
  690. from flowsint_types import Email, Phone # Can cause circular imports
  691. ```
  692. If you encounter circular import problems, you can use forward references (strings) in type hints and call `model_rebuild()` at the end of your module.
  693. ### Custom validation
  694. Consider adding custom validators for complex validation logic that goes beyond simple type checking:
  695. ```python
  696. @field_validator('email')
  697. @classmethod
  698. def validate_email(cls, v: str) -> str:
  699. """Validate and normalize email format."""
  700. if not is_valid_email(v):
  701. raise ValueError("Invalid email format")
  702. return v.lower()
  703. ```
  704. This keeps validation logic close to the type definition and ensures data integrity throughout the system.
  705. ### Order of execution
  706. Remember the order in which Pydantic processes your type:
  707. 1. **Field validators** (`@field_validator`) run first, validating and potentially transforming individual fields
  708. 2. **Model validators** (`@model_validator`) run after, operating on the entire validated model
  709. 3. Your `compute_label` method (a model validator) runs last, after all fields are validated
  710. This means you can safely access validated field values in `compute_label`.
  711. ## Testing your type
  712. Writing tests for your types ensures they work correctly and helps catch bugs early. Create a test file in `flowsint-types/tests/` that matches your type filename.
  713. ### Basic test structure
  714. ```python
  715. # flowsint_types/tests/test_vehicle.py
  716. from flowsint_types import Vehicle
  717. import pytest
  718. def test_vehicle_creation():
  719. """Test creating a vehicle with required fields."""
  720. vehicle = Vehicle(license_plate="ABC123")
  721. assert vehicle.license_plate == "ABC123"
  722. def test_vehicle_with_optional_fields():
  723. """Test creating a vehicle with optional fields."""
  724. vehicle = Vehicle(
  725. license_plate="ABC123",
  726. brand="Toyota",
  727. model="Camry",
  728. year=2020
  729. )
  730. assert vehicle.brand == "Toyota"
  731. assert vehicle.year == 2020
  732. def test_vehicle_missing_required_field():
  733. """Test that validation fails without required fields."""
  734. with pytest.raises(ValueError):
  735. Vehicle() # Should fail - missing required field
  736. ```
  737. ### Testing label computation
  738. The label is crucial for UI display, so test it thoroughly:
  739. ```python
  740. def test_vehicle_label_basic():
  741. """Test label computation with only required fields."""
  742. vehicle = Vehicle(license_plate="ABC123")
  743. assert vehicle.nodeLabel == "ABC123"
  744. def test_vehicle_label_with_details():
  745. """Test label computation with optional fields."""
  746. vehicle = Vehicle(
  747. license_plate="ABC123",
  748. brand="Toyota",
  749. model="Camry",
  750. year=2020
  751. )
  752. assert vehicle.nodeLabel == "ABC123 (Toyota Camry 2020)"
  753. def test_vehicle_label_partial_details():
  754. """Test label computation with some optional fields."""
  755. vehicle = Vehicle(
  756. license_plate="ABC123",
  757. brand="Toyota"
  758. )
  759. # Should handle None values gracefully
  760. assert vehicle.nodeLabel == "ABC123"
  761. ```
  762. ### Testing field validators
  763. If your type has custom validators, test both valid and invalid inputs:
  764. ```python
  765. # tests/test_username.py
  766. from flowsint_types import Username
  767. import pytest
  768. def test_username_valid():
  769. """Test valid username creation."""
  770. username = Username(value="john_doe")
  771. assert username.value == "john_doe"
  772. assert username.nodeLabel == "john_doe"
  773. def test_username_validation_too_short():
  774. """Test that usernames under 3 characters are rejected."""
  775. with pytest.raises(ValueError, match="Must be 3-80 characters"):
  776. Username(value="ab")
  777. def test_username_validation_invalid_chars():
  778. """Test that invalid characters are rejected."""
  779. with pytest.raises(ValueError, match="only letters, numbers, underscores, and hyphens"):
  780. Username(value="john@doe")
  781. def test_username_validation_boundaries():
  782. """Test boundary conditions."""
  783. # Minimum length
  784. username = Username(value="abc")
  785. assert username.value == "abc"
  786. # Maximum length
  787. long_name = "a" * 80
  788. username = Username(value=long_name)
  789. assert username.value == long_name
  790. # Too long
  791. with pytest.raises(ValueError):
  792. Username(value="a" * 81)
  793. ```
  794. ### Testing types with nested objects
  795. When your type contains other Flowsint types, test the relationships:
  796. ```python
  797. # tests/test_social_account.py
  798. from flowsint_types import SocialAccount, Username
  799. import pytest
  800. def test_social_account_creation():
  801. """Test creating a social account with a username object."""
  802. username = Username(value="john_doe")
  803. account = SocialAccount(
  804. username=username,
  805. platform="twitter",
  806. profile_url="https://twitter.com/john_doe"
  807. )
  808. assert account.username.value == "john_doe"
  809. assert account.platform == "twitter"
  810. assert account.id == "john_doe@twitter"
  811. def test_social_account_label_with_display_name():
  812. """Test label computation with display name."""
  813. username = Username(value="john_doe")
  814. account = SocialAccount(
  815. username=username,
  816. platform="twitter",
  817. display_name="John Doe"
  818. )
  819. assert account.nodeLabel == "John Doe (@john_doe)"
  820. def test_social_account_label_without_display_name():
  821. """Test label computation without display name."""
  822. username = Username(value="john_doe")
  823. account = SocialAccount(
  824. username=username,
  825. platform="twitter"
  826. )
  827. assert account.nodeLabel == "@john_doe"
  828. ```
  829. ### Testing serialization
  830. Verify that your types serialize correctly to JSON:
  831. ```python
  832. def test_vehicle_serialization():
  833. """Test that vehicle serializes to JSON correctly."""
  834. vehicle = Vehicle(
  835. license_plate="ABC123",
  836. brand="Toyota",
  837. model="Camry",
  838. year=2020
  839. )
  840. # Convert to dict
  841. data = vehicle.model_dump()
  842. assert data["license_plate"] == "ABC123"
  843. assert data["brand"] == "Toyota"
  844. assert data["nodeLabel"] == "ABC123 (Toyota Camry 2020)"
  845. # Convert to JSON string
  846. json_str = vehicle.model_dump_json()
  847. assert "ABC123" in json_str
  848. def test_vehicle_deserialization():
  849. """Test creating vehicle from dictionary."""
  850. data = {
  851. "license_plate": "ABC123",
  852. "brand": "Toyota",
  853. "model": "Camry",
  854. "year": 2020
  855. }
  856. vehicle = Vehicle(**data)
  857. assert vehicle.license_plate == "ABC123"
  858. assert vehicle.nodeLabel == "ABC123 (Toyota Camry 2020)"
  859. ```
  860. ### Running the tests
  861. To run your tests:
  862. ```bash
  863. cd flowsint-types
  864. poetry run pytest tests/test_vehicle.py -v
  865. # Run all tests
  866. poetry run pytest -v
  867. # Run with coverage
  868. poetry run pytest --cov=flowsint_types tests/
  869. ```
  870. ### Best practices for testing
  871. - **Test the happy path first**: Basic creation with valid data
  872. - **Test validation**: Both valid and invalid inputs
  873. - **Test edge cases**: Empty strings, very long strings, boundary values
  874. - **Test label computation**: With and without optional fields
  875. - **Test serialization**: To/from dict and JSON
  876. - **Use descriptive test names**: The test name should describe what it tests
  877. - **Use pytest fixtures** for complex setup that's reused across tests
  878. Example with fixtures:
  879. ```python
  880. import pytest
  881. from flowsint_types import Username, SocialAccount
  882. @pytest.fixture
  883. def sample_username():
  884. """Fixture providing a sample username."""
  885. return Username(value="john_doe")
  886. @pytest.fixture
  887. def sample_account(sample_username):
  888. """Fixture providing a sample social account."""
  889. return SocialAccount(
  890. username=sample_username,
  891. platform="twitter",
  892. profile_url="https://twitter.com/john_doe"
  893. )
  894. def test_with_fixtures(sample_account):
  895. """Test using fixtures."""
  896. assert sample_account.username.value == "john_doe"
  897. assert sample_account.platform == "twitter"
  898. ```
  899. ## Troubleshooting common issues
  900. ### Import errors
  901. If you encounter import errors after creating your type, make sure you've run `poetry install` in the `flowsint-types` directory. The package needs to be reinstalled for changes to take effect:
  902. ```bash
  903. cd flowsint-types
  904. poetry install
  905. ```
  906. ### Type not appearing in the API
  907. If your type doesn't appear in the API, verify that you've:
  908. 1. Decorated it with `@flowsint_type`
  909. 2. Imported it in `flowsint_types/__init__.py`
  910. 3. Added it to the `__all__` list in `flowsint_types/__init__.py`
  911. 4. Added it to the appropriate category in `_get_category_definitions()` in `flowsint-core/src/flowsint_core/core/services/type_registry_service.py`
  912. ### Type not found in TYPE_REGISTRY
  913. If `TYPE_REGISTRY.get("MyType")` returns `None`:
  914. - Ensure the `@flowsint_type` decorator is applied to the class
  915. - Ensure the module is imported (either in `__init__.py` or via `load_all_types()`)
  916. - Check for import errors in your type file that prevent the module from loading
  917. ### Validation errors
  918. For validation errors, check that you're using:
  919. - The ellipsis (`...`) for required fields
  920. - `None` for optional fields
  921. - `Optional[Type]` in type hints for optional fields
  922. ### Nodes not appearing in the graph
  923. If your type's instances aren't appearing in Neo4j:
  924. - **Check the enricher**: Verify that enrichers using this type call `self.create_node(instance)`
  925. - **Check the created node**: Make sure the format of the created node is correct, no missing required field, etc.
  926. ### Label not appearing correctly
  927. If labels aren't displaying correctly in the UI or graph:
  928. - **Missing compute_label**: Ensure you've implemented the `@model_validator(mode='after')` method
  929. - **Wrong field name**: Make sure you set `self.nodeLabel`, not `self.label`
  930. - **Not returning Self**: The method must return `self`
  931. - **None handling**: Check that you handle None values for optional fields gracefully
  932. - **Method name**: The method must be named `compute_label` exactly
  933. ### Circular imports
  934. If you're seeing issues with circular imports:
  935. - Use relative imports (`from .email import Email`) instead of absolute imports
  936. - Use forward references (string type hints) if needed
  937. - Call `model_rebuild()` at the end of your module to resolve forward references
  938. ### Enricher errors with your type
  939. If enrichers fail when using your type:
  940. - **Validation failures**: Your field validators might be too strict; check validator error messages in logs
  941. - **Nested object issues**: When passing nested Flowsint types, pass the complete object, don't recreate it
  942. - **Primary key extraction**: The graph service needs to extract a primitive value from your primary field
  943. ## Next steps
  944. Once you've created and registered your type, you can use it in enrichers to build intelligence gathering workflows. Types serve as the input and output specifications for enrichers, and they define the structure of nodes in the Neo4j graph database.
  945. ### Key checklist for new types
  946. Before considering your type complete, verify that you've:
  947. - Decorated with `@flowsint_type`
  948. - Inherited from `FlowsintType`
  949. - Marked exactly one field as primary with `json_schema_extra={"primary": True}`
  950. - Implemented `compute_label` method that sets `self.nodeLabel` and handles None values gracefully
  951. - Provided `description` and `title` for all fields
  952. - Used `default_factory` for list and dict fields
  953. - Written tests for creation, validation, primary field, and label computation
  954. - Exported your type in `flowsint_types/__init__.py`
  955. - Added it to a category in `flowsint-core/src/flowsint_core/core/services/type_registry_service.py`
  956. - Run `poetry install` to make the type available
  957. ### Exploring further
  958. You might also want to explore:
  959. - **Creating enrichers**: Use your type as input/output in custom enrichers
  960. - **Custom types via API**: Flowsint supports runtime type creation using JSON Schema (see `flowsint-core/src/flowsint_core/core/models.py`)
  961. - **Graph format**: Learn about the [node and edge format](/docs/developers/graph-format) used in the frontend
  962. - **Type schemas**: Understand how Pydantic schemas are used for API validation
  963. ### Final thoughts
  964. Remember that types are the foundation of everything in Flowsint:
  965. - **Well-designed types** make enrichers easier to write
  966. - **Clear primary fields** ensure proper node identification in the graph
  967. - **Meaningful labels** make the UI and graph database more intuitive
  968. - **Thorough validation** ensures data integrity throughout the platform
  969. With these concepts mastered, you're ready to create powerful, robust types that will make the entire Flowsint platform more effective for intelligence gathering.