Generate
Back to Blog
Changelog graphic showing new profile fields and expanded country support

The Another.IO profile generator received a significant update. Three areas changed: the internal correlation engine that links profile fields to each other, the list of supported countries, and the validation rules applied before a profile gets returned. This article explains what changed in each area, why it changed, and what the differences mean for people actually using the tool.

What "Deeper Correlations" Actually Means

Earlier versions treated most fields as independent. A name was generated. An address was generated. A phone number was generated. Each field was individually valid. But the connections between them were shallow. A German profile might have a perfectly formatted phone number and a valid postal code, except the postal code belonged to Munich while the street address was in Hamburg. The area code didn't match the city at all.

Nobody noticed until someone tried using the profiles for fraud detection testing and the inconsistencies triggered every rule in the system. That's the kind of feedback that forces an architectural change.

The updated engine generates fields in dependency chains rather than isolation. Country selection happens first, then narrows to a specific region, then generates everything else in a way that's internally consistent with that region. The practical effect is visible immediately.

Phone numbers now carry area codes that match the generated address. A profile placed in Lyon gets a phone number with the 04 prefix (Auvergne-Rhone-Alpes), not 01 (Paris region). Names are drawn from frequency tables specific to the country and, where data permits, the region within that country. A Catalan profile is more likely to produce a Catalan name than a Castilian one. A Quebec profile generates French-Canadian names, not anglophone ones.

Postal codes map to cities. Cities map to states or provinces. No more mismatches between a valid postal code and an unrelated city name. Credit card BIN prefixes and IBAN structures match the profile's country and, where possible, a plausible issuing bank for the region. Job titles are weighted by age bracket: a 22-year-old profile won't be assigned "Senior Vice President," and a 58-year-old won't get "Junior Intern."

The result is that profiles pass consistency checks that the old generator would have failed. Fraud detection systems, identity verification APIs, and manual reviewers all look for internal contradictions as a first-pass filter. Eliminating those contradictions means synthetic profiles behave more like real data in testing and research contexts, which is the entire point.

Why This Matters Per Use Case

The correlation improvements aren't cosmetic. They directly affect whether the generated profile works for its intended purpose, and different users feel the impact differently.

For QA and software testing, applications that validate address-postal code combinations now accept generated profiles instead of rejecting them at the form validation step. Tests that previously failed because the test data was internally contradictory now pass, which means QA teams can focus on actual application bugs rather than debugging their test data. Geo-targeted features (local store locations, shipping cost calculations, regional tax rates) can finally be tested with profiles that trigger the correct geographic logic.

For database seeding, staging environments populated with correlated profiles are more representative of what production actually looks like. If the production database has users clustered in specific metro areas with matching phone area codes and postal codes, the staging data now replicates that clustering. Queries that join on geographic fields return realistic result sets rather than random noise, which matters a lot when you're trying to reproduce a production bug in staging.

For security research, a social engineering pretext with a phone number, address, and name all pointing to the same city is harder to challenge than one where the fields scatter across three different regions. Investigation targets who verify caller ID against claimed location won't find a contradiction.

For privacy protection, a synthetic sign-up profile with internally consistent data is less likely to trigger fraud flags on the service you're signing up for. Services that cross-check address and phone number as an anti-fraud measure accept the profile instead of flagging it for manual review, which would defeat the purpose entirely.

From 12 Countries to 47

The country list grew substantially. The full list is in the API documentation, but the additions worth noting fall into geographic clusters.

Southeast Asia was among the most requested: Thailand, Vietnam, Indonesia, Philippines, Malaysia, and Singapore. QA teams building products for ASEAN markets had been generating US profiles and mentally translating, which is about as useful as it sounds. Each country required its own name frequency tables, address format rules, phone number structures, and national ID formats.

Latin America expanded to include Mexico, Brazil (fully, not just CPF validation), Argentina, Colombia, Chile, Peru, and Ecuador. Brazil's address formatting (CEP postal codes, state abbreviations, neighbourhood fields) was the most complex addition in this group. The previous version validated CPFs but couldn't produce a complete Brazilian address. That gap is closed.

Eastern Europe added Poland, Czech Republic, Romania, Hungary, Bulgaria, and Croatia. Poland's street naming conventions differ between cities in ways that are genuinely surprising if you haven't dealt with Polish address data before. The national ID systems in this group all have check-digit algorithms that the generator now implements correctly.

Middle East and North Africa brought in UAE, Saudi Arabia, Egypt, Morocco, and Turkey. Arabic-script names are generated with both the Arabic form and a Latin-script romanisation, since many applications store both.

Sub-Saharan Africa rounds out the expansion: South Africa, Nigeria, Kenya, Ghana, and Tanzania. South Africa's ID number encodes date of birth, gender, and citizenship status, and the generator now produces values that decode correctly under that scheme. Getting that wrong would have been worse than not supporting the country at all.

Each country addition required building and validating name frequency tables (weighted by actual population frequency where census data was available), address format templates, phone number rules, national ID formats, and financial instrument structures. Countries lacking public census data for name frequencies use curated lists built from public directories, academic publications, and government open-data portals. The sources are documented per country in the API reference.

How the Correlation Engine Works Internally

The previous generator used a flat pipeline: select country, then generate each field independently using country-specific rules. The updated engine uses a directed acyclic graph where each field depends on one or more upstream fields.

Country is the root node, selected by the user or at random. Region is selected next, weighted by population density. City follows, drawn from cities within the region and weighted by population. The address gets a street name from actual street name pools for that city, with a randomised house number in a realistic range. The postal code is looked up from the city rather than generated independently. The phone number derives its area code from the region. Names come from region-weighted frequency tables.

Date of birth feeds into employment (job title weighted by age bracket and regional industry prevalence), financial data (bank BIN selected from banks operating in the region), and national ID (generated using the country's algorithm, embedding the date of birth where the format requires it). Each downstream node receives its parent nodes' output as input. The postal code node receives the city. The phone node receives the region. No downstream field can contradict an upstream one because it's derived from it rather than generated in parallel.

Batch generation for a single country pre-loads reference data (street name pools, name frequency tables, postal code mappings) into memory once and reuses it across the batch. This avoids redundant lookups and keeps performance reasonable despite the added correlation logic.

Performance Numbers

The correlation engine adds computational overhead. Measured differences across common batch sizes: single profile generation went from roughly 85ms to 125ms. The extra 40ms comes primarily from the email deliverability pre-check and the cross-field consistency validation pass. Ten profiles of the same country went from 400ms to 520ms, because reference data caching reduces per-profile overhead after the first profile. A hundred same-country profiles: 2.1 seconds to 2.8 seconds. A hundred mixed-country profiles: 2.4 seconds to 3.9 seconds, since each new country in the batch triggers a reference data load.

For on-demand generation (one profile at a time in response to user actions), 40 extra milliseconds is imperceptible. For large automated batches, the throughput reduction is measurable but unlikely to bottleneck compared to network latency and downstream processing.

The mixed-country overhead is the price of correctness. The old generator could produce mixed-country batches quickly because it wasn't doing cross-referencing. The new generator loads regional reference data per country, which takes time but eliminates the inconsistencies that made those fast profiles useless for serious testing. If batch speed is a concern, grouping requests by country reduces the overhead significantly since the reference data loads once per country rather than once per profile.

Stricter Validation

The third update area tightened the rules applied to profiles before they're returned.

Previous validation checked field format correctness (phone digit count, postal code pattern), Luhn checks on credit card numbers, and basic email syntax. The updated validation adds cross-field consistency checks: the system verifies that the postal code maps to the stated city, that the phone area code matches the region, and that the national ID format matches the country. Profiles failing any check are regenerated automatically before being returned. No bad data reaches the consumer.

Age-date coherence is now enforced. The date of birth, age field, and any age-dependent fields (employment seniority, education level) are checked for mathematical consistency. A profile can't claim to be 25 with a date of birth that computes to 27. Character encoding validation catches names with non-ASCII characters (accented Latin, Cyrillic, CJK, Arabic) and tests for correct UTF-8 encoding, preventing garbled characters in downstream systems that handle encoding poorly.

Email deliverability pre-checks are new. Generated addresses are tested against the receiving mail server's MX record and SMTP handshake before inclusion. Addresses that fail the handshake are replaced. This reduces bounce rates when synthetic email is used for registration testing, which was a recurring complaint from QA teams.

Migration Notes

The API endpoints and response format haven't changed. Existing integrations consuming the JSON response will continue to work without modification. All correlation and validation changes happen server-side and are transparent to clients.

Two things may affect existing workflows. Field value distributions shifted because names and addresses are now drawn from region-specific frequency tables rather than country-wide pools. Automated tests that assert specific name patterns may need updated assertions. And generation time increased slightly, so applications generating profiles in tight loops should benchmark against the updated endpoint.

One non-obvious consequence of the stricter validation: profiles generated by the updated engine will occasionally differ from profiles the old engine would have produced for the same seed value. The deterministic seeding mechanism still works (same seed produces same output), but the output itself is different because the generation logic changed. Teams that stored reference profiles for regression testing should regenerate their baselines against the updated engine rather than comparing new output against old snapshots.

No client-side changes required. No API key rotation. No breaking changes to the response schema. The profiles are just better.