In the rapidly changing world of travel technology, where data from hundreds of suppliers must be constantly ingested and updated, performance and reliability become mission critical. Many companies rely on Tourplan, a leading travel management system, to process supplier content through XML import capabilities. However, as the size of supplier catalogs grew – sometimes to hundreds of thousands of products – the traditional XML import process began to break down. This bottleneck led developers to implement a smarter, fragmented import pipeline to maintain system stability and performance.
TL; DR
When importing large supplier catalogs into Tourplan via XML, many companies experienced repetitive errors and timeout errors. The problem stemmed from the system’s inability to process massive XML payloads in a single transaction. The introduction of a disaggregated import pipeline, where data is broken down into smaller, manageable chunks, enabled successful processing without timeouts. This solution improved both reliability and scalability for Tourplan users who process large amounts of supplier data.
The challenge with large-scale XML import
At its core, Tourplan’s XML data import feature is designed to facilitate the automation of vendor content ingestion. Suppliers would generate XML files with huge data sets: hotels, excursions, pricing structures, seasonal rules, availability calendars and more. These files are essential for travel companies to keep inventory up to date.
However, many teams encountered a critical problem: as the size of these XML files increased, the import process began to fail due to timeouts, memory pressure on the servers, or outright crashes. The core problem came down to a few key limitations:
- Single threaded import: Tourplan’s XML import processed the entire file at once, consuming enormous amounts of resources.
- No progress registration: Once the import process started, there was no way to pause it, resume it, or properly recover from a failure.
- Lack of batch mechanisms: The import pipeline assumed that all data in memory could be parsed and validated.
The result? Failed imports, corrupted data sets and a backlog of supplier updates. Simply increasing server resources was not a viable long-term solution.
Symptoms that indicated a bottleneck
Several clear symptoms emerged indicating a systemic bottleneck in Tourplan’s import pipeline:
- Timeout errors: Import jobs would only take hours to be terminated by server watchdog timers.
- Partial data recording: Teams noticed that while some records were updated, others remained outdated, resulting in inconsistent pricing and availability.
- Increased support tickets: End users began noticing discrepancies in itineraries and quotes generated from outdated catalogs.
As the stakes increased and the catalog size increased year after year, a more scalable, resilient method needed to be developed.
The solution: a split import pipeline
Rethinking the import architecture led to a critical realization: whether the data could be processed in stages chunksthe risk of timeouts and performance crashes could be significantly reduced. This idea gave rise to the new, divided import pipeline.
What is chunking? In the context of importing XML data, chunking refers to breaking large files into smaller, logically separated segments that are processed one at a time. Each segment can represent a specific type of data (hotels, rates, calendars) or even segments of identical data types (for example, 5,000 hotel objects each).
The divided import pipeline followed several best practices:
- Preprocessing step: The original XML file is first parsed and split using a parser that identifies logical breakpoints, such as closing tags or predefined object groupings.
- Queue-based processing: Each resulting chunk is added to a job queue and processed asynchronously to avoid overloading system memory.
- Fail-safe checkpoints: Each part contains metadata for auditing and can be retried independently in case of failure.
- Track and log progress: Dashboards were implemented to track the success/failure of each component for complete transparency.
Technical implementation details
Let’s zoom in on how this was technically achieved.
1. Chunking mechanism
Using an XML streaming parser (such as SAX or StAX in Java, or lxml in Python), the file was read row by row instead of being loaded into memory all at once. Logical nodes (e.g.
2. Asynchronous work queue
A job queue, powered by tools like RabbitMQ or AWS SQS, managed the submission of fragmented jobs. Multiple workers can work simultaneously to process parts of different CPU cores or cluster nodes, dramatically improving performance.
3. Error handling framework
If one part failed, it was logged separately and could be reprocessed without redoing the entire import. This reduced the risk and significantly shortened recovery times.

Benefits seen during production
After the rollout of the fragmented import system, several travel organizations noted clear improvements:
- 90% fewer import failures: Imports that previously failed due to a timeout now complete without issue.
- Faster recovery: Failed chunks could be retried immediately, allowing faster error correction.
- Reduced server load: Because chunks were smaller and processed asynchronously, memory and CPU usage stabilized.
- Transparency: Import logs and dashboards provided clear insight into which data was processed and which was not.
This approach proved particularly effective during the busy travel season, when supplier updates are frequent and time-sensitive. Teams could schedule imports on a nightly or hourly basis without fear of disabling systems or generating corrupt itineraries.
Lessons learned
This experience provided several critical lessons for ETL (Extract, Transform, Load) processes in modern travel platforms:
- Scale matters: What works for thousands of records may break millions of records; systems must evolve with the data volume.
- Observability is essential: Logs, statistics and dashboards should form the basis of any automated import system.
- Design for failure: Everything should be retrievable, and no edit should ever assume “perfect execution.”
Future improvements and next steps
While the split pipeline was a game changer, the innovation didn’t stop there. Several companies are now investigating:
- Real-time supplier API integrations: Bypassing XML file dumps by synchronizing data via REST APIs.
- Data validation at the edge: Implemented pre-import validation using XSDs and JSON-Schema to reduce garbage-in scenarios.
- Autoscale infrastructure: Use Kubernetes or serverless frameworks to dynamically scale the number of import workers based on job volume.
Conclusion
Data is the lifeblood of any modern travel company. As supplier ecosystems become more complex, systems like Tourplan must evolve to handle increasingly large and frequent updates. Moving to a fragmented import pipeline not only solved the problem of XML import timeouts, but also opened the door to a more robust, efficient, and scalable data management ecosystem.
Companies that have embraced this architecture now process imports faster, with greater accuracy and uptime, turning a former pain point into a competitive advantage.
#Tourplan #XML #import #failed #large #supplier #catalogs #split #import #pipeline #prevented #timeouts #Reset


