Multi-Modal Intelligent Assistant System for Human Settlements

I. Problems Addressed and Product Positioning

Human settlement design (including planning, architecture, and landscape) often encounters three main types of challenges: Firstly, general-purpose large language models have poor adaptability and struggle to understand professional standards (e.g.green space planning standards), resulting in generated solutions that often deviate from practical requirements. Secondly, data is fragmented and difficult to integrate; the necessary normative documents, case studies, and geographic information for design are scattered, making manual compilation time-consuming. Thirdly, cross-modal collaboration is challenging, with poor connection between textual requirements and design drawing generation, making it difficult to balance creative concepts and practical implementability.

To address these challenges, a prototype multi-modal intelligent assistant system for human settlements has been developed. It is not intended to replace designers but rather positioned as a “professional design assistance tool”. Based on a large language model (the optimized Chinese-LLaMA-Alpaca-2), this system integrates a human settlements knowledge base—containing 1.5k standards and 2k professional book datasets—and a knowledge graph, and supports multi-modal processing of text (e.g.tender document drafting) and images (e.g. design drawing generation). It is suitable for preliminary scheme screening in design firms, specification queries in planning institutions, and case references for landscape teams, providing foundational support for the intelligentization of human settlement design.

Figure 1 Agent Technical Route

II. Practical Value and Effectiveness

Based on testing, the system can already address core needs:

Efficiency Improvement: Tasks such as drafting a tender framework, which originally required 2 working days, can now be processed to generate an initial draft (including project background and strategic considerations) within 1 hour of inputting site-specific information (e.g. “expansion of 60-class school campus”), and the output aligns with local standards.

Solutions Balancing Innovation and Practicality: Generated design schemes (e.g.a three-dimensional activity platform for a campus) achieve a professionalism score of 8.0 out of 10.0 outperforming systems like AutoGEN, while also supporting adjustments to the creative direction.

Multi-Modal Collaboration: The system can generate preliminary design drawings (e.g.waterfront space layout) based on textual requirements and provide references by linking to the case database, reducing the cost of bridging “text and drawings”. Currently, the system is adapted for small to medium-scale projects, and its basic functions meet the needs of the design process.

Currently, the system is compatible with medium and small-scale projects, and its basic functions meet the requirements of the design process.

Figure 2 Visual Agent Experimental Results Project Image

Figure 3 Case Study Experimental Results

The current system still has room for optimization in areas such as adaptation to large-scale projects (e.g.urban district planning), supplementation of more regional standard data, and mobile terminal integration. If you are a design enterprise of landscape or architectural seeking to improve scheme generation efficiency, or a team of design software that needs to integrate professional intelligent modules, please feel free to contact us. We can provide customized services, such as supplementing design standards for specific regions (e.g.Southwest China), developing modules for tender customization or design drawing assistance, or opening interfaces to adapt to existing workflows, thereby promoting the application of academic achievements in practical design scenarios.

Multi-Modal Intelligent Assistant System for Human Settlements

I. Problems Addressed and Product Positioning

II. Practical Value and Effectiveness

Related Projects