A case-based reasoning system for interior design using a new cosine similarity retrieval algorithm

ABSTRACT During concept design stage, a case-based reasoning (CBR) system is good for searching the most suitable interior design drawing case as a preliminary concept drawing design aid to bridge the design requirements gap between the consigner and the designer. In the case base of the system, features of the pre-stored design drawing cases could be characterized by interval, nominal, ordinal and/or ratio scale data. However, the cosine similarity measure used for case reasoning scarcely deals with mixed measurement scales of interval, nominal, ordinal and ratio scale types. Therefore, the objective of this study has been twofold. The first goal was to propose a new cosine similarity retrieval algorithm for case reasoning, in which mixed measurement scales are considered and applied. The second goal was to propose a CBR system and apply the new cosine similarity retrieval algorithm to the search engine of the CBR system for interior design drawing retrieval. Finally, numerical experiments were carried out to demonstrate the capability and effectiveness of the proposed similarity matching algorithm on the case-based reasoning system.


Introduction
Case-Based Design (CBD) is the application of Case-Based Reasoning (CBR) to develop new design solutions by adapting previous solved solutions. A CBR is especially suitable for domains with a weak domain theory, i.e. when the domain is difficult to formalize and is empirical (Begum, Barua, Filla, & Ahmed, 2014). A CBR system can start working with a few reference cases in its case base and then learn day by day by gradually adding new solved cases into the case base. The domain of interior design exhibits both the illdefined and complex design generation problems, thus, application of case based design systems in this domain is of increasing necessary. In our previous study (Lin & Ke, 2015), the authors proposed a virtue reality based recommender system to retrieve an interior design drawing case from a pre-stored drawings case base. The retrieved design drawing case can be treated as a prototype drawing for developing a new design solution. A binary bit measurement scale is applied in that study and the cosine similarity measure function is used for binary string pattern matching between two sequences of 0's and 1's in the retrieval engine of the recommender system, 0 for absence of a specific design feature and 1for presence of a specific design feature. In further practical similarity measurement, the author found that features of the pre-stored design drawing cases could be characterized by interval, nominal, ordinal and/or ratio scale data. Since cosine similarity measure is the most commonly used similarity measure when dealing with numerical features, this measure fails as the Euclidean distance is not defined for features of ordinal, nominal or mixed scale types. No studies is reported in the literature on the use of cosine similarity measure function in developing a search engine for dealing with mixed measurement scales of interval, nominal, ordinal and ratio scale types at the same time. Therefore, the objective of this study has been twofold. The first goal was to propose a new cosine similarity retrieval algorithm for case reasoning, in which mixed measurement scales are considered. The second goal was to propose a CBR system and apply the new cosine similarity retrieval algorithm to the search engine of the CBR system for interior design drawing case retrieval.

Case-Based Reasoning
Case-Based Reasoning (CBR) is a problem solving system that uses past experiences as knowledge used to solve similar new problems (Aamodt & Plaza, 1994;Weenawadee & Sarun, 2013). Case-Based reasoning deals with a new problem by first retrieving a past similar case, and then reusing and adapting that case for solving the new problem. For A CBR System, the four main phases of the CBR 4 R cycle are iteratively performed: retrieval, reuse, revision and retention. The knowledge of previous experience is represented as a case that consists of problem descriptions and solutions. Several cases are collected into the case base. Domain experts store previous experiences as cases in case base. Typically, each case contains a description of the problem, plus a solution and/or the outcome. A new case or unsolved case is the description of a new problem to be solved. When a new problem comes, it becomes an input to the system. In the retrieval phase, cases that are to be retrieved from the case database will be matched with the new case based on several features. The most similar case to the target problem from the case database is retrieved. In the reuse phase, a solution for the new case from the available solutions of the cases that were retrieved from the case database is suggested and reused to solve the new problem. In the revise phase, if there is a very similar past case whose solution needs little adjustment to the query, revise or adapt the proposed solution for the new problem if necessary. If the new case is judged to be possibly revised /adapted for future problem solving in the next retrieval cycle, then it can be retained in the retain phase. In the retain phase, the revised /adapted new problem and its solution/ experience is kept as a new case embodied in the case base to use in future problem solving.

Cosine similarity measure for fuzzy sets
A commonly used similarity measure for the retrieval engine is cosine similarity (Larsen & Aone, 1999;Nahm, Bilenko, & Mooney, 2002). To deal with vague and uncertain human knowledge, the fuzzy set based information retrieval methods are increasingly recognized as more realistic than the classical set based information retrieval methods (Baeza-Yates & Ribeiro-Neto, 2011;Cross, 1994;Donald & Erin Colvin, 2017;Zadrożny & Nowacka, 2009).
A cosine similarity measure for fuzzy sets (Salton & McGill, 1983) is defined as the inner product of two vectors divided by the product of their lengths. Let A = (μ A (x 1 ), μ A (x 2 ), … , μ A (x j ), … , μ A (x n )) and B = (μ B (x 1 ), μ B (x 2 ), … , μ B (x j ), … , μ B (x n )) be two fuzzy sets in the universe of discourse X = {x 1 , x 2 , … , x n }, x i ∈X. The cosine similarity measure (angular coefficient) between A and B can be defined by the following definition: Definition 2.1: Cosine Similarity Measure for Fuzzy Sets (Salton & McGill, 1983;Ye, 2011): For improving the performance efficiency of the retrieval engine, based on the cosine similarity measure, the author proposed a new cosine similarity matching algorithm, which will be described in the later section.

Interior design
Design is a problem solving process. Interior design plays a major role in the quality of building design. The layout and space division, type and colour of coated surfaces (floors, ceilings, and walls), lighting, furniture and other important elements are influential factors that determine the beauty and work efficiency of interior spaces (Azad, 2015). An interior design prototype drawing is better to be created earlier during concept design stage; thereafter, any adjustment, correction, or addition to the design prototype drawing can further be developed and implemented, bridging the design requirements gap between the consigner and the designer. 'Seeing design as redesign' has long been a popular cliché in interior design industry. There is no interior design that starts from the very original. Thus, interior design can be seen as an improvement process and can follow a continuous redesign of the base design. Normally, interior designers draw upon their past experiences and expertise to solve new design problems. They always work on design solutions that are already at hand and which had existed as a case. Interior design cases are reusable solutions that are used as an effective tool for solving the recurring interior design problems. Interior design cases provide efficient design by not having to spend much time to find the solutions to problems that have already been solved.

New fuzzy similarity retrieval algorithm
The most important part of a CBR system is the retrieval engine. Thus, improving the performance efficiency of the retrieval engine is critical for the CBR system (Leake, 1996). A commonly used similarity measure for the retrieval engine is cosine similarity (Larsen & Aone, 1999;Nahm et al., 2002). For improving the performance efficiency of the retrieval engine, the author proposes a new cosine similarity retrieval algorithm, which is described as follows.
Step 1: Translating Various Measurement Scales into Fuzzy Measurement Scales From a practical perspective, four different formats of measurement scales are considered for interior design drawing case retrieval in a CBR system. Thus, we need to translate the four traditional measurement scales into fuzzy measurement scales, including nominal, ordinal, interval and ratio measurement scales. The pre-stored cases and query case which are rated by various numerical or categorical scales will then be translated into their corresponding fuzzy linguistic terms. Then, the translated fuzzy linguistic terms will be transformed into their corresponding membership functions. By referring to several types of triangular fuzzy numbers in linguistic terms (Chen & Ku, 2008), the author parameterizes these linguistic terms with Triangular Fuzzy Numbers.
Step 2: Computing Local Similarity Measure The local similarity is the similarity measure on each feature of the new query vector Q and the stored case vector S i . Let sim(r qj , r ij ) denote the local similarities (feature similarities) between the stored case vector S i and the new query vector Q with regard to feature F j . r qj and r ij is the rating of F j in the query case Q and the historical case S i, respectively. Depending on different measurement scales, the local similarity can be calculated as follows.
(1) Cosine Similarity Measure for Fuzzy Nominal Scale A nominal scale is a scale with no order in rank and an ordinal scale is a scale with an order in rank. Let sim(r qj , r ij ) denote the feature similarity measure between the stored case S i and the query case Q with regard to the feature F j . If the feature F j is a nominal variable, the local similarity between two j-th feature ratings r qj and r ij is given by: It assigns value 1 if the two features match and value 0 otherwise.
(2) Cosine Similarity Measure for Fuzzy Ordinal, Interval and Ratio Scale The ordinal, interval and ratio scale can be translated into corresponding triangular fuzzy number. Thus, a cosine similarity measure for triangular fuzzy numbers is proposed in an analogous manner to the cosine similarity measure (angular coefficient) between fuzzy sets (Salton & McGill, 1983;Ye, 2011) ) be a triangular fuzzy number in the set of real numbers R, the three parameters in A can be considered as a vector representation with the three elements. Assume that there are two triangular ) in the set of real numbers R. Based on the extension of the cosine similarity measure for fuzzy sets, a cosine similarity measure between A and B is proposed as follows: Step 3: Computing Global Similarity Measure To retrieve proper pre-stored case, it is necessary to measure the global similarities. Thus, the local similarities (feature similarities) can be aggregated into a global similarity. A global similarity measure is a normalized similarity measure resulting from the summation of the individual similarity measure multiplied by the weight factors. The global similarity measure Sim(Q, S i ) aggregates the local similarity measures sim(r qj , r ij ) into one similarity value. Let Sim(Q, S i ) denote the global similarity between the pre-stored case S i and the query case Q, then a global similarity measure can be derived by the weighted summation of the local similarity matching measures: Sim(Q, S i ) = ( n j=1 w j × sim(r qj , r ij ))/ n j=1 w j , where w j , are the local weights defined by domain experts to reflect importance of the corresponding feature, n is the number of features in each case, and sim(r qj , r ij ) is the local similarity measure for individual feature F j in query case Q and stored case S i .

Architecture of the proposed CBR system
After proposed the cosine similarity retrieval algorithm, the author further developed a virtual reality based CBR system for interior design drawing case retrieval. As shown in Figure 1, the proposed CBR system consists of a case base, a virtual reality based display module, a query module and a retrieval engine. The case base stores a group of historical interior design drawings drawn by major interior design software tools. The retrieval engine calculates the similarity of the cases in the case base to the query using the new similarity matching model and retrieves the case which is most suitable to the user' request and match his design preferences. The case retrieval phase is typically the main step of the CBR cycle and the majority of CBR systems can be described as sophisticated retrieval engines. The query module generates the preference query from the designer and transfers the designer query into a query profile. The display module Figure 1. The architecture of the proposed CBR system (modified after Aamodt & Plaza, 1994). provides a virtual reality based platform for displaying retrieved design drawings by using FancyDesigner® interior design drawing tool (http://www.fancydesigner.com.tw/). A designer can requests a solution pre-stored as a case, a similarity measure is computed based on the current problem statement and pre-stored cases with the associated solution to support retrieval.

Numerical experiments of the proposed CBR system
When a designer needs to find an interior design drawing case that matches the consigner's design requirements, the user inputs some preference ratings about the design features of the design drawing case through a questionnaire query module. The designer's query is then taken to retrieve a most suitable case and displayed to the consigner through a display module. If the consigner is not satisfied with the retrieved case, the designer can modify and adapt the case to form a new case to satisfy the current query and store the new case to the case base.

Experimental setup
To show the effectiveness of the proposed cosine similarity retrieval algorithm, the author applies the proposed algorithm to the proposed CBR system for searching interior design drawing case. The numerical experiments were carried out following the following steps: Step 1: Collecting Old Cases to the Case Base In CBR systems, cases have been regarded as an important source of knowledge. A design case is a design solution that can be applied to a design problem (query). Designers' expertise is embodied in a collection of past design cases (seed cases) pre-stored in the case base. The information used to search the case base for most matched cases is a design problem (query). Using the information of a given design query, a user can browse throughout the case base and search the most suited case for reuse.
Step 2: Extracting Features of Cases Case indexing is essential to assign features to cases for use in the retrieval process. The extracted features are applied to represent the characteristics of indexes that are used to identify case problem descriptions. Let n be the number of extracted feature set F. F = {F 1 , F 2 , … ,F n } and F j be the j-th feature (j = 1,2, … ,n). These features will be used for case based reasoning in the CBR system. In most situations, cases are represented as feature-value pairs. It is a group of related records of data set. Each record is a group of related design features of drawings. A feature is a distinctive property (or characteristic, attribute) used to distinguish two objects. Feature values are ratings of features that are provided by the users. An individual case S i is assumed to be comprised of a finite list of feature-value pairs: S i = {(F 1 , r i1 ), (F 2 , r i2 ) … (F j , r ij ) … (F n , r in )}. Let the case base, CB, be a finite set of individual cases, CB = {S 1 , S 2 … S m } and S i be the i-th pre-stored case (i = 1,2 … m), where m is the number of cases. From careful analysis, a set of ten extracted design features are used for a case base and are developed for classifying the interior design drawings. The ten features are: F = {F 1 , F 2 , … , F 10 } = {Budget, Working Duration, Design Style, Design Theme, Design Hue, Brightness, Saturation, Colour temperature, Daylighting source, Lighting type}.
Step 3: Establishing Case Profiles and Query Profile In this study, a case profile is a collection of records representing important features and descriptions of the case. There are two classes of entities in the CBR system: a pre-stored case profile consisting of pre-specified feature-value pairs and a query profile consisting of user's preferences feature-value pairs. These feature-value pair entries are represented as a utility matrix. With profile vectors for both query profile and past cases profiles, we can estimate the degree to which a user would prefer a case by computing the similarity between the user's and a case's vectors. Both feature ratings of the stored cases and the user's query case are transformed into a set of old case-feature-rating vectors (case profiles) and user's query-feature-rating vector (query profile). A set of old case-featurerating vectors is represented as a utility matrix.
(1) Collecting case-feature rating matrix and query-feature rating vector The profiles of pre-stored cases can be represented by a case-feature-rating matrix. Each row in the matrix is the vector representation of a design case. As shown in Figure 2, casefeature-rating entries usually are described as an m×n ratings matrix R mn , where each entry r ij (1 ≦ i ≦ m, 1 ≦ j ≦ n) means the rating assigned to the case i on the feature j; the row represents m cases and the column represents n features.
The designer's preference query can be described as the following feature-value pairs: Q = {(F 1 , r q1 ), (F 2 , r q2 ), … ,(F j , r qj ), … ,(F n , r qn )}. Table 1 shows a set of extracted features, a set of collected content profiles in the case base and a query profile of a problem description. As shown in Table 1, example of values of the case-feature-rating matrix and queryfeature-rating vector can be nominal, ordinal, interval or ratio scale.
(2) Translating traditional measurement scales into fuzzy measurement scales A linguistic variable representing natural language expression whose value is not numerical value. Instead, a linguistic variable can be decomposed into a set of linguistic terms and quantified by different membership functions. Based on the available a priori information or the phenomenon intending to describe, a membership function may take many forms. Typical shapes for membership functions include triangular, trapezoid and Gaussian membership functions. The shape of each membership function used in this study is triangular due to its popularity in specifying fuzzy sets (Nahm et al., 2002). Because we have more knowledge about Budget and Working Duration than other rational features in this study, the author defined five linguistic terms. The semantics of the linguistic terms are given using fuzzy numbers. By referring to Chen and Ku (2008), the author further  translated the linguistic variables and terms into five triangular fuzzy numbers, ranging between zero and one (Tables 2 and 3 and Figure 3). For translating feature of Design Style, Design Theme and Colour Hue, the author uses binary bit string of nominal scale for measurement (Tables 4-6).
For translating feature of Brightness, Saturation, Colour Temperature and Daylighting Source, we categorized three linguistic terms of ordinal scale and the range is defined between zero and one (Table 7 and Figure 4).
For translating feature of Lighting Type, the author uses binary string of nominal measurement scale (Table 8). Lighting Type is classified as task, accent, or general lighting, depending largely on the distribution of the light produced by the fixture.
Example of calculation sheet for translating measurement scales into linguistic terms is shown in Table 9.
Step 4: Computing Similarity Measures (1) Computing local similarity measures on features between a case and a query     The local similarity calculates the similarity for each feature j of the case S i and the query Q. By applying the local similarity measure sim(r qj , r ij ), the similarity on feature j between stored case S i in the case base and the new query Q can be calculated. Taking query Q and case S 6 for examples, calculation sheet for translated fuzzy number and calculated local similarity measures and global similarity measure is shown in Table 10.       (2) Computing global similarity measure between a stored case and a query case The global similarity function measures similarity between each of stored cases S i (i = 1 … m) to the new query case Q. Suppose the weightings are equally weighted, each being equal to 1. Then the global similarity between the stored case S 6 and the query case Q is calculated as: Sim(Q, S 6 ) = ( n j=1 w j × sim(r qj , r 6j ))/ n j=1 wj = 1 * (0.84 + 0.84 + 1+ 0 + 1 + 1 + 0.89 + 0.89 + 1 + 0)/10 = 0.75.
Step 5: Retrieving the Case In the previous step, the query case and the stored cases are fed into the retrieval engine to retrieve the most similar case to meet the user's preferences. The transformed query vector and each case vector of the case-feature-rating matrix are compared through the new fuzzy similarity retrieval algorithm. In this step, according to the calculation results of the similarity retrieval algorithm, the global similarity values in descending order are derived and the case with the highest degree of similarity measure is retrieved and displayed to the user. The target design drawing which is associated with the retrieved case is then triggered and displayed to the user as a prototype drawing through a Fancy-Designer® based virtue reality platform (http://www.fancydesigner.com.tw/). Thereafter, necessary review/revision to the retrieved drawing can be made.
Step 6: Revising the Case Typically, the initial formulated case base might not be complete or accurate; thus, continuous learning is crucial to the CBR. Learning in the CBR system means revising/extending/refining previous experiences and incorporating them into the case base to make them more usable and accessible. Normally, the case retrieved from the case base will not be an exact match with a current design problem; it is necessary to adapt the retrieved case to better meet the current new problem (design requirements).
Step 7: Retaining the Case A CBR system should seek to increase its knowledge by retaining new cases. The CBR system retains a revised problem description and its solution as a new case. The revised new case is stored into the case base for future reference case. A CBR system has the ability of learning by retaining useful cases during this step. Retaining step is a necessary prerequisite for enabling a system to learn from experiences. As long as the CBR system retains new cases, it learns and grows. The revising and learning characteristics of a case-based reasoning (CBR) are useful when the new query is hard to articulate their requirement. Thus, a CBR system can begin with a few number of reference cases and allows the case database to be developed incrementally, regarded as an initial version that could be revised, retained and restored to increase the number of reference cases.

Experimental results and discussion
Precision is a vital measure used to evaluate the efficiency of the case retrieval process of a CBR system. In this study, the performance of the similarity matching algorithm is evaluated by precision, where Precision is defined as the measure of how many correct cases (N correct) hits in the total retrieved cases (N retrieved ): Precision = N correct / (N correct +N false ) = N correct / N retrieved . It represents the proportion of the relevant retrieved cases to all the retrieved cases and is a direct measurement of the success hit counts of the retrieval process. Number of retrieved cases which achieves user satisfaction is used to measure the success hit counts of search engines. The more satisfied cases are retrieved the better the precision will be. For the comparison and evaluation of the performance of our previous matching algorithm (Lin & Ke, 2015) with the proposed matching algorithm, experiments were carried out and experimental results were calculated. Table 11 showed the experimental results comparing two similarity matching algorithms in terms of precision. The experimental results showed that the similarity matching efficiency of the proposed matching algorithm improved significantly and demonstrated its ability with 73% Precision during retrieval. The results proved that the proposed similarity matching algorithm on the proposed case-based reasoning system is effective and capable of handling case based retrieval of interior design drawings.

Conclusions and future recommendations
A new fuzzy cosine similarity retrieval algorithm has been proposed for searching interior design drawing case in a case-based reasoning system, in which mixed measurement scales are considered and applied to the case-based reasoning. Numerical experiments were carried out and the retrieval performance of the proposed similarity retrieval algorithm was measured by precision measurement. Compared with previous study, the increase in precision percentage proved the efficiency of the proposed algorithm. The experimental results demonstrated that the proposed similarity matching algorithm on the case-based reasoning system is effective and is suitable for handling case based retrieval of interior design drawings.
In our ongoing future work, as the number of query cases increases, more cases will be revised and restored in the case base, so that the availability of choices on cases can be increased. Besides, more design features will be extracted from more data sources to characterize complex interior design drawing case. Thus, with increased number of stored cases and extracted features, the next step of this study will be further to enhance more accurate retrieval performance and better user satisfaction.

Notes on contributor
Kuo-Sui Lin received his MSc. degree in industrial and management from National Tsing Hua University, Taiwan, in 1987, and his PhD degree in the field of management from University of Paisley, United Kingdom, in 1999, respectively. After nearly 25 years of carrier in the practice fields of manufacturing, system integration and venture capital, he changed his career to the academic circle. Since December 2010, he has been with the college of management of Aletheia University. Now he is an assistant professor in the department of information management and his current research interests include Fuzzy Set Theory and Applications, Engineering and Managerial Economics, Intelligent Information Systems, Project Management and Multiple Criteria Decision Making (MCDM). Table 11. Experimental results comparing two similarity matching algorithms in terms of precision.

Disclosure statement
No potential conflict of interest was reported by the author.