skip to main content
article
Free access

Security-control methods for statistical databases: a comparative study

Published: 01 December 1989 Publication History

Abstract

This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.
Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.
To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).
We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.

References

[1]
ABUL-ELA, A.-L., GREENBERG, B. G., AND HORViTZ, D. G. 1967. A multi-proportions randomized response model. J. Am. Stat. Assoc. 62, 319 (Sept.), 990-1008.]]
[2]
ACHUGBUE, J. O., AND CHIN, F. Y. 1979. The effectiveness of output modification by rounding for protection of statistical databases. INFOR 17, 3 (Aug.), 209-218.]]
[3]
BECK, L. L. 1980. A security mechanism for statistical databases. ACM Trans. Database Syst. 5, 3 (Sept.), 316-338.]]
[4]
CHIN, F. Y. 1978. Security in statistical databases for queries with small counts. A CM Trans. Database Syst. 3, 1, 92-104.]]
[5]
CHIN, F. Y., KOSSOWSKI, P., AND LOH, S. C. 1984. Efficient inference control for range sum queries. TheoL. Comput. Sci. 32, 77-86.]]
[6]
CHIN, F. Y., AND C)ZSOYOC, LU, G. 1982. Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8, 6 (Apr.), 574-582.]]
[7]
CHIN, F. Y., AND (~)ZSOYOGLU, G. 1981. Statistical database design. A CM Trans. Database Syst. 6, 1 (Mar.), 113-139.]]
[8]
CHIN, F. Y., AND 0ZSOYO~,LU, G. 1979. Security in partitioned dynamic statistical databases. In Proceedings of the iEEE COMPSAC, pp. 594-601.]]
[9]
Cox, L. H. 1980. Suppression methodology and statistical disclosure control. J. Am. Star. Assoc. 75, 370 (June), 377-385.]]
[10]
DALENIUS, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3, 202-208.]]
[11]
DALENIUS, T. 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429-444.]]
[12]
DALENIUS, T. 1974. The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12, 213-225.]]
[13]
DENNING, D. E. 1985. Commutative filters for reducing inference threats in multilevel database systems. In Proceedings of the 1985 Symposium on Security and Privacy, IEEE Computer Society, pp. 134-146.]]
[14]
DENNING, D. E. 1984. Cryptographic check-sums for multilevel database security. In Proceedings of the 1984 Symposium on Security and Privacy, IEEE Computer Society, pp. 52-61.]]
[15]
DENNING, D. r. 1983. A security model for the statistical database problem. In Proceedings of the 2nd International Workshop on Management, pp. 1-16.]]
[16]
DENNING, D. E. 1982. Cryptography and Data Security. Addison~Wesley, Reading, Mass.]]
[17]
DENNING, D. E. 1981. Restricting queries that might lead to compromise. In Proceedings of IEEE Symposium on Security and Privacy (Apr.), pp. 33-40.]]
[18]
DENNING, D. E. 1980. Secure statistical databases with random sample queries. A CM Trans. Database Syst. 5, 3 (Sept.), 291-315.]]
[19]
DENNING, D. E., AND SCHLORER, J. 1983. Inference control for statistical databases. Computer 16, 7 (July), 69-82.]]
[20]
DENNING, D. E., AND SCHLORER, J. 1980. A fast procedure for finding a tracker in a statistical database. ACM Trans. Database Syst. 5, 1 (Mar.), 88-102.]]
[21]
DENNING, D. E., SCHLORER, J., AND WEHRLE, E. 1982. Memoryless inference controls for statistical databases. Computer Science Dept., Purdue Univ.]]
[22]
DENNING, D. E., DENNING, P. J., ANO SCHWARTZ, M. D. 1979. The tracker: A threat to statistical database security. A CM Trans. Database Syst. 4, I (Mar.), 76-96.]]
[23]
DOBKIN, D., JONES, A. K., AND LIPTON, R. J. 1979. Secure databases: Protection against user influence. ACM Trans. Database Syst. 4, I (Mar.), 97-106.]]
[24]
FELLEGI, I. r. 1972. On the question of statistical confidentiality. J. Am. Stat. Assoc. 67, 337 (Mar.), 7-18.]]
[25]
FELLEGI, I. P., AND PHILLIPS, J. r. 1974. Statistical confidentiality: Some theory and applications to data dissemination. Ann. Ec. Soc. MeaN. 3, 2 (Apr.), 399-409.]]
[26]
FRIEDMAN, A. D., AND HOFFMAN, L. J. 1980. Towards a fail-safe approach to secure databases. In Proceedings of IEEE Symposium on Security and Privacy (Apr.).]]
[27]
GHOSH, S. P. 1986. Statistical relational tables for statistical database management. IEEE Trans. Softw. Eng. SE-12, 12, 1106-1116.]]
[28]
GHOSU, S. P. 1985. An application of statistical databases in manufacturing testing. IEEE Trans. Softw. Eng. SE-11, 7, 591-596.]]
[29]
GHOSH, S. P. 1984. An application of statistical databases in manufacturing testing, in Proceedings of IEEE COMPDEC Conference.]]
[30]
GREENBERG, B. G., ABERNATHY, J. R., AND HORVITZ, D. G. 1969a. Application of randomized response technique in obtaining quantitative data. In Proceedings of Social Statistics Section, America, Statistical Association, (Aug.), 40-43.]]
[31]
GREENBERG, B. G., ABUL-ELA, A.-L., SIMMONS, W. R., AND HORVITZ, U. G. 1969b. The unrelated question randomized response model: Theoretical framework. J. Am. Star. Assoc. 64, 326 (June), 520-539.]]
[32]
HAQ, M. I. UL. 1977. On safeguarding statistical disclosure by giving approximate answers to queries. In Proceedings of International Computer Symposium (North-Holland), pp. 491-495.]]
[33]
HAQ, M. I. UL. 1975. Insuring individual's privacy from statistical database users. In Proceedings of National Computer Conference (Montvale, N.J.), vol. 44. AFIPS Press, Arlington, Va., pp. 941-946.]]
[34]
HOFFMAN, L. J. 1977. Modern Methods for Computer Security and Privacy. Prentice-Hall, Englewood Cliffs, N.J.]]
[35]
HOFFMAN, L. J., AND MILLER, W. F. 1970. Getting a personal dossier from a statistical data bank. Datarnation 16, 5 (May), 74-75.]]
[36]
JONGE, W. DE 1983. Compromising statistical databases: Responding to queries about means. ACM Trans. Database Syst. 8, i (Mar.), 60-80.]]
[37]
KAM, J. B., AND ULLMAN, J. D. 1977. A model of statistical databases and their security. A CM Trans. Database Syst. 2, 1, 1-10.]]
[38]
LEFONS, D., SILVESTRI, A., AND TANGORRA, F. 1983. An analytic approach to statistical databases. In Proceedings of 9th Conference on Very Large Databases (Florence, Italy), pp. 260-273.]]
[39]
LEISS, E. 1982. Randomizing a practical method for protecting statistical databases against compromise. In Proceedings of 8th Conference on Very Large Databases, pp. 189-196.]]
[40]
LIEW, C. K., CHOI, W. J., AND LIEW, C. J. 1985. A data distortion by probability distribution. A CM Trans. Database Syst. 10, 3, 395-411.]]
[41]
MATLOFr, N. E. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy, pp. 173-180.]]
[42]
MCLEISH, M. 1983. An information theoretic approach to statistical databases and their security: A preliminary report. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 355-359.]]
[43]
MILLER, A. R. 1971. The Assault on Privacy-Com~ puters, Data Banks and Dossiers. University of Michigan Press, Ann Arbor, Mich.]]
[44]
MORCENSTERN, M. 1987. Security and Inference in Multi-level Database and Knowledge-Bare Systems. In Proceedings of A CM Special Interest Group on Management of Data, pp. 357-373.]]
[45]
0ZSOYOGLU, G., AND CHIN, F. Y. 1982. Enhancing the security of statistical databases with a ques* tion-answering system and a kernel design. IEEE Trans. Softw. Eng. SE-8, 3, 223-234.]]
[46]
C)ZSOYO~LU, G., AND CHUNG, J. 1986. Information loss in the lattice model of summary tables due to cell suppression. In Proceedings of IEEE Symposium on Security and Privacy, pp. 75-83.]]
[47]
0ZSOYOSLU, G., AND ()ZSOYOS, LU, M. 1981. Update handling techniques in statistical databases. In Proceedings of the 1st LBL Workshop on Statistical Database Management (Berkeley, Calif., Dec.), pp. 249-284.]]
[48]
0ZSOYOGLU, G., AND Su, T. A. 1985. Rounding and inference control in conceptual models for statistical databases. In Proceedings of IEEE Symposium on Security and Privacy, pp. 160-173.]]
[49]
PALLEY, M. A. 1986. Security of statistical databases compromise through attribute correlational modeling. In Proceedings of IEEE Conference on Data Engineering, pp. 67-74.]]
[50]
PALLEY, M. A., AND SIMONOFF, J. S. 1987. The use of regression methodology for compromise of confidential information in statistical databases. ACM Trans. Database Syst. 12, 4 (Dec.), 593-608.]]
[51]
REISS, J. P. 1980. Practical data-swapping: The first steps. In Proceedings of IEEE Symposium on Security and Privacy, pp. 36-44.]]
[52]
REISS, S. P. 1984. Practical data swapping: The first steps. ACM Trans. Database Syst. 9, I (Mar.), 20-37.]]
[53]
ROWE, N. 1984. Diophantine inference from statistical aggregates on few-valued attributes. In Proceedings of IEEE Conference on Data Engineering, pp. 107-110.]]
[54]
SANDE, G. 1983. Automated cell suppression to reserve confidentiality of business statistics. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 346-353.]]
[55]
SCHLORER, J. 1983. Information loss in partitioned statistical databases. Comput. J. 26, 3, 218-223.]]
[56]
SCHLORER, J. 1981. Security of statistical databases: multidimensional transformation. A CM Trans. Database Syst. 6, i (Mar.), 95-112.]]
[57]
SCHLORER, J. 1980. Disclosure from statistical databases: Quantitative aspects of trackers. A CM Trans. Database Syst. 5, 4 (Dec.), 467-492.]]
[58]
SCHLORER, J. 1976. Confidentiality of statistical records: A threat monitoring scheme of on-line dialogue. Methods Inform. Med. 15, 1, 36-42.]]
[59]
SCHLORER, J. 1975. Identification and retrieval of personal records from a statistical data bank. Methods Info. Med. 14, i, 7-13.]]
[60]
SCHWARTZ, M. D., DENNING, D. E., AND DENNING, P. J. 1979. Linear queries in statistical databases. ACM Trans. Database Syst. 4, 2, 156-167.]]
[61]
Su, T., AND 0ZSOYOS, LU, G. 1987. Data dependencies and inference control in multilevel relational database systems. In Proceedings of the 1987 Symposium on Security and Privacy, IEEE Computer Society, pp. 202-211.]]
[62]
TENDICK, P., AND MATLOFr, N. S. 1987. Recent results on the noise addition method for database security. Presented at the Joint ASA/IMS Statis~ tical Meetings, San Francisco.]]
[63]
TRAUB, J. F., YEMINI, Y., AND WOZNIAKOWSKI, H. 1984. The statistical security of a statistical database. ACM Trans. Database Syst. 9, 4 (Dec.), 672-679.]]
[64]
TRUEBLOOD, R. P. 1984. Security issues in knowledge systems. In Proceedings of I st International Workshop on Expert Database Systems, vol. 2, pp. 834-840.]]
[65]
TURN, R., AND SHAPIRO, N. Z. 1978. Privacy and security in databank systems: Measure of effectiveness, costs, and protector-intruder interactions. Computers and Security, C. T. Dinardo, Ed. AFIPS Press, Arlington, Va., pp. 49-57.]]
[66]
WARNER, S. L. 1971. The linear randomized response model. J. Am. Star. Assoc. 66, 336 (Dec.), 884-888.]]
[67]
WARNER, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60, 309 (Mar.), 63-69.]]
[68]
Yu, C. T., ANO CmN, F. Y. 1977. A study on the protection of statistical databases. In Proceedings of A CM SIGMOD International Conference on Management of Data (Aug.), pp. 169-181.]]

Cited By

View all

Recommendations

Reviews

Mary McLeish

A statistical database (SDB) is any traditional database system in which queries are restricted to statistical aggregates (such as sample mean and count); an example is the US Census Bureau database. It is often required that the system be secure from users' attempts to infer confidential information about an individual from the aggregate query responses. Considerable work has been carried out over the last 15 years to discover sufficient conditions on the queries to keep these databases secure. This has been found to be a very difficult task. With the increasing use of large database systems and knowledge bases for expert systems, the issue has become even more relevant in recent years. This careful survey of work on statistical database security examines the different approaches that have been used—conceptual, query restriction, data perturbation, and output perturbation. The authors discuss the main conceptual models due to Chin, O¨zsoyog?lu, Denning, and Schlorer. Different ways to restrict queries are surveyed and compared. The query set size may be restricted, and may overlap between successive queries. The methodology can depend on the data storage technique. The query-set-size control releases a statistic only if certain restrictions are made to the size of the query set. The paper mentions work in this area by Denning, Schlorer, Schwartz, Hoffman, Miller, Jonge, and Traub and notes the problems caused by the application of fast trackers. The work by Dobkin and others on query-set overlap restrictions shows the difficulties that arise in making this solution practical. Auditing involves keeping track of all queries made by a user over time and checking for possible compromise whenever a new query is issued. This paper discusses methods proposed by Chin, O¨zsoyog?lu, and McLeish. Data partitioning clusters individual entities of the population into a number of mutually exclusive subsets. Security control methods that use this technique, which are due to Yu, Chin, Schlorer, and O¨zsoyog?lu, are mentioned. (McLeish's more general results on dynamic partitioned models are not mentioned.) Sande and Cox's work on the use of cell suppression methods for the Canadian census indicates that this method is computationally complex. Data and output perturbation methods provide a very different solution to the security problem—none of the reported answers will be exact. The bias problem (Matloff), data swapping (Reiss), probability-distribution methods (Liew et al.), an analytical method (Lefons et al.), and fixed-data perturbation methods (Traub and Warner) are all reviewed. Output perturbation methods involve using random-sample queries (Denning), and a method due to Beck introduces a varying perturbation to the data. The paper presents a variety of rounding techniques: systematic rounding, random rounding, and controlled rounding. The effectiveness of all these methods depends, of course, on the strictness of the security required. Sometimes an exact compromise (determining the exact value of a protected attribute) is the only situation in which disclosure is of concern; at other times,<__?__Pub Caret> revealing a close estimate of the value (statistical disclosure) is dangerous. The authors discuss the approaches to security control mentioned in the previous paragraph in light of the different measures of disclosure. A comparative analysis of the methods contrasts random-sample queries with Beck's varying-output-perturbation and some fixed-data-perturbation methods. Precision and security criteria for COUNT and SUM queries are presented along with consistency, cost, and robustness criteria. The paper brings some newer threats to the attention of SDB researchers. Logical inference is a new problem arising from logic programming environments. Earlier work concentrated on sum and count queries, but regression analyses present a whole new set of confidentiality problems. The last section makes the point that no single method is adequate for security control. The authors make suggestions for future work to try to overcome the major practical problems of many of the current methods. The paper presents and synthesizes a great deal of published material in this surprisingly large research area. Particular methods are discussed in some detail. The bibliography is not as extensive as in some survey papers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 21, Issue 4
Dec. 1989
107 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/76894
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 1989
Published in CSUR Volume 21, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)671
  • Downloads (Last 6 weeks)107
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media