US20160071017A1 - Method of operating artificial intelligence machines to improve predictive model training and performance - Google Patents

Method of operating artificial intelligence machines to improve predictive model training and performance Download PDF

Info

Publication number
US20160071017A1
US20160071017A1 US14/941,586 US201514941586A US2016071017A1 US 20160071017 A1 US20160071017 A1 US 20160071017A1 US 201514941586 A US201514941586 A US 201514941586A US 2016071017 A1 US2016071017 A1 US 2016071017A1
Authority
US
United States
Prior art keywords
data
field
artificial intelligence
records
series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/941,586
Inventor
Akli Adjaoute
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brighterion Inc
Original Assignee
Brighterion Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/514,381 external-priority patent/US20150032589A1/en
Priority claimed from US14/521,667 external-priority patent/US20150046332A1/en
Priority claimed from US14/815,848 external-priority patent/US20150339672A1/en
Priority claimed from US14/815,934 external-priority patent/US20150339673A1/en
Application filed by Brighterion Inc filed Critical Brighterion Inc
Priority to US14/941,586 priority Critical patent/US20160071017A1/en
Publication of US20160071017A1 publication Critical patent/US20160071017A1/en
Assigned to BRIGHTERION INC reassignment BRIGHTERION INC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ADJAOUTE, AKLI
Assigned to ADJAOUTE, AKLI reassignment ADJAOUTE, AKLI ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BRIGHTERION, INC
Assigned to BRIGHTERION, INC. reassignment BRIGHTERION, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ADJAOUTE, AKLI
Priority to US16/674,980 priority patent/US10984423B2/en
Priority to US17/200,997 priority patent/US20210248612A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/048Fuzzy inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Definitions

  • Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning.
  • supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price.
  • a supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.
  • unsupervised learning data points have no labels associated with them. Instead, the goal of unsupervised learning is to identify and explore regularities and dependencies in data, e.g., the structure of the underlying data distributions.
  • the quality of a structure is measured by a cost function which is usually minimized to infer optimal parameters characterizing the hidden structure in the data.
  • Reliable and robust inference requires a guarantee that the extracted structures are typical for the data source, e.g., similar structures have to be extracted from a second sample set of the same data source.
  • Reinforcement learning maps situations to actions to maximize a scalar reward or reinforcement signal.
  • the learner does not need to be directly told which actions to take, but instead must discover which actions yield the best rewards by trial and error.
  • An action may affect not only the immediate reward, but also the next situation, and consequently all subsequent rewards.
  • Trial and error search, and delayed reward, are two important distinguishing characteristics of reinforcement learning.
  • Supervised learning algorithms use a known dataset to thereafter make predictions.
  • the dataset training includes input data that produces response values.
  • Supervised learning algorithms are used to build predictive models for new responses to new data. The larger the training datasets, the better will be the prediction models.
  • Supervised learning includes classifications in which the data must be separated into classes, and regression for continuous-response.
  • Common classification algorithms include support vector machines (SVM), neural networks, Na ⁇ ve Bayes classifier and decision trees.
  • Common regression algorithms include linear regression, nonlinear regression, generalized linear models, decision trees, and neural networks.
  • method embodiments of the present invention improve the training and performance of predictive models included in artificial intelligence machines.
  • a first method of operating an artificial intelligence machine produces predictive model language documents describing improved predictive models that generate better business decisions from raw data record inputs.
  • a second method of operating an artificial intelligence machine including processors for predictive model algorithms produces and outputs better business decisions from raw data record inputs. Both methods enrich the raw data records their processors are fed by deleting data fields with data values that have little benefit in decision making, and that derive and add new data fields from information sources then available that do benefit in the decision making of the artificial intelligence machine through improved accuracies of prediction.
  • FIG. 1 is a flowchart of a method embodiment of the present invention that provides user-service consumers with data science as-a-service operating on artificial intelligence machines;
  • FIG. 2 is a flowchart diagram of an algorithm for triple data encryption standard encryption and decryption as used in the method of FIG. 1 ;
  • FIG. 3A is a flowchart diagram of an algorithm for data cleanup as used in the method of FIG. 1 ;
  • FIG. 3B is a flowchart diagram of an algorithm for replacing a numeric value as used in the method of FIG. 3A ;
  • FIG. 3C is a flowchart diagram of an algorithm for replacing a symbolic value as used in the method of FIG. 3A ;
  • FIG. 4 is a flowchart diagram of an algorithm for building training sets, test sets, and blind sets, and further for down sampling if needed and as used in the method of FIG. 1 ;
  • FIG. 5A is a flowchart diagram of an algorithm for a first part of the data enrichment as used in the method of FIG. 1 ;
  • FIG. 6 is a flowchart diagram of a method of using the PMML Documents of FIG. 1 with an algorithm for the run-time operation of parallel predictive model technologies in artificial intelligence machines;
  • FIG. 7 is a flowchart diagram of an algorithm for the decision engine of FIG. 6 ;
  • FIG. 8 is a flowchart diagram of an algorithm for using ordered rules and thresholds to decide amongst prediction classes
  • FIG. 9 is a flowchart diagram of a method that combines the methods of FIGS. 1-8 and their algorithms to artificial intelligence machines that provide an on-line service for scoring, predictions, and decisions to user-service consumers requiring data science and artificial intelligence services without their being required to invest in and maintain specialized equipment and software;
  • FIG. 10 is a flowchart diagram illustrating an artificial intelligence machine apparatus for executing an algorithm for reconsideration of an otherwise final adverse decision, for example, in a payment authorization system a transaction request for a particular amount $X has already been preliminarily “declined” according to some other decision model;
  • FIG. 11 is a flowchart diagram of an algorithm for the operational use of smart agents in artificial intelligence machines
  • FIGS. 12-29 provide greater detail regarding the construction and functioning of algorithms that are employed in FIGS. 1-11 ;
  • FIG. 12 is a schematic diagram of a neural network architecture used in a model
  • FIG. 13 is a diagram of a single neuron in a neural network used in a model
  • FIG. 14 is a flowchart of an algorithm for training a neural network
  • FIG. 15 is an example illustrating a table of distance measures that is used in a neural network training process
  • FIG. 16 is a flowchart of an algorithm for propagating an input record through a neural network
  • FIG. 17 is a flowchart of an algorithm for updating a training process of a neural network
  • FIG. 18 is a flowchart of an algorithm for creating intervals of normal values for a field in a training table
  • FIG. 19 is a flowchart of an algorithm for determining dependencies between each field in a training table
  • FIG. 20 is a flowchart of an algorithm for verifying dependencies between fields in an input record
  • FIG. 21 is a flowchart of an algorithm for updating a smart-agent technology
  • FIG. 22 is a flowchart of an algorithm for generating a data mining technology to create a decision tree based on similar records in a training table
  • FIG. 23 is an example illustrating a decision tree for a database maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and a age of its driver;
  • FIG. 24 is a flowchart of an algorithm for generating a case-based reasoning technology to find a case in a database that best resembles a new transaction;
  • FIG. 25 is an example illustrating a table of global similarity measures used by a case-based reasoning technology
  • FIG. 26 is an example illustrating a table of local similarity measures used by a case-based reasoning technology
  • FIG. 27 is an example illustrating a rule for use with a rule-based reasoning technology
  • FIG. 28 is an example illustrating a fuzzy rule to specify if a person is tall
  • FIG. 29 is a flowchart of an algorithm for applying rule-based reasoning, fuzzy logic, and constraint programming to assess the normality/abnormality of and classify a transaction assess an activity
  • FIG. 30 is a flowchart diagram of an algorithm executed by an apparatus needed to implement a method embodiment of the present invention for improving predictive model training and performance by data enrichment of transaction records.
  • Computer-implemented method embodiments of the present invention provide an artificial intelligence and machine-learning service that is delivered on-demand to user-service consumers, their clients, and other users through network servers.
  • the methods are typically implemented with special algorithms executed by computer apparatus and delivered to non-transitory storage mediums to the providers and user-service consumers who then sell or use the service themselves.
  • Prediction Technologies Users in occasional or even regular need of artificial intelligence and machine learning Prediction Technologies can get the essential data-science services required on the Cloud from an appropriate provider, instead of installing specialized hardware and maintaining their own software. Users are thereby freed from needing to operate and manage complex software and hardware.
  • the intermediaries manage user access to their particular applications, including quality, security, availability, and performance.
  • FIG. 1 represents a predictive model learning method 100 that provides artificial intelligence and machine learning as-a-service by generating predictive models from service-consumer-supplied training data input records.
  • a computer file 102 previously hashed or encrypted by a triple-DES algorithm, or similar protection. It also possible to send a non-encrypted filed through an encrypted channel. Users of the platform would upload their data through SSL/TLS from a browser or from a command line interface (SCP or SFTP). This is then received by a network server from a service consumer needing predictive models.
  • SSL/TLS Secure Socket Transfer Protocol
  • SCP or SFTP command line interface
  • the records 102 received represent an encryption of individual supervised and/or unsupervised records each comprising a predefined plurality of predefined data fields that communicate data values, and structured and unstructured text. Such text often represents that found in webpages, blogs, automated news feeds, etc., and very often such contains errors and inconsistencies.
  • Structured text has an easily digested form and unstructured text does not.
  • Text mining can use a simple bag-of-words model, such as how many times does each word occur.
  • complex approaches that pull the context from language structures, e.g., the metadata of a post on Twitter where the unstructured data is the text of the post.
  • These records 102 are decrypted in a step 104 with an apparatus for executing a decoding algorithm, e.g., a standard triple-DES device that uses three keys.
  • a decoding algorithm e.g., a standard triple-DES device that uses three keys.
  • FIG. 2 An example is illustrated in FIG. 2 .
  • a series of results are transformed into a set of non-transitory, raw-data records 106 that are collectively stored in a machine-readable storage mechanism.
  • a step 108 cleans up and improves the integrity of the data stored in the raw-data records 106 with an apparatus for executing a data integrity analysis algorithm.
  • An example is illustrated in FIGS. 3A , 3 B, and 3 C.
  • Step 108 compares and corrects any data values in each data field according to user-service consumer preferences like min, max, average, null, and default, and a predefined data dictionary of valid data values.
  • Step 108 discerns the context of the structured and unstructured text with an apparatus for executing a contextual dictionary algorithm.
  • Step 108 transforms each result into a set of flat-data records 110 that are collectively stored in a machine-readable storage mechanism.
  • Method 108 improves the training of predictive models by converting and transforming a variety of inconsistent and incoherent supervised and unsupervised training data for predictive models received by a network server as electronic data files, and storing that in a computer data storage mechanism. It then transforms these into another single, error-free, uniformly formatted record file in computer data storage with an apparatus for executing a data integrity analysis algorithm that harmonizes a range of supervised and unsupervised training data into flat-data records in which every field of every record file is modified to be coherent and well-populated with information.
  • the data values in each data field in the inconsistent and incoherent supervised and unsupervised training data are compared and corrected according to a user-service consumer preference and a predefined data dictionary of valid data values.
  • An apparatus for executing an algorithm substitutes data values in the data fields of incoming supervised and unsupervised training data with at least one value representing a minimum, a maximum, a null, an average, and a default.
  • the context of any text included in the inconsistent and incoherent supervised and unsupervised training data is discerned, recognized, detected, and discriminated with an apparatus for executing a contextual dictionary algorithm that employs a thesaurus of alternative contexts of ambiguous words for find a common context denominator, and to then record the context determined into the computer data storage mechanism for later access by a predictive model.
  • Data cleaning herein deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
  • Data quality problems are present in single data collections, such as files and databases, or multiple data sources. For example,
  • a test is made to see if a number of records 114 in the set of flat-data records 110 exceeds a predefined threshold, e.g., about one hundred million.
  • a predefined threshold e.g., about one hundred million.
  • the particular cutoff number to use is inexact and is empirically determined by what produces the best commercial efficiencies.
  • Step 116 samples a portion of the set of flat-data records 110 .
  • An example is illustrated in FIG. 4 .
  • Step 116 stores a set of samples 118 in a machine-readable storage mechanism for use in the remaining steps.
  • Step 116 consequently employs an apparatus for executing a special sampling algorithm that limits the number of records that must be processed by the remaining steps, but at the same time preserves important training data. The details are described herein in connection with FIG. 4 .
  • a modeling data 120 is given a new, amplified texture by a step 122 for enhancing, enriching, and concentrating the sampled or unsampled data stored in the flat-data records with an apparatus for executing a data enrichment algorithm.
  • An example apparatus is illustrated in FIG. 4 , which outputs training sets 420 , 421 , and 440 ; and test sets 422 , 423 , and 442 ; and blind sets 424 , 425 , and 444 derived from either the flat data 110 or sampled data 118 .
  • Such step 122 removes data that may exist in particular data fields that is less important to building predictive models. Entire data fields themselves are removed here that are predetermined to be unavailing to building good predictive models that follow.
  • Step 122 calculates and combines any data it has into new data fields that are predetermined to be more important to building such predictive models. It converts text with an apparatus for executing a context mining algorithm, as suggested by FIG. 6 . Even more details of this are suggested in my U.S. patent application Ser. No. 14/613,383, filed Feb. 4, 2015, and titled, ARTIFICIAL INTELLIGENCE FOR CONTEXT CLASSIFIER. Step 122 then transforms a plurality of results from the execution of these algorithms into a set of enriched-data records 124 that are collectively stored in a machine-readable storage mechanism.
  • a step 126 uses the set of enriched-data records 124 to build a plurality of smart-agent predictive models for each entity represented.
  • Step 126 employs an apparatus for executing a smart-agent building algorithm. The details of this are shown in FIG. 6 . Further related information is included in my U.S. Pat. No. 7,089,592 B2, issued Aug. 8, 2006, titled, SYSTEMS AND METHODS FOR DYNAMIC DETECTION AND PREVENTION OF ELECTRONIC FRAUD, which is incorporated herein by reference. (Herein, Adjaoute '592.) Special attention should be placed on FIGS. 11-30 and the descriptions of smart-agents in connection with FIG. 21 and the smart-agent technology in Columns 16 - 18 .
  • Each field or attribute in a data record is represented by a corresponding smart-agent.
  • Each smart-agent representing a field will build what-is-normal (normality) and what-is-abnormal (abnormality) metrics regarding other smart-agents.
  • Apparatus for creating smart-agents is supervised or unsupervised.
  • supervised an expert provides information about each domain.
  • Each numeric field is characterized by a list of intervals of normal values, and each symbolic field is characterized by a list of normal values. It is possible for a field to have only one interval. If there are no intervals for an attribute, the system apparatus can skip testing the validity of its values, e.g., when an event occurs.
  • a doctor can give the temperature of the human body as within an interval [35° C.: 41° C.], and the hair colors can be ⁇ black, blond, red ⁇ .
  • An unsupervised learning process uses the following algorithm:
  • ⁇ man represents the minimum number of elements an interval must include. This means that an interval will only be take into account if it encapsulates enough values, so its values will be considered normal because frequent;
  • the system apparatus defines two parameters that is modified:
  • ⁇ min is computed with the following method:
  • ⁇ min f Imin *number of records in the table.
  • ⁇ dist represents the maximum width of an interval. This prevents the system apparatus from regrouping some numeric values that are too disparate. For an attribute a, lets call mina the smallest value of a on the whole table and maxa the biggest one. Then:
  • each field is verified with the intervals of the normal values it created, or that were fixed by an expert. It checks that at least one interval exists. If not, the field is not verified. If true, the value inside is tested against the intervals, otherwise a warning is generated for the field.
  • the default value for ⁇ x is 1%: the system apparatus will only consider the significant value of each attribute.
  • the default value for ⁇ xy is 85%: the system apparatus will only consider the significant relations found.
  • a first level is a hash of the attribute's name (Att1 in eq); a second level is a hash for each attribute the values that imply some correlations (v1 in eq); a third level is a hash of the names of the attributes with correlations (Att2 in eq) to the first attribute; a fourth and last level has values of the second attribute that are correlated (v2 in eq).
  • Each leaf represents a relation.
  • the system apparatus stores the cardinalities c xi , c yj and c ij . This will allow the system apparatus to incrementally update the relations during its lifetime. Also it gives:
  • the system apparatus incrementally learns with new events:
  • a step 127 selects amongst a plurality of smart-agent predictive models and updates a corresponding particular smart-agent's real-time profile and long-term profile.
  • profiles are stored in a machine-readable storage mechanism with the data from the enriched-data records 124 .
  • Each corresponds to a transaction activity of a particular entity.
  • Step 127 employs an apparatus for executing a smart-agent algorithm that compares a current transaction, activity, behavior to previously memorialized transactions, activities and profiles such as illustrated in FIG. 7 .
  • Step 127 then transforms and stores a series of results as smart-agent predictive model in a markup language document in a machine-readable storage mechanism.
  • Such smart-agent predictive model markup language documents are XML types and best communicated in a registered file extension format, “.IFM”, marketed by Brighterion, Inc. (San Francisco, Calif.).
  • Steps 126 and 127 can both be implemented by the apparatus of FIG. 11 that executes algorithm 1100 .
  • a step 128 exports the .IFM-type smart-agent predictive model markup language documents to a user-service consumer, e.g., using an apparatus for executing a data-science-as-a-service algorithm from a network server, as illustrated in FIGS. 6 and 9 .
  • Method 100 further includes a step 130 for building a data mining predictive model (e.g. 612 , FIG. 6 ) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a data mining algorithm.
  • a data-tree result 131 is transformed by a step 132 into a data-mining predictive model markup language document that is stored in a machine-readable storage mechanism.
  • PMML predictive model markup language
  • PMML is an XML-based file format developed by the Data Mining Group (dmg.org) to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feed-forward neural networks. Further information related to data mining is included in Adjaoute '592. Special attention should be placed on FIGS. 11-30 and the descriptions of the data-mining technology in Columns 18 - 20 .
  • Method 100 further includes an alternative step 134 for building a neural network predictive model (e.g. 613 , FIG. 6 ) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a neural network algorithm.
  • a neural network predictive model e.g. 613 , FIG. 6
  • a nodes/weight result 135 is transformed by a step 136 into a neural-network predictive model markup language document that is stored in a machine-readable storage mechanism.
  • Further information related to neural networks is included in Adjaoute '592. Special attention should be placed on FIGS. 13-15 and the descriptions of the neural network technology in Columns 14 - 16 .
  • Method 100 further includes an alternative step 138 for building a case-based-reasoning predictive model (e.g. 614 , FIG. 6 ) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a cased-based reasoning algorithm.
  • a cases result 139 is transformed into a case-based-reasoning predictive model markup language document 140 that is stored in a machine-readable storage mechanism. Further information related to case-based-reasoning is included in Adjaoute '592. Special attention should be placed on FIGS. 24-25 and the descriptions of the case-based-reasoning technology in Columns 20 - 21 .
  • Method 100 further includes an alternative step 142 for building a clustering predictive model (e.g. 615 , FIG. 6 ) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a clustering algorithm.
  • a clusters result 143 is transformed by a step 144 into a clustering predictive model markup language document that is stored in a machine-readable storage mechanism.
  • Clustering here involves the unsupervised classification of observations, data items, feature vectors, and other patterns into groups.
  • supervised learning a collection of labeled patterns are used to determine class descriptions which, in turn, can then be used to label the new pattern.
  • unsupervised clustering the challenge is in grouping a given collection of unlabeled patterns into meaningful clusters.
  • Typical pattern clustering algorithms involve the following steps:
  • Pattern representation extraction and/or selection
  • Feature selection algorithms identify the most effective subsets of the original features to use in clustering.
  • Feature extraction makes transformations of the input features into new relevant features. Either one or both of these techniques is used to obtain an appropriate set of features to use in clustering.
  • Pattern representation refers to the number of classes and available patterns to the clustering algorithm. Pattern proximity is measured by a distance function defined on pairs of patterns.
  • a clustering is a partition of data into exclusive groups or fuzzy clustering. Using Fuzzy Logic, A fuzzy clustering method assigns degrees of membership in several clusters to each input pattern. Both similarity measures and dissimilarity measures are used here in creating clusters.
  • Method 100 further includes an alternative step 146 for building a business rules predictive model (e.g. 616 , FIG. 6 ) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a business rules algorithm.
  • a rules result 147 is transformed by a step 148 into a business rules predictive model markup language document that is stored in a machine-readable storage mechanism. Further information related to rule-based-reasoning is included in Adjaoute '592. Special attention should be placed on FIG. 27 and the descriptions of the rule-based-reasoning technology in Columns 20 - 21 .
  • Each of Documents 128 , 132 , 136 , 140 , 144 , and 146 is a tangible machine-readable transformation of a trained model and can be sold, transported, installed, used, adapted, maintained, and modified by a user-service consumer or provider.
  • FIG. 2 represents an apparatus 200 for executing an encryption algorithm 202 and a matching decoding algorithm 204 , e.g., a standard triple-DES device that uses two keys.
  • the Data Encryption Standard (DES) is a widely understood and once predominant symmetric-key algorithm for the encryption of electronic data.
  • DES is the archetypal block cipher—an algorithm that takes data and transforms it through a series of complicated operations into another cipher text bit string of the same length. In the case of DES, the block size is 64 bits.
  • DES also uses a key to customize the transformation, so that decryption can supposedly only be performed by those who know the particular key used to encrypt.
  • the key ostensibly consists of 64 bits; however, only 56 of these are actually used by the algorithm. Eight bits are used solely for checking parity, and are thereafter discarded. Hence the effective key length is 56 bits.
  • Triple DES is a common name in cryptography for the Triple Data Encryption Algorithm (TDEA or Triple DEA) symmetric-key block cipher, which applies the Data Encryption Standard (DES) cipher algorithm three times to each data block.
  • TDEA Triple Data Encryption Algorithm
  • DES Data Encryption Standard
  • the original DES cipher's key size of 56-bits was generally sufficient when that algorithm was designed, but the availability of increasing computational power made brute-force attacks feasible.
  • Triple DES provides a relatively simple method of increasing the key size of DES to protect against such attacks, without the need to design a completely new block cipher algorithm.
  • algorithms 202 and 204 transform data in separate records in storage memory back and forth between private data (P) and triple encrypted data (C).
  • FIGS. 3A , 3 B, and 3 C represent an algorithm 300 for cleaning up the raw data 106 in stored data records, field-by-field, record-by-record. What is meant by “cleaning up” is that inconsistent, missing, and illegal data in each field are removed or reconstituted. Some types of fields are very restricted in what is legal or allowed.
  • a record 302 is fetched from the raw data 304 and for each field 306 a test 306 sees if the data value reported is numeric or symbolic. If numeric, a data dictionary 308 is used by a step 310 to see if such data value is listed as valid. If symbolic, another data dictionary 312 is used by a step 314 to see if such data value is listed as valid.
  • a test 316 is used to branch if not numeric to a step 318 that replaces the numeric value.
  • FIG. 3B illustrates such in greater detail.
  • a test 320 is used to check if the numeric value is within an acceptable range. If not, step 318 is used to replace the numeric value.
  • a test 322 is used to branch if not numeric to a step 324 that replaces the symbolic value.
  • FIG. 3C illustrates such in greater detail.
  • a test 326 is used to check if the symbolic value is an allowable one. If yes, a step 328 checks if the value is allowed in a set. If yes, then a return 330 proceeds to the next field. If no, step 324 replaces the symbolic value.
  • step 326 If in step 326 the symbolic value in the field is not an allowed value, a step 332 asks if the present field is a zip code field. If yes, a step 334 asks if it's a valid zip code. If yes, the processing moves on to the next field with step 330 . Otherwise, it calls on step 324 to replace the symbolic value.
  • step 338 asks if the field is reserved for telephone and fax numbers. If yes, a step 340 asks if it's a valid telephone and fax number. If yes, the processing moves on to the next field with step 330 . Otherwise, it calls on step 324 to replace the symbolic value.
  • step 338 If in step 338 the field is not a field reserved for telephone and fax numbers, then a step 344 asks if the present field is reserved for dates and time. If yes, a step 346 asks if it's a date or time. If yes, the processing moves on to the next field with step 330 . Otherwise, it calls on step 324 to replace the symbolic value.
  • a step 350 applies a Smith-Waterman algorithm to the data value.
  • the Smith-Waterman algorithm does a local-sequence alignment. It's used to determine if there are any similar regions between two strings or sequences. For example, to recognize “Avenue” as being the same as “Ave.”; and “St.” as the same as “Street”; and “Mr.” as the same as “Mister”. A consistent, coherent terminology is then enforceable in each data field without data loss.
  • the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure without looking at the total sequence. Then the processing moves on to a next field with step 330 .
  • FIG. 3B represents what happens inside step 318 , replace numeric value.
  • the numeric value to use as a replacement depends on any flags or preferences that were set to use a default, the average, a minimum, a maximum, or a null.
  • a step 360 tests if user preferences were set to use a default value. If yes, then a step 361 sets a default value and returns to do a next field in step 330 .
  • a step 362 tests if user preferences were set to use an average value. If yes, then a step 361 sets an average value and returns to do the next field in step 330 .
  • a step 364 tests if user preferences were set to use a minimum value.
  • a step 361 sets a minimum value and returns to do the next field in step 330 .
  • a step 366 tests if user preferences were set to use a maximum value. If yes, then a step 361 sets a maximum value and returns to do the next field in step 330 .
  • a step 368 tests if user preferences were set to use a null value. If yes, then a step 361 sets a null value and returns to do the next field in step 330 . Otherwise, a step 370 removes the record and moves on to the next record.
  • FIG. 3C represents what happens inside step 324 , replace symbolic value.
  • the symbolic value to use as a replacement depends on if flags were set to use a default, the average, or null.
  • a step 374 tests if user preferences were set to use a default value. If yes, then a step 375 sets a default value and returns to do the next field in step 330 .
  • a step 376 tests if user preferences were set to use an average value. If yes, then a step 377 sets an average value and returns to do the next field in step 330 .
  • a step 378 tests if user preferences were set to use a null value. If yes, then a step 379 sets a null value and returns to do the next field in step 330 . Otherwise, a step 380 removes the record and moves on to a next record.
  • FIG. 4 represents the apparatus for executing sampling algorithm 116 .
  • a sampling algorithm 400 takes cleaned, raw-data 402 and asks in step 404 if method embodiments of the present invention data are supervised. If so, a step 406 creates one data set “C 1 ” 408 and a “Cn” 410 for each class. Stratified selection is used if needed.
  • Each application carries its own class set, e.g., stocks portfolio managers use buy-sell-hold classes; loans managers use loan interest rate classes; risk assessment managers use fraud-no_fraud-suspicious classes; marketing managers use product-category-to-suggest classes; and, cybersecurity uses normal_behavior-abnormal_behavior classes. Other classes are possible and useful.
  • a step 412 and 413 asks if the class is abnormal (e.g., uncharacteristic). If not, a step 414 and 415 down-sample and produce sampled records of the class 416 and 417 . Then a step 418 and 419 splits the remaining data into separate training sets 420 and 421 , separate test sets 422 and 423 , and separate blind sets 424 and 425 .
  • abnormal e.g., uncharacteristic
  • step 430 creates one data set with all the records and stores them in a memory device 432 .
  • a step 434 down-samples all of them and stores those in a memory device 436 .
  • a step 438 splits the remaining data into separate a training set 440 , a separate test set 442 , and a separate blind set 444 .
  • FIGS. 5A and 5B together represent an apparatus 500 with at least one processor for executing a specialized data enrichment algorithm that works both to enrich the profiling criteria for smart-agents and to enrich the data fields for all the other general predictive models. They all are intended to work together in parallel with the smart-agents in operational use.
  • a plurality of training sets, herein 502 and 502 , for each class C 1 . . . Cn are input for each data field of a record in a step 506 .
  • Such supervised and unsupervised training sets correspond to training sets 420 , 421 , and 440 ( FIG. 4 ). More generally, flat data 110 , 120 and sampled data 118 ( FIG. 1 ).
  • a step 508 asks if there are too many distinct data values, e.g., more than a threshold data value stored in memory. For example, data that is so random as to reveal no information and nothing systemic. If so, a step 510 excludes that field and thereby reduces the list of fields.
  • a step 512 asks if there is a single data value. Again, if so such field is not too useful in later steps, and step 510 excludes that field as well. Otherwise, a step 514 asks if the Shannon entropy is too small, e.g., less than a threshold data value stored in memory. The Shannon entropy is calculable using a conventional formula:
  • step 510 excludes that field. Otherwise, a step 516 reduces the number of fields in the set of fields carried forward as those that actually provide useful information.
  • a step 517 asks if the field type under inspection at that instant is symbolic or numeric. If symbolic, a step 518 provides AI behavior grouping. For example, colors or the names of boys. Otherwise, a step 520 does a numeric fuzzification in which a numeric value is turned into a membership of one or more fuzzy sets. Then a step 522 produces a reduced set of transformed fields.
  • a step 524 asks if the number of criteria or data fields remaining meets a predefined target number. The target number represents a judgment of the optimum spectrum of profiling criteria data fields that will be needed to produce high performance smart-agents and good predictive models.
  • a step 526 outputs a final list of profiling criteria and data fields needed by the smart-agent steps 126 and 127 in FIG. 1 and all the other predictive model steps 130 , 131 , 134 , 135 , 138 , 139 , 142 , 143 , 146 , and 147 .
  • a step 528 begins a process to generate additional profiling criteria and newly derived data fields.
  • a step 530 chooses an aggregation type.
  • a step 532 chooses a time range for a newly derived field or profiling criteria.
  • a step 534 chooses a filter.
  • a step 536 chooses constraints.
  • a step 538 chooses the fields to aggregate.
  • a step 540 chooses a recursive level.
  • a step 542 assesses the quality of the newly derived field by importing test set classes C 1 . . . Cn 544 and 546 . It assesses the profiling criteria and data field quality for large enough coverage in a step 548 , the maximum transaction/event false positive rate (TFPR) below a limit in a step 550 , the average TFPR below a limit in a step 552 , transaction/event detection rate (TDR) above a threshold in a step 554 , the transaction/event review rate (TRR) trend below a threshold in a step 556 , the number of conditions below a threshold in a step 560 , the number of records is above a threshold in a step 562 , and the time window is optimal a step 564 .
  • TFPR maximum transaction/event false positive rate
  • TDR transaction/event detection rate
  • TRR transaction/event review rate
  • a step 566 adds it to the list. Otherwise, the newly derive profiling criteria or data field is discarded in a step 568 and returns to step 528 to try a new iteration with updated parameters.
  • Thresholds and limits are stored in computer storage memory mechanisms as modifiable digital data values that are non-transitory. Thresholds are predetermined and is “tuned” later to optimize overall operational performance. For example, by manipulating the data values stored in a computer memory storage mechanism through an administrator's console dashboard. Thresholds are digitally compared to incoming data, or newly derived data using conventional devices.
  • the predictive model technologies have been individually trained by both supervised and unsupervised data and then packaged into a PMML Document, one or more of them can be put to work in parallel render a risk or a decision score for each new record presented to them.
  • the smart-agent predictive model technology will be employed by a user-consumer. But when more than one predictive model technology is added in to leverage their respective synergies, a decision engine algorithm is needed to single out which predicted class produced in parallel by several predictive model technologies would be the best to rely on.
  • FIG. 6 is a flowchart diagram of a method 600 for using the PMML Documents ( 128 , 132 , 136 , 140 , 144 , and 148 ) of FIG. 1 with an algorithm for the run-time operation of parallel predictive model technologies.
  • Method 600 depends on an apparatus to execute an algorithm to use the predictive technologies produced by method 100 ( FIG. 1 ) and exported as PMML Documents.
  • Method 600 can provide a substantial commercial advantage in a real-time, record-by-record application by a business.
  • One or more PMML Documents 601 - 606 are imported and put to work in parallel as predictive model technologies 611 - 616 to simultaneously predict a class and its confidence in that class for each new record in a raw data record input 618 that are presented to them.
  • a resulting enriched data 624 with newly derived fields in the records is then passed in parallel for simultaneous consideration and evaluation by all the predictive model technologies 611 - 616 present. Each will transform its inputs into a predicted class 631 - 636 and a confidence 641 - 646 stored in a computer memory storage mechanism.
  • a record-by-record decision engine 650 inputs user strategies in the form of flag settings 652 and rules 654 to decision on which to output as a prevailing predicted class output 660 and to compute a normalized confidence output 661 .
  • Such record-by-record decision engine 650 is detailed here next in FIG. 7 .
  • FIELD OF APPLICATION OUTPUT CLASSES stocks use class buy , buy, sell, hold, etc. loans use class provide a loan with an interest , or not risk use class fraud, no fraud, suspicious marketing use class category of product to suggest cybersecurity use class normal behavior, abnormal, etc.
  • Method 600 works with at least two of the predictive models from steps 128 , 132 , 136 , 140 , 144 , and 148 (of FIG. 1 ).
  • the predictive models each simultaneously produce a score and a score-confidence level in parallel sets, all from a particular record in a plurality of enriched-data records. These combine into a single result to return to a user-service consumer as a decision.
  • Adjaoute '592 Further information related to combining models is included in Adjaoute '592. Special attention should be placed on FIG. 30 and the description in Column 22 on combining the technologies. There, the neural network, smart-agent, data mining, and case-based reasoning technologies all come together to produce a final decision, such as if a particular electronic transaction is fraudulent, in a different application, if there is network intrusion.
  • FIG. 7 is a flowchart diagram of an apparatus with an algorithm 700 for the decision engine 650 of FIG. 6 .
  • Algorithm 700 chooses which predicted class 631 - 636 , or a composite of them, should be output as prevailing predicted class 660 .
  • Switches or flag settings 652 are used to control the decision outcome and are fixed by the user-service consumer in operating their business based on the data science embodied in Documents 601 - 606 .
  • Rules 654 too can include business rules like, “always follow the smart agent's predicted class if its confidence exceeds 90%.”
  • a step 702 inspects the rule type then in force.
  • Compiled flag settings rules are fuzzy rules (business rules) developed with fuzzy logic. Fuzzy rules are used to merge the predicted classes from all the predictive models and technologies 631 - 636 and decide on one final prediction, herein, prevailing predicted class 660 .
  • Rules 654 are either manually written by analytical engineers, or they are automatically generated when analyzing the enriched training data 124 ( FIG. 1 ) in steps 126 , 130 , 134 , 138 , 142 , and 146 .
  • step 704 invokes the compiled flag settings rules and returns with a corresponding decision 706 for output as prevailing predicted class 660 .
  • step 702 If in step 702 it is decided to follow “smart agents”, then a step 708 invokes the smart agents and returns with a corresponding decision 710 for output as prevailing predicted class 660 .
  • step 702 it is decided to follow “predefined rules”, then a step 712 asks if the flag settings should be applied first. If not, a step 714 applies a winner-take-all test to all the individual predicted classes 631 - 636 ( FIG. 6 ). A step tests if one particular class wins. If yes, a step 718 outputs that winner class for output as prevailing predicted class 660 .
  • a step 720 applies the flag settings to the individual predicted classes 631 - 636 ( FIG. 6 ). Then a step 722 asks there is a winner rule. If yes, a step 724 outputs that winner rule decision for output as prevailing predicted class 660 . Otherwise, a step 726 outputs an “otherwise” rule decision for output as prevailing predicted class 660 .
  • a step 730 applies the flags to the individual predicted classes 631 - 636 ( FIG. 6 ). Then a step 732 asks if there is a winner rule. If yes, then a step 734 outputs that winner rule decision for output as prevailing predicted class 660 . Otherwise, a step 736 asks if the decision should be winner-take-all. If no, a step 738 outputs an “otherwise” rule decision for output as prevailing predicted class 660 .
  • step 740 applies winner-take-all to each of the individual predicted classes 631 - 636 ( FIG. 6 ). Then a step 742 asks if there is now a winner class. If not, step 738 outputs an “otherwise” rule decision for output as prevailing predicted class 660 . Otherwise, a step 744 outputs a winning class decision for output as prevailing predicted class 660 .
  • Compiled flag settings rules in step 704 are fuzzy rules, e.g., business rules with fuzzy logic. Such fuzzy rules are targeted to merge the predictions 631 - 636 into one final prediction 660 . Such rules are either written by analytical engineers or are generated automatically by analyses of the training data.
  • step 730 an algorithm for a set of ordered rules that indicate how to handle predictions output by each prediction technology.
  • FIG. 8 illustrates this further.
  • FIG. 8 shows flag settings 800 as a set of ordered rules 801 - 803 that indicate how to handle each technology prediction 631 - 636 ( FIG. 6 ). For each technology 611 - 616 , there is at least one rule 801 - 803 that provides a corresponding threshold 811 - 813 . Each are then compared to prediction confidences 641 - 646 .
  • a corresponding incoming confidence 820 is higher or equal to a given threshold 811 - 813 provided by a rule 801 - 803
  • the technology 611 - 616 associated with rule 801 - 803 is declared “winner” and its class and confidence are used as the final prediction.
  • an “otherwise rule” determines what to do.
  • a clause indicates how to classify the transaction (fraud/not-fraud) and it sets the confidence to zero.
  • a first rule looks at a smart-agent confidence (e.g., 641 ) of 0.7, but that is below a given corresponding threshold (e.g., 811 ) of 0.75 so inspection continues.
  • a second rule looks at a data mining confidence (e.g., 642 ) of 0.8 which is above a given threshold (e.g., 812 ) of 0.7. Inspection stops here and decision engine 650 uses the Data Mining prediction (e.g., 632 ) to define the final prediction (e.g., 660 ). Thus it is decided in this example that the incoming transaction is fraudulent with a confidence of 0.8.
  • a winner-take-all technique groups the individual predictions 631 - 636 by their prediction output classes.
  • Each Prediction Technology is assigned its own weight, one used when it predicts a fraudulent transaction, another used when it predicts a valid transaction. All similar predictions are grouped together by summing their weighted confidence. The sum of the weighted confidences is divided by the sum of the weights used in order to obtain a final confidence between 0.0 and 1.0.
  • Weights Predictions Prediction Weight- Weight- Prediction Technology Fraud Valid Class Technology Confidence Smart-agents 2 2 Fraud Smart-agents 0.7 Data Mining 1 1 Fraud Data Mining 0.8 Case Based 2 2 Valid Cases Based 0.4 Reasoning Reasoning
  • two prediction technologies e.g., 611 and 612
  • predicting e.g., 631 and 632
  • their cumulated weighted confidence is computed as: 2*0.7+1*0.8 which is 2.2, and stored in computer memory.
  • this particular transaction in this example is decided to belong to the “fraud” class.
  • the confidence is then normalized for output by dividing it by the sum of the weights that where associated with the fraud (2 and 1). So the final confidence (e.g., 661 ) is computed by 2.2/(2+1) giving: 0.73.
  • Some models 611 - 616 may have been trained to output more than just two binary classes.
  • a fuzzification can provide more than two slots, e.g., for buy/sell/hold, or declined/suspect/approved. It may help to group classes by type of prediction (fraud or not-fraud).
  • Embodiments of the present invention integrate the constituent opinions of the technologies and make a single prediction class. How they integrate the constituent predictions 631 - 636 depend on a user-service consumers' selections of which technologies to favor and how to favor, and such selections are made prior to training the technologies, e.g., through a model training interface.
  • a default selection includes the results of the neural network technology, the smart-agent technology, the data mining technology, and the case-based reasoning technology.
  • the user-service consumer may decide to use any combination of technologies, or to select an expert mode with four additional technologies: (1) rule-based reasoning technology; (2) fuzzy logic technology; (3) genetic algorithms technology; and (4) constraint programming technology.
  • One strategy that could be defined by a user-service consumer-consumer assigns one vote to each predictive technology 611 - 616 .
  • a final decision 660 then stems from a majority decision reached by equal votes by the technologies within decision engine 650 .
  • Another strategy definable by a user-service consumer-consumer assigns priority values to each one of technologies 611 - 616 with higher priorities that more heavily determine the final decision, e.g., that a transaction is fraudulent and another technology with a lower priority determines that the transaction is not fraudulent, then method embodiments of the present invention use the priority values to discriminate between the results of the two technologies and determine that the transaction is indeed fraudulent.
  • a further strategy definable by a user-service consumer-consumer specifies instead a set of meta-rules to help choose a final decision 660 for output. These all indicate an output prediction class and its confidence level as a percentage (0-1000, or 0-1.0) proportional to how confident the system apparatus is in the prediction.
  • FIG. 9 illustrates a method 900 of business decision making that requires the collaboration of two businesses, a service provider 901 and a user-consumer 902 .
  • the two businesses communicate with one another via secure Internet between network servers.
  • the many data records and data files passed between them are hashed or encrypted by a triple-DES algorithm, or similar protection. It also possible to send a non-encrypted filed through an encrypted channel.
  • Users of the platform would upload their data through SSL/TLS from a browser or from a command line interface (SCP or SFTP).
  • the service-provider business 901 combines method 100 ( FIG. 1 ) and method 600 ( FIG. 6 ) and their constituent algorithms. It accepts supervised and unsupervised training data 904 and strategies 906 from the user-service consumer business 902 . Method 100 then processes such as described above with FIGS. 1-8 to produce a full set of fully trained predictive models that are passed to method 600 .
  • New records from operations 906 provided, e.g., in real-time as they occur, are passed after being transformed by encryption from the user-service consumer business 902 to the service provider business 901 and method 600 .
  • An on-going run of scores, predictions, and decisions 908 (produced by method 600 according to the predictive models of method 100 and the strategies 905 and training data 904 ) are returned to user-service consumer business 902 after being transformed by encryption.
  • method 900 is trained for a wide range of uses, e.g., to classify fraud/no-fraud in payment transaction networks, to predict buy/sell/hold in stock trading, to detect malicious insider activity, and to call for preventative maintenance with machine and device failure predictions.
  • another method of operating an artificial intelligence machine to improve their decisions from included predictive models begins by deleting with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data training records stored in a memory of the artificial intelligence machine to exclude each data field in the first series of data training records that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and using an information gain to select the most useful data fields, and then transforming a surviving number of data fields in all the first series of data training records into a corresponding reduced-field series of data training records stored in the memory of the artificial intelligence machine.
  • a next step includes adding with the at least one processor a new derivative data field to all the reduced-field series of data training records stored in the memory and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data training records stored in the memory of the artificial intelligence machine.
  • a next step includes verifying with the at least one processor that each predictive model if trained with the enriched-field series of data training records stored in the memory produces decisions having fewer errors than the same predictive model trained only with the first series of data training records.
  • a further step includes recording a data-enrichment descriptor into the memory to include an identity of selected data fields in a data training record format of the first series of data training records that were subsequently deleted, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources.
  • a next step includes causing the at least one processor of the artificial intelligence machine to start extracting decisions from a new series of data records of new events by receiving and storing the new series of data records in the memory of the artificial intelligence machine.
  • a further step includes causing the at least one processor to fetch the data-enrichment descriptor and use it to select which data fields to delete and then deleting all the data values included in the selected data fields from each of a new series of data records of new events.
  • Each data field deleted matches a data field in the first series of data training records had more than a threshold number of random data values, or that had only one repeating data value, or that had too small a Shannon entropy.
  • a next step includes adding with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory according to the data-enrichment descriptor, and initializing each added new derivative data field with a new data value stored in the memory.
  • Each new derivative data field added matches a new derivative data field added to the enriched-field series of data training records in which real scaler numeric data values were changed into fuzzy values, or if symbolic, were changed into a behavior group data value stored in the memory, and were tested that a minimum number of data fields survive, and if not, then that generated a new derivative data field and fixed within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level.
  • the method concludes by producing and outputting a series of predictive decisions with the at least one processor that operates at least one predictive model algorithm derived from one originally built and trained with records having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
  • FIG. 10 represents an apparatus for executing an algorithm 1000 for reclassifying a decision 660 ( FIG. 6 ) for business profitability reasons. For example, when a payment card transaction for a particular transaction amount $X has already been preliminarily “declined” and included in a decision 1002 (and 660 , FIG. 6 ) according to some other decision model.
  • a test 1004 compares a dollar transaction “threshold amount-A” 1006 to a computation 1008 of the running average business a particular user has been doing with the account involved. The rational for doing this is that valuable customers who do more than an average amount (threshold-A 1006 ) of business with their payment card should not be so easily or trivially declined.
  • Some artificial intelligence deliberation and reconsideration is appropriate.
  • a “transaction declined” decision 1010 is issued as final (transaction-declined 110 ). Such is then forwarded by a financial network to the merchant point-of-sale (POS).
  • POS point-of-sale
  • threshold-B transaction amount 1016 is compared to the transaction amount $X. Essentially, threshold-B transaction amount 1016 is set at a level that would relieve qualified accountholders of ever being denied a petty transaction, e.g., under $250, and yet not involve a great amount of risk should the “positive” scoring indication from the “other decision model” not prove much later to be “false”. If the transaction amount $X is less than threshold-B transaction amount 1016 , a “transaction approved” decision 1018 is issued as final. Such is then forwarded by the financial network to the merchant CP/CNP, unattended terminal, ATM, online payments, etc.
  • a transaction-preliminarily-approved decision 1020 is carried forward to a familiar transaction pattern test 1022 .
  • An abstract 1024 of this account's transaction patterns is compared to the instant transaction. For example, if this accountholder seems to be a new parent with a new baby as evidenced in purchases of particular items, then all future purchases that could be associated are reasonably predictable. Or, in another example, if the accountholder seems to be on business in a foreign country as evidenced in purchases of particular items and travel arrangements, then all future purchases that could be reasonably associated are to be expected and scored as lower risk. And, in one more example, if the accountholder seems to be a professional gambler as evidenced in cash advances at casinos, purchases of specific things and arrangements, then these future purchases too could be reasonably associated are be expected and scored as lower risk.
  • a “transaction declined” decision 1026 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM. Otherwise; a transaction-preliminarily-approved decision 1028 is carried forward to a threshold-C test 1030 .
  • threshold-C transaction amount 1032 is compared to the transaction amount $X. Essentially, threshold-C transaction amount 1032 is set at a level that would relieve qualified accountholders of being denied a moderate transaction, e.g., under $2500, and yet not involve a great amount of risk because the accountholder's transactional behavior is within their individual norms. If the transaction amount $X is less than threshold-C transaction amount 1032 , a “transaction approved” decision 1034 is issued as final (transaction-approved). Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • a transaction-preliminarily-approved decision 1036 is carried forward to a familiar user device recognition test 1038 .
  • An abstract 1040 of this account's user devices is compared to those used in the instant transaction.
  • a “transaction declined” decision 1042 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM. Otherwise; a transaction-preliminarily-approved decision 1044 is carried forward to a threshold-D test 1046 .
  • a threshold-D transaction amount 1048 is compared to the transaction amount $X. Basically, the threshold-D transaction amount 1048 is set at a higher level that would avoid denying substantial transactions to qualified accountholders, e.g., under $10,000, and yet not involve a great amount of risk because the accountholder's user devices are recognized and their instant transactional behavior is within their individual norms. If the transaction amount $X is less than threshold-D transaction amount 1032 , a “transaction approved” decision 1050 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • transaction amount $X is just too large to override a denial if the other decision model decision 1002 was “positive”, e.g., for fraud, or some other reason.
  • a “transaction declined” decision 1052 is issued as final (transaction-declined 110 ). Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • threshold-B 1016 is less than threshold-C 1032 , which in turn is less than threshold-D 1048 . It could be that tests 1022 and 1038 would serve profits better if swapped in FIG. 10 . Embodiments of the present invention would therefore include this variation as well. It would seem that threshold-A 1006 should be empirically derived and driven by business goals.
  • the further data processing required by technology 1000 occurs in real-time while merchant (CP and CNP, ATM and all unattended terminal) and users wait for approved/declined data messages to arrive through financial network.
  • merchant CP and CNP, ATM and all unattended terminal
  • the abstracts for this-account's-running-average-totals 1008 , this account's-transaction-patterns 1024 , and this-account's-devices 1040 must all be accessible and on-hand very quickly.
  • a simple look-up is preferred to having to compute the values.
  • the smart agents and the behavioral profiles they maintain and that we've described in this Application and those we incorporate herein by reference are up to doing this job well. Conventional methods and apparatus may struggle to provide this information quickly enough.
  • FIG. 10 represents for the first time in machine learning an apparatus that allows a different threshold for each customer. It further enables different thresholds for the same customer based on the context, e.g., a Threshold- 1 while traveling, a Threshold- 2 while buying things familiar with his purchase history, a Threshold- 3 while in same area where they live, a Threshold- 4 during holidays, a Threshold- 5 for nights, a Threshold- 6 during business hours, etc.
  • FIG. 11 represents an algorithm that executes as smart-agent production apparatus 1100 , and is included in the build of smart-agents in steps 126 and 127 ( FIG. 1 ), or as step 611 ( FIG. 6 ) in operation.
  • the results are either exported as an .IFM-type XML document in step 128 , or used locally as in method 600 ( FIG. 6 ).
  • Step 126 ( FIG. 1 ) builds a population of smart-agents and their profiles that are represented in FIG. 11 as smart-agents S 1 1102 and Sn 1104 .
  • Step 127 ( FIG. 1 ) initialized that build.
  • Such population can reach into the millions for large systems, e.g., those that handle payment transaction requests nationally and internationally for millions of cardholders (entities).
  • a step 1110 gets the corresponding smart-agent that matches this identification from the initial population of smart-agents 1102 , 1102 it received in step 128 ( FIG. 1 ).
  • a step 1112 asks if any were not found.
  • a step 1114 uses default profiles optimally defined for each entity, and to create and initialize smart-agents and profiles for entities that do not have a match in the initial population of smart-agents 1102 , 1102 .
  • a step 1116 uses the matching smart-agent and profile to assess record 1106 and issues a score 1118 .
  • a step 1120 updates the matching smart-agent profile with the new information in record 1106 .
  • a step 1122 dynamically creates/removes/updates and otherwise adjusts attributes in any matching smart-agent profile based on a content of records 1106 .
  • a step 1124 adjusts an aggregation type (count, sum, distinct, ratio, average, minimum, maximum, standard deviation, . . . ) in a matching smart-agent profile.
  • a step 1126 adjusts a time range in a matching smart-agent profile.
  • a step 1128 adjusts a filter based on a reduced set of transformed fields in a matching smart-agent profile.
  • a step 1130 adjusts a multi-dimensional aggregation constraint in a matching smart-agent profile.
  • a step 1132 adjusts an aggregation field, if needed, in the matching smart-agent profile.
  • a step 1134 adjusts a recursive level in the matching smart-agent profile.
  • FIGS. 12-29 provide greater detail regarding the construction and functioning of algorithms that are employed in FIGS. 1-11 .
  • FIG. 12 is a schematic diagram of the neural network architecture used in method embodiments of the present invention.
  • Neural network 1200 consists of a set of processing elements or neurons that are logically arranged into three layers: (1) input layer 1201 ; (2) output layer 1202 ; and (3) hidden layer 1203 .
  • the architecture of neural network 1200 is similar to a back propagation neural network, but its training, utilization, and learning algorithms are different.
  • the neurons in input layer 1201 receive input fields from a training table. Each of the input fields are multiplied by a weight such as weight “Wij” 1204 a to obtain a state or output that is passed along another weighted connection with weights “Vjt” 1205 between neurons in hidden layer 1202 and output layer 1203 .
  • the inputs to neurons in each layer come exclusively from output of neurons in a previous layer, and the output from these neurons propagate to the neurons in the following layers.
  • FIG. 13 is a diagram of a single neuron in the neural network used in method embodiments of the present invention.
  • Neuron 1300 receives input “i” from a neuron in a previous layer. Input “i” is multiplied by a weight “Wih” and processed by neuron 1300 to produce state “s”. State “s” is then multiplied by weight “V hi ” to produce output “i” that is processed by neurons in the following layers.
  • Neuron 1300 contains limiting thresholds 1301 that determine how an input is propagated to neurons in the following layers.
  • FIG. 14 is a flowchart of an algorithm 1400 for training neural networks with a single hidden layer that builds incrementally during a training process. The hidden layers may also grow in number later during any updates.
  • Each training process computes a distance between all the records in a training table, and groups some of the records together.
  • a training set “S” and input weights “bi” are initialized. Training set “S” is initialized to contain all the records in the training table. Each field “i” in the training table is assigned a weight “bi” to indicate its importance.
  • the input weights “bi” are selected by a client.
  • a distance matrix D is created. Distance matrix D is a square and symmetric matrix of size N ⁇ N, where N is the total number of records in training set “S”. Each element “Dij” in row “i” and column “j” of distance matrix D contains the distance between record “i” and record “j” in training set “S”.
  • the distance between two records in training set “S” is computed using a distance
  • FIG. 15 illustrates a table of distance measures 1500 that is used in a neural network training process.
  • Table 1500 lists distance measures that is used to compute the distance between two records Xi and Xj in training set “S”.
  • the default distance measure used in the training process is a Weighted-Euclidean distance measure that uses input weights “bi” to assign priority values to the fields in a training table.
  • a distance matrix D is computed such that each element at row “i” and column “j” contains d(Xi, Xj) between records Xi and Xj in training set “S”. Each row “i” of distance matrix D is then sorted so that it contains the distances of all the records in training set “S” ordered from the closest one to the farthest one.
  • a new neuron is added to the hidden layer of the neural network the largest subset “Sk” of input records having the same output is determined.
  • the neuron group is formed at step 97 .
  • the input weights “Wh” are equal to the value of the input record in row “k” of the distance matrix D, and the output weights “Vh” are equal to zero except for the weight assigned between the created neuron in the hidden layer and the neuron in the output layer representing the output class value of any records belonging to subset “Sk”.
  • a subset “Sk” is removed from training set “S”, and all the previously existing output weights “Vh” between the hidden layer and the output layer are doubled.
  • the training set is checked to see if it still contains input records, and if so, the training process goes back. Otherwise, the training process is finished and the neural network is ready for use.
  • FIG. 16 is a flowchart of an algorithm 1600 for propagating an input record through a neural network.
  • An input record is propagated through a network to predict if its output signifies a fraudulent transaction.
  • a distance between the input record and the weight pattern “Wh” between the input layer and the hidden layer in the neural network is computed.
  • the distance “d” is compared to the limiting thresholds low and high of the first neuron in the hidden layer. If the distance is between the limiting thresholds, then the weights “Wh” are added to the weights “Vh” between the hidden layer and the output layer of the neural network. If there are more neurons in the hidden layer, then the propagation algorithm goes back to repeat steps for the other neurons in the hidden layer. Finally, the predicted output class is determined according to the neuron at the output layer that has the higher weight.
  • FIG. 17 is a flowchart of an algorithm 1700 for updating the training process of a neural network.
  • the training process is updated whenever a neural network needs to learn some new input record.
  • Neural networks are updated automatically, as soon as data from a new record is evaluated by method embodiments of the present invention. Alternatively, the neural network may be updated offline.
  • a new training set for updating a neural network is created.
  • the new training set contains all the new data records that were not utilized when first training the network using the training algorithm illustrated in FIG. 14 .
  • the training set is checked to see if it contains any new output classes not found in the neural network. If there are no new output classes, the updating process proceeds with the training algorithm illustrated in FIG. 14 . If there are new output classes, then new neurons are added to the output layer of the neural network, so that each new output class has a corresponding neuron at the output layer. When the new neurons are added, the weights from these neurons to the existing neurons at the hidden layer of the neural network are initialized to zero.
  • the weights from the hidden neurons to be created during the training algorithm are initialized as 2h, where “h” is the number of hidden neurons in the neural network prior to the insertion of each new hidden neuron.
  • the training algorithm illustrated in FIG. 14 is started to form the updated neural network technology.
  • Smart-agent technology uses multiple smart-agents in unsupervised mode, e.g., to learn how to create profiles and clusters.
  • Each field in a training table has its own smart-agent that cooperates with others to combine some partial pieces of knowledge they have about data for a given field, and validate the data being examined by another smart-agent.
  • the smart-agents can identify unusual data and unexplained relationships. For example, by analyzing a healthcare database, the smart-agents would be able to identify unusual medical treatment combinations used to combat a certain disease, or to identify that a certain disease is only linked to children.
  • the smart-agents would also be able to detect certain treatment combinations just by analyzing the database records with fields such as symptoms, geographic information of patients, medical procedures, and so on.
  • Smart-agent technology creates intervals of normal values for each one of the fields in a training table to evaluate if the values of the fields of a given electronic transaction are normal. And the technology determines any dependencies between each field in a training table to evaluate if the values of the fields of a given electronic transaction or record are coherent with the known field dependencies. Both goals can generate warnings.
  • FIG. 18 is a flowchart of an algorithm for creating intervals of normal values for a field in a training table.
  • the algorithm illustrated in the flowchart is run for each field “a” in a training table.
  • a list “La” of distinct couples (“vai”, “nai”) is created, where “vai” represents the i th distinct value for field “a” and “nai” represents its cardinality, e.g., the number of times value “vai” appears in a training table.
  • the field is determined to be symbolic or numeric.
  • each member of “La” is copied into a new list “Ia” whenever “nai” is superior to a threshold “ ⁇ min” that represents the minimum number of elements a normal interval must include.
  • the relations (a, Ia) are saved in memory storage. Whenever a data record is to be evaluated by the smart-agent technology, the value of the field “a” in the data record is compared to the normal intervals created in “Ia” to determine if the value of the field “a” is outside the normal range of values for that given field.
  • the list “La” of distinct couples (“vai”, nai) is ordered starting with the smallest value Va.
  • the total cardinality “na” of all the values from “val” to “vak” is compared to “ ⁇ min” to determine the final value of the list of normal intervals “Ia”. If the list “Ia” is not empty, the relations (a, Ia) are saved.
  • the value of the field “a” in the data record is compared to the normal intervals created in “Ia” to determine if the value of the field “a” is outside the normal range of values for that given field. If the value of the field “a” is outside the normal range of values for that given field, a warning is generated to indicate that the data record is likely fraudulent.
  • FIG. 19 is a flowchart of an algorithm 1900 for determining dependencies between each field in a training table.
  • a list Lx of couples (vxi, nxi) is created for each field “x” in a training table.
  • the values vxi in Lx for which (nxi/nT)> ⁇ x are determined, where nT is the total number of records in a training table and ⁇ x is a threshold value specified by the user. In a preferred embodiment, ⁇ x has a default value of 1%.
  • a list Ly of couples (vyi, nyi) for each field y, Y ⁇ X is created.
  • ⁇ xy has a default value of 85%.
  • All the relations are saved in a tree made with four levels of hash tables to increase the speed of the smart-agent technology.
  • the first level in the tree hashes the field name of the first field
  • the second level hashes the values for the first field implying some correlations with other fields
  • the third level hashes the field name with whom the first field has some correlations
  • the fourth level in the tree hashes the values of the second field that are correlated with the values of the first field.
  • Each leaf of the tree represents a relation, and at each leaf, the cardinalities nxi, nyj, and nij are stored. This allows the smart-agent technology to be automatically updated and to determine the accuracy, prevalence, and the expected predictability of any given relation formed in a training table.
  • FIG. 21 is a flowchart of an algorithm 2100 for updating smart-agents.
  • the total number of records nT in a training table is incremented by a new number of input records to be included in the update of the smart-agent technology.
  • the parameters nxi, nyj, and nij are retrieved, and, nxi, nyj, and nij are respectively incremented.
  • FIG. 22 represents one way to implement a data mining algorithm as in steps 130 - 132 ( FIG. 1 ). More detail is incorporated herein by reference to Adjaoute '592, and especially that relating to its FIG. 22 .
  • the data mining algorithm and the data tree of step 131 are highly advantaged by having been trained by the enriched data 124 . Such results in far superior training compared to conventional training with data like raw data 106 .
  • Data mining identifies several otherwise hidden data relationships, including: (1) associations, wherein one event is correlated to another event such as purchase of gourmet cooking books close to the holiday season; (2) sequences, wherein one event leads to another later event such as purchase of gourmet cooking books followed by the purchase of gourmet food ingredients; (3) classification, and, e.g., the recognition of patterns and a resulting new organization of data such as profiles of customers who make purchases of gourmet cooking books; (4) clustering, e.g., finding and visualizing groups of facts not previously known; and (5) forecasting, e.g., discovering patterns in the data that can lead to predictions about the future.
  • One goal of data mining technology is to create a decision tree based on records in a training database to facilitate and speed up the case-based reasoning technology.
  • the case-based reasoning technology determines if a given input record associated with an electronic transaction is similar to any typical records encountered in a training table. Each record is referred to as a “case”. If no similar cases are found, a warning is issued to flag the input record.
  • the data mining technology creates a decision tree as an indexing mechanism for the case-based reasoning technology. Data mining technology can also be used to automatically create and maintain business rules for a rule-based reasoning technology.
  • the decision tree is an “N-ary” tree, wherein each node contains a subset of similar records in a training database. (An N-ary tree is a tree in which each node has no more than N children.) In preferred embodiments, the decision tree is a binary tree. Each subset is split into two other subsets, based on the result of an intersection between the set of records in the subset and a test on a field. For symbolic fields, the test is if the values of the fields in the records in the subset are equal, and for numeric fields, the test is if the values of the fields in the records in the subset are smaller than a given value. Applying the test on a subset splits the subset in two others, depending on if they satisfy the test or not. The newly created subsets become the children of the subset they originated from in the tree. The data mining technology creates the subsets recursively until each subset that is a terminal node in the tree represents a unique output class.
  • FIG. 22 is a flowchart of an algorithm 2200 for generating the data mining technology to create a decision tree based on similar records in a training table.
  • Sets “S”, R, and U are initialized.
  • Set “S” is a set that contains all the records in a training table
  • set R is the root of the decision tree
  • set U is the set of nodes in the tree that are not terminal nodes. Both R and U are initialized to contain all the records in a training table.
  • a first node Ni (containing all the records in the training database) is removed from U.
  • the triplet field, test, value) that best splits the subset Si associated with the node Ni into two subsets is determined.
  • the triplet that best splits the subset Si is the one that creates the smallest depth tree possible, that is, the triplet would either create one or two terminal nodes, or create two nodes that, when split, would result in a lower number of children nodes than other triplets.
  • the triplet is determined by using an impurity function such as Entropy or the Gini index to find the information conveyed by each field value in the database.
  • the field value that conveys the least degree of information contains the least uncertainty and determines the triplet to be used for splitting the subsets.
  • a node Nij is created and associated to the first subset Sij formed.
  • the node Nij is then linked to node Ni, and named with the triplet (field, test, value).
  • a check is performed to evaluate if all the records in subset Sij at node Nij belong to the same output class c ij . If they do, then the prediction of node Nij is set to c ij . If not, then node Nij is added to U.
  • the algorithm then proceeds to check if there are still subsets Sij to be split in the tree, and if so, the algorithm goes back. When all subsets have been associated with nodes, the algorithm continues for the remaining nodes in U until U is determined to be empty.
  • FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver.
  • Database 2301 has three fields: (1) age, (2) car type, and (3) risk.
  • the risk field is the output class that needs to be predicted for any new incoming data record.
  • the age and the car type fields are used as inputs.
  • the data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.
  • the decision tree starts with a root node NO ( 2302 ).
  • a test 2303 is determined that best splits database 2301 into two nodes, a node N 1 ( 2304 ) with a subset 2305 , and a node N 2 ( 2306 ) with a subset 2307 .
  • Node N 1 ( 2304 ) is a terminal node type, since all data records in subset 2305 have the same class output that indicates a high insurance risk for drivers that are younger than twenty-five.
  • the data mining technology then splits a node N 2 ( 2306 ) into two additional nodes, a node N 3 ( 2308 ) containing a subset 2309 , and a node N 4 ( 2310 ) containing a subset 2311 .
  • Both nodes N 3 ( 2308 ) and N 4 ( 2310 ) were split from node N 2 ( 2306 ) based on a test 2312 , that checks if the car type is a sports car.
  • nodes N 3 ( 2308 ) and N 4 ( 2310 ) are terminal nodes, with node N 3 ( 2308 ) signifying a high insurance risk and node N 4 ( 2310 ) representing a low insurance risk.
  • the decision tree formed by the data mining technology is preferably a depth two binary tree, significantly reducing the size of the search problem for the case-based reasoning technology. Instead of searching for similar cases to an incoming data record associated with an electronic transaction in the entire database, the case-based reasoning technology only has to use the predefined index specified by the decision tree.
  • the case-based reasoning technology stores past data records or cases to identify and classify a new case. It reasons by analogy and classification. Case-based reasoning technologies create a list of generic cases that best represent the cases in its training table. A typical case is generated by computing similarities between all the cases in its training table and selecting those cases that best represent distinct cases. Whenever a new case is presented in a record, a decision tree is to determine if any input record it has on file in its database is similar to something encountered in its training table.
  • FIG. 24 is a flowchart of an algorithm for generating a case-based reasoning technology used later to find a record in a database that best resembles an input record corresponding to a new transaction.
  • An input record is propagated through a decision tree according to tests defined for each node in the tree until it reaches a terminal node. If an input record is not fully defined, that is, the input record does not contain values assigned to certain fields, and then the input record is propagated to a last node in a tree that satisfies all the tests.
  • the cases retrieved from this node are all the cases belonging to the node's leaves.
  • a similarity measure is computed between the input record and each one of the cases retrieved.
  • the similarity measure returns a value that indicates how close the input record is to a given case retrieved.
  • the case with the highest similarity measure is then selected as the case that best represents the input record.
  • the solution is revised by using a function specified by the user to modify any weights assigned to fields in the database. Finally, the input record is included in the training database and the decision tree is updated for learning new patterns.
  • FIG. 25 represents a table 2500 of global similarity measures useful by case-based reasoning technology.
  • the table lists an example of six similarity measures that could be used in case-based reasoning to compute a similarity between cases.
  • the Global Similarity Measure is a computation of the similarity between case values V 1i and V 2i and are based on local similarity measures sim i for each field y i .
  • the global similarity measures may also employ weights w i for different fields.
  • FIG. 26 is an example table of Local Similarity Measures useful in case-based reasoning.
  • Table 2600 lists fourteen different Local Similarity Measures that is used by the global similarity measures listed.
  • the local similarity measures depend on the field type and valuation.
  • the field type is: (1) symbolic or nominal; (2) ordinal, when the values are ordered; (3) taxonomic, when the values follow a hierarchy; and (4) numeric, which can take discrete or continuous values.
  • the Local Similarity Measures are based on a number of parameters, including: (1) the values of a given field for two cases, V 1 and V 2 ; (2) the lower (V 1 ⁇ and V 2 ⁇ ) and higher (V 1 + and V 2 +) limits of V 1 and V 2 ; (3) the set of all values that is reached by the field; (4) the central points of V 1 and V 2 , V1c and V2c; (5) the absolute value “ec” of a given interval; and (6) the height “h” of a level in a taxonomic descriptor.
  • Genetic algorithms technologies include a library of genetic algorithms that incorporate biological evolution concepts to find if a class is true, e.g., a business transaction is fraudulent, there is network intrusion, etc. Genetic algorithms is used to analyze many data records and predictions generated by other predictive technologies and recommend its own efficient strategies for quickly reaching a decision.
  • Rule-based reasoning, fuzzy logic, and constraint programming technologies include business rules, constraints, and fuzzy rules to determine the output class of a current data record, e.g., if an electronic transaction is fraudulent.
  • Such business rules, constraints, and fuzzy rules are derived from past data records in a training database or created from predictable but unusual data records that may arise in the future.
  • the business rules is automatically created by the data mining technology, or they is specified by a user.
  • the fuzzy rules are derived from business rules, with constraints specified by a user that specify which combinations of values for fields in a database are allowed and which are not.
  • FIG. 27 represents a rule 2700 for use with the rule-based reasoning technology.
  • Rule 2700 is an IF-THEN rule containing an antecedent and consequence. The antecedent uses tests or conditions on data records to analyze them. The consequence describes the actions to be taken if the data satisfies the tests.
  • An example of rule 2700 that determines if a credit card transaction is fraudulent for a credit card belonging to a single user may include “IF (credit card user makes a purchase at 8 AM in New York City) and (credit card user makes a purchase at 8 AM in Atlanta) THEN (credit card number may have been stolen)”.
  • the use of the words “may have been” in the consequence sets a trigger that other rules need to be checked to determine if the credit card transaction is indeed fraudulent or not.
  • FIG. 28 represents a fuzzy rule 2800 to specify if a person is tall.
  • Fuzzy rule 2800 uses fuzzy logic to handle the concept of partial truth, e.g., truth values between “completely true” and “completely false” for a person who may or may not be considered tall.
  • Fuzzy rule 2800 contains a middle ground, in addition to the binary patterns of yes/no. Fuzzy rule 2800 derives here from an example rule such as
  • FIG. 29 is a flowchart of an algorithm 2900 for applying rule-based reasoning, fuzzy logic, and constraint programming to determine if an electronic transaction is fraudulent.
  • the rules and constraints are specified by a user-service consumer and/or derived by data mining technology.
  • the data record associated with a current electronic transaction is matched against the rules and the constraints to determine which rules and constraints apply to the data.
  • the data is tested against the rules and constraints to determine if the transaction is fraudulent.
  • the rules and constraints are updated to reflect the new electronic transaction.
  • the present inventor Dr. Akli Adjaoute and his Company, Brighterion, Inc. (San Francisco, Calif.), have been highly successful in developing fraud detection computer models and applications for banks, payment processors, and other financial institutions.
  • these fraud detection computer models and applications are trained to follow and develop an understanding of the normal transaction behavior of single individual accountholders. Such training is sourced from multi-channel transaction training data or single-channel. Once trained, the fraud detection computer models and applications are highly effective when used in real-time transaction fraud detection that comes from the same channels used in training.
  • Some embodiments of the present invention train several single-channel fraud detection computer models and applications with corresponding different channel training data.
  • the resulting, differently trained fraud detection computer models and applications are run several in parallel so each can view a mix of incoming real-time transaction message reports flowing in from broad diverse sources from their unique perspectives. One may compute a “hit” the others will miss, and that's the point.
  • one differently trained fraud detection computer model and application produces a hit, it is considered herein a warning that the accountholder has been compromised or has gone rogue.
  • the other differently trained fraud detection computer models and applications should be and are sensitized to expect fraudulent activity from this accountholder in the other payment transaction channels. Hits across all channels are added up and too many is reason to shut down all payment channels for the affected accountholder.
  • a method of cross-channel financial fraud protection comprises training a variety of real-time, risk-scoring fraud model technologies with training data selected for each from a common transaction history. This then can specialize each member in the monitoring of a selected channel. After training, the heterogeneous real-time, risk-scoring fraud model technologies are arranged in parallel so that all receive the same mixed channel flow of real-time transaction data or authorization requests.
  • Parallel, diversity trained, real-time, risk-scoring fraud model technologies are hosted on a network server platform for real-time risk scoring of a mixed channel flow of real-time transaction data or authorization requests.
  • Risk thresholds are directly updated for particular accountholders in every member of the parallel arrangement of diversity trained real-time, risk-scoring fraud model technologies when any one of them detects a suspicious or outright fraudulent transaction data or authorization request for the accountholder. So, a compromise, takeover, or suspicious activity of an accountholder's account in any one channel is thereafter prevented from being employed to perpetrate a fraud in any of the other channels.
  • Such method of cross-channel financial fraud protection can further include building a population of real-time, long-term, and recursive profiles for each accountholder in each of the real-time, risk-scoring fraud model technologies. Then during real-time use, maintaining and updating the real-time, long-term, and recursive profiles for each accountholder in each and all of the real-time, risk-scoring fraud model technologies with newly arriving data.
  • Fifteen-minute vectors are a way to cross pollenate risks calculated in one channel with the others.
  • the 15-minute vectors can represent an amalgamation or fuzzification of transactions in all channels, or channel-by channel. Once a 15-minute vector has aged, it is shifted into a 100-minute vector, a one-hour vector, and a whole day vector by a simple shift register means. These vectors represent velocity counts that is very effective in catching fraud as it is occurring in real time.
  • embodiments of the present invention include adaptive learning that combines three learning techniques to evolve the artificial intelligence classifiers.
  • First is the automatic creation of profiles, or smart-agents, from historical data, e.g., long-term profiling.
  • the second is real-time learning, e.g., enrichment of the smart-agents based on real-time activities.
  • the third is adaptive learning carried by incremental learning algorithms.
  • a smart-agent is created for each individual card in that data in a first learning step, e.g., long-term profiling.
  • Each profile is created from the card's activities and transactions that took place over the two year period.
  • Each profile for each smart-agent comprises knowledge extracted field-by-field, such as merchant category code (MCC), time, amount for an mcc over a period of time, recursive profiling, zip codes, type of merchant, monthly aggregation, activity during the week, weekend, holidays, Card not present (CNP) versus card present (CP), domestic versus cross-border, etc. this profile will highlights all the normal activities of the smart-agent (specific payment card).
  • MCC merchant category code
  • CNP Card not present
  • CP card present
  • domestic versus cross-border etc.
  • Smart-agent technology learns specific behaviors of each cardholder and creates a smart-agent to follow the behavior of each cardholder. Because it learns from each activity of a cardholder, the smart-agent updates its profiles and makes effective changes at runtime. It is the only technology with an ability to identify and stop, in real-time, previously unknown fraud schemes. It has the highest detection rate and lowest false positives because it separately follows and learns the behaviors of each cardholder.
  • Smart-agents have a further advantage in data size reduction. Once, say twenty-seven terabytes of historical data is transformed into smart-agents, only 200-gigabytes is needed to represent twenty-seven million distinct smart-agents corresponding to all the distinct cardholders.
  • Incremental learning technologies are embedded in the machine algorithms and smart-agent technology to continually re-train from any false positives and negatives that occur along the way. Each corrects itself to avoid repeating the same classification errors.
  • Data mining logic incrementally changes the decision trees by creating a new link or updating the existing links and weights.
  • Neural networks update the weight matrix, and case based reasoning logic updates generic cases or creates new ones.
  • Smart-agents update their profiles by adjusting the normal/abnormal thresholds, or by creating exceptions.
  • FIG. 30 represents a flowchart of an algorithm 3000 executed by an apparatus needed to implement a method embodiment of the present invention for improving predictive model training and performance by data enrichment of transaction records.
  • the data enrichment of transaction records is done first with supervised and unsupervised training data 124 ( FIG. 1 ) and training sets 420 + 422 + 424 , 421 + 423 + 425 , and 440 + 442 + 444 ( FIG. 4 ) during training to build predictive models 127 , 131 , 135 , 139 , 143 , and 147 ( FIG. 1 ), and 601 - 606 ( FIG. 6 ). These are ultimately deployed as predictive models 611 - 616 ( FIG. 6 ) for use in real time with a raw feed of new event, non-training data records 906 ( FIG. 9 ).
  • FIG. 30 shows on the left that method 500 ( FIG. 5 ) includes a step 3001 to delete some data fields not particularly useful, a step 3002 to add some data fields are helpful, a step 3003 to test that the data fields added in step 3002 do improve the final predictions, and a step 3004 to loop until all the original data fields are scrutinized.
  • embodiments of the present invention include a method 3000 of operating an artificial intelligence machine 100 to produce predictive model language documents 128 , 132 , 136 , 140 , 144 , and 148 describing improved predictive models that generate better business decisions 660 , 661 from raw data record inputs 618 .
  • a first phase includes deleting 3001 with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data records (e.g., training sets 420 + 422 + 424 , 421 + 423 + 425 , and 440 + 442 + 444 [ FIG.
  • a next phase includes adding 3002 with the at least one processor a new derivative data field to all the reduced-field series of data records stored in the memory of the artificial intelligence machine and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine.
  • a next phase includes verifying 3003 with the at least one processor that a predictive model trained with the enriched-field series of data records stored in the memory of the artificial intelligence machine produces more accurate predictions from the artificial intelligence machine having fewer errors than the same predictive model trained only with the first series of data records.
  • Another phase of the method includes verifying with the at least one processor that a predictive model 611 - 616 fed a non-training set of the enriched-field series of data records 906 stored in the memory of the artificial intelligence machine produces more accurate predictions 660 , 661 with fewer errors than the same predictive model fed with data records with unmodified data fields.
  • a still further phase of the method includes recording as a data-enrichment descriptor 3006 and 3008 into the memory of the artificial intelligence machine including the at least one processor an identity of any data fields in a data record format of the first series of data records that were subsequently deleted and can be ignored, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources.
  • Another phase includes passing along the data-enrichment descriptor with the at least one processor information stored in the memory of the artificial intelligence machine to an artificial intelligence machine including processors for predictive model algorithms to produce and output better business decisions from its own feed of new events as raw data record inputs stored in the memory of the artificial intelligence machine.
  • a method 622 ( FIG. 6 ) of operating an artificial intelligence machine including processors for predictive model algorithms that produces and that outputs better business decisions 660 , 661 from a new series of data records of new events as raw data record inputs 618 and 906 , includes a phase to recover with at least one processor a recording of a data-enrichment descriptor stored in a memory of an artificial intelligence machine including an identity 3006 of any data fields in a data record format of a series of data records that were subsequently deleted by an artificial intelligence machine including processors for predictive model building, and which of any newly derived data fields 3008 were subsequently added, and how each newly derived data field was derived and from which information sources.
  • a next phase includes accepting a new series of data records 906 of new events with the artificial intelligence machine including at least one processor to receive and store records in the memory of the artificial intelligence machine.
  • a next phase of the method 3000 includes ignoring or deleting 3010 with the at least one processor all data fields and all data values contained in the data fields from each of a new series of data records of new events, stored in the memory of the artificial intelligence machine, according to the data-enrichment descriptor 3006 .
  • a next phase that includes adding 3011 with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory of the artificial intelligence machine according to the data-enrichment descriptor 3008 , and initializing each added new derivative data field with a new data value stored in the memory of the artificial intelligence machine.
  • the method further includes producing and outputting a series of predictive decisions 660 , 661 with the at least one processor that operates at least one predictive model algorithm 611 - 616 derived from one originally built and trained with records (e.g., training sets 420 + 422 + 424 , 421 + 423 + 425 , and 440 + 442 + 444 [ FIG. 4 ]) having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
  • records e.g., training sets 420 + 422 + 424 , 421 + 423 + 425 , and 440 + 442 + 444 [ FIG. 4 ]
  • the method excludes each data field stored in the memory of the artificial intelligence machine that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and then transforming a surviving number of data fields into a corresponding reduced-field series of data records stored in the memory of the artificial intelligence machine.
  • the method adds a new derivative data field to a reduced-field series of data records stored in the memory of the artificial intelligence machine and initialize each added new derivative data field with a new data value, and to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value stored in the memory of the artificial intelligence machine, and testing that a minimum number of data fields survive in that stored in the memory of the artificial intelligence machine, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and which the quality of each newly derived data field was test, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computational Linguistics (AREA)
  • Automation & Control Theory (AREA)
  • Fuzzy Systems (AREA)

Abstract

A method of improving the training and performance of predictive models. A first method of operating an artificial intelligence machine produces predictive model language documents describing improved predictive models that generate better business decisions from raw data record inputs. A second method of operating an artificial intelligence machine including processors for predictive model algorithms produces and outputs better business decisions from raw data record inputs. Both methods enrich the raw data records their processors are fed by deleting data fields with data values that have little benefit in decision making, and that derive and add new data fields from information sources then available that do benefit in the decision making of the artificial intelligence machine through improved accuracies of prediction.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to ARTIFICIAL INTELLIGENCE MACHINES and more specifically to methods of improving the training and performance of predictive models these include by enriching the data records to produce better decisions.
  • 2. Background
  • Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.
  • In unsupervised learning, data points have no labels associated with them. Instead, the goal of unsupervised learning is to identify and explore regularities and dependencies in data, e.g., the structure of the underlying data distributions. The quality of a structure is measured by a cost function which is usually minimized to infer optimal parameters characterizing the hidden structure in the data. Reliable and robust inference requires a guarantee that the extracted structures are typical for the data source, e.g., similar structures have to be extracted from a second sample set of the same data source.
  • Reinforcement learning maps situations to actions to maximize a scalar reward or reinforcement signal. The learner does not need to be directly told which actions to take, but instead must discover which actions yield the best rewards by trial and error. An action may affect not only the immediate reward, but also the next situation, and consequently all subsequent rewards. Trial and error search, and delayed reward, are two important distinguishing characteristics of reinforcement learning.
  • Supervised learning algorithms use a known dataset to thereafter make predictions. The dataset training includes input data that produces response values. Supervised learning algorithms are used to build predictive models for new responses to new data. The larger the training datasets, the better will be the prediction models. Supervised learning includes classifications in which the data must be separated into classes, and regression for continuous-response. Common classification algorithms include support vector machines (SVM), neural networks, Naïve Bayes classifier and decision trees. Common regression algorithms include linear regression, nonlinear regression, generalized linear models, decision trees, and neural networks.
  • SUMMARY OF THE INVENTION
  • Briefly, method embodiments of the present invention improve the training and performance of predictive models included in artificial intelligence machines. A first method of operating an artificial intelligence machine produces predictive model language documents describing improved predictive models that generate better business decisions from raw data record inputs. A second method of operating an artificial intelligence machine including processors for predictive model algorithms produces and outputs better business decisions from raw data record inputs. Both methods enrich the raw data records their processors are fed by deleting data fields with data values that have little benefit in decision making, and that derive and add new data fields from information sources then available that do benefit in the decision making of the artificial intelligence machine through improved accuracies of prediction.
  • The above and still further objects, features, and advantages of the present invention will become apparent upon consideration of the following detailed description of specific embodiments thereof, especially when taken in conjunction with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a method embodiment of the present invention that provides user-service consumers with data science as-a-service operating on artificial intelligence machines;
  • FIG. 2 is a flowchart diagram of an algorithm for triple data encryption standard encryption and decryption as used in the method of FIG. 1;
  • FIG. 3A is a flowchart diagram of an algorithm for data cleanup as used in the method of FIG. 1;
  • FIG. 3B is a flowchart diagram of an algorithm for replacing a numeric value as used in the method of FIG. 3A;
  • FIG. 3C is a flowchart diagram of an algorithm for replacing a symbolic value as used in the method of FIG. 3A;
  • FIG. 4 is a flowchart diagram of an algorithm for building training sets, test sets, and blind sets, and further for down sampling if needed and as used in the method of FIG. 1;
  • FIG. 5A is a flowchart diagram of an algorithm for a first part of the data enrichment as used in the method of FIG. 1;
  • FIG. 5B is a flowchart diagram of an algorithm for a second part of the data enrichment as used in the method of FIG. 1 and where more derived fields are needed to suit quality targets;
  • FIG. 6 is a flowchart diagram of a method of using the PMML Documents of FIG. 1 with an algorithm for the run-time operation of parallel predictive model technologies in artificial intelligence machines;
  • FIG. 7 is a flowchart diagram of an algorithm for the decision engine of FIG. 6;
  • FIG. 8 is a flowchart diagram of an algorithm for using ordered rules and thresholds to decide amongst prediction classes;
  • FIG. 9 is a flowchart diagram of a method that combines the methods of FIGS. 1-8 and their algorithms to artificial intelligence machines that provide an on-line service for scoring, predictions, and decisions to user-service consumers requiring data science and artificial intelligence services without their being required to invest in and maintain specialized equipment and software;
  • FIG. 10 is a flowchart diagram illustrating an artificial intelligence machine apparatus for executing an algorithm for reconsideration of an otherwise final adverse decision, for example, in a payment authorization system a transaction request for a particular amount $X has already been preliminarily “declined” according to some other decision model;
  • FIG. 11 is a flowchart diagram of an algorithm for the operational use of smart agents in artificial intelligence machines;
  • FIGS. 12-29 provide greater detail regarding the construction and functioning of algorithms that are employed in FIGS. 1-11;
  • FIG. 12 is a schematic diagram of a neural network architecture used in a model;
  • FIG. 13 is a diagram of a single neuron in a neural network used in a model;
  • FIG. 14 is a flowchart of an algorithm for training a neural network;
  • FIG. 15 is an example illustrating a table of distance measures that is used in a neural network training process;
  • FIG. 16 is a flowchart of an algorithm for propagating an input record through a neural network;
  • FIG. 17 is a flowchart of an algorithm for updating a training process of a neural network;
  • FIG. 18 is a flowchart of an algorithm for creating intervals of normal values for a field in a training table;
  • FIG. 19 is a flowchart of an algorithm for determining dependencies between each field in a training table;
  • FIG. 20 is a flowchart of an algorithm for verifying dependencies between fields in an input record;
  • FIG. 21 is a flowchart of an algorithm for updating a smart-agent technology;
  • FIG. 22 is a flowchart of an algorithm for generating a data mining technology to create a decision tree based on similar records in a training table;
  • FIG. 23 is an example illustrating a decision tree for a database maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and a age of its driver;
  • FIG. 24 is a flowchart of an algorithm for generating a case-based reasoning technology to find a case in a database that best resembles a new transaction;
  • FIG. 25 is an example illustrating a table of global similarity measures used by a case-based reasoning technology;
  • FIG. 26 is an example illustrating a table of local similarity measures used by a case-based reasoning technology;
  • FIG. 27 is an example illustrating a rule for use with a rule-based reasoning technology;
  • FIG. 28 is an example illustrating a fuzzy rule to specify if a person is tall;
  • FIG. 29 is a flowchart of an algorithm for applying rule-based reasoning, fuzzy logic, and constraint programming to assess the normality/abnormality of and classify a transaction assess an activity; and
  • FIG. 30 is a flowchart diagram of an algorithm executed by an apparatus needed to implement a method embodiment of the present invention for improving predictive model training and performance by data enrichment of transaction records.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Computer-implemented method embodiments of the present invention provide an artificial intelligence and machine-learning service that is delivered on-demand to user-service consumers, their clients, and other users through network servers. The methods are typically implemented with special algorithms executed by computer apparatus and delivered to non-transitory storage mediums to the providers and user-service consumers who then sell or use the service themselves.
  • Users in occasional or even regular need of artificial intelligence and machine learning Prediction Technologies can get the essential data-science services required on the Cloud from an appropriate provider, instead of installing specialized hardware and maintaining their own software. Users are thereby freed from needing to operate and manage complex software and hardware. The intermediaries manage user access to their particular applications, including quality, security, availability, and performance.
  • FIG. 1 represents a predictive model learning method 100 that provides artificial intelligence and machine learning as-a-service by generating predictive models from service-consumer-supplied training data input records. A computer file 102 previously hashed or encrypted by a triple-DES algorithm, or similar protection. It also possible to send a non-encrypted filed through an encrypted channel. Users of the platform would upload their data through SSL/TLS from a browser or from a command line interface (SCP or SFTP). This is then received by a network server from a service consumer needing predictive models. Such encode the supervised and/or unsupervised data of the service consumer that are essential for use in later steps as training inputs. The records 102 received represent an encryption of individual supervised and/or unsupervised records each comprising a predefined plurality of predefined data fields that communicate data values, and structured and unstructured text. Such text often represents that found in webpages, blogs, automated news feeds, etc., and very often such contains errors and inconsistencies.
  • Structured text has an easily digested form and unstructured text does not. Text mining can use a simple bag-of-words model, such as how many times does each word occur. Or complex approaches that pull the context from language structures, e.g., the metadata of a post on Twitter where the unstructured data is the text of the post.
  • These records 102 are decrypted in a step 104 with an apparatus for executing a decoding algorithm, e.g., a standard triple-DES device that uses three keys. An example is illustrated in FIG. 2. A series of results are transformed into a set of non-transitory, raw-data records 106 that are collectively stored in a machine-readable storage mechanism.
  • A step 108 cleans up and improves the integrity of the data stored in the raw-data records 106 with an apparatus for executing a data integrity analysis algorithm. An example is illustrated in FIGS. 3A, 3B, and 3C. Step 108 compares and corrects any data values in each data field according to user-service consumer preferences like min, max, average, null, and default, and a predefined data dictionary of valid data values. Step 108 discerns the context of the structured and unstructured text with an apparatus for executing a contextual dictionary algorithm. Step 108 transforms each result into a set of flat-data records 110 that are collectively stored in a machine-readable storage mechanism.
  • Method 108 improves the training of predictive models by converting and transforming a variety of inconsistent and incoherent supervised and unsupervised training data for predictive models received by a network server as electronic data files, and storing that in a computer data storage mechanism. It then transforms these into another single, error-free, uniformly formatted record file in computer data storage with an apparatus for executing a data integrity analysis algorithm that harmonizes a range of supervised and unsupervised training data into flat-data records in which every field of every record file is modified to be coherent and well-populated with information.
  • The data values in each data field in the inconsistent and incoherent supervised and unsupervised training data are compared and corrected according to a user-service consumer preference and a predefined data dictionary of valid data values. An apparatus for executing an algorithm substitutes data values in the data fields of incoming supervised and unsupervised training data with at least one value representing a minimum, a maximum, a null, an average, and a default.
  • The context of any text included in the inconsistent and incoherent supervised and unsupervised training data is discerned, recognized, detected, and discriminated with an apparatus for executing a contextual dictionary algorithm that employs a thesaurus of alternative contexts of ambiguous words for find a common context denominator, and to then record the context determined into the computer data storage mechanism for later access by a predictive model.
  • Further details regarding data clean-up are provided below in connection with FIGS. 3A, 3B, and 3C. Data cleaning herein deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data. Data quality problems are present in single data collections, such as files and databases, or multiple data sources. For example,
  • Single-Source Data
  • level data errors
    attribute illegal values birth date = 30.13.70
    record violated attribute age = 32, birth date = 12.02.76
    dependencies
    record uniqueness name = “john smith”, SSN =
    type violation “123456”);
    name = “peter miller”, SSN =
    “123456”)
    source referential
    integrity
    violation
    attribute missing values phone = 9999-999999
    misspellings city = “SO”
    abbreviations Occupation = “database
    programmer.”
    embedded values name = “j. smith 12.02.70 new
    York”
    misfielded values city = “USA“
    record violated attribute city = “mill valley”, zip =
    dependencies 765662
    record word name1 = “j. smith”, name2 =
    type transpositions “miller p.”
    duplicated name = “john smith”, . . . );
    records name = “j. smith”, . . . )
    contradicting name = “john smith”, birth
    records date = 12.02.76);
    name = “john smith”, birth
    date = 12.12.76)
    source wrong references employee = (name = “john
    smith”, dept. no = 17)
    problems metadata examples/heuristics
    illegal values cardinality e.g., cardinality (gender) 2
    indicates problem
    max, min max, min should not be outside
    of permissible range
    variance, variance, deviation of statistical
    deviation values should not be higher than
    threshold
    misspellings attribute values sorting on values often brings
    misspelled values next to correct
    values
    missing values null values percentage/number of null values
    attribute values + presence of default value may
    default values indicate real value is missing
    varying value attribute values comparing attribute value set of a
    representations column of one table against that
    of a column of another table
    duplicates cardinality + attribute cardinality = # rows
    uniqueness should hold
    attribute values sorting values by number of
    occurrences; more than 1
    occurrence indicates duplicates
  • In a step 112, a test is made to see if a number of records 114 in the set of flat-data records 110 exceeds a predefined threshold, e.g., about one hundred million. The particular cutoff number to use is inexact and is empirically determined by what produces the best commercial efficiencies.
  • But if the number of records 114 is too large, a step 116 then samples a portion of the set of flat-data records 110. An example is illustrated in FIG. 4. Step 116 stores a set of samples 118 in a machine-readable storage mechanism for use in the remaining steps. Step 116 consequently employs an apparatus for executing a special sampling algorithm that limits the number of records that must be processed by the remaining steps, but at the same time preserves important training data. The details are described herein in connection with FIG. 4.
  • A modeling data 120 is given a new, amplified texture by a step 122 for enhancing, enriching, and concentrating the sampled or unsampled data stored in the flat-data records with an apparatus for executing a data enrichment algorithm. An example apparatus is illustrated in FIG. 4, which outputs training sets 420, 421, and 440; and test sets 422, 423, and 442; and blind sets 424, 425, and 444 derived from either the flat data 110 or sampled data 118. Such step 122 removes data that may exist in particular data fields that is less important to building predictive models. Entire data fields themselves are removed here that are predetermined to be unavailing to building good predictive models that follow.
  • Step 122 calculates and combines any data it has into new data fields that are predetermined to be more important to building such predictive models. It converts text with an apparatus for executing a context mining algorithm, as suggested by FIG. 6. Even more details of this are suggested in my U.S. patent application Ser. No. 14/613,383, filed Feb. 4, 2015, and titled, ARTIFICIAL INTELLIGENCE FOR CONTEXT CLASSIFIER. Step 122 then transforms a plurality of results from the execution of these algorithms into a set of enriched-data records 124 that are collectively stored in a machine-readable storage mechanism.
  • A step 126 uses the set of enriched-data records 124 to build a plurality of smart-agent predictive models for each entity represented. Step 126 employs an apparatus for executing a smart-agent building algorithm. The details of this are shown in FIG. 6. Further related information is included in my U.S. Pat. No. 7,089,592 B2, issued Aug. 8, 2006, titled, SYSTEMS AND METHODS FOR DYNAMIC DETECTION AND PREVENTION OF ELECTRONIC FRAUD, which is incorporated herein by reference. (Herein, Adjaoute '592.) Special attention should be placed on FIGS. 11-30 and the descriptions of smart-agents in connection with FIG. 21 and the smart-agent technology in Columns 16-18.
  • Unsupervised Learning of Normal and Abnormal Behavior
  • Each field or attribute in a data record is represented by a corresponding smart-agent. Each smart-agent representing a field will build what-is-normal (normality) and what-is-abnormal (abnormality) metrics regarding other smart-agents.
  • Apparatus for creating smart-agents is supervised or unsupervised. When supervised, an expert provides information about each domain. Each numeric field is characterized by a list of intervals of normal values, and each symbolic field is characterized by a list of normal values. It is possible for a field to have only one interval. If there are no intervals for an attribute, the system apparatus can skip testing the validity of its values, e.g., when an event occurs.
  • As an example, a doctor (expert) can give the temperature of the human body as within an interval [35° C.: 41° C.], and the hair colors can be {black, blond, red}.
  • 1) For each field ″a″ of a Table:
    i) Retrieve all the distinct values and their cardinalities
    and create a list ″La″ of couples (vai, nai);
    ii) Analyze the intermediate list ″La″ to create the list
    of intervals of normal values Ia with this method:
    (a) If ″a″ is a symbolic attribute, copy each member
    of ″La″ into Ia when nai is superior to a threshold
    Θmin;
    (b) If ″a″ is a numeric attribute:
    1. Order the list ″La″ starting with the smallest
    values ″a″;
    2. While La is not empty;
    i. Remove the first element ea= ( va1, na1) of
    ″La″
    ii. Create an interval with this element:
    I′ = [va1, va1]
    iii. While it is possible, enlarge this
    interval with the first elements of ″La″
    and remove them from ″La″: I′ = [va1,
    vak]. The loop stops before the size of
    the interval vak-va1 becomes greater than
    a threshold Θdist.
    (c) given: na′ = na1 + ... + nak
    (d) If na′ is superior to a threshold Θmin, Ia = I′
    otherwise, ia = Ø;
    iii) If Ia is not empty, save the relation (a , Ia).
  • An unsupervised learning process uses the following algorithm:
  • Θman represents the minimum number of elements an interval must include. This means that an interval will only be take into account if it encapsulates enough values, so its values will be considered normal because frequent;
  • the system apparatus defines two parameters that is modified:
  • the maximum number of intervals for each attribute nmax;
  • the minimum frequency of values in each interval fImin;
  • Θmin is computed with the following method:

  • Θmin =f Imin*number of records in the table.
  • Θdist represents the maximum width of an interval. This prevents the system apparatus from regrouping some numeric values that are too disparate. For an attribute a, lets call mina the smallest value of a on the whole table and maxa the biggest one. Then:

  • Θdist=(maxa−mina)/n max
  • For example, consider a numeric attribute of temperature with the following values:
  • 75 80 85 72 69 72 83 64 81 71 65 75 68 70

    The first step is to sort and group the values into “La”: “La”={(64, 1) (65, 1) (68, 1) (69, 1) (70, 1) (71, 1) (72, 2) (75, 2) (80, 1) (81, 1) (83, 1) (85, 1)}
    Then the system apparatus creates the intervals of normal values:
  • Consider fImin=10% and nmax=5 then Θmin=1.4 and Θdist=(85−64)/5=4.2
      • Ia={[64, 68] [69, 72] [75] [80, 83]}
        The interval [85, 85] was removed because its cardinality (1) is smaller than Θmin.
  • When a new event occurs, the values of each field are verified with the intervals of the normal values it created, or that were fixed by an expert. It checks that at least one interval exists. If not, the field is not verified. If true, the value inside is tested against the intervals, otherwise a warning is generated for the field.
  • During creation, dependencies between two fields are expressed as follows:
  • When the field 1 is equal to the value v1, then the field 2 takes the value v2 in significant frequency p.
  • Example: when species is human the body_temperature is 37.2° C. with a 99.5% accuracy.
  • Given cT is the number of records in the whole database. For each attribute X in the table:
  • Retrieve the list of distinct values for X with the cardinality of each value:
  • Lx={(x1, cx1), . . . (xi, cxi), . . . (xn, cxn)}
  • For each distinct value xi in the list:
    Verify if the value is typical enough: (cxi/cT)>Θx?
  • If true, for each attribute Y in the table, Y≠X
  • Retrieve the list of distinct values for Y with the cardinality of each value:
      • Ly={(y1, cy1), . . . (yj, cyj), . . . (yn, cyn)}
        For each value yj;
  • Retrieve the number of records cij where (X=xi) and (Y=yj). If the relation is significant, save it: if (cij/cxi)>Θxy then save the relation [(X=xi)
    Figure US20160071017A1-20160310-P00001
    (Y=yj)] with the cardinalities cxi, cyj and cij.
  • The accuracy of this relation is given by the quotient (cij/cxi)
  • Verify the coherence of all the relations: for each relation [(X=xi)
    Figure US20160071017A1-20160310-P00001
    (Y=yj)] (1)
  • Search if there is a relation [(Y=yj)
    Figure US20160071017A1-20160310-P00001
    (X=xk)] (2)
  • If xi≠xk remove both relations (1) and (2) from the model otherwise it will trigger a warning at each event since (1) and (2) cannot both be true.
  • To find all the dependencies, the system apparatus analyses a database with the following algorithm:
  • The default value for Θx is 1%: the system apparatus will only consider the significant value of each attribute.
  • The default value for Θxy is 85%: the system apparatus will only consider the significant relations found.
  • A relation is defined by: (Att1=v1)
    Figure US20160071017A1-20160310-P00001
    (Att2=v2) (eq).
  • All the relations are stored in a tree made with four levels of hash tables, e.g., to increase the speed of the system apparatus. A first level is a hash of the attribute's name (Att1 in eq); a second level is a hash for each attribute the values that imply some correlations (v1 in eq); a third level is a hash of the names of the attributes with correlations (Att2 in eq) to the first attribute; a fourth and last level has values of the second attribute that are correlated (v2 in eq).
  • Each leaf represents a relation. At each leaf, the system apparatus stores the cardinalities cxi, cyj and cij. This will allow the system apparatus to incrementally update the relations during its lifetime. Also it gives:
  • the accuracy of a relation: cij/cxi;
  • the prevalence of a relation: cij/cT;
  • the expected predictability of a relation: cyj/cT.
  • Consider an example with two attributes, A and B:
  • A B
    1 4
    1 4
    1 4
    1 3
    2 1
    2 1
    2 2
    3 2
    3 2
    3 2

    There are ten records: cT=10.
    Consider all the possible relations:
  • Relation Cxi Cyi Cij (cxi/cT) Accuracy
    (A = 1)  
    Figure US20160071017A1-20160310-P00002
      (B = 4)
    4 3 3 40%  75% (1)
    (A = 2)  
    Figure US20160071017A1-20160310-P00002
      (B = 1)
    2 2 2 20% 100% (2)
    (A = 3)  
    Figure US20160071017A1-20160310-P00002
      (B = 2)
    3 4 3 30% 100% (3)
    (B = 4)  
    Figure US20160071017A1-20160310-P00002
      (A = 1)
    3 4 3 30% 100% (4)
    (B = 3)  
    Figure US20160071017A1-20160310-P00002
      (A = 1)
    1 4 1 10% 100% (5)
    (B = 1)  
    Figure US20160071017A1-20160310-P00002
      (A = 2)
    2 3 2 20% 100% (6)
    (B = 2)  
    Figure US20160071017A1-20160310-P00002
      (A = 3)
    4 3 3 40%  75% (7)

    With the defaults values for Θx and Θxy, for each possible relation, the first test (cxi/cT)>Θx is successful (since Θx=1%) but the relations (1) and (7) would be rejected (since Θxy=85%).
    Then the system apparatus verifies the coherence of each remaining relation with an algorithm:
  • (A = 2)  
    Figure US20160071017A1-20160310-P00003
      (B = 1) is coherent with (B = 1)  
    Figure US20160071017A1-20160310-P00003
      (A = 2);
    (A = 3)  
    Figure US20160071017A1-20160310-P00003
      (B = 2) is not coherent since there is no more relation
    (B = 2)  
    Figure US20160071017A1-20160310-P00003
      . . . ;
    (B = 4)  
    Figure US20160071017A1-20160310-P00003
      (A = 1) is not coherent since there is no more relation
    (A = 1)  
    Figure US20160071017A1-20160310-P00003
      . . . ;
    (B = 3)  
    Figure US20160071017A1-20160310-P00003
      (A = 1) is not coherent since there is no more relation
    (A = 1)  
    Figure US20160071017A1-20160310-P00003
      . . . ;
    (B = 1)  
    Figure US20160071017A1-20160310-P00003
      (A = 2) is coherent with (A = 2)  
    Figure US20160071017A1-20160310-P00003
      (B = 1).

    The system apparatus classifies the normality/abnormality of each new event in real-time during live production and detection.
  • For each event couple attribute/value (X, xi):
  • Looking in the model for all the relations starting by [(X=xi)
    Figure US20160071017A1-20160310-P00004
    . . . ]
      • For all the other couple attribute/value (Y, Yj), Y≠X, of the event:
        • Look in the model for a relation [(X=xi)
          Figure US20160071017A1-20160310-P00004
          (Y=v)];
        • If yj≠v then trigger a warning “[(X=xi)
          Figure US20160071017A1-20160310-P00004
          (Y=yj)] not respected”.
    Incremental Learning
  • The system apparatus incrementally learns with new events:
  • Increment cT by the number or records in the new table T.
    For each relation [(X=xi)
    Figure US20160071017A1-20160310-P00004
    (Y=yj)] previously created:
      • Retrieve its parameters: cxi, cyj and cij
      • Increment cxi by the number of records in T where X=xi;
      • Increment cyj by the number of records in T where Y=yj;
      • Increment cij by the number of records in T where [(X=xi)
        Figure US20160071017A1-20160310-P00004
        (Y=Yj)];
      • Verify if the relation is still significant:
        • If (cxi/cT)<θx, remove this relation;
          If (cij/cxi)<Θxy, remove this relation.
  • In FIG. 1, a step 127 selects amongst a plurality of smart-agent predictive models and updates a corresponding particular smart-agent's real-time profile and long-term profile. Such profiles are stored in a machine-readable storage mechanism with the data from the enriched-data records 124. Each corresponds to a transaction activity of a particular entity. Step 127 employs an apparatus for executing a smart-agent algorithm that compares a current transaction, activity, behavior to previously memorialized transactions, activities and profiles such as illustrated in FIG. 7. Step 127 then transforms and stores a series of results as smart-agent predictive model in a markup language document in a machine-readable storage mechanism. Such smart-agent predictive model markup language documents are XML types and best communicated in a registered file extension format, “.IFM”, marketed by Brighterion, Inc. (San Francisco, Calif.).
  • Steps 126 and 127 can both be implemented by the apparatus of FIG. 11 that executes algorithm 1100.
  • A step 128 exports the .IFM-type smart-agent predictive model markup language documents to a user-service consumer, e.g., using an apparatus for executing a data-science-as-a-service algorithm from a network server, as illustrated in FIGS. 6 and 9.
  • In alternative method embodiments of the present invention, Method 100 further includes a step 130 for building a data mining predictive model (e.g. 612, FIG. 6) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a data mining algorithm. For example, as illustrated in FIG. 22. A data-tree result 131 is transformed by a step 132 into a data-mining predictive model markup language document that is stored in a machine-readable storage mechanism. For example, as an industry standardized predictive model markup language (PMML) document. PMML is an XML-based file format developed by the Data Mining Group (dmg.org) to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feed-forward neural networks. Further information related to data mining is included in Adjaoute '592. Special attention should be placed on FIGS. 11-30 and the descriptions of the data-mining technology in Columns 18-20.
  • Method 100 further includes an alternative step 134 for building a neural network predictive model (e.g. 613, FIG. 6) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a neural network algorithm. For example, as illustrated in FIG. 12-17. A nodes/weight result 135 is transformed by a step 136 into a neural-network predictive model markup language document that is stored in a machine-readable storage mechanism. Further information related to neural networks is included in Adjaoute '592. Special attention should be placed on FIGS. 13-15 and the descriptions of the neural network technology in Columns 14-16.
  • Method 100 further includes an alternative step 138 for building a case-based-reasoning predictive model (e.g. 614, FIG. 6) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a cased-based reasoning algorithm. As suggested by the algorithm of FIG. 25-26. A cases result 139 is transformed into a case-based-reasoning predictive model markup language document 140 that is stored in a machine-readable storage mechanism. Further information related to case-based-reasoning is included in Adjaoute '592. Special attention should be placed on FIGS. 24-25 and the descriptions of the case-based-reasoning technology in Columns 20-21.
  • Method 100 further includes an alternative step 142 for building a clustering predictive model (e.g. 615, FIG. 6) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a clustering algorithm. A clusters result 143 is transformed by a step 144 into a clustering predictive model markup language document that is stored in a machine-readable storage mechanism.
  • Clustering here involves the unsupervised classification of observations, data items, feature vectors, and other patterns into groups. In supervised learning, a collection of labeled patterns are used to determine class descriptions which, in turn, can then be used to label the new pattern. In the case of unsupervised clustering, the challenge is in grouping a given collection of unlabeled patterns into meaningful clusters.
  • Typical pattern clustering algorithms involve the following steps:
  • (1) Pattern representation: extraction and/or selection;
  • (2) Pattern proximity measure appropriate to the data domain;
  • (3) Clustering, and
  • (4) Assessment of the outputs.
  • Feature selection algorithms identify the most effective subsets of the original features to use in clustering. Feature extraction makes transformations of the input features into new relevant features. Either one or both of these techniques is used to obtain an appropriate set of features to use in clustering. Pattern representation refers to the number of classes and available patterns to the clustering algorithm. Pattern proximity is measured by a distance function defined on pairs of patterns.
  • A clustering is a partition of data into exclusive groups or fuzzy clustering. Using Fuzzy Logic, A fuzzy clustering method assigns degrees of membership in several clusters to each input pattern. Both similarity measures and dissimilarity measures are used here in creating clusters.
  • Method 100 further includes an alternative step 146 for building a business rules predictive model (e.g. 616, FIG. 6) by applying the same data from the samples of the enriched-data records 124 as an input to an apparatus for generating a business rules algorithm. As suggested by the algorithm of FIG. 27-29. A rules result 147 is transformed by a step 148 into a business rules predictive model markup language document that is stored in a machine-readable storage mechanism. Further information related to rule-based-reasoning is included in Adjaoute '592. Special attention should be placed on FIG. 27 and the descriptions of the rule-based-reasoning technology in Columns 20-21.
  • Each of Documents 128, 132, 136, 140, 144, and 146 is a tangible machine-readable transformation of a trained model and can be sold, transported, installed, used, adapted, maintained, and modified by a user-service consumer or provider.
  • FIG. 2 represents an apparatus 200 for executing an encryption algorithm 202 and a matching decoding algorithm 204, e.g., a standard triple-DES device that uses two keys. The Data Encryption Standard (DES) is a widely understood and once predominant symmetric-key algorithm for the encryption of electronic data. DES is the archetypal block cipher—an algorithm that takes data and transforms it through a series of complicated operations into another cipher text bit string of the same length. In the case of DES, the block size is 64 bits. DES also uses a key to customize the transformation, so that decryption can supposedly only be performed by those who know the particular key used to encrypt. The key ostensibly consists of 64 bits; however, only 56 of these are actually used by the algorithm. Eight bits are used solely for checking parity, and are thereafter discarded. Hence the effective key length is 56 bits.
  • Triple DES (3DES) is a common name in cryptography for the Triple Data Encryption Algorithm (TDEA or Triple DEA) symmetric-key block cipher, which applies the Data Encryption Standard (DES) cipher algorithm three times to each data block. The original DES cipher's key size of 56-bits was generally sufficient when that algorithm was designed, but the availability of increasing computational power made brute-force attacks feasible. Triple DES provides a relatively simple method of increasing the key size of DES to protect against such attacks, without the need to design a completely new block cipher algorithm.
  • In FIG. 2, algorithms 202 and 204 transform data in separate records in storage memory back and forth between private data (P) and triple encrypted data (C).
  • FIGS. 3A, 3B, and 3C represent an algorithm 300 for cleaning up the raw data 106 in stored data records, field-by-field, record-by-record. What is meant by “cleaning up” is that inconsistent, missing, and illegal data in each field are removed or reconstituted. Some types of fields are very restricted in what is legal or allowed. A record 302 is fetched from the raw data 304 and for each field 306 a test 306 sees if the data value reported is numeric or symbolic. If numeric, a data dictionary 308 is used by a step 310 to see if such data value is listed as valid. If symbolic, another data dictionary 312 is used by a step 314 to see if such data value is listed as valid.
  • For numeric data values, a test 316 is used to branch if not numeric to a step 318 that replaces the numeric value. FIG. 3B illustrates such in greater detail. A test 320 is used to check if the numeric value is within an acceptable range. If not, step 318 is used to replace the numeric value.
  • For symbolic data values, a test 322 is used to branch if not numeric to a step 324 that replaces the symbolic value. FIG. 3C illustrates such in greater detail. A test 326 is used to check if the symbolic value is an allowable one. If yes, a step 328 checks if the value is allowed in a set. If yes, then a return 330 proceeds to the next field. If no, step 324 replaces the symbolic value.
  • If in step 326 the symbolic value in the field is not an allowed value, a step 332 asks if the present field is a zip code field. If yes, a step 334 asks if it's a valid zip code. If yes, the processing moves on to the next field with step 330. Otherwise, it calls on step 324 to replace the symbolic value.
  • If in step 332 the field is not an allowed value a zip code field, then a step 338 asks if the field is reserved for telephone and fax numbers. If yes, a step 340 asks if it's a valid telephone and fax number. If yes, the processing moves on to the next field with step 330. Otherwise, it calls on step 324 to replace the symbolic value.
  • If in step 338 the field is not a field reserved for telephone and fax numbers, then a step 344 asks if the present field is reserved for dates and time. If yes, a step 346 asks if it's a date or time. If yes, the processing moves on to the next field with step 330. Otherwise, it calls on step 324 to replace the symbolic value.
  • If in step 344 the field is not a field reserved for dates and time, then a step 350 applies a Smith-Waterman algorithm to the data value. The Smith-Waterman algorithm does a local-sequence alignment. It's used to determine if there are any similar regions between two strings or sequences. For example, to recognize “Avenue” as being the same as “Ave.”; and “St.” as the same as “Street”; and “Mr.” as the same as “Mister”. A consistent, coherent terminology is then enforceable in each data field without data loss. The Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure without looking at the total sequence. Then the processing moves on to a next field with step 330.
  • FIG. 3B represents what happens inside step 318, replace numeric value. The numeric value to use as a replacement depends on any flags or preferences that were set to use a default, the average, a minimum, a maximum, or a null. A step 360 tests if user preferences were set to use a default value. If yes, then a step 361 sets a default value and returns to do a next field in step 330. A step 362 tests if user preferences were set to use an average value. If yes, then a step 361 sets an average value and returns to do the next field in step 330. A step 364 tests if user preferences were set to use a minimum value. If yes, then a step 361 sets a minimum value and returns to do the next field in step 330. A step 366 tests if user preferences were set to use a maximum value. If yes, then a step 361 sets a maximum value and returns to do the next field in step 330. A step 368 tests if user preferences were set to use a null value. If yes, then a step 361 sets a null value and returns to do the next field in step 330. Otherwise, a step 370 removes the record and moves on to the next record.
  • FIG. 3C represents what happens inside step 324, replace symbolic value. The symbolic value to use as a replacement depends on if flags were set to use a default, the average, or null. A step 374 tests if user preferences were set to use a default value. If yes, then a step 375 sets a default value and returns to do the next field in step 330. A step 376 tests if user preferences were set to use an average value. If yes, then a step 377 sets an average value and returns to do the next field in step 330. A step 378 tests if user preferences were set to use a null value. If yes, then a step 379 sets a null value and returns to do the next field in step 330. Otherwise, a step 380 removes the record and moves on to a next record.
  • FIG. 4 represents the apparatus for executing sampling algorithm 116. A sampling algorithm 400 takes cleaned, raw-data 402 and asks in step 404 if method embodiments of the present invention data are supervised. If so, a step 406 creates one data set “C1408 and a “Cn” 410 for each class. Stratified selection is used if needed. Each application carries its own class set, e.g., stocks portfolio managers use buy-sell-hold classes; loans managers use loan interest rate classes; risk assessment managers use fraud-no_fraud-suspicious classes; marketing managers use product-category-to-suggest classes; and, cybersecurity uses normal_behavior-abnormal_behavior classes. Other classes are possible and useful. For all classes, a step 412 and 413 asks if the class is abnormal (e.g., uncharacteristic). If not, a step 414 and 415 down-sample and produce sampled records of the class 416 and 417. Then a step 418 and 419 splits the remaining data into separate training sets 420 and 421, separate test sets 422 and 423, and separate blind sets 424 and 425.
  • If in step 404 method embodiments of the present invention data was determined to be unsupervised, a step 430 creates one data set with all the records and stores them in a memory device 432. A step 434 down-samples all of them and stores those in a memory device 436. Then a step 438 splits the remaining data into separate a training set 440, a separate test set 442, and a separate blind set 444.
  • Later applications described herein also require data cleanup and data enrichment, but they do not require the split training sets produced by sampling algorithm 400. Instead they process new incoming records that are cleaned and enriched to make a prediction, a score, or a decision, record one at a time.
  • FIGS. 5A and 5B together represent an apparatus 500 with at least one processor for executing a specialized data enrichment algorithm that works both to enrich the profiling criteria for smart-agents and to enrich the data fields for all the other general predictive models. They all are intended to work together in parallel with the smart-agents in operational use.
  • In FIG. 5A, a plurality of training sets, herein 502 and 502, for each class C1 . . . Cn are input for each data field of a record in a step 506. Such supervised and unsupervised training sets correspond to training sets 420, 421, and 440 (FIG. 4). More generally, flat data 110, 120 and sampled data 118 (FIG. 1). A step 508 asks if there are too many distinct data values, e.g., more than a threshold data value stored in memory. For example, data that is so random as to reveal no information and nothing systemic. If so, a step 510 excludes that field and thereby reduces the list of fields. Otherwise, a step 512 asks if there is a single data value. Again, if so such field is not too useful in later steps, and step 510 excludes that field as well. Otherwise, a step 514 asks if the Shannon entropy is too small, e.g., less than a threshold data value stored in memory. The Shannon entropy is calculable using a conventional formula:
  • H ( X ) = i = 1 n p ( x i ) I ( x i ) = i = 1 n p ( x i ) log b 1 p ( x i ) = - i = 1 n p ( x i ) log b p ( x i ) ,
  • The entropy of a message is its amount of uncertainty. It increases when the message is closer to random, and decreases when it is less random. The idea here is that the less likely an event is, the more information it provides when it occurs. If the Shannon entropy is too small, step 510 excludes that field. Otherwise, a step 516 reduces the number of fields in the set of fields carried forward as those that actually provide useful information.
  • A step 517 asks if the field type under inspection at that instant is symbolic or numeric. If symbolic, a step 518 provides AI behavior grouping. For example, colors or the names of boys. Otherwise, a step 520 does a numeric fuzzification in which a numeric value is turned into a membership of one or more fuzzy sets. Then a step 522 produces a reduced set of transformed fields. A step 524 asks if the number of criteria or data fields remaining meets a predefined target number. The target number represents a judgment of the optimum spectrum of profiling criteria data fields that will be needed to produce high performance smart-agents and good predictive models.
  • If yes, a step 526 outputs a final list of profiling criteria and data fields needed by the smart- agent steps 126 and 127 in FIG. 1 and all the other predictive model steps 130, 131, 134, 135, 138, 139, 142, 143, 146, and 147.
  • If not, the later steps in Method 100 need richer data to work with than is on-hand at the moment. The enrichment provided represents the most distinctive advantage that embodiments of the present invention have over conventional methods and systems. A step 528 (FIG. 5B) begins a process to generate additional profiling criteria and newly derived data fields. A step 530 chooses an aggregation type. A step 532 chooses a time range for a newly derived field or profiling criteria. A step 534 chooses a filter. A step 536 chooses constraints. A step 538 chooses the fields to aggregate. A step 540 chooses a recursive level.
  • A step 542 assesses the quality of the newly derived field by importing test set classes C1 . . . Cn 544 and 546. It assesses the profiling criteria and data field quality for large enough coverage in a step 548, the maximum transaction/event false positive rate (TFPR) below a limit in a step 550, the average TFPR below a limit in a step 552, transaction/event detection rate (TDR) above a threshold in a step 554, the transaction/event review rate (TRR) trend below a threshold in a step 556, the number of conditions below a threshold in a step 560, the number of records is above a threshold in a step 562, and the time window is optimal a step 564.
  • If the newly derived profiling criteria or data field has been qualified, a step 566 adds it to the list. Otherwise, the newly derive profiling criteria or data field is discarded in a step 568 and returns to step 528 to try a new iteration with updated parameters.
  • Thresholds and limits are stored in computer storage memory mechanisms as modifiable digital data values that are non-transitory. Thresholds are predetermined and is “tuned” later to optimize overall operational performance. For example, by manipulating the data values stored in a computer memory storage mechanism through an administrator's console dashboard. Thresholds are digitally compared to incoming data, or newly derived data using conventional devices.
  • Using the Data Science
  • Once the predictive model technologies have been individually trained by both supervised and unsupervised data and then packaged into a PMML Document, one or more of them can be put to work in parallel render a risk or a decision score for each new record presented to them. At a minimum, only the smart-agent predictive model technology will be employed by a user-consumer. But when more than one predictive model technology is added in to leverage their respective synergies, a decision engine algorithm is needed to single out which predicted class produced in parallel by several predictive model technologies would be the best to rely on.
  • FIG. 6 is a flowchart diagram of a method 600 for using the PMML Documents (128, 132, 136, 140, 144, and 148) of FIG. 1 with an algorithm for the run-time operation of parallel predictive model technologies.
  • Method 600 depends on an apparatus to execute an algorithm to use the predictive technologies produced by method 100 (FIG. 1) and exported as PMML Documents. Method 600 can provide a substantial commercial advantage in a real-time, record-by-record application by a business. One or more PMML Documents 601-606 are imported and put to work in parallel as predictive model technologies 611-616 to simultaneously predict a class and its confidence in that class for each new record in a raw data record input 618 that are presented to them.
  • It is important that these records receive a data-cleanup 620 and a data-enrichment, as were described for steps 108 and 122 in FIG. 1. A resulting enriched data 624 with newly derived fields in the records is then passed in parallel for simultaneous consideration and evaluation by all the predictive model technologies 611-616 present. Each will transform its inputs into a predicted class 631-636 and a confidence 641-646 stored in a computer memory storage mechanism.
  • A record-by-record decision engine 650 inputs user strategies in the form of flag settings 652 and rules 654 to decision on which to output as a prevailing predicted class output 660 and to compute a normalized confidence output 661. Such record-by-record decision engine 650 is detailed here next in FIG. 7.
  • Typical examples of prevailing predicted classes 660:
  • FIELD OF APPLICATION OUTPUT CLASSES
    stocks use class buy , buy, sell, hold, etc.
    loans use class provide a loan with an interest , or not
    risk use class fraud, no fraud, suspicious
    marketing use class category of product to suggest
    cybersecurity use class normal behavior, abnormal, etc.
  • Method 600 works with at least two of the predictive models from steps 128, 132, 136, 140, 144, and 148 (of FIG. 1). The predictive models each simultaneously produce a score and a score-confidence level in parallel sets, all from a particular record in a plurality of enriched-data records. These combine into a single result to return to a user-service consumer as a decision.
  • Further information related to combining models is included in Adjaoute '592. Special attention should be placed on FIG. 30 and the description in Column 22 on combining the technologies. There, the neural network, smart-agent, data mining, and case-based reasoning technologies all come together to produce a final decision, such as if a particular electronic transaction is fraudulent, in a different application, if there is network intrusion.
  • FIG. 7 is a flowchart diagram of an apparatus with an algorithm 700 for the decision engine 650 of FIG. 6. Algorithm 700 chooses which predicted class 631-636, or a composite of them, should be output as prevailing predicted class 660. Switches or flag settings 652 are used to control the decision outcome and are fixed by the user-service consumer in operating their business based on the data science embodied in Documents 601-606. Rules 654 too can include business rules like, “always follow the smart agent's predicted class if its confidence exceeds 90%.”
  • A step 702 inspects the rule type then in force. Compiled flag settings rules are fuzzy rules (business rules) developed with fuzzy logic. Fuzzy rules are used to merge the predicted classes from all the predictive models and technologies 631-636 and decide on one final prediction, herein, prevailing predicted class 660. Rules 654 are either manually written by analytical engineers, or they are automatically generated when analyzing the enriched training data 124 (FIG. 1) in steps 126, 130, 134, 138, 142, and 146.
  • If in step 702 it is decided to follow “compiled rules”, then a step 704 invokes the compiled flag settings rules and returns with a corresponding decision 706 for output as prevailing predicted class 660.
  • If in step 702 it is decided to follow “smart agents”, then a step 708 invokes the smart agents and returns with a corresponding decision 710 for output as prevailing predicted class 660.
  • If in step 702 it is decided to follow “predefined rules”, then a step 712 asks if the flag settings should be applied first. If not, a step 714 applies a winner-take-all test to all the individual predicted classes 631-636 (FIG. 6). A step tests if one particular class wins. If yes, a step 718 outputs that winner class for output as prevailing predicted class 660.
  • If not in step 716, a step 720 applies the flag settings to the individual predicted classes 631-636 (FIG. 6). Then a step 722 asks there is a winner rule. If yes, a step 724 outputs that winner rule decision for output as prevailing predicted class 660. Otherwise, a step 726 outputs an “otherwise” rule decision for output as prevailing predicted class 660.
  • If in step 712 flag setting are to be applied first, a step 730 applies the flags to the individual predicted classes 631-636 (FIG. 6). Then a step 732 asks if there is a winner rule. If yes, then a step 734 outputs that winner rule decision for output as prevailing predicted class 660. Otherwise, a step 736 asks if the decision should be winner-take-all. If no, a step 738 outputs an “otherwise” rule decision for output as prevailing predicted class 660.
  • If in step 736 it should be winner-take-all, a step 740 applies winner-take-all to each of the individual predicted classes 631-636 (FIG. 6). Then a step 742 asks if there is now a winner class. If not, step 738 outputs an “otherwise” rule decision for output as prevailing predicted class 660. Otherwise, a step 744 outputs a winning class decision for output as prevailing predicted class 660.
  • Compiled flag settings rules in step 704 are fuzzy rules, e.g., business rules with fuzzy logic. Such fuzzy rules are targeted to merge the predictions 631-636 into one final prediction 660. Such rules are either written by analytical engineers or are generated automatically by analyses of the training data.
  • When applying flag settings to the individual predictions, as in step 730, an algorithm for a set of ordered rules that indicate how to handle predictions output by each prediction technology. FIG. 8 illustrates this further.
  • FIG. 8 shows flag settings 800 as a set of ordered rules 801-803 that indicate how to handle each technology prediction 631-636 (FIG. 6). For each technology 611-616, there is at least one rule 801-803 that provides a corresponding threshold 811-813. Each are then compared to prediction confidences 641-646.
  • When a corresponding incoming confidence 820 is higher or equal to a given threshold 811-813 provided by a rule 801-803, the technology 611-616 associated with rule 801-803 is declared “winner” and its class and confidence are used as the final prediction. When none of the technologies 611-616 win, an “otherwise rule” determines what to do. In this case, a clause indicates how to classify the transaction (fraud/not-fraud) and it sets the confidence to zero.
  • Consider the following example:
  • Flags Settings Predictions
    Prediction Prediction Prediction
    Type Technology Threshold Class Technology Confidence
    All Smart- 0.75 Fraud Smart- 0.7
    agents agents
    All Data 0.7 Data 0.8
    Mining Mining
    . . . . . . . . . , , , . . . . .

    A first rule, e.g., 801, looks at a smart-agent confidence (e.g., 641) of 0.7, but that is below a given corresponding threshold (e.g., 811) of 0.75 so inspection continues.
  • A second rule (e.g., 802) looks at a data mining confidence (e.g., 642) of 0.8 which is above a given threshold (e.g., 812) of 0.7. Inspection stops here and decision engine 650 uses the Data Mining prediction (e.g., 632) to define the final prediction (e.g., 660). Thus it is decided in this example that the incoming transaction is fraudulent with a confidence of 0.8.
  • It is possible to define rules that apply only to specific kinds of predictions. For example, a higher threshold is associated with predictions of fraud, as opposed to prediction classes of non-frauds.
  • A winner-take-all technique groups the individual predictions 631-636 by their prediction output classes. Each Prediction Technology is assigned its own weight, one used when it predicts a fraudulent transaction, another used when it predicts a valid transaction. All similar predictions are grouped together by summing their weighted confidence. The sum of the weighted confidences is divided by the sum of the weights used in order to obtain a final confidence between 0.0 and 1.0.
  • For example:
  • Weights Predictions
    Prediction Weight- Weight- Prediction
    Technology Fraud Valid Class Technology Confidence
    Smart-agents 2 2 Fraud Smart-agents 0.7
    Data Mining 1 1 Fraud Data Mining 0.8
    Case Based 2 2 Valid Cases Based 0.4
    Reasoning Reasoning

    Here in the Example, two prediction technologies (e.g., 611 and 612) are predicting (e.g., 631 and 632) a “fraud” class for the transaction. So their cumulated weighted confidence here is computed as: 2*0.7+1*0.8 which is 2.2, and stored in computer memory. Only case-based-reasoning (e.g., 614) predicts (e.g., class 634) a “valid” transaction, so its weighted confidence here is computed as: 1*0.4, and is also stored in computer memory for comparison later.
  • Since the first computed value of 2.2 is greater than the second computed value of 0.4, this particular transaction in this example is decided to belong to the “fraud” class. The confidence is then normalized for output by dividing it by the sum of the weights that where associated with the fraud (2 and 1). So the final confidence (e.g., 661) is computed by 2.2/(2+1) giving: 0.73.
  • Some models 611-616 may have been trained to output more than just two binary classes. A fuzzification can provide more than two slots, e.g., for buy/sell/hold, or declined/suspect/approved. It may help to group classes by type of prediction (fraud or not-fraud).
  • For example:
  • Weights Predictions
    Pre- Pre-
    diction diction
    Tech- Weight- Weight- Tech- Confi- Classes
    nology Fraud Valid Class nology dence Value Type
    Smart- 2 2 00 Smart- 0.6 00 Fraud
    agents agents
    Data
    1 1 01 Data 0.5 01 Fraud
    Mining Mining
    Cases
    2 2 G Cases 0.7 G Valid
    Based Based
    Rea- Rea-
    soning soning
  • In a first example, similar classes are grouped together. So fraud=2*0.6+1*0.5=1.7, and valid=2*0.7=1.4. The transaction in this example is marked as fraudulent.
  • In a second example, all the classes are distinct, with the following equation: 2*0.6 “00”+1*0.5 “01”+2*0.7 “G” so the winner is the class “G” and the transaction is marked as valid in this example.
  • Embodiments of the present invention integrate the constituent opinions of the technologies and make a single prediction class. How they integrate the constituent predictions 631-636 depend on a user-service consumers' selections of which technologies to favor and how to favor, and such selections are made prior to training the technologies, e.g., through a model training interface.
  • A default selection includes the results of the neural network technology, the smart-agent technology, the data mining technology, and the case-based reasoning technology. Alternatively, the user-service consumer may decide to use any combination of technologies, or to select an expert mode with four additional technologies: (1) rule-based reasoning technology; (2) fuzzy logic technology; (3) genetic algorithms technology; and (4) constraint programming technology.
  • One strategy that could be defined by a user-service consumer-consumer assigns one vote to each predictive technology 611-616. A final decision 660 then stems from a majority decision reached by equal votes by the technologies within decision engine 650.
  • Another strategy definable by a user-service consumer-consumer assigns priority values to each one of technologies 611-616 with higher priorities that more heavily determine the final decision, e.g., that a transaction is fraudulent and another technology with a lower priority determines that the transaction is not fraudulent, then method embodiments of the present invention use the priority values to discriminate between the results of the two technologies and determine that the transaction is indeed fraudulent.
  • A further strategy definable by a user-service consumer-consumer specifies instead a set of meta-rules to help choose a final decision 660 for output. These all indicate an output prediction class and its confidence level as a percentage (0-1000, or 0-1.0) proportional to how confident the system apparatus is in the prediction.
  • FIG. 9 illustrates a method 900 of business decision making that requires the collaboration of two businesses, a service provider 901 and a user-consumer 902. The two businesses communicate with one another via secure Internet between network servers. The many data records and data files passed between them are hashed or encrypted by a triple-DES algorithm, or similar protection. It also possible to send a non-encrypted filed through an encrypted channel. Users of the platform would upload their data through SSL/TLS from a browser or from a command line interface (SCP or SFTP).
  • The service-provider business 901 combines method 100 (FIG. 1) and method 600 (FIG. 6) and their constituent algorithms. It accepts supervised and unsupervised training data 904 and strategies 906 from the user-service consumer business 902. Method 100 then processes such as described above with FIGS. 1-8 to produce a full set of fully trained predictive models that are passed to method 600.
  • New records from operations 906 provided, e.g., in real-time as they occur, are passed after being transformed by encryption from the user-service consumer business 902 to the service provider business 901 and method 600. An on-going run of scores, predictions, and decisions 908 (produced by method 600 according to the predictive models of method 100 and the strategies 905 and training data 904) are returned to user-service consumer business 902 after being transformed by encryption.
  • With some adjustment and reconfiguration, method 900 is trained for a wide range of uses, e.g., to classify fraud/no-fraud in payment transaction networks, to predict buy/sell/hold in stock trading, to detect malicious insider activity, and to call for preventative maintenance with machine and device failure predictions.
  • Referring again to FIG. 9, another method of operating an artificial intelligence machine to improve their decisions from included predictive models begins by deleting with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data training records stored in a memory of the artificial intelligence machine to exclude each data field in the first series of data training records that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and using an information gain to select the most useful data fields, and then transforming a surviving number of data fields in all the first series of data training records into a corresponding reduced-field series of data training records stored in the memory of the artificial intelligence machine.
  • A next step includes adding with the at least one processor a new derivative data field to all the reduced-field series of data training records stored in the memory and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data training records stored in the memory of the artificial intelligence machine.
  • A next step includes verifying with the at least one processor that each predictive model if trained with the enriched-field series of data training records stored in the memory produces decisions having fewer errors than the same predictive model trained only with the first series of data training records.
  • A further step includes recording a data-enrichment descriptor into the memory to include an identity of selected data fields in a data training record format of the first series of data training records that were subsequently deleted, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources.
  • A next step includes causing the at least one processor of the artificial intelligence machine to start extracting decisions from a new series of data records of new events by receiving and storing the new series of data records in the memory of the artificial intelligence machine.
  • A further step includes causing the at least one processor to fetch the data-enrichment descriptor and use it to select which data fields to delete and then deleting all the data values included in the selected data fields from each of a new series of data records of new events. Each data field deleted matches a data field in the first series of data training records had more than a threshold number of random data values, or that had only one repeating data value, or that had too small a Shannon entropy.
  • A next step includes adding with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory according to the data-enrichment descriptor, and initializing each added new derivative data field with a new data value stored in the memory. Each new derivative data field added matches a new derivative data field added to the enriched-field series of data training records in which real scaler numeric data values were changed into fuzzy values, or if symbolic, were changed into a behavior group data value stored in the memory, and were tested that a minimum number of data fields survive, and if not, then that generated a new derivative data field and fixed within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level.
  • The method concludes by producing and outputting a series of predictive decisions with the at least one processor that operates at least one predictive model algorithm derived from one originally built and trained with records having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
  • FIG. 10 represents an apparatus for executing an algorithm 1000 for reclassifying a decision 660 (FIG. 6) for business profitability reasons. For example, when a payment card transaction for a particular transaction amount $X has already been preliminarily “declined” and included in a decision 1002 (and 660, FIG. 6) according to some other decision model. A test 1004 compares a dollar transaction “threshold amount-A” 1006 to a computation 1008 of the running average business a particular user has been doing with the account involved. The rational for doing this is that valuable customers who do more than an average amount (threshold-A 1006) of business with their payment card should not be so easily or trivially declined. Some artificial intelligence deliberation and reconsideration is appropriate.
  • If, however test 1004 decides that the accountholder has not earned special processing, a “transaction declined” decision 1010 is issued as final (transaction-declined 110). Such is then forwarded by a financial network to the merchant point-of-sale (POS).
  • But when test 1004 decides that the accountholder has earned special processing, a transaction-preliminarily-approved decision 1012 is carried forward to a test 1014. A threshold-B transaction amount 1016 is compared to the transaction amount $X. Essentially, threshold-B transaction amount 1016 is set at a level that would relieve qualified accountholders of ever being denied a petty transaction, e.g., under $250, and yet not involve a great amount of risk should the “positive” scoring indication from the “other decision model” not prove much later to be “false”. If the transaction amount $X is less than threshold-B transaction amount 1016, a “transaction approved” decision 1018 is issued as final. Such is then forwarded by the financial network to the merchant CP/CNP, unattended terminal, ATM, online payments, etc.
  • If the transaction amount $X is more than threshold-B transaction amount 1016, a transaction-preliminarily-approved decision 1020 is carried forward to a familiar transaction pattern test 1022. An abstract 1024 of this account's transaction patterns is compared to the instant transaction. For example, if this accountholder seems to be a new parent with a new baby as evidenced in purchases of particular items, then all future purchases that could be associated are reasonably predictable. Or, in another example, if the accountholder seems to be on business in a foreign country as evidenced in purchases of particular items and travel arrangements, then all future purchases that could be reasonably associated are to be expected and scored as lower risk. And, in one more example, if the accountholder seems to be a professional gambler as evidenced in cash advances at casinos, purchases of specific things and arrangements, then these future purchases too could be reasonably associated are be expected and scored as lower risk.
  • So if the transaction type is not a familiar one, then a “transaction declined” decision 1026 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM. Otherwise; a transaction-preliminarily-approved decision 1028 is carried forward to a threshold-C test 1030.
  • A threshold-C transaction amount 1032 is compared to the transaction amount $X. Essentially, threshold-C transaction amount 1032 is set at a level that would relieve qualified accountholders of being denied a moderate transaction, e.g., under $2500, and yet not involve a great amount of risk because the accountholder's transactional behavior is within their individual norms. If the transaction amount $X is less than threshold-C transaction amount 1032, a “transaction approved” decision 1034 is issued as final (transaction-approved). Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • If the transaction amount $X is more than threshold-C transaction amount 1032, a transaction-preliminarily-approved decision 1036 is carried forward to a familiar user device recognition test 1038. An abstract 1040 of this account's user devices is compared to those used in the instant transaction.
  • So if the user device is not recognizable as one employed by the accountholder, then a “transaction declined” decision 1042 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM. Otherwise; a transaction-preliminarily-approved decision 1044 is carried forward to a threshold-D test 1046.
  • A threshold-D transaction amount 1048 is compared to the transaction amount $X. Basically, the threshold-D transaction amount 1048 is set at a higher level that would avoid denying substantial transactions to qualified accountholders, e.g., under $10,000, and yet not involve a great amount of risk because the accountholder's user devices are recognized and their instant transactional behavior is within their individual norms. If the transaction amount $X is less than threshold-D transaction amount 1032, a “transaction approved” decision 1050 is issued as final. Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • Otherwise, the transaction amount $X is just too large to override a denial if the other decision model decision 1002 was “positive”, e.g., for fraud, or some other reason. In such case, a “transaction declined” decision 1052 is issued as final (transaction-declined 110). Such is then forwarded by the financial network 106 to the merchant (CP and/or CNP) and/or unattended terminal/ATM.
  • In general, threshold-B 1016 is less than threshold-C 1032, which in turn is less than threshold-D 1048. It could be that tests 1022 and 1038 would serve profits better if swapped in FIG. 10. Embodiments of the present invention would therefore include this variation as well. It would seem that threshold-A 1006 should be empirically derived and driven by business goals.
  • The further data processing required by technology 1000 occurs in real-time while merchant (CP and CNP, ATM and all unattended terminal) and users wait for approved/declined data messages to arrive through financial network. The consequence of this is that the abstracts for this-account's-running-average-totals 1008, this account's-transaction-patterns 1024, and this-account's-devices 1040 must all be accessible and on-hand very quickly. A simple look-up is preferred to having to compute the values. The smart agents and the behavioral profiles they maintain and that we've described in this Application and those we incorporate herein by reference are up to doing this job well. Conventional methods and apparatus may struggle to provide this information quickly enough.
  • FIG. 10 represents for the first time in machine learning an apparatus that allows a different threshold for each customer. It further enables different thresholds for the same customer based on the context, e.g., a Threshold-1 while traveling, a Threshold-2 while buying things familiar with his purchase history, a Threshold-3 while in same area where they live, a Threshold-4 during holidays, a Threshold-5 for nights, a Threshold-6 during business hours, etc.
  • FIG. 11 represents an algorithm that executes as smart-agent production apparatus 1100, and is included in the build of smart-agents in steps 126 and 127 (FIG. 1), or as step 611 (FIG. 6) in operation. The results are either exported as an .IFM-type XML document in step 128, or used locally as in method 600 (FIG. 6). Step 126 (FIG. 1) builds a population of smart-agents and their profiles that are represented in FIG. 11 as smart-agents S1 1102 and Sn 1104. Step 127 (FIG. 1) initialized that build. Such population can reach into the millions for large systems, e.g., those that handle payment transaction requests nationally and internationally for millions of cardholders (entities).
  • Each new record 1106 received, from training records 124, or from data enrichment 622 in FIG. 6, is inspected by a step 1108 that identifies the entity unique to the record that has caused to record to be generated. A step 1110 gets the corresponding smart-agent that matches this identification from the initial population of smart- agents 1102, 1102 it received in step 128 (FIG. 1). A step 1112 asks if any were not found. A step 1114 uses default profiles optimally defined for each entity, and to create and initialize smart-agents and profiles for entities that do not have a match in the initial population of smart- agents 1102, 1102. A step 1116 uses the matching smart-agent and profile to assess record 1106 and issues a score 1118. A step 1120 updates the matching smart-agent profile with the new information in record 1106.
  • A step 1122 dynamically creates/removes/updates and otherwise adjusts attributes in any matching smart-agent profile based on a content of records 1106. A step 1124 adjusts an aggregation type (count, sum, distinct, ratio, average, minimum, maximum, standard deviation, . . . ) in a matching smart-agent profile. A step 1126 adjusts a time range in a matching smart-agent profile. A step 1128 adjusts a filter based on a reduced set of transformed fields in a matching smart-agent profile. A step 1130 adjusts a multi-dimensional aggregation constraint in a matching smart-agent profile. A step 1132 adjusts an aggregation field, if needed, in the matching smart-agent profile. A step 1134 adjusts a recursive level in the matching smart-agent profile.
  • FIGS. 12-29 provide greater detail regarding the construction and functioning of algorithms that are employed in FIGS. 1-11.
  • Neural Network Technology
  • FIG. 12 is a schematic diagram of the neural network architecture used in method embodiments of the present invention. Neural network 1200 consists of a set of processing elements or neurons that are logically arranged into three layers: (1) input layer 1201; (2) output layer 1202; and (3) hidden layer 1203. The architecture of neural network 1200 is similar to a back propagation neural network, but its training, utilization, and learning algorithms are different. The neurons in input layer 1201 receive input fields from a training table. Each of the input fields are multiplied by a weight such as weight “Wij” 1204 a to obtain a state or output that is passed along another weighted connection with weights “Vjt” 1205 between neurons in hidden layer 1202 and output layer 1203. The inputs to neurons in each layer come exclusively from output of neurons in a previous layer, and the output from these neurons propagate to the neurons in the following layers.
  • FIG. 13 is a diagram of a single neuron in the neural network used in method embodiments of the present invention. Neuron 1300 receives input “i” from a neuron in a previous layer. Input “i” is multiplied by a weight “Wih” and processed by neuron 1300 to produce state “s”. State “s” is then multiplied by weight “Vhi” to produce output “i” that is processed by neurons in the following layers. Neuron 1300 contains limiting thresholds 1301 that determine how an input is propagated to neurons in the following layers.
  • FIG. 14 is a flowchart of an algorithm 1400 for training neural networks with a single hidden layer that builds incrementally during a training process. The hidden layers may also grow in number later during any updates. Each training process computes a distance between all the records in a training table, and groups some of the records together. In a first step, a training set “S” and input weights “bi” are initialized. Training set “S” is initialized to contain all the records in the training table. Each field “i” in the training table is assigned a weight “bi” to indicate its importance. The input weights “bi” are selected by a client. A distance matrix D is created. Distance matrix D is a square and symmetric matrix of size N×N, where N is the total number of records in training set “S”. Each element “Dij” in row “i” and column “j” of distance matrix D contains the distance between record “i” and record “j” in training set “S”. The distance between two records in training set “S” is computed using a distance measure.
  • FIG. 15 illustrates a table of distance measures 1500 that is used in a neural network training process. Table 1500 lists distance measures that is used to compute the distance between two records Xi and Xj in training set “S”. The default distance measure used in the training process is a Weighted-Euclidean distance measure that uses input weights “bi” to assign priority values to the fields in a training table.
  • In FIG. 14, a distance matrix D is computed such that each element at row “i” and column “j” contains d(Xi, Xj) between records Xi and Xj in training set “S”. Each row “i” of distance matrix D is then sorted so that it contains the distances of all the records in training set “S” ordered from the closest one to the farthest one.
  • A new neuron is added to the hidden layer of the neural network the largest subset “Sk” of input records having the same output is determined. Once the largest subset “Sk” is determined, the neuron group is formed at step 97. The neuron group consists of two limiting thresholds, Blow and Θhigh, input weights “Wh”, and output weights “Vh”, such that Θlow=Dk, “j” and Θhigh=Dk, l, where “k” is the row in the sorted distance matrix D that contains the largest subset “Sk” of input records having the same output, “j” is the index of the first column in the subset “Sk” of row “k”, and 1 is the index of the last column in the subset “Sk” of row “k”. The input weights “Wh” are equal to the value of the input record in row “k” of the distance matrix D, and the output weights “Vh” are equal to zero except for the weight assigned between the created neuron in the hidden layer and the neuron in the output layer representing the output class value of any records belonging to subset “Sk”. A subset “Sk” is removed from training set “S”, and all the previously existing output weights “Vh” between the hidden layer and the output layer are doubled. Finally, the training set is checked to see if it still contains input records, and if so, the training process goes back. Otherwise, the training process is finished and the neural network is ready for use.
  • FIG. 16 is a flowchart of an algorithm 1600 for propagating an input record through a neural network. An input record is propagated through a network to predict if its output signifies a fraudulent transaction. A distance between the input record and the weight pattern “Wh” between the input layer and the hidden layer in the neural network is computed. The distance “d” is compared to the limiting thresholds low and high of the first neuron in the hidden layer. If the distance is between the limiting thresholds, then the weights “Wh” are added to the weights “Vh” between the hidden layer and the output layer of the neural network. If there are more neurons in the hidden layer, then the propagation algorithm goes back to repeat steps for the other neurons in the hidden layer. Finally, the predicted output class is determined according to the neuron at the output layer that has the higher weight.
  • FIG. 17 is a flowchart of an algorithm 1700 for updating the training process of a neural network. The training process is updated whenever a neural network needs to learn some new input record. Neural networks are updated automatically, as soon as data from a new record is evaluated by method embodiments of the present invention. Alternatively, the neural network may be updated offline.
  • A new training set for updating a neural network is created. The new training set contains all the new data records that were not utilized when first training the network using the training algorithm illustrated in FIG. 14. The training set is checked to see if it contains any new output classes not found in the neural network. If there are no new output classes, the updating process proceeds with the training algorithm illustrated in FIG. 14. If there are new output classes, then new neurons are added to the output layer of the neural network, so that each new output class has a corresponding neuron at the output layer. When the new neurons are added, the weights from these neurons to the existing neurons at the hidden layer of the neural network are initialized to zero. The weights from the hidden neurons to be created during the training algorithm are initialized as 2h, where “h” is the number of hidden neurons in the neural network prior to the insertion of each new hidden neuron. With this initialization, the training algorithm illustrated in FIG. 14 is started to form the updated neural network technology.
  • Evaluating if a given input record belongs to one class or other is done quickly and reliably with the training, propagation, and updating algorithms described.
  • Smart-Agent Technology
  • Smart-agent technology uses multiple smart-agents in unsupervised mode, e.g., to learn how to create profiles and clusters. Each field in a training table has its own smart-agent that cooperates with others to combine some partial pieces of knowledge they have about data for a given field, and validate the data being examined by another smart-agent. The smart-agents can identify unusual data and unexplained relationships. For example, by analyzing a healthcare database, the smart-agents would be able to identify unusual medical treatment combinations used to combat a certain disease, or to identify that a certain disease is only linked to children. The smart-agents would also be able to detect certain treatment combinations just by analyzing the database records with fields such as symptoms, geographic information of patients, medical procedures, and so on.
  • Smart-agent technology creates intervals of normal values for each one of the fields in a training table to evaluate if the values of the fields of a given electronic transaction are normal. And the technology determines any dependencies between each field in a training table to evaluate if the values of the fields of a given electronic transaction or record are coherent with the known field dependencies. Both goals can generate warnings.
  • FIG. 18 is a flowchart of an algorithm for creating intervals of normal values for a field in a training table. The algorithm illustrated in the flowchart is run for each field “a” in a training table. A list “La” of distinct couples (“vai”, “nai”) is created, where “vai” represents the ith distinct value for field “a” and “nai” represents its cardinality, e.g., the number of times value “vai” appears in a training table. At step 119, the field is determined to be symbolic or numeric. If the field is symbolic, each member of “La” is copied into a new list “Ia” whenever “nai” is superior to a threshold “Θmin” that represents the minimum number of elements a normal interval must include. “Θmin” is computed as “Θmin”=fmin*M, where M is the total number of records in a training table and fmin is a parameter specified by the user representing the minimum frequency of values in each normal interval. Finally, the relations (a, Ia) are saved in memory storage. Whenever a data record is to be evaluated by the smart-agent technology, the value of the field “a” in the data record is compared to the normal intervals created in “Ia” to determine if the value of the field “a” is outside the normal range of values for that given field.
  • If the field “a” is determined to be numeric, then the list “La” of distinct couples (“vai”, nai) is ordered starting with the smallest value Va. At step 122, the first element e=(val, nal) is removed from the list “La”, and an interval NI=[val, val] is formed. At step 124, the interval NI is enlarged to NI=[Val, vak] until Vak−Val>Θdist, where Θdist represents the maximum width of a normal interval. Θdist is computed as Θdist=(maxa−mina)/nmax, where nmax is a parameter specified by the user to denote the maximum number of intervals for each field in a training table. The values that are too dissimilar are not grouped together in the same interval.
  • The total cardinality “na” of all the values from “val” to “vak” is compared to “Θmin” to determine the final value of the list of normal intervals “Ia”. If the list “Ia” is not empty, the relations (a, Ia) are saved. Whenever a data record is to be evaluated by the smart-agent technology, the value of the field “a” in the data record is compared to the normal intervals created in “Ia” to determine if the value of the field “a” is outside the normal range of values for that given field. If the value of the field “a” is outside the normal range of values for that given field, a warning is generated to indicate that the data record is likely fraudulent.
  • FIG. 19 is a flowchart of an algorithm 1900 for determining dependencies between each field in a training table. A list Lx of couples (vxi, nxi) is created for each field “x” in a training table. The values vxi in Lx for which (nxi/nT)>Θx are determined, where nT is the total number of records in a training table and Θx is a threshold value specified by the user. In a preferred embodiment, Θx has a default value of 1%. At step 132, a list Ly of couples (vyi, nyi) for each field y, Y≠X, is created. The number of records nij where (x=xi) and (y=yj) are retrieved from a training table. If the relation is significant, that is if (nij/nxi)>Θxy, where Θxy is a threshold value specified by the user when the relation (X=xi)
    Figure US20160071017A1-20160310-P00005
    (Y=yj) is saved with the cardinalities nxi, nyj, and nij, and accuracy (nij/nxi). In a preferred embodiment, Θxy has a default value of 85%.
  • All the relations are saved in a tree made with four levels of hash tables to increase the speed of the smart-agent technology. The first level in the tree hashes the field name of the first field, the second level hashes the values for the first field implying some correlations with other fields, the third level hashes the field name with whom the first field has some correlations, and finally, the fourth level in the tree hashes the values of the second field that are correlated with the values of the first field. Each leaf of the tree represents a relation, and at each leaf, the cardinalities nxi, nyj, and nij are stored. This allows the smart-agent technology to be automatically updated and to determine the accuracy, prevalence, and the expected predictability of any given relation formed in a training table.
  • FIG. 20 is a flowchart of an algorithm 2000 for verifying the dependencies between the fields in an input record. For each field “x” in the input record corresponding to an electronic transaction, the relations starting with [(X=xi)
    Figure US20160071017A1-20160310-P00006
    . . . ] are found in the smart-agent technology tree. For all the other fields “y” in a transaction, the relations [(X=xi)
    Figure US20160071017A1-20160310-P00006
    (Y=v)] are found in the tree. A warning is triggered anytime Yj≠V. The warning indicates that the values of the fields in the input record are not coherent with the known field dependencies, which is often a characteristic of fraudulent transactions.
  • FIG. 21 is a flowchart of an algorithm 2100 for updating smart-agents. The total number of records nT in a training table is incremented by a new number of input records to be included in the update of the smart-agent technology. For the first relation (X=xi)
    Figure US20160071017A1-20160310-P00006
    (Y=yj) previously created in the technology, the parameters nxi, nyj, and nij are retrieved, and, nxi, nyj, and nij are respectively incremented. The relation is verified to see if it is still significant for including it in a smart-agent tree. If the relation is not significant, then it is removed from the tree. Finally, a check is performed to see if there are more previously created relations (X=xi)
    Figure US20160071017A1-20160310-P00006
    *(Y=yj)] in the technology. If there are, then algorithm 2100 goes back and iterates until there are no more relations in the tree to be updated.
  • Data Mining Technology
  • FIG. 22 represents one way to implement a data mining algorithm as in steps 130-132 (FIG. 1). More detail is incorporated herein by reference to Adjaoute '592, and especially that relating to its FIG. 22. Here the data mining algorithm and the data tree of step 131 are highly advantaged by having been trained by the enriched data 124. Such results in far superior training compared to conventional training with data like raw data 106.
  • Data mining identifies several otherwise hidden data relationships, including: (1) associations, wherein one event is correlated to another event such as purchase of gourmet cooking books close to the holiday season; (2) sequences, wherein one event leads to another later event such as purchase of gourmet cooking books followed by the purchase of gourmet food ingredients; (3) classification, and, e.g., the recognition of patterns and a resulting new organization of data such as profiles of customers who make purchases of gourmet cooking books; (4) clustering, e.g., finding and visualizing groups of facts not previously known; and (5) forecasting, e.g., discovering patterns in the data that can lead to predictions about the future.
  • One goal of data mining technology is to create a decision tree based on records in a training database to facilitate and speed up the case-based reasoning technology. The case-based reasoning technology determines if a given input record associated with an electronic transaction is similar to any typical records encountered in a training table. Each record is referred to as a “case”. If no similar cases are found, a warning is issued to flag the input record. The data mining technology creates a decision tree as an indexing mechanism for the case-based reasoning technology. Data mining technology can also be used to automatically create and maintain business rules for a rule-based reasoning technology.
  • The decision tree is an “N-ary” tree, wherein each node contains a subset of similar records in a training database. (An N-ary tree is a tree in which each node has no more than N children.) In preferred embodiments, the decision tree is a binary tree. Each subset is split into two other subsets, based on the result of an intersection between the set of records in the subset and a test on a field. For symbolic fields, the test is if the values of the fields in the records in the subset are equal, and for numeric fields, the test is if the values of the fields in the records in the subset are smaller than a given value. Applying the test on a subset splits the subset in two others, depending on if they satisfy the test or not. The newly created subsets become the children of the subset they originated from in the tree. The data mining technology creates the subsets recursively until each subset that is a terminal node in the tree represents a unique output class.
  • FIG. 22 is a flowchart of an algorithm 2200 for generating the data mining technology to create a decision tree based on similar records in a training table. Sets “S”, R, and U are initialized. Set “S” is a set that contains all the records in a training table, set R is the root of the decision tree, and set U is the set of nodes in the tree that are not terminal nodes. Both R and U are initialized to contain all the records in a training table. Next, a first node Ni (containing all the records in the training database) is removed from U. The triplet (field, test, value) that best splits the subset Si associated with the node Ni into two subsets is determined. The triplet that best splits the subset Si is the one that creates the smallest depth tree possible, that is, the triplet would either create one or two terminal nodes, or create two nodes that, when split, would result in a lower number of children nodes than other triplets. The triplet is determined by using an impurity function such as Entropy or the Gini index to find the information conveyed by each field value in the database. The field value that conveys the least degree of information contains the least uncertainty and determines the triplet to be used for splitting the subsets.
  • A node Nij is created and associated to the first subset Sij formed. The node Nij is then linked to node Ni, and named with the triplet (field, test, value). Next, a check is performed to evaluate if all the records in subset Sij at node Nij belong to the same output class cij. If they do, then the prediction of node Nij is set to cij. If not, then node Nij is added to U. The algorithm then proceeds to check if there are still subsets Sij to be split in the tree, and if so, the algorithm goes back. When all subsets have been associated with nodes, the algorithm continues for the remaining nodes in U until U is determined to be empty.
  • FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database. The decision tree starts with a root node NO (2302). Once the data records in database 2301 are analyzed, a test 2303 is determined that best splits database 2301 into two nodes, a node N1 (2304) with a subset 2305, and a node N2 (2306) with a subset 2307. Node N1 (2304) is a terminal node type, since all data records in subset 2305 have the same class output that indicates a high insurance risk for drivers that are younger than twenty-five.
  • The data mining technology then splits a node N2 (2306) into two additional nodes, a node N3 (2308) containing a subset 2309, and a node N4 (2310) containing a subset 2311. Both nodes N3 (2308) and N4 (2310) were split from node N2 (2306) based on a test 2312, that checks if the car type is a sports car. As a result, nodes N3 (2308) and N4 (2310) are terminal nodes, with node N3 (2308) signifying a high insurance risk and node N4 (2310) representing a low insurance risk.
  • The decision tree formed by the data mining technology is preferably a depth two binary tree, significantly reducing the size of the search problem for the case-based reasoning technology. Instead of searching for similar cases to an incoming data record associated with an electronic transaction in the entire database, the case-based reasoning technology only has to use the predefined index specified by the decision tree.
  • Case-Based Reasoning Technology
  • The case-based reasoning technology stores past data records or cases to identify and classify a new case. It reasons by analogy and classification. Case-based reasoning technologies create a list of generic cases that best represent the cases in its training table. A typical case is generated by computing similarities between all the cases in its training table and selecting those cases that best represent distinct cases. Whenever a new case is presented in a record, a decision tree is to determine if any input record it has on file in its database is similar to something encountered in its training table.
  • FIG. 24 is a flowchart of an algorithm for generating a case-based reasoning technology used later to find a record in a database that best resembles an input record corresponding to a new transaction. An input record is propagated through a decision tree according to tests defined for each node in the tree until it reaches a terminal node. If an input record is not fully defined, that is, the input record does not contain values assigned to certain fields, and then the input record is propagated to a last node in a tree that satisfies all the tests. The cases retrieved from this node are all the cases belonging to the node's leaves.
  • A similarity measure is computed between the input record and each one of the cases retrieved. The similarity measure returns a value that indicates how close the input record is to a given case retrieved. The case with the highest similarity measure is then selected as the case that best represents the input record. The solution is revised by using a function specified by the user to modify any weights assigned to fields in the database. Finally, the input record is included in the training database and the decision tree is updated for learning new patterns.
  • FIG. 25 represents a table 2500 of global similarity measures useful by case-based reasoning technology. The table lists an example of six similarity measures that could be used in case-based reasoning to compute a similarity between cases. The Global Similarity Measure is a computation of the similarity between case values V1i and V2i and are based on local similarity measures simi for each field yi. The global similarity measures may also employ weights wi for different fields.
  • FIG. 26 is an example table of Local Similarity Measures useful in case-based reasoning. Table 2600 lists fourteen different Local Similarity Measures that is used by the global similarity measures listed. The local similarity measures depend on the field type and valuation. The field type is: (1) symbolic or nominal; (2) ordinal, when the values are ordered; (3) taxonomic, when the values follow a hierarchy; and (4) numeric, which can take discrete or continuous values. The Local Similarity Measures are based on a number of parameters, including: (1) the values of a given field for two cases, V1 and V2; (2) the lower (V1− and V2−) and higher (V1+ and V2+) limits of V1 and V2; (3) the set of all values that is reached by the field; (4) the central points of V1 and V2, V1c and V2c; (5) the absolute value “ec” of a given interval; and (6) the height “h” of a level in a taxonomic descriptor.
  • Genetic Algorithms Technology
  • Genetic algorithms technologies include a library of genetic algorithms that incorporate biological evolution concepts to find if a class is true, e.g., a business transaction is fraudulent, there is network intrusion, etc. Genetic algorithms is used to analyze many data records and predictions generated by other predictive technologies and recommend its own efficient strategies for quickly reaching a decision.
  • Rule-Based Reasoning, Fuzzy Logic, and Constraint Programming Technologies
  • Rule-based reasoning, fuzzy logic, and constraint programming technologies include business rules, constraints, and fuzzy rules to determine the output class of a current data record, e.g., if an electronic transaction is fraudulent. Such business rules, constraints, and fuzzy rules are derived from past data records in a training database or created from predictable but unusual data records that may arise in the future. The business rules is automatically created by the data mining technology, or they is specified by a user. The fuzzy rules are derived from business rules, with constraints specified by a user that specify which combinations of values for fields in a database are allowed and which are not.
  • FIG. 27 represents a rule 2700 for use with the rule-based reasoning technology. Rule 2700 is an IF-THEN rule containing an antecedent and consequence. The antecedent uses tests or conditions on data records to analyze them. The consequence describes the actions to be taken if the data satisfies the tests. An example of rule 2700 that determines if a credit card transaction is fraudulent for a credit card belonging to a single user may include “IF (credit card user makes a purchase at 8 AM in New York City) and (credit card user makes a purchase at 8 AM in Atlanta) THEN (credit card number may have been stolen)”. The use of the words “may have been” in the consequence sets a trigger that other rules need to be checked to determine if the credit card transaction is indeed fraudulent or not.
  • FIG. 28 represents a fuzzy rule 2800 to specify if a person is tall. Fuzzy rule 2800 uses fuzzy logic to handle the concept of partial truth, e.g., truth values between “completely true” and “completely false” for a person who may or may not be considered tall. Fuzzy rule 2800 contains a middle ground, in addition to the binary patterns of yes/no. Fuzzy rule 2800 derives here from an example rule such as
      • “IF height >6 ft., THEN person is tall”.
        Fuzzy logic derives fuzzy rules by “fuzzification” of the antecedents and “de-fuzzification” of the consequences of business rules.
  • FIG. 29 is a flowchart of an algorithm 2900 for applying rule-based reasoning, fuzzy logic, and constraint programming to determine if an electronic transaction is fraudulent. The rules and constraints are specified by a user-service consumer and/or derived by data mining technology. The data record associated with a current electronic transaction is matched against the rules and the constraints to determine which rules and constraints apply to the data. The data is tested against the rules and constraints to determine if the transaction is fraudulent. The rules and constraints are updated to reflect the new electronic transaction.
  • The present inventor, Dr. Akli Adjaoute and his Company, Brighterion, Inc. (San Francisco, Calif.), have been highly successful in developing fraud detection computer models and applications for banks, payment processors, and other financial institutions. In particular, these fraud detection computer models and applications are trained to follow and develop an understanding of the normal transaction behavior of single individual accountholders. Such training is sourced from multi-channel transaction training data or single-channel. Once trained, the fraud detection computer models and applications are highly effective when used in real-time transaction fraud detection that comes from the same channels used in training.
  • Some embodiments of the present invention train several single-channel fraud detection computer models and applications with corresponding different channel training data. The resulting, differently trained fraud detection computer models and applications are run several in parallel so each can view a mix of incoming real-time transaction message reports flowing in from broad diverse sources from their unique perspectives. One may compute a “hit” the others will miss, and that's the point.
  • If one differently trained fraud detection computer model and application produces a hit, it is considered herein a warning that the accountholder has been compromised or has gone rogue. The other differently trained fraud detection computer models and applications should be and are sensitized to expect fraudulent activity from this accountholder in the other payment transaction channels. Hits across all channels are added up and too many is reason to shut down all payment channels for the affected accountholder.
  • In general, a method of cross-channel financial fraud protection comprises training a variety of real-time, risk-scoring fraud model technologies with training data selected for each from a common transaction history. This then can specialize each member in the monitoring of a selected channel. After training, the heterogeneous real-time, risk-scoring fraud model technologies are arranged in parallel so that all receive the same mixed channel flow of real-time transaction data or authorization requests.
  • Parallel, diversity trained, real-time, risk-scoring fraud model technologies are hosted on a network server platform for real-time risk scoring of a mixed channel flow of real-time transaction data or authorization requests. Risk thresholds are directly updated for particular accountholders in every member of the parallel arrangement of diversity trained real-time, risk-scoring fraud model technologies when any one of them detects a suspicious or outright fraudulent transaction data or authorization request for the accountholder. So, a compromise, takeover, or suspicious activity of an accountholder's account in any one channel is thereafter prevented from being employed to perpetrate a fraud in any of the other channels.
  • Such method of cross-channel financial fraud protection can further include building a population of real-time, long-term, and recursive profiles for each accountholder in each of the real-time, risk-scoring fraud model technologies. Then during real-time use, maintaining and updating the real-time, long-term, and recursive profiles for each accountholder in each and all of the real-time, risk-scoring fraud model technologies with newly arriving data.
  • If during real-time use a compromise, takeover, or suspicious activity of the accountholder's account in any one channel is detected, then updating the real-time, long-term, and recursive profiles for each accountholder in each and all of the other real-time, risk-scoring fraud model technologies to further include an elevated risk flag. The elevated risk flags are included in a final risk score calculation 728 for the current transaction or authorization request.
  • Fifteen-minute vectors are a way to cross pollenate risks calculated in one channel with the others. The 15-minute vectors can represent an amalgamation or fuzzification of transactions in all channels, or channel-by channel. Once a 15-minute vector has aged, it is shifted into a 100-minute vector, a one-hour vector, and a whole day vector by a simple shift register means. These vectors represent velocity counts that is very effective in catching fraud as it is occurring in real time.
  • In every case, embodiments of the present invention include adaptive learning that combines three learning techniques to evolve the artificial intelligence classifiers. First is the automatic creation of profiles, or smart-agents, from historical data, e.g., long-term profiling. The second is real-time learning, e.g., enrichment of the smart-agents based on real-time activities. The third is adaptive learning carried by incremental learning algorithms.
  • For example, two years of historical credit card transactions data needed over twenty seven terabytes of database storage. A smart-agent is created for each individual card in that data in a first learning step, e.g., long-term profiling. Each profile is created from the card's activities and transactions that took place over the two year period. Each profile for each smart-agent comprises knowledge extracted field-by-field, such as merchant category code (MCC), time, amount for an mcc over a period of time, recursive profiling, zip codes, type of merchant, monthly aggregation, activity during the week, weekend, holidays, Card not present (CNP) versus card present (CP), domestic versus cross-border, etc. this profile will highlights all the normal activities of the smart-agent (specific payment card).
  • Smart-agent technology learns specific behaviors of each cardholder and creates a smart-agent to follow the behavior of each cardholder. Because it learns from each activity of a cardholder, the smart-agent updates its profiles and makes effective changes at runtime. It is the only technology with an ability to identify and stop, in real-time, previously unknown fraud schemes. It has the highest detection rate and lowest false positives because it separately follows and learns the behaviors of each cardholder.
  • Smart-agents have a further advantage in data size reduction. Once, say twenty-seven terabytes of historical data is transformed into smart-agents, only 200-gigabytes is needed to represent twenty-seven million distinct smart-agents corresponding to all the distinct cardholders.
  • Incremental learning technologies are embedded in the machine algorithms and smart-agent technology to continually re-train from any false positives and negatives that occur along the way. Each corrects itself to avoid repeating the same classification errors. Data mining logic incrementally changes the decision trees by creating a new link or updating the existing links and weights. Neural networks update the weight matrix, and case based reasoning logic updates generic cases or creates new ones. Smart-agents update their profiles by adjusting the normal/abnormal thresholds, or by creating exceptions.
  • FIG. 30 represents a flowchart of an algorithm 3000 executed by an apparatus needed to implement a method embodiment of the present invention for improving predictive model training and performance by data enrichment of transaction records.
  • The data enrichment of transaction records is done first with supervised and unsupervised training data 124 (FIG. 1) and training sets 420+422+424, 421+423+425, and 440+442+444 (FIG. 4) during training to build predictive models 127, 131, 135, 139, 143, and 147 (FIG. 1), and 601-606 (FIG. 6). These are ultimately deployed as predictive models 611-616 (FIG. 6) for use in real time with a raw feed of new event, non-training data records 906 (FIG. 9).
  • FIG. 30 shows on the left that method 500 (FIG. 5) includes a step 3001 to delete some data fields not particularly useful, a step 3002 to add some data fields are helpful, a step 3003 to test that the data fields added in step 3002 do improve the final predictions, and a step 3004 to loop until all the original data fields are scrutinized.
  • In summary, embodiments of the present invention include a method 3000 of operating an artificial intelligence machine 100 to produce predictive model language documents 128, 132, 136, 140, 144, and 148 describing improved predictive models that generate better business decisions 660, 661 from raw data record inputs 618. A first phase includes deleting 3001 with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data records (e.g., training sets 420+422+424, 421+423+425, and 440+442+444 [FIG. 4]) stored in a memory of the artificial intelligence machine to exclude each data field in the first series of data records that has more than a threshold number of random data values, or that has only one repeating data value, or has too small a Shannon entropy, and then transforming a surviving number of data fields in all the first series of data records into a corresponding reduced-field series of data records stored in the memory of the artificial intelligence machine.
  • A next phase includes adding 3002 with the at least one processor a new derivative data field to all the reduced-field series of data records stored in the memory of the artificial intelligence machine and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine.
  • And a next phase includes verifying 3003 with the at least one processor that a predictive model trained with the enriched-field series of data records stored in the memory of the artificial intelligence machine produces more accurate predictions from the artificial intelligence machine having fewer errors than the same predictive model trained only with the first series of data records.
  • Another phase of the method includes verifying with the at least one processor that a predictive model 611-616 fed a non-training set of the enriched-field series of data records 906 stored in the memory of the artificial intelligence machine produces more accurate predictions 660, 661 with fewer errors than the same predictive model fed with data records with unmodified data fields.
  • A still further phase of the method includes recording as a data- enrichment descriptor 3006 and 3008 into the memory of the artificial intelligence machine including the at least one processor an identity of any data fields in a data record format of the first series of data records that were subsequently deleted and can be ignored, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources.
  • Another phase includes passing along the data-enrichment descriptor with the at least one processor information stored in the memory of the artificial intelligence machine to an artificial intelligence machine including processors for predictive model algorithms to produce and output better business decisions from its own feed of new events as raw data record inputs stored in the memory of the artificial intelligence machine.
  • A method 622 (FIG. 6) of operating an artificial intelligence machine including processors for predictive model algorithms that produces and that outputs better business decisions 660, 661 from a new series of data records of new events as raw data record inputs 618 and 906, includes a phase to recover with at least one processor a recording of a data-enrichment descriptor stored in a memory of an artificial intelligence machine including an identity 3006 of any data fields in a data record format of a series of data records that were subsequently deleted by an artificial intelligence machine including processors for predictive model building, and which of any newly derived data fields 3008 were subsequently added, and how each newly derived data field was derived and from which information sources. A next phase includes accepting a new series of data records 906 of new events with the artificial intelligence machine including at least one processor to receive and store records in the memory of the artificial intelligence machine. A next phase of the method 3000 includes ignoring or deleting 3010 with the at least one processor all data fields and all data values contained in the data fields from each of a new series of data records of new events, stored in the memory of the artificial intelligence machine, according to the data-enrichment descriptor 3006. And in a next phase that includes adding 3011 with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory of the artificial intelligence machine according to the data-enrichment descriptor 3008, and initializing each added new derivative data field with a new data value stored in the memory of the artificial intelligence machine.
  • The method further includes producing and outputting a series of predictive decisions 660, 661 with the at least one processor that operates at least one predictive model algorithm 611-616 derived from one originally built and trained with records (e.g., training sets 420+422+424, 421+423+425, and 440+442+444 [FIG. 4]) having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
  • The method excludes each data field stored in the memory of the artificial intelligence machine that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and then transforming a surviving number of data fields into a corresponding reduced-field series of data records stored in the memory of the artificial intelligence machine.
  • The method adds a new derivative data field to a reduced-field series of data records stored in the memory of the artificial intelligence machine and initialize each added new derivative data field with a new data value, and to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value stored in the memory of the artificial intelligence machine, and testing that a minimum number of data fields survive in that stored in the memory of the artificial intelligence machine, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and which the quality of each newly derived data field was test, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine.
  • Although particular embodiments of the present invention have been described and illustrated, such is not intended to limit the invention. Modifications and changes will no doubt become apparent to those skilled in the art, and it is intended that the invention only be limited by the scope of the appended claims.

Claims (7)

1. A method of operating an artificial intelligence machine to improve their decisions from included predictive models, comprising:
deleting with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data training records stored in a memory of the artificial intelligence machine to exclude each data field in the first series of data training records that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and using any information gained to select the most useful data fields, and then transforming a surviving number of data fields in all the first series of data training records into a corresponding reduced-field series of data training records stored in the memory of the artificial intelligence machine;
adding with the at least one processor a new derivative data field to all the reduced-field series of data training records stored in the memory and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data training records stored in the memory of the artificial intelligence machine;
verifying with the at least one processor that each predictive model if trained with the enriched-field series of data training records stored in the memory produces decisions having fewer errors than the same predictive model trained only with the first series of data training records;
recording a data-enrichment descriptor into the memory to include an identity of selected data fields in a data training record format of the first series of data training records that were subsequently deleted, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources;
causing the at least one processor of the artificial intelligence machine to start extracting decisions from a new series of data records of new events by receiving and storing the new series of data records in the memory of the artificial intelligence machine;
causing the at least one processor to fetch the data-enrichment descriptor and use it to select which data fields to delete and then deleting all the data values included in the selected data fields from each of a new series of data records of new events;
wherein, each data field deleted matches a data field in the first series of data training records had more than a threshold number of random data values, or that had only one repeating data value, or that had too small a Shannon entropy;
adding with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory according to the data-enrichment descriptor, and initializing each added new derivative data field with a new data value stored in the memory;
wherein, each new derivative data field added matches a new derivative data field added to the enriched-field series of data training records in which real scaler numeric data values were changed into fuzzy values, or if symbolic, were changed into a behavior group data value stored in the memory, and were tested that a minimum number of data fields survive, and if not, then that generated a new derivative data field and fixed within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level; and
producing and outputting a series of predictive decisions with the at least one processor that operates at least one predictive model algorithm derived from one originally built and trained with records having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
2. A method of operating an artificial intelligence machine to produce predictive model language documents describing improved predictive models that generate better business decisions from raw data record inputs, comprising:
deleting with at least one processor a selected data field and any data values contained in the selected data field from each of a first series of data records stored in a memory of the artificial intelligence machine to exclude each data field in the first series of data records that has more than a threshold number of random data values, or that has only one repeating data value, or has too small a Shannon entropy, and then transforming a surviving number of data fields in all the first series of data records into a corresponding reduced-field series of data records stored in the memory of the artificial intelligence machine;
adding with the at least one processor a new derivative data field to all the reduced-field series of data records stored in the memory of the artificial intelligence machine and initializing each added new derivative data field with a new data value, and including an apparatus for executing an algorithm to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value, and testing that a minimum number of data fields survive, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and then assessing the quality of a newly derived data field by testing it with a test set of data, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine; and
verifying with the at least one processor that a predictive model trained with the enriched-field series of data records stored in the memory of the artificial intelligence machine produces more accurate predictions from the artificial intelligence machine having fewer errors than the same predictive model trained only with the first series of data records.
3. The method of claim 2, further comprising:
verifying with the at least one processor that a predictive model supplied with a non-training set of the enriched-field series of data records stored in the memory of the artificial intelligence machine produces more accurate predictions with fewer errors than the same predictive model fed with data records with unmodified data fields.
4. The method of claim 2, further comprising:
recording as a data-enrichment descriptor into the memory of the artificial intelligence machine including the at least one processor an identity of any data fields in a data record format of the first series of data records that were subsequently deleted, and which newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources; and
passing along the data-enrichment descriptor with the at least one processor information stored in the memory of the artificial intelligence machine to an artificial intelligence machine including processors for predictive model algorithms to produce and output better business decisions from its own feed of new events as raw data record inputs stored in the memory of the artificial intelligence machine.
5. A method of operating an artificial intelligence machine including processors for predictive model algorithms that produces and that outputs better business decisions from a new series of data records of new events as raw data record inputs, comprising:
recovering with at least one processor a recording of a data-enrichment descriptor stored in a memory of the artificial intelligence machine including an identity of any data fields in a data record format of a series of data records that were subsequently deleted by an artificial intelligence machine including processors for predictive model building, and which of any newly derived data fields were subsequently added, and how each newly derived data field was derived and from which information sources;
accepting a new series of data records of new events with the artificial intelligence machine including at least one processor to receive and store records in the memory of the artificial intelligence machine;
deleting with the at least one processor all data fields and all data values contained in the data fields from each of a new series of data records of new events, stored in the memory of the artificial intelligence machine, according to the data-enrichment descriptor;
adding with the at least one processor a new derivative data field to each record of the new series of data records stored in the memory of the artificial intelligence machine according to the data-enrichment descriptor, and initializing each added new derivative data field with a new data value stored in the memory of the artificial intelligence machine; and
producing and outputting a series of predictive decisions with the at least one processor that operates at least one predictive model algorithm derived from one originally built and trained with records having a same record format described by the data-enrichment descriptor and stored in the memory of the artificial intelligence machine.
6. The method of claim 5 which includes causing the at least one processor in the step of deleting to:
exclude each data field stored in the memory of the artificial intelligence machine that has more than a threshold number of random data values, or that has only one repeating data value, or that has too small a Shannon entropy, and then transforming a surviving number of data fields into a corresponding reduced-field series of data records stored in the memory of the artificial intelligence machine.
7. The method of claim 6 which includes causing the at least one processor in the step of adding to:
add a new derivative data field to a reduced-field series of data records stored in the memory of the artificial intelligence machine and initialize each added new derivative data field with a new data value, and to either change real scaler numeric data values into fuzzy values, or if symbolic, to change a behavior group data value stored in the memory of the artificial intelligence machine, and testing that a minimum number of data fields survive in that stored in the memory of the artificial intelligence machine, and if not, then to generate a new derivative data field and fix within each an aggregation type, a time range, a filter, a set of aggregation constraints, a set of data fields to aggregate, and a recursive level, and which the quality of each newly derived data field was test, and then transforming the results into an enriched-field series of data records stored in the memory of the artificial intelligence machine.
US14/941,586 2014-10-15 2015-11-14 Method of operating artificial intelligence machines to improve predictive model training and performance Abandoned US20160071017A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/941,586 US20160071017A1 (en) 2014-10-15 2015-11-14 Method of operating artificial intelligence machines to improve predictive model training and performance
US16/674,980 US10984423B2 (en) 2014-10-15 2019-11-05 Method of operating artificial intelligence machines to improve predictive model training and performance
US17/200,997 US20210248612A1 (en) 2014-10-15 2021-03-15 Method of operating artificial intelligence machines to improve predictive model training and performance

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US14/514,381 US20150032589A1 (en) 2014-08-08 2014-10-15 Artificial intelligence fraud management solution
US14/521,667 US20150046332A1 (en) 2014-08-08 2014-10-23 Behavior tracking smart agents for artificial intelligence fraud protection and management
US14/815,848 US20150339672A1 (en) 2014-08-08 2015-07-31 Automation tool development method for building computer fraud management applications
US14/815,934 US20150339673A1 (en) 2014-10-28 2015-07-31 Method for detecting merchant data breaches with a computer network server
US14/941,586 US20160071017A1 (en) 2014-10-15 2015-11-14 Method of operating artificial intelligence machines to improve predictive model training and performance

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US14/521,667 Continuation-In-Part US20150046332A1 (en) 2014-08-08 2014-10-23 Behavior tracking smart agents for artificial intelligence fraud protection and management
US14/815,848 Continuation-In-Part US20150339672A1 (en) 2014-08-08 2015-07-31 Automation tool development method for building computer fraud management applications

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/674,980 Continuation US10984423B2 (en) 2014-10-15 2019-11-05 Method of operating artificial intelligence machines to improve predictive model training and performance

Publications (1)

Publication Number Publication Date
US20160071017A1 true US20160071017A1 (en) 2016-03-10

Family

ID=55437807

Family Applications (3)

Application Number Title Priority Date Filing Date
US14/941,586 Abandoned US20160071017A1 (en) 2014-10-15 2015-11-14 Method of operating artificial intelligence machines to improve predictive model training and performance
US16/674,980 Active US10984423B2 (en) 2014-10-15 2019-11-05 Method of operating artificial intelligence machines to improve predictive model training and performance
US17/200,997 Abandoned US20210248612A1 (en) 2014-10-15 2021-03-15 Method of operating artificial intelligence machines to improve predictive model training and performance

Family Applications After (2)

Application Number Title Priority Date Filing Date
US16/674,980 Active US10984423B2 (en) 2014-10-15 2019-11-05 Method of operating artificial intelligence machines to improve predictive model training and performance
US17/200,997 Abandoned US20210248612A1 (en) 2014-10-15 2021-03-15 Method of operating artificial intelligence machines to improve predictive model training and performance

Country Status (1)

Country Link
US (3) US20160071017A1 (en)

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372318A1 (en) * 2016-06-23 2017-12-28 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US20180012163A1 (en) * 2016-07-06 2018-01-11 Mastercard International Incorporated Method and system for providing sales information and insights through a conversational interface
US20180053213A1 (en) * 2016-08-18 2018-02-22 Bryan Joseph Wrzesinski System and method for copyright content monetization
WO2018218259A1 (en) * 2017-10-19 2018-11-29 Pure Storage, Inc. Ensuring reproducibility in an artificial intelligence infrastructure
US20190066133A1 (en) * 2016-11-11 2019-02-28 Jpmorgan Chase Bank, N.A. System and method for providing data science as a service
EP3451250A4 (en) * 2016-04-27 2019-10-02 The Fourth Paradigm (Beijing) Tech Co Ltd Method and device for presenting prediction model, and method and device for adjusting prediction model
WO2019190919A1 (en) * 2018-03-26 2019-10-03 Adp, Llc Intelligent security risk assessment
US20190370695A1 (en) * 2018-05-31 2019-12-05 Microsoft Technology Licensing, Llc Enhanced pipeline for the generation, validation, and deployment of machine-based predictive models
US10649988B1 (en) * 2017-10-19 2020-05-12 Pure Storage, Inc. Artificial intelligence and machine learning infrastructure
US20200286084A1 (en) * 2019-03-05 2020-09-10 International Business Machines Corporation Auto-evolving database endorsement policies
US20200356816A1 (en) * 2019-05-08 2020-11-12 Komodo Health Determining an association metric for record attributes associated with cardinalities that are not necessarily the same for training and applying an entity resolution model
US10949854B1 (en) 2016-03-25 2021-03-16 State Farm Mutual Automobile Insurance Company Reducing false positives using customer feedback and machine learning
US10963627B2 (en) * 2018-06-11 2021-03-30 Adobe Inc. Automatically generating digital enterprise content variants
US11017298B2 (en) * 2016-09-21 2021-05-25 Scianta Analytics Llc Cognitive modeling apparatus for detecting and adjusting qualitative contexts across multiple dimensions for multiple actors
CN112860303A (en) * 2021-02-07 2021-05-28 济南大学 Model incremental updating method and system
US11023969B2 (en) * 2018-02-06 2021-06-01 Chicago Mercantile Exchange Inc. Message transmission timing optimization
CN113221503A (en) * 2020-12-31 2021-08-06 芯和半导体科技(上海)有限公司 Passive device modeling simulation engine based on machine learning
US11099529B2 (en) 2019-07-23 2021-08-24 International Business Machines Corporation Prediction optimization for system level production control
US11106689B2 (en) * 2019-05-02 2021-08-31 Tate Consultancy Services Limited System and method for self-service data analytics
US11138631B1 (en) * 2015-10-30 2021-10-05 Amazon Technologies, Inc. Predictive user segmentation modeling and browsing interaction analysis for digital advertising
US20210319098A1 (en) * 2018-12-31 2021-10-14 Intel Corporation Securing systems employing artificial intelligence
US11151660B1 (en) * 2019-04-03 2021-10-19 Progressive Casualty Insurance Company Intelligent routing control
US20210390424A1 (en) * 2020-06-10 2021-12-16 At&T Intellectual Property I, L.P. Categorical inference for training a machine learning model
US11244321B2 (en) * 2019-10-02 2022-02-08 Visa International Service Association System, method, and computer program product for evaluating a fraud detection system
US20220043850A1 (en) * 2020-08-07 2022-02-10 Basf Se Practical supervised classification of data sets
CN114090601A (en) * 2021-11-23 2022-02-25 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
US11263484B2 (en) * 2018-09-20 2022-03-01 Innoplexus Ag System and method for supervised learning-based prediction and classification on blockchain
US20220070282A1 (en) * 2020-08-31 2022-03-03 Ashkan SOBHANI Methods, systems, and media for network model checking using entropy based bdd compression
US11348016B2 (en) * 2016-09-21 2022-05-31 Scianta Analytics, LLC Cognitive modeling apparatus for assessing values qualitatively across a multiple dimension terrain
US11367088B2 (en) * 2016-11-11 2022-06-21 Jpmorgan Chase Bank, N.A. System and method for providing data science as a service
US11367142B1 (en) * 2017-09-28 2022-06-21 DatalnfoCom USA, Inc. Systems and methods for clustering data to forecast risk and other metrics
US11403327B2 (en) * 2019-02-20 2022-08-02 International Business Machines Corporation Mixed initiative feature engineering
US11429725B1 (en) * 2018-04-26 2022-08-30 Citicorp Credit Services, Inc. (Usa) Automated security risk assessment systems and methods
US11443235B2 (en) 2019-11-14 2022-09-13 International Business Machines Corporation Identifying optimal weights to improve prediction accuracy in machine learning techniques
US11455168B1 (en) 2017-10-19 2022-09-27 Pure Storage, Inc. Batch building for deep learning training workloads
US11455199B2 (en) * 2020-05-26 2022-09-27 Micro Focus Llc Determinations of whether events are anomalous
US11461690B2 (en) 2016-07-18 2022-10-04 Nantomics, Llc Distributed machine learning systems, apparatus, and methods
US11469878B2 (en) * 2019-01-28 2022-10-11 The Toronto-Dominion Bank Homomorphic computations on encrypted data within a distributed computing environment
US11475326B2 (en) * 2020-03-11 2022-10-18 International Business Machines Corporation Analyzing test result failures using artificial intelligence models
US11494692B1 (en) 2018-03-26 2022-11-08 Pure Storage, Inc. Hyperscale artificial intelligence and machine learning infrastructure
US11496480B2 (en) 2018-05-01 2022-11-08 Brighterion, Inc. Securing internet-of-things with smart-agent technology
US11514354B2 (en) 2018-04-20 2022-11-29 Accenture Global Solutions Limited Artificial intelligence based performance prediction system
US11515018B2 (en) * 2018-11-08 2022-11-29 Express Scripts Strategic Development, Inc. Systems and methods for patient record matching
US11537598B1 (en) 2021-08-12 2022-12-27 International Business Machines Corporation Effective ensemble model prediction system
US11544713B1 (en) * 2019-09-30 2023-01-03 United Services Automobile Association (Usaa) Fraud detection using augmented analytics
US11625730B2 (en) * 2017-11-28 2023-04-11 Equifax Inc. Synthetic online entity detection
US20230269263A1 (en) * 2022-02-24 2023-08-24 Bank Of America Corporation Adversarial Machine Learning Attack Detection and Prevention System
US20230316349A1 (en) * 2022-04-05 2023-10-05 Tide Platform Limited Machine-learning model to classify transactions and estimate liabilities
EP4071570A4 (en) * 2019-12-05 2023-12-06 OMRON Corporation Prediction system, information processing device, and information processing program
US20230401578A1 (en) * 2022-06-10 2023-12-14 Oracle Financial Services Software Limited Automatic modification of transaction constraints
US11861423B1 (en) 2017-10-19 2024-01-02 Pure Storage, Inc. Accelerating artificial intelligence (‘AI’) workflows
US11941496B2 (en) * 2020-03-19 2024-03-26 International Business Machines Corporation Providing predictions based on a prediction accuracy model using machine learning
US20240127384A1 (en) * 2022-10-04 2024-04-18 Mohamed bin Zayed University of Artificial Intelligence Cooperative health intelligent emergency response system for cooperative intelligent transport systems
US12067466B2 (en) 2017-10-19 2024-08-20 Pure Storage, Inc. Artificial intelligence and machine learning hyperscale infrastructure
US12073408B2 (en) 2016-03-25 2024-08-27 State Farm Mutual Automobile Insurance Company Detecting unauthorized online applications using machine learning
US12079815B2 (en) 2019-04-19 2024-09-03 Paypal, Inc. Graphical user interface for editing classification rules

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10949764B2 (en) * 2017-08-31 2021-03-16 International Business Machines Corporation Automatic model refreshment based on degree of model degradation
US20240214003A1 (en) * 2017-10-30 2024-06-27 AtomBeam Technologies Inc. Data compression with signature-based intrusion detection
US12003256B2 (en) * 2017-10-30 2024-06-04 AtomBeam Technologies Inc. System and method for data compression with intrusion detection
US11816226B2 (en) * 2018-03-22 2023-11-14 Via Science, Inc. Secure data processing transactions
US11593818B2 (en) * 2018-07-09 2023-02-28 Truecar, Inc. System and method for correlating and enhancing data obtained from distributed sources in a network of distributed computer systems
US10834142B2 (en) * 2018-10-09 2020-11-10 International Business Machines Corporation Artificial intelligence assisted rule generation
US20200160443A1 (en) * 2018-11-15 2020-05-21 Joseph Martin Weideman Process for the detection and identification of idiosyncratic valuations by intent in equities commerce and other illegal equity trades in u.s. domestic stock markets
US11625736B2 (en) * 2019-12-02 2023-04-11 Oracle International Corporation Using machine learning to train and generate an insight engine for determining a predicted sales insight
US20210279606A1 (en) * 2020-03-09 2021-09-09 Samsung Electronics Co., Ltd. Automatic detection and association of new attributes with entities in knowledge bases
US11468271B2 (en) * 2020-03-31 2022-10-11 Aiola Ltd. Method of data prediction and system thereof
US20210365922A1 (en) * 2020-05-20 2021-11-25 Wells Fargo Bank, N.A. Device controls
DK202070472A1 (en) * 2020-07-09 2022-01-18 A P Moeller Mærsk As A method for controlling a process for handling a conflict and related electronic device
US20220027750A1 (en) * 2020-07-22 2022-01-27 Paypal, Inc. Real-time modification of risk models based on feature stability
US20220027916A1 (en) * 2020-07-23 2022-01-27 Socure, Inc. Self Learning Machine Learning Pipeline for Enabling Binary Decision Making
CN111737546B (en) 2020-07-24 2020-12-01 支付宝(杭州)信息技术有限公司 Method and device for determining entity service attribute
US11763084B2 (en) 2020-08-10 2023-09-19 International Business Machines Corporation Automatic formulation of data science problem statements
EP4205046A1 (en) * 2020-08-28 2023-07-05 Umnai Limited Behavior modeling, verification, and autonomous actions and triggers of ml and ai systems
US11354274B1 (en) 2021-03-26 2022-06-07 Sameer Chandrakant Ahirrao System and method for performing data minimization without reading data content
CN112801670B (en) * 2021-04-07 2021-07-23 支付宝(杭州)信息技术有限公司 Risk assessment method and device for payment operation
US11544715B2 (en) 2021-04-12 2023-01-03 Socure, Inc. Self learning machine learning transaction scores adjustment via normalization thereof accounting for underlying transaction score bases

Family Cites Families (126)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8918553D0 (en) 1989-08-15 1989-09-27 Digital Equipment Int Message control system
DE4230419A1 (en) 1992-09-11 1994-03-17 Siemens Ag Neural network with rule-base network configuration
US5819226A (en) 1992-09-08 1998-10-06 Hnc Software Inc. Fraud detection using predictive modeling
US7251624B1 (en) 1992-09-08 2007-07-31 Fair Isaac Corporation Score based decisioning
SE500769C2 (en) 1993-06-21 1994-08-29 Televerket Procedure for locating mobile stations in digital telecommunications networks
US5420910B1 (en) 1993-06-29 1998-02-17 Airtouch Communications Inc Method and apparatus for fraud control in cellular telephone systems utilizing rf signature comparison
US5442730A (en) 1993-10-08 1995-08-15 International Business Machines Corporation Adaptive job scheduling using neural network priority functions
US5692107A (en) 1994-03-15 1997-11-25 Lockheed Missiles & Space Company, Inc. Method for generating predictive models in a computer system
AU5424396A (en) 1995-03-15 1996-10-02 Coral Systems, Inc. Apparatus and method for preventing fraudulent activity in a communication network
US6601048B1 (en) 1997-09-12 2003-07-29 Mci Communications Corporation System and method for detecting and managing fraud
GB2303275B (en) 1995-07-13 1997-06-25 Northern Telecom Ltd Detecting mobile telephone misuse
US5822741A (en) 1996-02-05 1998-10-13 Lockheed Martin Corporation Neural network/conceptual clustering fraud detection architecture
US6026397A (en) 1996-05-22 2000-02-15 Electronic Data Systems Corporation Data analysis system and method
US5930392A (en) 1996-07-12 1999-07-27 Lucent Technologies Inc. Classification technique using random decision forests
US6453246B1 (en) 1996-11-04 2002-09-17 3-Dimensional Pharmaceuticals, Inc. System, method, and computer program product for representing proximity data in a multi-dimensional space
GB2321364A (en) 1997-01-21 1998-07-22 Northern Telecom Ltd Retraining neural network
US6336109B2 (en) 1997-04-15 2002-01-01 Cerebrus Solutions Limited Method and apparatus for inducing rules from data classifiers
US6272479B1 (en) 1997-07-21 2001-08-07 Kristin Ann Farry Method of evolving classifier programs for signal processing and control
US7403922B1 (en) 1997-07-28 2008-07-22 Cybersource Corporation Method and apparatus for evaluating fraud risk in an electronic commerce transaction
US7096192B1 (en) 1997-07-28 2006-08-22 Cybersource Corporation Method and system for detecting fraud in a credit card transaction over a computer network
US6029154A (en) 1997-07-28 2000-02-22 Internet Commerce Services Corporation Method and system for detecting fraud in a credit card transaction over the internet
US6122624A (en) 1998-05-28 2000-09-19 Automated Transaction Corp. System and method for enhanced fraud detection in automated electronic purchases
US6347374B1 (en) 1998-06-05 2002-02-12 Intrusion.Com, Inc. Event detection
US6161130A (en) 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
CA2349177A1 (en) 1998-11-03 2000-05-11 British Telecommunications Public Limited Company Apparatus for processing communications
US6321338B1 (en) 1998-11-09 2001-11-20 Sri International Network surveillance
US6254000B1 (en) 1998-11-13 2001-07-03 First Data Corporation System and method for providing a card transaction authorization fraud warning
AU768096B2 (en) 1998-11-18 2003-12-04 Lightbridge, Inc. Event manager for use in fraud detection
US6424997B1 (en) 1999-01-27 2002-07-23 International Business Machines Corporation Machine learning based electronic messaging system
US6430539B1 (en) 1999-05-06 2002-08-06 Hnc Software Predictive modeling of consumer financial behavior
US8666757B2 (en) 1999-07-28 2014-03-04 Fair Isaac Corporation Detection of upcoding and code gaming fraud and abuse in prospective payment healthcare systems
US7478089B2 (en) 2003-10-29 2009-01-13 Kontera Technologies, Inc. System and method for real-time web page context analysis for the real-time insertion of textual markup objects and dynamic content
US8972590B2 (en) 2000-09-14 2015-03-03 Kirsten Aldrich Highly accurate security and filtering software
US6850606B2 (en) 2001-09-25 2005-02-01 Fair Isaac Corporation Self-learning real-time prioritization of telecommunication fraud control actions
US7036146B1 (en) 2000-10-03 2006-04-25 Sandia Corporation System and method for secure group transactions
US6782375B2 (en) 2001-01-16 2004-08-24 Providian Bancorp Services Neural network based decision processor and method
US7089592B2 (en) 2001-03-15 2006-08-08 Brighterion, Inc. Systems and methods for dynamic detection and prevention of electronic fraud
US20020188533A1 (en) 2001-05-25 2002-12-12 Capital One Financial Corporation Methods and systems for managing financial accounts having adjustable account parameters
US7865427B2 (en) 2001-05-30 2011-01-04 Cybersource Corporation Method and apparatus for evaluating fraud risk in an electronic commerce transaction
US20070174164A1 (en) 2001-06-01 2007-07-26 American Express Travel Related Services Company, Inc. Network/Processor Fraud Scoring for Card Not Present Transactions
US20030009495A1 (en) 2001-06-29 2003-01-09 Akli Adjaoute Systems and methods for filtering electronic content
US7835919B1 (en) 2001-08-10 2010-11-16 Freddie Mac Systems and methods for home value scoring
AU2002327677A1 (en) 2001-09-19 2003-04-01 Meta Tv, Inc. Interactive user interface for television applications
US7813937B1 (en) 2002-02-15 2010-10-12 Fair Isaac Corporation Consistency modeling of healthcare claims to detect fraud and abuse
US6889207B2 (en) 2002-06-18 2005-05-03 Bellsouth Intellectual Property Corporation Content control in a device environment
US7657482B1 (en) 2002-07-15 2010-02-02 Paymentech, L.P. System and apparatus for transaction fraud processing
US8972582B2 (en) 2002-10-03 2015-03-03 Nokia Corporation Method and apparatus enabling reauthentication in a cellular communication system
US7720761B2 (en) 2002-11-18 2010-05-18 Jpmorgan Chase Bank, N. A. Method and system for enhancing credit line management, price management and other discretionary levels setting for financial accounts
US8266215B2 (en) 2003-02-20 2012-09-11 Sonicwall, Inc. Using distinguishing properties to classify messages
US7406502B1 (en) 2003-02-20 2008-07-29 Sonicwall, Inc. Method and system for classifying a message based on canonical equivalent of acceptable items included in the message
US7483947B2 (en) 2003-05-02 2009-01-27 Microsoft Corporation Message rendering for identification of content features
JP2004334526A (en) 2003-05-07 2004-11-25 Intelligent Wave Inc Calculation program and method for illegal determination score value, and calculation system for illegal determination score value of credit card
US7272853B2 (en) 2003-06-04 2007-09-18 Microsoft Corporation Origination/destination features and lists for spam prevention
AU2004267843B2 (en) 2003-08-22 2011-03-24 Mastercard International Incorporated Methods and systems for predicting business behavior from profiling consumer card transactions
US20060041464A1 (en) 2004-08-19 2006-02-23 Transunion Llc. System and method for developing an analytic fraud model
US20060149674A1 (en) * 2004-12-30 2006-07-06 Mike Cook System and method for identity-based fraud detection for transactions using a plurality of historical identity records
US8768766B2 (en) 2005-03-07 2014-07-01 Turn Inc. Enhanced online advertising system
US20070174214A1 (en) 2005-04-13 2007-07-26 Robert Welsh Integrated fraud management systems and methods
US7631362B2 (en) 2005-09-20 2009-12-08 International Business Machines Corporation Method and system for adaptive identity analysis, behavioral comparison, compliance, and application protection using usage information
US7668769B2 (en) 2005-10-04 2010-02-23 Basepoint Analytics, LLC System and method of detecting fraud
US20070112667A1 (en) 2005-10-31 2007-05-17 Dun And Bradstreet System and method for providing a fraud risk score
EP1816595A1 (en) 2006-02-06 2007-08-08 MediaKey Ltd. A method and a system for identifying potentially fraudulent customers in relation to network based commerce activities, in particular involving payment, and a computer program for performing said method
US8650080B2 (en) 2006-04-10 2014-02-11 International Business Machines Corporation User-browser interaction-based fraud detection system
US8027439B2 (en) 2006-09-18 2011-09-27 Fair Isaac Corporation Self-calibrating fraud detection
WO2008045354A2 (en) 2006-10-05 2008-04-17 Richard Zollino Method for analyzing credit card transaction data
US20080104101A1 (en) 2006-10-27 2008-05-01 Kirshenbaum Evan R Producing a feature in response to a received expression
US20080162259A1 (en) 2006-12-29 2008-07-03 Ebay Inc. Associated community platform
US7716610B2 (en) 2007-01-05 2010-05-11 International Business Machines Corporation Distributable and serializable finite state machine
US7433960B1 (en) 2008-01-04 2008-10-07 International Business Machines Corporation Systems, methods and computer products for profile based identity verification over the internet
US7882027B2 (en) 2008-03-28 2011-02-01 American Express Travel Related Services Company, Inc. Consumer behaviors at lender level
US10230803B2 (en) 2008-07-30 2019-03-12 Excalibur Ip, Llc System and method for improved mapping and routing
US8041597B2 (en) 2008-08-08 2011-10-18 Fair Isaac Corporation Self-calibrating outlier model and adaptive cascade model for fraud detection
US20100082751A1 (en) 2008-09-29 2010-04-01 Microsoft Corporation User perception of electronic messaging
US9400879B2 (en) 2008-11-05 2016-07-26 Xerox Corporation Method and system for providing authentication through aggregate analysis of behavioral and time patterns
US8572736B2 (en) 2008-11-12 2013-10-29 YeeJang James Lin System and method for detecting behavior anomaly in information access
US8126791B2 (en) 2008-11-14 2012-02-28 Mastercard International Incorporated Methods and systems for providing a decision making platform
US20100191634A1 (en) * 2009-01-26 2010-07-29 Bank Of America Corporation Financial transaction monitoring
EP2399230A1 (en) 2009-02-20 2011-12-28 Moqom Limited Merchant alert system and method for fraud prevention
US8090648B2 (en) 2009-03-04 2012-01-03 Fair Isaac Corporation Fraud detection based on efficient frequent-behavior sorted lists
US8145562B2 (en) 2009-03-09 2012-03-27 Moshe Wasserblat Apparatus and method for fraud prevention
US8600873B2 (en) 2009-05-28 2013-12-03 Visa International Service Association Managed real-time transaction fraud analysis and decisioning
US20110016041A1 (en) 2009-07-14 2011-01-20 Scragg Ernest M Triggering Fraud Rules for Financial Transactions
US9529864B2 (en) 2009-08-28 2016-12-27 Microsoft Technology Licensing, Llc Data mining electronic communications
US20110055264A1 (en) 2009-08-28 2011-03-03 Microsoft Corporation Data mining organization communications
US8805737B1 (en) 2009-11-02 2014-08-12 Sas Institute Inc. Computer-implemented multiple entity dynamic summarization systems and methods
US20120137367A1 (en) 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
WO2011094734A2 (en) 2010-02-01 2011-08-04 Jumptap, Inc. Integrated advertising system
US20110238566A1 (en) 2010-02-16 2011-09-29 Digital Risk, Llc System and methods for determining and reporting risk associated with financial instruments
US8626663B2 (en) 2010-03-23 2014-01-07 Visa International Service Association Merchant fraud risk score
US8473415B2 (en) 2010-05-04 2013-06-25 Kevin Paul Siegel System and method for identifying a point of compromise in a payment transaction processing system
US9215244B2 (en) 2010-11-18 2015-12-15 The Boeing Company Context aware network security monitoring for threat detection
US8744979B2 (en) 2010-12-06 2014-06-03 Microsoft Corporation Electronic communications triage using recipient's historical behavioral and feedback
US20120203698A1 (en) 2011-02-07 2012-08-09 Dustin Duncan Method and System for Fraud Detection and Notification
US11386096B2 (en) 2011-02-22 2022-07-12 Refinitiv Us Organization Llc Entity fingerprints
US8458069B2 (en) 2011-03-04 2013-06-04 Brighterion, Inc. Systems and methods for adaptive identification of sources of fraud
US8751399B2 (en) 2011-07-15 2014-06-10 Wal-Mart Stores, Inc. Multi-channel data driven, real-time anti-money laundering system for electronic payment cards
US8555077B2 (en) 2011-11-23 2013-10-08 Elwha Llc Determining device identity using a behavioral fingerprint
US10902426B2 (en) 2012-02-06 2021-01-26 Fair Isaac Corporation Multi-layered self-calibrating analytics
US9032258B2 (en) 2012-09-14 2015-05-12 Infineon Technologies Ag Safety system challenge-and-response using modified watchdog timer
US20140149128A1 (en) 2012-11-29 2014-05-29 Verizon Patent And Licensing Inc. Healthcare fraud detection with machine learning
US20140180974A1 (en) 2012-12-21 2014-06-26 Fair Isaac Corporation Transaction Risk Detection
US9218568B2 (en) 2013-03-15 2015-12-22 Business Objects Software Ltd. Disambiguating data using contextual and historical information
US9264442B2 (en) 2013-04-26 2016-02-16 Palo Alto Research Center Incorporated Detecting anomalies in work practice data by combining multiple domains of information
US9898741B2 (en) 2013-07-17 2018-02-20 Visa International Service Association Real time analytics system
US20150161609A1 (en) 2013-12-06 2015-06-11 Cube, Co. System and method for risk and fraud mitigation while processing payment card transactions
US9547834B2 (en) 2014-01-08 2017-01-17 Bank Of America Corporation Transaction performance monitoring
US9384629B2 (en) 2014-03-31 2016-07-05 Fresh Idea Global Limited Automated money laundering detection, notification, and reporting techniques implemented at casino gaming networks
US20180053114A1 (en) 2014-10-23 2018-02-22 Brighterion, Inc. Artificial intelligence for context classifier
US10438206B2 (en) 2014-05-27 2019-10-08 The Toronto-Dominion Bank Systems and methods for providing merchant fraud alerts
US20150046224A1 (en) 2014-08-08 2015-02-12 Brighterion, Inc. Reducing false positives with transaction behavior forecasting
EP3278213A4 (en) 2015-06-05 2019-01-30 C3 IoT, Inc. Systems, methods, and devices for an enterprise internet-of-things application development platform
US10362113B2 (en) 2015-07-02 2019-07-23 Prasenjit Bhadra Cognitive intelligence platform for distributed M2M/ IoT systems
US10324773B2 (en) 2015-09-17 2019-06-18 Salesforce.Com, Inc. Processing events generated by internet of things (IoT)
US11423414B2 (en) 2016-03-18 2022-08-23 Fair Isaac Corporation Advanced learning system for detection and prevention of money laundering
US9721296B1 (en) 2016-03-24 2017-08-01 Www.Trustscience.Com Inc. Learning an entity's trust model and risk tolerance to calculate a risk score
US10104567B2 (en) 2016-05-31 2018-10-16 At&T Intellectual Property I, L.P. System and method for event based internet of things (IOT) device status monitoring and reporting in a mobility network
CN107644279A (en) 2016-07-21 2018-01-30 阿里巴巴集团控股有限公司 The modeling method and device of evaluation model
US20180040064A1 (en) 2016-08-04 2018-02-08 Xero Limited Network-based automated prediction modeling
US20180048710A1 (en) 2016-08-11 2018-02-15 Afero, Inc. Internet of things (iot) storage device, system and method
US10339606B2 (en) 2016-09-07 2019-07-02 American Express Travel Related Services Company, Inc. Systems and methods for an automatically-updating fraud detection variable
KR101765235B1 (en) 2016-11-28 2017-08-04 한국건설기술연구원 FACILITY MAINTENANCE SYSTEM USING INTERNET OF THINGS (IoT) BASED SENSOR AND UNMANNED AIR VEHICLE (UAV), AND METHOD FOR THE SAME
US11238528B2 (en) 2016-12-22 2022-02-01 American Express Travel Related Services Company, Inc. Systems and methods for custom ranking objectives for machine learning models applicable to fraud and credit risk assessments
US10087063B2 (en) 2017-01-20 2018-10-02 Afero, Inc. Internet of things (IOT) system and method for monitoring and collecting data in a beverage dispensing system
US20180253657A1 (en) 2017-03-02 2018-09-06 Liang Zhao Real-time credit risk management system
US10586280B2 (en) 2018-01-30 2020-03-10 PointPredictive Inc. Risk-based machine learning classsifier
US10838705B2 (en) 2018-02-12 2020-11-17 Afero, Inc. System and method for service-initiated internet of things (IoT) device updates

Cited By (103)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138631B1 (en) * 2015-10-30 2021-10-05 Amazon Technologies, Inc. Predictive user segmentation modeling and browsing interaction analysis for digital advertising
US12026716B1 (en) 2016-03-25 2024-07-02 State Farm Mutual Automobile Insurance Company Document-based fraud detection
US11049109B1 (en) 2016-03-25 2021-06-29 State Farm Mutual Automobile Insurance Company Reducing false positives using customer data and machine learning
US11170375B1 (en) 2016-03-25 2021-11-09 State Farm Mutual Automobile Insurance Company Automated fraud classification using machine learning
US12125039B2 (en) 2016-03-25 2024-10-22 State Farm Mutual Automobile Insurance Company Reducing false positives using customer data and machine learning
US11037159B1 (en) 2016-03-25 2021-06-15 State Farm Mutual Automobile Insurance Company Identifying chargeback scenarios based upon non-compliant merchant computer terminals
US12073408B2 (en) 2016-03-25 2024-08-27 State Farm Mutual Automobile Insurance Company Detecting unauthorized online applications using machine learning
US11334894B1 (en) 2016-03-25 2022-05-17 State Farm Mutual Automobile Insurance Company Identifying false positive geolocation-based fraud alerts
US11699158B1 (en) 2016-03-25 2023-07-11 State Farm Mutual Automobile Insurance Company Reducing false positive fraud alerts for online financial transactions
US11348122B1 (en) 2016-03-25 2022-05-31 State Farm Mutual Automobile Insurance Company Identifying fraudulent online applications
US11989740B2 (en) 2016-03-25 2024-05-21 State Farm Mutual Automobile Insurance Company Reducing false positives using customer feedback and machine learning
US11004079B1 (en) 2016-03-25 2021-05-11 State Farm Mutual Automobile Insurance Company Identifying chargeback scenarios based upon non-compliant merchant computer terminals
US11687937B1 (en) 2016-03-25 2023-06-27 State Farm Mutual Automobile Insurance Company Reducing false positives using customer data and machine learning
US11978064B2 (en) 2016-03-25 2024-05-07 State Farm Mutual Automobile Insurance Company Identifying false positive geolocation-based fraud alerts
US10949854B1 (en) 2016-03-25 2021-03-16 State Farm Mutual Automobile Insurance Company Reducing false positives using customer feedback and machine learning
US11687938B1 (en) 2016-03-25 2023-06-27 State Farm Mutual Automobile Insurance Company Reducing false positives using customer feedback and machine learning
US11741480B2 (en) 2016-03-25 2023-08-29 State Farm Mutual Automobile Insurance Company Identifying fraudulent online applications
EP3451250A4 (en) * 2016-04-27 2019-10-02 The Fourth Paradigm (Beijing) Tech Co Ltd Method and device for presenting prediction model, and method and device for adjusting prediction model
US11562256B2 (en) * 2016-04-27 2023-01-24 The Fourth Paradigm (Beijing) Tech Co Ltd Method and device for presenting prediction model, and method and device for adjusting prediction model
US11132688B2 (en) * 2016-06-23 2021-09-28 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US20190012671A1 (en) * 2016-06-23 2019-01-10 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US20220012745A1 (en) * 2016-06-23 2022-01-13 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US10496996B2 (en) * 2016-06-23 2019-12-03 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US10496997B2 (en) * 2016-06-23 2019-12-03 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US11961087B2 (en) * 2016-06-23 2024-04-16 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US20230186312A1 (en) * 2016-06-23 2023-06-15 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US11615419B2 (en) * 2016-06-23 2023-03-28 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US20200065818A1 (en) * 2016-06-23 2020-02-27 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
WO2017223522A1 (en) * 2016-06-23 2017-12-28 Mohammad Shami Neural network systems and methods for generating distributed representations of electronic transaction information
US20170372318A1 (en) * 2016-06-23 2017-12-28 Capital One Services, Llc Neural network systems and methods for generating distributed representations of electronic transaction information
US10685301B2 (en) * 2016-07-06 2020-06-16 Mastercard International Incorporated Method and system for providing sales information and insights through a conversational interface
US20180012163A1 (en) * 2016-07-06 2018-01-11 Mastercard International Incorporated Method and system for providing sales information and insights through a conversational interface
CN109416803A (en) * 2016-07-06 2019-03-01 万事达卡国际公司 It is presented sales message the method and system with opinion by dialog interface
US11694122B2 (en) 2016-07-18 2023-07-04 Nantomics, Llc Distributed machine learning systems, apparatus, and methods
US11461690B2 (en) 2016-07-18 2022-10-04 Nantomics, Llc Distributed machine learning systems, apparatus, and methods
US20180053213A1 (en) * 2016-08-18 2018-02-22 Bryan Joseph Wrzesinski System and method for copyright content monetization
US11017298B2 (en) * 2016-09-21 2021-05-25 Scianta Analytics Llc Cognitive modeling apparatus for detecting and adjusting qualitative contexts across multiple dimensions for multiple actors
US11348016B2 (en) * 2016-09-21 2022-05-31 Scianta Analytics, LLC Cognitive modeling apparatus for assessing values qualitatively across a multiple dimension terrain
US11367088B2 (en) * 2016-11-11 2022-06-21 Jpmorgan Chase Bank, N.A. System and method for providing data science as a service
US11562382B2 (en) * 2016-11-11 2023-01-24 Jpmorgan Chase Bank, N.A. System and method for providing data science as a service
US20190066133A1 (en) * 2016-11-11 2019-02-28 Jpmorgan Chase Bank, N.A. System and method for providing data science as a service
US11367142B1 (en) * 2017-09-28 2022-06-21 DatalnfoCom USA, Inc. Systems and methods for clustering data to forecast risk and other metrics
US11210140B1 (en) 2017-10-19 2021-12-28 Pure Storage, Inc. Data transformation delegation for a graphical processing unit (‘GPU’) server
US12067466B2 (en) 2017-10-19 2024-08-20 Pure Storage, Inc. Artificial intelligence and machine learning hyperscale infrastructure
US11556280B2 (en) 2017-10-19 2023-01-17 Pure Storage, Inc. Data transformation for a machine learning model
US10649988B1 (en) * 2017-10-19 2020-05-12 Pure Storage, Inc. Artificial intelligence and machine learning infrastructure
US11861423B1 (en) 2017-10-19 2024-01-02 Pure Storage, Inc. Accelerating artificial intelligence (‘AI’) workflows
US11803338B2 (en) 2017-10-19 2023-10-31 Pure Storage, Inc. Executing a machine learning model in an artificial intelligence infrastructure
US11768636B2 (en) 2017-10-19 2023-09-26 Pure Storage, Inc. Generating a transformed dataset for use by a machine learning model in an artificial intelligence infrastructure
US10671435B1 (en) 2017-10-19 2020-06-02 Pure Storage, Inc. Data transformation caching in an artificial intelligence infrastructure
US10671434B1 (en) 2017-10-19 2020-06-02 Pure Storage, Inc. Storage based artificial intelligence infrastructure
US10360214B2 (en) 2017-10-19 2019-07-23 Pure Storage, Inc. Ensuring reproducibility in an artificial intelligence infrastructure
WO2018218259A1 (en) * 2017-10-19 2018-11-29 Pure Storage, Inc. Ensuring reproducibility in an artificial intelligence infrastructure
US11403290B1 (en) 2017-10-19 2022-08-02 Pure Storage, Inc. Managing an artificial intelligence infrastructure
US11455168B1 (en) 2017-10-19 2022-09-27 Pure Storage, Inc. Batch building for deep learning training workloads
US11625730B2 (en) * 2017-11-28 2023-04-11 Equifax Inc. Synthetic online entity detection
US11023969B2 (en) * 2018-02-06 2021-06-01 Chicago Mercantile Exchange Inc. Message transmission timing optimization
US20210233174A1 (en) * 2018-02-06 2021-07-29 Chicago Mercantile Exchange Inc. Message transmission timing optimization
WO2019190919A1 (en) * 2018-03-26 2019-10-03 Adp, Llc Intelligent security risk assessment
US11550905B2 (en) 2018-03-26 2023-01-10 Adp, Inc Intelligent security risk assessment
US11494692B1 (en) 2018-03-26 2022-11-08 Pure Storage, Inc. Hyperscale artificial intelligence and machine learning infrastructure
US11514354B2 (en) 2018-04-20 2022-11-29 Accenture Global Solutions Limited Artificial intelligence based performance prediction system
US11429725B1 (en) * 2018-04-26 2022-08-30 Citicorp Credit Services, Inc. (Usa) Automated security risk assessment systems and methods
US11496480B2 (en) 2018-05-01 2022-11-08 Brighterion, Inc. Securing internet-of-things with smart-agent technology
US20190370695A1 (en) * 2018-05-31 2019-12-05 Microsoft Technology Licensing, Llc Enhanced pipeline for the generation, validation, and deployment of machine-based predictive models
US10963627B2 (en) * 2018-06-11 2021-03-30 Adobe Inc. Automatically generating digital enterprise content variants
US11263484B2 (en) * 2018-09-20 2022-03-01 Innoplexus Ag System and method for supervised learning-based prediction and classification on blockchain
US11515018B2 (en) * 2018-11-08 2022-11-29 Express Scripts Strategic Development, Inc. Systems and methods for patient record matching
US20210319098A1 (en) * 2018-12-31 2021-10-14 Intel Corporation Securing systems employing artificial intelligence
US11469878B2 (en) * 2019-01-28 2022-10-11 The Toronto-Dominion Bank Homomorphic computations on encrypted data within a distributed computing environment
US11403327B2 (en) * 2019-02-20 2022-08-02 International Business Machines Corporation Mixed initiative feature engineering
US11790368B2 (en) * 2019-03-05 2023-10-17 International Business Machines Corporation Auto-evolving database endorsement policies
US20200286084A1 (en) * 2019-03-05 2020-09-10 International Business Machines Corporation Auto-evolving database endorsement policies
US11151660B1 (en) * 2019-04-03 2021-10-19 Progressive Casualty Insurance Company Intelligent routing control
US12079815B2 (en) 2019-04-19 2024-09-03 Paypal, Inc. Graphical user interface for editing classification rules
US11106689B2 (en) * 2019-05-02 2021-08-31 Tate Consultancy Services Limited System and method for self-service data analytics
US11914621B2 (en) * 2019-05-08 2024-02-27 Komodo Health Determining an association metric for record attributes associated with cardinalities that are not necessarily the same for training and applying an entity resolution model
US20200356816A1 (en) * 2019-05-08 2020-11-12 Komodo Health Determining an association metric for record attributes associated with cardinalities that are not necessarily the same for training and applying an entity resolution model
US11099529B2 (en) 2019-07-23 2021-08-24 International Business Machines Corporation Prediction optimization for system level production control
US11544713B1 (en) * 2019-09-30 2023-01-03 United Services Automobile Association (Usaa) Fraud detection using augmented analytics
US11741475B2 (en) * 2019-10-02 2023-08-29 Visa International Service Association System, method, and computer program product for evaluating a fraud detection system
US11244321B2 (en) * 2019-10-02 2022-02-08 Visa International Service Association System, method, and computer program product for evaluating a fraud detection system
US20220122085A1 (en) * 2019-10-02 2022-04-21 Visa International Service Association System, Method, and Computer Program Product for Evaluating a Fraud Detection System
US11443235B2 (en) 2019-11-14 2022-09-13 International Business Machines Corporation Identifying optimal weights to improve prediction accuracy in machine learning techniques
EP4071570A4 (en) * 2019-12-05 2023-12-06 OMRON Corporation Prediction system, information processing device, and information processing program
US11475326B2 (en) * 2020-03-11 2022-10-18 International Business Machines Corporation Analyzing test result failures using artificial intelligence models
US11790256B2 (en) 2020-03-11 2023-10-17 International Business Machines Corporation Analyzing test result failures using artificial intelligence models
US11941496B2 (en) * 2020-03-19 2024-03-26 International Business Machines Corporation Providing predictions based on a prediction accuracy model using machine learning
US11455199B2 (en) * 2020-05-26 2022-09-27 Micro Focus Llc Determinations of whether events are anomalous
US20210390424A1 (en) * 2020-06-10 2021-12-16 At&T Intellectual Property I, L.P. Categorical inference for training a machine learning model
US20220043850A1 (en) * 2020-08-07 2022-02-10 Basf Se Practical supervised classification of data sets
US11914629B2 (en) * 2020-08-07 2024-02-27 Basf Se Practical supervised classification of data sets
US20220070282A1 (en) * 2020-08-31 2022-03-03 Ashkan SOBHANI Methods, systems, and media for network model checking using entropy based bdd compression
US11522978B2 (en) * 2020-08-31 2022-12-06 Huawei Technologies Co., Ltd. Methods, systems, and media for network model checking using entropy based BDD compression
CN113221503A (en) * 2020-12-31 2021-08-06 芯和半导体科技(上海)有限公司 Passive device modeling simulation engine based on machine learning
CN112860303A (en) * 2021-02-07 2021-05-28 济南大学 Model incremental updating method and system
US11537598B1 (en) 2021-08-12 2022-12-27 International Business Machines Corporation Effective ensemble model prediction system
CN114090601A (en) * 2021-11-23 2022-02-25 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
US20230269263A1 (en) * 2022-02-24 2023-08-24 Bank Of America Corporation Adversarial Machine Learning Attack Detection and Prevention System
US20230316349A1 (en) * 2022-04-05 2023-10-05 Tide Platform Limited Machine-learning model to classify transactions and estimate liabilities
US20230401578A1 (en) * 2022-06-10 2023-12-14 Oracle Financial Services Software Limited Automatic modification of transaction constraints
US20240127384A1 (en) * 2022-10-04 2024-04-18 Mohamed bin Zayed University of Artificial Intelligence Cooperative health intelligent emergency response system for cooperative intelligent transport systems
US12125117B2 (en) * 2022-10-04 2024-10-22 Mohamed bin Zayed University of Artificial Intelligence Cooperative health intelligent emergency response system for cooperative intelligent transport systems

Also Published As

Publication number Publication date
US20200111100A1 (en) 2020-04-09
US20210248612A1 (en) 2021-08-12
US10984423B2 (en) 2021-04-20

Similar Documents

Publication Publication Date Title
US10984423B2 (en) Method of operating artificial intelligence machines to improve predictive model training and performance
US11734607B2 (en) Data clean-up method for improving predictive model training
US11853854B2 (en) Method of automating data science services
US11748758B2 (en) Method for improving operating profits with better automated decision making with artificial intelligence
US11763310B2 (en) Method of reducing financial losses in multiple payment channels upon a recognition of fraud first appearing in any one payment channel
US20160086185A1 (en) Method of alerting all financial channels about risk in real-time
Seera et al. An intelligent payment card fraud detection system
Zhang et al. Machine learning and sampling scheme: An empirical study of money laundering detection
Zanin et al. Credit card fraud detection through parenclitic network analysis
Gicić et al. Credit scoring for a microcredit data set using the synthetic minority oversampling technique and ensemble classifiers
US20190325528A1 (en) Increasing performance in anti-money laundering transaction monitoring using artificial intelligence
US20020049685A1 (en) Prediction analysis apparatus and program storage medium therefor
Askari et al. IFDTC4. 5: Intuitionistic fuzzy logic based decision tree for E-transactional fraud detection
Hossain et al. A differentiate analysis for credit card fraud detection
Jagric et al. Does non-linearity matter in retail credit risk modeling?
Sindhuraj et al. Loan eligibility prediction using adaptive hybrid optimization driven-deep neuro fuzzy network
Kazemian et al. Comparisons of machine learning techniques for detecting fraudulent criminal identities
Supriya et al. A Hybrid Federated Learning Model for Insurance Fraud Detection
US20240144091A1 (en) Method of automating data science services
US20230325630A1 (en) Graph learning-based system with updated vectors
Krishnavardhan et al. Flower pollination optimization algorithm with stacked temporal convolution network-based classification for financial anomaly fraud detection
Danenas Intelligent financial fraud detection and analysis: a survey of recent patents
Tran On some studies of Fraud Detection Pipeline and related issues from the scope of Ensemble Learning and Graph-based Learning
Eteng et al. A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
Afriyie et al. Supervised Machine Learning Algorithm Approach to Detecting and Predicting Fraud in Credit Card Transactions

Legal Events

Date Code Title Description
AS Assignment

Owner name: BRIGHTERION INC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ADJAOUTE, AKLI;REEL/FRAME:041545/0279

Effective date: 20170309

AS Assignment

Owner name: ADJAOUTE, AKLI, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BRIGHTERION, INC;REEL/FRAME:042048/0621

Effective date: 20170418

STCV Information on status: appeal procedure

Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS

AS Assignment

Owner name: BRIGHTERION, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ADJAOUTE, AKLI;REEL/FRAME:045686/0918

Effective date: 20180501

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION