Decision tree is a traditional algorithm using tree structure to do the
classification and regression. The algorithm break down a dataset into smaller
and smaller subsets. This algorithm has been widely used for many years, however, in some private
situation one cannot only use the traditional information to do the classification. The privacy of
important information should be also take into consideration. Thus, we here implement a revised decision tree
algorithm to do the task in privacy situation.
The following section will fist introduce the
traditional ID3 algorithm, and then illustrate the two-party joint decision
tree construction algorithms.

ID3 Algorithm

The algorithm used for building decision tree is ID3 by J.R.Quinlan
whose method employs a top-down, greedy search through the space of possible branches with no backtracking.
ID3 uses Entropy and Information Gain to construct a decision tree. Next we use an example to illustrate how the algorithm implement

Dataset

The dataset used for explain how the ID3 algorithm work is the
well-known play golf dataset shown as follows. The
right side displays the whole structure of the constructed decision
tree.

Play Golf Dataset

Outlook

Temp

Humidity

Windy

Play Golf

Rainy

Hot

High

False

No

Rainy

Hot

High

True

No

Overcast

Hot

High

False

Yes

Sunny

Mild

High

False

Yes

Sunny

Cool

Normal

False

Yes

Sunny

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Rainy

Mild

High

False

No

Rainy

Cool

Normal

False

Yes

Sunny

Mild

Normal

False

Yes

Rainy

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Sunny

Mild

High

True

No

Privacy Two-party ID3 Algorithm

The two-party ID3 algorithm is based on the traditional ID3 algorithm
introduced above. The algorithm is proposed in order to deal with some
practical situations. Assume that there are two organizations, and they
both have some data, and they want to corperate with each other to
mine some knowledge based on their data. However, these data involve
confidential information of their organizations and they cannot share all
the detailed information with each other, but they both still want to
discover some interesting rules based on their data. The following parts
we will introduce one intuitive approach to solve the problem.

Assuming that Angelina is the database manager of organization $A$, and Bob is the
database manager of organization $B$. They both possessed part of the dataset, and
in this example we assume that the joint of their dataset is the whole dataset.

Angelina

Angelina Dataset

Outlook

Temp

Humidity

Windy

Play Golf

Rainy

Hot

High

False

No

Rainy

Hot

High

True

No

Overcast

Hot

High

False

Yes

Sunny

Mild

High

False

Yes

Sunny

Cool

Normal

False

Yes

Sunny

Cool

Normal

True

No

Bob

Bob Dataset

Outlook

Temp

Humidity

Windy

Play Golf

Overcast

Cool

Normal

True

Yes

Rainy

Mild

High

False

No

Rainy

Cool

Normal

False

Yes

Sunny

Mild

Normal

False

Yes

Rainy

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Sunny

Mild

High

True

No

If they build the decision tree based on their own
dataset, they both will get a "tree" like:

Angelina

Bob

Angelina

Play Golf

Yes

No

Outlook

Sunny

2

1

3

Overcast

1

0

1

Rainy

0

2

2

3

3

6

Bob

transInfo --> Outlook

Yes

No

Sunny

1

1

2

Overcast

3

0

3

Rainy

2

1

3

6

2

8

Thus, the information gain calculation is change to:

Thus the rest part of construction is similar with the traditional ID3 algorithm, and the only difference is
every time when calculate the information gain, one should take both the Angelina data and transInfo into consideration.