The purpose here is to write a script in Python that uses the k-Means method in order to partition in k meaningful clusters the dataset (shown in the 3D graph below) containing levels of three kinds of steroid hormones found in female or male foxes some living in protected regions. The following are code examples for showing how to use sklearn.cluster.KMeans.They are extracted from open source Python projects. You can vote up the examples you like or vote down the ones you don't like.
Active4 years, 6 months ago
I have a dataset which looks like this:
{'dns_query_count': '11', 'http_hostnames_count': '7', 'dest_port_count': '3', 'ip_count': '11', 'signature_count': '0', 'src_ip': '10.0.64.42', 'http_user_agent_count': '2'}
This is already converted to dict from csv
Then i use DictVectorizer to convert it
Then i try to use Kmeans on it
My question is how do i get infromation about which row of my data belongs to what cluster?
I expect to get something like this:
{'dns_query_count': '11', 'http_hostnames_count': '7', 'dest_port_count': '3', 'ip_count': '11', 'signature_count': '0', 'src_ip': '10.0.64.42', 'http_user_agent_count': '2', cluster: '1'}
Can someone give me an step by step example how to go from raw data like i showed to the same data with information to which clusters they belong?
For example i used Weka for this dataset and it showed me what i want - i can click datapoints on the graphs and read exactly which datapoints belongs to which cluster. How to get similar results with sklearn?
CoolfaceCoolface
1 Answer
This will show how you can retrieve the cluster id for each row and the cluster centers. I have also measured the distance from each row to each centroid so you can see that the rows are properly assigned to the clusters.
jay sjay s
Not the answer you're looking for? Browse other questions tagged pythonmachine-learningscikit-learnk-means or ask your own question.
Active4 years, 6 months ago
I have a dataset which looks like this:
{'dns_query_count': '11', 'http_hostnames_count': '7', 'dest_port_count': '3', 'ip_count': '11', 'signature_count': '0', 'src_ip': '10.0.64.42', 'http_user_agent_count': '2'}
![Python Python](/uploads/1/2/6/2/126242100/452985939.png)
This is already converted to dict from csv
Then i use DictVectorizer to convert it
Then i try to use Kmeans on it
My question is how do i get infromation about which row of my data belongs to what cluster?
I expect to get something like this:
{'dns_query_count': '11', 'http_hostnames_count': '7', 'dest_port_count': '3', 'ip_count': '11', 'signature_count': '0', 'src_ip': '10.0.64.42', 'http_user_agent_count': '2', cluster: '1'}
![Kmeans Kmeans](/uploads/1/2/6/2/126242100/896920320.png)
Can someone give me an step by step example how to go from raw data like i showed to the same data with information to which clusters they belong?
For example i used Weka for this dataset and it showed me what i want - i can click datapoints on the graphs and read exactly which datapoints belongs to which cluster. How to get similar results with sklearn?
CoolfaceCoolface
1 Answer
This will show how you can retrieve the cluster id for each row and the cluster centers. I have also measured the distance from each row to each centroid so you can see that the rows are properly assigned to the clusters.
jay sjay s